Examine interrupt latency

There are several different types of interrupt latency that you can measure in a system:

The System Profiler, as a type of software logic analyzer, helps you look at the timing of activities once the interrupt has been acknowledged by the operating system. In order to accurately measure the time between the signal generation and the acknowledgment of it, you need additional hardware tools.

Requirements

To measure interrupt service time (the time taken for the operating system to acknowledge the interrupt, handle it, and return to normal processing), you must log the QNX Neutrino Interrupt Entry/Exit events.

If you're interested in the time from the operating system's acknowledgment to a service handling routine, then you also need to capture the Interrupt Handler Entry/Exit events in the log file.

To properly gauge the latency in triggering a response in user code, you should also log the QNX Neutrino thread READY and RUNNING states, in addition to the communication PULSE events, since these are often used to trigger a user application's behavior in response to an interrupt.

Procedure

Interrupt activity is best viewed in the System Profiler editor using the Timeline editor pane. Open the target log file in the System Profiler editor. Switch to the Timeline editor pane via the menu item System Profiler > Display > Switch Pane > Timeline.

You should see a display that resembles the following. The details will of course be different, but the layout similar:

This display shows the various event owners/sources (interrupts, interrupt handlers, processes and threads) as a tree with their associated events arranged horizontally as a timeline.

If you've logged Interrupt Handler Entry/Exit events, then you should be able to expand the interrupt entries to show the various handlers (more than one handler can be attached to service an interrupt source), such as the following:

Here you can see that the io-pkt process has attached to Interrupt 0x8c and that procnto has attached to Interrupt 0x800000000, which on this system is the timer interrupt firing once every millisecond or so.

You can determine how many interrupt events are occurring in this log file by using the General Statistics view. This view is part of the default System Profiler perspective, and you can also access it via Window > Show View > Other… > General Statistics.

If you use the refresh button, this view extracts the event statistics for the entire log file (default), or for only a selected area if specified. This results in the following display:

This table provides a breakdown for all of the event sources, showing the number of raw events and also the maximum, minimum, average, and total duration of the various QNX Neutrino thread states in this log file.

If you're interested in only the events associated with the timer interrupt (Interrupt 0x80000000), you can select that event owner in the Timeline editor pane:

Next, uncheck the Show statistics for all elements check box at the bottom of the General Statistics view:

The General Statistics view tables will show the content limited to just the selected event owners.

Using this technique, you can get an estimate of the rough order of magnitude of how many events you're looking at in a log file, and in the case of interrupts, you can see some of the statistics about what the maximum, minimum, average, and total times spent were.

This display also lets you drill down further into the results, by allowing navigation in the Timeline editor pane directly to the maximum and minimum times, where you can look at the exact timing sequences. To do this, select one of the entries in the States table, and then right-click or use the toolbar to jump to the appropriate selection.

In order to look at the timing sequence of an interrupt, you usually have to zoom in on the timeline a significant amount to achieve an adequate level of visual detail, since interrupt processing is typically fast compared to the length of the log files. If you zoom into an area where a networking interrupt is being processed, the Timeline editor pane will change to look something like:

At this level of granularity, it also helps to see the trace event log concurrently with the Timeline editor pane. This is part of the standard System Profiler perspective, and you can access it using Window > Show View > Other… > Trace Event Log. The Trace Event Log and the Timeline editor pane are synchronized; when you change your cursor in the editor, the selection in the Trace Event Log view also changes.

The selection synchronization is shown here. In the Trace Event Log view, we've selected the Interrupt 0x8c Entry event through to the Interrupt 0x8c Exit event. This represents the start to end of the processing of the interrupt event. In the timeline display, this selection is made and the timing measurement of 11.304 microseconds is displayed:

So the total interrupt handling time from start to end of the operating system interrupt service routine, including the event handler was 11.304 microseconds. If you want to just look at the handling time for interrupt handler attached by the io-pkt process, you can see that this time is only 8 microseconds. These times represent the earliest and latest points in time that can be measured before entering/exiting control of the software.

You can also see in this example that the io-pkt interrupt handler is returning a pulse that's triggering something in the user's application (event 13515) and that an io-pkt thread is then scheduled to service that request. You can also measure this latency to determine how long it takes to go from operating system awareness of the interrupt to eventual application processing, using the same selection technique:

There are many different choices in terms of what time ranges are of interest to measure. Here we've decided to measure from the time that the operating system is aware of the interrupt (event 13511) through to the point at which the user process has started to respond to the signal generated by the io-pkt interrupt handler. Since the interrupt handler communicates using a pulse (event 13515), then the earliest that the user code can respond is when the MsgReceive kernel call exits (event 13519) with the received pulse. In this case, we can see that the end-to-end latency from OS awareness to the start of user processing (nonprivileged) is 46.304 microseconds:

Alternate measurements that could be of interest and that you can easily examine include: