Detecting priority inversion

If you notice an application missing its deadlines, it may be experiencing priority inversion. The System Profiler lets you examine thread state change events that happened very rapidly so you can recognize if one thread prevented another thread of a higher priority from running, even for only a short time.

Note: The following section uses the “System Profiling” IDE example. This sample QNX project is accessible through File > New > Example > QNX Examples > IDE Examples.

For an explanation of priority inversion, see the Priority inheritance section in the Getting Started with QNX Neutrino guide. That topic describes a client-server interaction in which priority inversion happens twice. That example emphasizes the need for priority inheritance, which entails temporarily setting a thread's priority to the highest priority of any other thread that's blocked (even indirectly) by that first thread. Here, we present a scenario in which a lack of priority inheritance causes a medium-priority thread to be run instead of a high-priority thread because that second thread can't receive a reply from a low-priority thread.

Note: Whenever possible, you should use mutexes instead of semaphores in your code. Mutexes propagate priority levels (and hence, support priority inheritance), so priority inversion can't happen when programs use mutexes around critical sections. The program used to generate the results shown here uses semaphores, which don't propagate priorities. If you inherit code from a third-party or can't reasonably avoid semaphores in your code, you have to follow the process outlined here to detect priority inversion.

Determining which threads ran at certain times

To detect priority inversion, you must find which higher-priority threads were prevented from running and which lower-priority threads got to run instead. In the Timeline pane, you can examine the thread states at the time your application first became slow or unresponsive.

Initially, the pane displays timelines for all event owners (sources), which include all threads in all processes. This information can be hard to interpret given the large number of owners in most systems.

To read the timelines effectively, you must understand your system well enough to know which threads were likely involved in priority inversion. You can specify exactly which timelines to show by using the Filters view, which is displayed in the bottom left corner. In the Owners tab, you can click the checkboxes next to process or thread labels to show or hide the corresponding timelines; more details are given in the Filters reference.

Given that thread scheduling happens very quickly, you must also know fairly precisely when the non-optimal scheduling occurred and zoom in considerably, likely to within a few milliseconds. To change the zoom level, you can use the zoom in (Icon: Zoom in button) and zoom out (Icon: Zoom out button) buttons in the upper left corner of the IDE. You can also select a region of the graph by clicking and dragging the mouse, then right-click and choose Zoom In. Above the graph, the yellow band in the timeline scale and the start and end times are updated to reflect the time period now displayed.

The Timeline screenshot shown below contains the timelines of all cpu_burner, hw_server, and high_prio_client threads. Here, the user clicked a Running event in the cpu_burner Thread 1 timeline and hovered the pointer over the yellow arrow to show the event details:

Screenshot of Timeline pane showing Running event highlighted in cpu_burner Thread 1, with hw_server Thread 4 in Semaphore Blocked state, Thread 5 in Ready state, and high_prio_client Thread 1 in Reply Blocked state

Based on the thread state coloring used by the System Profiler, we can see that hw_server Thread 5 was Ready, hw_server Thread 4 was Semaphore Blocked, and high_prio_client Thread 1 was Reply Blocked while cpu_burner Thread 1 was Running. For readability, you can show the state names by selecting State Labels in the Toggle Labels dropdown (Icon: Toggle Labels dropdown) in the upper left toolbar. Information about the default mapping of colors to states and how to change that mapping is found in the Timeline reference.

From our application design, we know that the server threads each run at level 10 and the client thread runs at level 40. This explains why QNX Neutrino scheduled the cpu_burner thread, which runs at level 15, instead of hw_server Thread 5. But that particular server thread was blocking two threads, because after it ran later (briefly) and generated events, hw_server Thread 4 and high_prio_client Thread 1 were then able to run, as shown in the next screenshot.

Screenshot of Timeline pane showing that hw_server Thread 5 generated events just after cpu_burner Thread 1 finished running and just before hw_server Thread 4 resumed running

Here, the user clicked a SyncSemPost Enter event from hw_server Thread 5 and hovered the pointer to show the event details. This semaphore post (release) apparently unblocked Thread 4, as seen by the color change in this other thread's timeline. This other thread apparently then replied to high_prio_client Thread 1, as seen by the client thread's timeline color change.

With priority inheritance, the client's higher priority of 40 would have been propagated to Thread 4 and Thread 5. This latter server thread would have run before the cpu_burner thread and released the semaphore, allowing the other two threads to complete their work. But because semaphores were used instead of mutexes, no priority inheritance happened and thus, priority inversion occurred.

Finding events that caused priority inversion

The Timeline pane works well if you have a good idea of exactly when priority inversion occurred and which threads were involved. But given the huge number of events in most traces, it can be more practical to search for the events emitted when an application misses its deadlines. When this happens, many applications write messages to a log file or output stream. Our program uses printf(), which calls write(), which calls MsgSendV(). We can then search for the event emitted when the kernel enters this last function and owned by the high-priority thread that we think was improperly delayed.

Using the Trace Search tab of the Search dialog (which is accessed through Search > Search or CtrlH), we create a search condition for MsgSendV Enter events for high_prio_client Thread 1. Instructions on creating and configuring search conditions are given in the System Profiler reference.

When a search is run, the IDE displays the results in the Search view:

Screenshot of Search view listing MsgSendV Enter events for high_prio_client Thread 1

You can double-click a row in the results to navigate to that event in the timeline display. In addition to highlighting the event with a yellow arrow and dotted vertical line, the Timeline pane draws red circles around it and other events of the same type that occurred within the displayed time period.

The toolbar in the upper left corner of the IDE provides buttons for navigating between events in the timeline. You can jump to the next or previous events within the selection (i.e., the area highlighed in grey), within the same owner of the current event, or anywhere in the trace, regardless of the selection or owner. Note that the editor display must be in focus for these buttons to be active.

To see the full list of events that occurred, access the Trace Event Log view. If the event highlighted in the timeline display isn't selected in this other view, click the Enable / disable event filtering button (Icon: Previous Event By Owner button) in the upper right view controls. This filters the event log list so you see only those owners (i.e., processes and threads) currently displayed in the Timeline pane.

Screenshot of Trace Event Log view showing a Reply event from hw_server Thread 4, followed by a Ready event from high_prio_client Thread 1, then a series of other events ending with a MsgSendV Enter event from high_prio_client Thread 1

In our case, we want to know why high_prio_client Thread 1 didn't run sooner. To find out, we click the Previous Event By Owner button (Icon: Previous Event By Owner button) to navigate to this thread's last Ready event. If we then click the Previous Event button (Icon: Previous Event By Owner button) to navigate to the immediately preceeding event, we see that hw_server Thread 4 replied to the client thread. This is evident from the matching TID and PID values given in the Reply and Ready events.

The next question is: why didn't the server reply sooner? To answer this, we must navigate backwards through many events:

Screenshot of Trace Event Log view showing how cpu_burner Thread 1 being scheduled to run delays hw_server Thread 5 from releasing a semaphore needed by hw_server Thread 4, which delays this server thread from replying to high_prio_client Thread 1
The latest screenshot reveals the following retroactive event sequence:
  1. hw_server Thread 4 was waiting on a semaphore, as indicated by its SyncSemWait Exit event.
  2. hw_server Thread 4 couldn't continue its work until hw_server Thread 5 released the same semaphore, as shown by the matching addresses in the SyncSemPost Enter events of these server threads.
  3. hw_server Thread 5 couldn't run until cpu_burner Thread 1 finished running, as shown by the different priority levels listed in their separate Running events.
  4. hw_server Thread 4 was waiting on the semaphore before cpu_burner Thread 1 was scheduled to run, as indicated by the SyncSemWait Enter and Running events for these threads.
  5. high_prio_client Thread 1 was waiting a message reply from hw_server Thread 4, as shown by the matching PIDs in the Send Message and Receive Message events for these threads.

Viewing the CPU usage of threads of different priorities

You can confirm the shift in CPU assignment from the high-priority to medium-priority thread by switching to the CPU Usage pane. To do this, select the option with this name in the Switch Pane dropdown in the editor controls (Icon: Switch Pane dropdown). This pane contains a line graph that illustrates the CPU usage history of any thread selected in the table underneath. The time period displayed matches the Timeline selection, so the graph should be zoomed in to the appropriate level to easily see the CPU usage changes that resulted from the priority inversion.

The following screenshot shows that CPU consumption for hw_server Thread 5 suddenly goes from 100% to 0%, exactly when the consumption for high_prio_client Thread 1 and hw_server Thread 4 also goes to 0%. For a short but noticeable timeframe, none of these threads runs while cpu_burner Thread 1 consumes all of the CPU:

Screenshot of CPU Usage pane showing CPU usage levels for three selected threads over a very short timeframe

The By Priority tab displays how long the threads ran at specific priorities. Priority inversion is evident because the cpu_burner thread ran at level 15 for about 2 ms but the high_prio_client Thread 1 and hw_server Thread 4 threads ran at level 40 for well under 1 ms each. Also, the graph shows CPU usage rising and dropping quickly for these two threads at around 2.164 s into the trace. This pattern suggests that the client and server were communicating briefly just before both stopped executing for awhile. Indeed, both threads began executing again after the medium-priority thread finished running at around 2.166 s, presumably because the server was able to do the requested work and reply to the client.