Interpreting Trace Data

This chapter includes:

Overview

Once the data has been captured, you may process it, either in real time or offline.


Data interpretation


Possible data interpretation configurations.

The best tool (by far) for interpreting the copious amounts of trace data is the Integrated Development Environment. It provides a sophisticated and versatile user interface that lets you filter and examine the data.


System Profiler


Examining trace data in the IDE's System Profiler perspective.

For more information, see the Analyzing Your System with Kernel Tracing chapter of the IDE User's Guide.

We also provide a traceprinter utility that simply prints a plain-text version of the trace data, sending its output to stdout or to a file.

You can also build your own, custom interpreter, using the traceparser library.

Using traceprinter and interpreting the output

The simplest way to turn the tracing data into a form that you can analyze is to pass the .kev file through traceprinter. For details, see its entry in the Neutrino Utilities Reference.

Let's take a look at an example of the output from traceprinter. This is the output from Gathering all events from all classes in the Tutorials chapter.

The output starts with some information about how you ran the trace:

TRACEPRINTER version 1.02
TRACEPARSER LIBRARY version 1.02
 -- HEADER FILE INFORMATION -- 
       TRACE_FILE_NAME:: all_classes.kev
            TRACE_DATE:: Wed Jun 24 10:52:58 2009
       TRACE_VER_MAJOR:: 1
       TRACE_VER_MINOR:: 01
   TRACE_LITTLE_ENDIAN:: TRUE
        TRACE_ENCODING:: 16 byte events
       TRACE_BOOT_DATE:: Tue Jun 23 11:47:46 2009
  TRACE_CYCLES_PER_SEC:: 736629000
         TRACE_CPU_NUM:: 1
         TRACE_SYSNAME:: QNX
        TRACE_NODENAME:: localhost
     TRACE_SYS_RELEASE:: 6.4.1
     TRACE_SYS_VERSION:: 2009/05/20-17:35:56EDT
         TRACE_MACHINE:: x86pc
     TRACE_SYSPAGE_LEN:: 2264
TRACE_TRACELOGGER_ARGS:: tracelogger -d1 -n 3 -f all_classes.kev

The next section includes information about all the processes in existence when the trace started:

 -- KERNEL EVENTS -- 
t:0x4f81e320 CPU:00 CONTROL: BUFFER sequence = 33, num_events = 714
t:0x4f81e320 CPU:00 CONTROL :TIME msb:0x000037b0 lsb(offset):0x4f81e014
t:0x4f82017a CPU:00 PROCESS :PROCCREATE_NAME
                      ppid:0
                       pid:1
                      name:proc/boot/procnto-smp-instr
t:0x4f820f9a CPU:00 THREAD  :THCREATE      pid:1 tid:1
t:0x4f821358 CPU:00 THREAD  :THREADY       pid:1 tid:1 
t:0x4f821698 CPU:00 THREAD  :THCREATE      pid:1 tid:2
t:0x4f821787 CPU:00 THREAD  :THRECEIVE     pid:1 tid:2 
t:0x4f8219ca CPU:00 THREAD  :THCREATE      pid:1 tid:3
t:0x4f821ac6 CPU:00 THREAD  :THRECEIVE     pid:1 tid:3 
t:0x4f821c94 CPU:00 THREAD  :THCREATE      pid:1 tid:4
t:0x4f821d90 CPU:00 THREAD  :THRECEIVE     pid:1 tid:4 
t:0x4f821f6c CPU:00 THREAD  :THCREATE      pid:1 tid:5
t:0x4f82205b CPU:00 THREAD  :THRECEIVE     pid:1 tid:5 
t:0x4f8222aa CPU:00 THREAD  :THCREATE      pid:1 tid:7
t:0x4f822399 CPU:00 THREAD  :THRECEIVE     pid:1 tid:7 
t:0x4f8225bd CPU:00 THREAD  :THCREATE      pid:1 tid:8
t:0x4f8226ac CPU:00 THREAD  :THRECEIVE     pid:1 tid:8 
t:0x4f8228ca CPU:00 THREAD  :THCREATE      pid:1 tid:10
t:0x4f8229b9 CPU:00 THREAD  :THRECEIVE     pid:1 tid:10 
t:0x4f822b7d CPU:00 THREAD  :THCREATE      pid:1 tid:11
t:0x4f822c6c CPU:00 THREAD  :THRECEIVE     pid:1 tid:11 
t:0x4f822dd7 CPU:00 THREAD  :THCREATE      pid:1 tid:12
t:0x4f822ec6 CPU:00 THREAD  :THRECEIVE     pid:1 tid:12 
t:0x4f8230ac CPU:00 THREAD  :THCREATE      pid:1 tid:15
t:0x4f82319b CPU:00 THREAD  :THRECEIVE     pid:1 tid:15 
t:0x4f8233ca CPU:00 THREAD  :THCREATE      pid:1 tid:20
t:0x4f8234b9 CPU:00 THREAD  :THRECEIVE     pid:1 tid:20 
t:0x4f823ad0 CPU:00 PROCESS :PROCCREATE_NAME
                      ppid:1
                       pid:2
                      name:sbin/tinit
t:0x4f823f38 CPU:00 THREAD  :THCREATE      pid:2 tid:1
t:0x4f82402e CPU:00 THREAD  :THREPLY       pid:2 tid:1 
t:0x4f82447d CPU:00 PROCESS :PROCCREATE_NAME
                      ppid:2
                       pid:4099
                      name:proc/boot/pci-bios
t:0x4f824957 CPU:00 THREAD  :THCREATE      pid:4099 tid:1
t:0x4f824a4d CPU:00 THREAD  :THRECEIVE     pid:4099 tid:1 
t:0x4f824ff8 CPU:00 PROCESS :PROCCREATE_NAME
                      ppid:2
                       pid:4100
                      name:proc/boot/slogger

You can suppress this initial information by passing the _NTO_TRACE_STARTNOSTATE command to TraceEvent(), but you'll likely need the information (including process IDs and thread IDs) to make sense out of the actual trace data.

The sample above shows the creation and naming of the instrumented kernel procnto-smp-instr (process ID 1) and its threads (thread ID 1 is the idle thread), followed by tinit (process ID 2), pci-bios, and slogger. Some of these are the processes that were launched when you booted your system; for more information, see the Controlling How Neutrino Starts chapter of the QNX Neutrino User's Guide.

This continues for a while, culminating in the creation of the tracelogger process and our own program, all_classes (process ID 1511472):

t:0x4f852aa8 CPU:00 PROCESS :PROCCREATE_NAME
                      ppid:426015
                       pid:1507375
                      name:usr/sbin/tracelogger
t:0x4f853360 CPU:00 THREAD  :THCREATE      pid:1507375 tid:1
t:0x4f853579 CPU:00 THREAD  :THRECEIVE     pid:1507375 tid:1 
t:0x4f85392a CPU:00 THREAD  :THCREATE      pid:1507375 tid:2
t:0x4f853a19 CPU:00 THREAD  :THSIGWAITINFO pid:1507375 tid:2 
t:0x4f853d96 CPU:00 PROCESS :PROCCREATE_NAME
                      ppid:426022
                       pid:1511472
                      name:./all_classes
t:0x4f854048 CPU:00 THREAD  :THCREATE      pid:1511472 tid:1
t:0x4f854140 CPU:00 THREAD  :THRUNNING     pid:1511472 tid:1 

Next is the exit from our program's call to TraceEvent():

t:0x4f854910 CPU:00 KER_EXIT:TRACE_EVENT/01 ret_val:0x00000000 empty:0x00000000

Why doesn't the trace doesn't include the entry to TraceEvent()? Well, tracelogger didn't log anything until our program told it to — by calling TraceEvent()!

So far, so good, but now things get more complicated:

t:0x4f856aac CPU:00 KER_CALL:THREAD_DESTROY/47 tid:-1 status_p:0
t:0x4f857dca CPU:00 KER_EXIT:THREAD_DESTROY/47 ret_val:0x00000030 empty:0x00000000
t:0x4f8588d3 CPU:00 KER_CALL:THREAD_DESTROYALL/48 empty:0x00000000 empty:0x00000000
t:0x4f858ed7 CPU:00 THREAD  :THDESTROY     pid:1511472 tid:1
t:0x4f8598b9 CPU:00 THREAD  :THDEAD        pid:1511472 tid:1 
t:0x4f859c4c CPU:00 THREAD  :THRUNNING     pid:1 tid:1 

You can see that a thread is being destroyed, but which one? The tid of -1 refers to the current thread, but which process does it belong to? As mentioned earlier, most of the events don't indicate what caused the event to occur; you have to infer from a previous thread-running event. In this case, it's our own program (process ID 1511472) that's ending; it starts the tracing, and then exits. Thread 1 of procnto-smp-instr (the idle thread) runs.

The trace continues like this:

t:0x4f85c6e3 CPU:00 COMM    :SND_PULSE_EXE scoid:0x40000002 pid:1
t:0x4f85cecd CPU:00 THREAD  :THRUNNING     pid:1 tid:12 
t:0x4f85d5ad CPU:00 THREAD  :THREADY       pid:1 tid:1 
t:0x4f85e5b3 CPU:00 COMM    :REC_PULSE     scoid:0x40000002 pid:1
t:0x4f860ee2 CPU:00 KER_CALL:THREAD_CREATE/46 func_p:f0023170 arg_p:eff6e000
t:0x4f8624c7 CPU:00 THREAD  :THCREATE      pid:1511472 tid:1
t:0x4f8625ff CPU:00 THREAD  :THWAITTHREAD  pid:1 tid:12 
t:0x4f8627b4 CPU:00 THREAD  :THRUNNING     pid:1511472 tid:1 
t:0x4f8636fd CPU:00 THREAD  :THREADY       pid:1 tid:12 
t:0x4f865c34 CPU:00 KER_CALL:CONNECT_SERVER_INFO/41 pid:0 coid:0x00000000
t:0x4f866836 CPU:00 KER_EXIT:CONNECT_SERVER_INFO/41 coid:0x00000000 info->nd:0
t:0x4f86735e CPU:00 KER_CALL:TIMER_TIMEOUT/75 timeout_flags:0x00000050 ntime(sec):30
t:0x4f868445 CPU:00 KER_EXIT:TIMER_TIMEOUT/75 prev_timeout_flags:0x00000000 otime(sec):0
t:0x4f8697d3 CPU:00 INT_ENTR:0x00000000 (0)       IP:0xf008433e
t:0x4f86a276 CPU:00 INT_HANDLER_ENTR:0x00000000 (0)       PID:126997 IP:0x080b7334 AREA:0x0812a060
t:0x4f86afa7 CPU:00 INT_HANDLER_EXIT:0x00000000 (0) SIGEVENT:NONE
t:0x4f86b304 CPU:00 INT_HANDLER_ENTR:0x00000000 (0)       PID:1 IP:0xf0056570 AREA:0x00000000
t:0x4f86ca12 CPU:00 INT_HANDLER_EXIT:0x00000000 (0) SIGEVENT:NONE
t:0x4f86cff6 CPU:00 INT_EXIT:0x00000000 (0) inkernel:0x00000f01
t:0x4f86e276 CPU:00 KER_CALL:MSG_SENDV/11 coid:0x00000000 msg:"" (0x00040116)
t:0x4f86e756 CPU:00 COMM    :SND_MESSAGE   rcvid:0x0000004f pid:159762
t:0x4f86f84a CPU:00 THREAD  :THREPLY       pid:1511472 tid:1 
t:0x4f8705dd CPU:00 THREAD  :THREADY       pid:159762 tid:1 
t:0x4f8707d4 CPU:00 THREAD  :THRUNNING     pid:159762 tid:1 
t:0x4f870bff CPU:00 COMM    :REC_MESSAGE   rcvid:0x0000004f pid:159762
t:0x4f878b6c CPU:00 KER_CALL:MSG_REPLYV/15 rcvid:0x0000004f status:0x00000000
t:0x4f878f4b CPU:00 COMM    :REPLY_MESSAGE tid:1 pid:1511472
t:0x4f8798d2 CPU:00 THREAD  :THREADY       pid:1511472 tid:1 

The SND_PULSE_EXE event indicates that a SIGEV_PULSE was sent to the server connection ID 0x40000002 of procnto-smp-instr, but what is it, and who sent it? Thread 12 of the kernel receives it, and then surprisingly creates a new thread 1 in our process (ID 1511472), and starts chatting with it. What we're seeing here is the teardown of our process. It delivers a death pulse to the kernel, and then one of the kernel's threads receives the pulse and creates a thread in the process to clean up.

In the midst of this teardown, an interrupt occurs, its handler runs, and a message is sent to the process with ID 159762. By looking at the initial system information, we can determine that process ID 159762 is devc-pty.

Farther down in the trace is the actual death of our all_classes process:

t:0x4f8faa68 CPU:00 THREAD  :THRUNNING     pid:1 tid:20 
t:0x4f8fb09f CPU:00 COMM    :REC_PULSE     scoid:0x40000002 pid:1
t:0x4f8ff1a5 CPU:00 PROCESS :PROCDESTROY   ppid:426022 pid:1511472

As you can tell from a very short look at this trace, wading through a trace can be time-consuming, but can give you a great understanding of what exactly is happening in your system.

You can simplify your task by terminating any processes that you don't want to include in the trace, or by filtering the trace data.

Building your own parser

If you want to create your own parser, you can start with the code for traceprinter itself. To get the source code from Foundry27, go to http://community.qnx.com, log into your myQNX account, go to the project for the QNX Neutrino OS, and look for utils/t/traceprinter.

The traceprinter utility consists of a long list of callback definitions, followed by a fairly simple parsing procedure. Each of the callback definitions is for printing.

In its native form, the traceprinter utility is of limited use. However, it's been designed to be easily borrowed from, copied, or modified to allow you to customize your own utilities. See the copyright header included in the source for the full terms of use.

The following sections give a brief introduction to the building blocks to the parser, and some of the issues you'll need to handle:

The traceparser library

The traceparser library provides a front end to facilitate the handling and parsing of events received from the instrumented kernel and the data-capture utility. The library serves as a thin middle layer to:

You typically use the traceparser functions as follows:

  1. Initialize the traceparser library by calling traceparser_init(). You can also use this function to get the state of your parser.
  2. Set the traceparser debug mode and specify a FILE stream for the debugging output by calling traceparser_debug().
  3. Set up callbacks for processing the trace events that you're interested in:
    traceparser_cs()
    Attach a callback to an event
    traceparser_cs_range()
    Attach a callback to a range of events

    When you set up a callback with either of these functions, you can provide a pointer to arbitrary user data to be passed to the callback.

  4. Start parsing your trace data by calling traceparser()
  5. Destroy your parser by calling traceparser_destroy()

You can get information about your parser at any time by calling traceparser_get_info().

For more information about these functions, see their entries in the QNX Neutrino Library Reference.

Simple and combine events

A simple event is an event that can be described in a single event buffer slot; a combine event is an event that is larger and can be fully described only in multiple event buffer slots. Both simple and combine events consist of only one kernel event.

Each event buffer slot is an opaque traceevent_t structure.

The traceevent_t structure


Note: The traceevent_t structure is opaque—although some details are provided, the structure shouldn't be accessed without the libtraceparser API.

The traceevent_t structure is only 16 bytes long, and only half of that describes the event. This small size reduces instrumentation overhead and improves granularity. Where the information required to represent the event won't fit into a single traceevent_t structure, it spans as many traceevent_t structures as required, resulting in a combine event. A combine event isn't actually several events combined, but rather a single, long event requiring a combination of traceevent_t elements to represent it.

In order to distinguish regular events from combine events, the traceevent_t structure includes a 2-bit flag that indicates whether the event is a single event or whether it's the first, middle, or last traceevent_t structure of the event. The flag is also used as a rudimentary integrity check. The timestamp element of the combine event is identical in each buffer slot; no other event will have the same timestamp.

Adding this “thin” protocol doesn't burden the instrumented kernel and keeps the traceevent_t structure small. The trade-off is that it may take many traceevent_t structures to represent a single kernel event.

Event interlacing

Although events are timestamped immediately, they may not be written to the buffer in one single operation (atomically). When multiple buffer slot events (“combine events”) are being written to the buffer, the process is frequently interrupted in order to allow other threads and processes to run. Events triggered by higher-priority threads are often written to the buffer first. Thus, events may be interlaced. Although events may not be contiguous, they are not scrambled (unless there's a buffer overrun.) The sequential order of the combine event is always correct, even if it's interrupted with a different event.

In order to maintain speed during runtime, the instrumented kernel writes events unsorted as quickly as possible; reassembling the combine events must be done in post-processing. The libtraceparser API transparently reassembles the events.

Timestamps

The timestamp is the 32 Least Significant Bits (LSB) part of the 64-bit clock. Whenever the 32-bit portion of the clock rolls over, a _NTO_TRACE_CONTROLTIME control event is issued. Although adjacent events will never have the same exact timestamp, there may be some timestamp duplication due to the clock's rolling over.

The rollover control event includes the 32 Most Significant Bits (MSB), so you can reassemble the full clock time, if required. The timestamp includes only the LSB in order to reduce the amount of data being generated. (A 1-GHz clock rolls over every 4.29 seconds—an eternity compared to the number of events generated.)


Note: Although the majority of events are stored chronologically, you shouldn't write code that depends on events being retrieved chronologically. Some multiple buffer slot events (combine events) may be interlaced with others with leading timestamps. In the case of buffer overruns, the timestamps will definitely be scrambled, with entire blocks of events out of chronological order. Spurious gaps, while theoretically possible, are very unlikely.