Controlling kernel event tracing through API calls

In your application code, you can use the TraceEvent() function to control what kernel event data are written to memory and when the data are captured to a file. Calls to this function modify how the instrumented kernel logs events.

Adapting your code to use this API (and rebuilding and relaunching the application) is admittedly more work than configuring kernel event trace settings in the IDE or specifying tracelogger options. However, the API allows you to turn instrumentation mode on and off, emit events for only certain classes and types, and insert custom events into the trace.

For information about all commands that control kernel event logging, see the TraceEvent() entry in the C Library Reference. Here, we demonstrate how to use some common commands.

Sample program

Your program must include the trace header file (sys/trace.h). TraceEvent() is the only function it must call but as mentioned, these calls are effective only when the kernel's instrumentation mode is active, meaning it's logging events.

In our case, we want close control of kernel instrumentation, so we run tracelogger in daemon mode, using the -d1 option. This option makes the utility wait for the associated TraceEvent() commands before enabling kernel instrumentation or capturing data to the kernel event log (.kev) file. Details on daemon mode are given in the tracelogger entry in the Utilities Reference.

Consider the following program:
#include <stdlib.h>
#include <stdio.h>
#include <process.h>
#include <sys/trace.h>

#define NUM_PROCESSES 16

int main() {
  /* 
     To keep the amount of data generated manageable, we define a static rules filter.
     We do so by first disabling all tracing by suppressing tracing for all classes 
     and for process- and thread-specific events, which must be cleared separately.
  */
  TraceEvent(_NTO_TRACE_DELALLCLASSES);
  TraceEvent(_NTO_TRACE_CLRCLASSPID, _NTO_TRACE_KERCALL);
  TraceEvent(_NTO_TRACE_CLRCLASSPID, _NTO_TRACE_THREAD);
  TraceEvent(_NTO_TRACE_CLRCLASSPID, _NTO_TRACE_VTHREAD);
  TraceEvent(_NTO_TRACE_CLRCLASSPID, _NTO_TRACE_SYSTEM);
  TraceEvent(_NTO_TRACE_CLRCLASSPID, _NTO_TRACE_COMM);

  /*
      Next, we tell the kernel to emit only certain event types and to use wide mode,
      meaning it outputs more event data. With our sorting algorithm, the processes 
      send each other data using many-to-many exchanging. So, we want to see state 
      changes from Ready to Running to any blocked state associated with messaging,
      and message exchange events. We also want to see custom user events, which we
      use to indicate when a process reaches a new phase in sorting. 
  */
  TraceEvent(_NTO_TRACE_ADDEVENT, _NTO_TRACE_THREAD, _NTO_TRACE_THREADY);
  TraceEvent(_NTO_TRACE_ADDEVENT, _NTO_TRACE_THREAD, _NTO_TRACE_THRUNNING);
  TraceEvent(_NTO_TRACE_ADDEVENT, _NTO_TRACE_THREAD, _NTO_TRACE_THSEND);
  TraceEvent(_NTO_TRACE_ADDEVENT, _NTO_TRACE_THREAD, _NTO_TRACE_THRECEIVE);
  TraceEvent(_NTO_TRACE_ADDEVENT, _NTO_TRACE_THREAD, _NTO_TRACE_THREPLY);

  TraceEvent(_NTO_TRACE_ADDEVENT, _NTO_TRACE_COMM, _NTO_TRACE_COMM_SMSG);
  TraceEvent(_NTO_TRACE_ADDEVENT, _NTO_TRACE_COMM, _NTO_TRACE_COMM_RMSG);
  TraceEvent(_NTO_TRACE_ADDEVENT, _NTO_TRACE_COMM, _NTO_TRACE_COMM_REPLY);
  TraceEvent(_NTO_TRACE_ADDEVENT, _NTO_TRACE_COMM, _NTO_TRACE_COMM_ERROR);

  TraceEvent(_NTO_TRACE_ADDEVENT, _NTO_TRACE_USER, _NTO_TRACE_INSERTUSRSTREVENT);

  TraceEvent(_NTO_TRACE_SETALLCLASSESWIDE);

  /*
     To prepare to sort the list, the parent (master) process creates the necessary 
     child processes, which will sort and merge list subsets, as will the parent.
  */
  ...

  /* When we're ready to begin sorting, we tell the kernel to start logging events. */
  TraceEvent(_NTO_TRACE_START);

  /* Do the sorting! */
  psrs();

  /*
     When the sorting finishes, we stop the event logging because we're not interested 
     in kernel activity associated with process termination. Issuing the _NTO_TRACE_STOP
     command when tracelogger is running in daemon mode causes the utility to flush all
     buffer data (i.e., logged events) to the output file.
  */
  TraceEvent(_NTO_TRACE_STOP);
  return 0;
}

// Parallel sorting by regular sampling --
// A four-phase parallel sorting algorithm in which each process:
// 1) Locally sorts a subset of the original list
// 2) Takes samples (element values) from its sorted list subset and sends them to the 
//    master process, which sorts them to determine pivot values
// 3) Based on pivot values received from the master, sends its list partitions to the
//    other processes, with each partition containing values within a certain range 
//    and designated for the process responsible for sorting that range
// 4) Merges (no resorting is necessary) the received partitions to produce a fragment
//    of the final, sorted list
int psrs() {

  // Phase one
  snprintf(phase1str, 64, "Process %d(%d) beginning phase one (local sort)", 
           myIndex, myPID);
  TraceEvent(_NTO_TRACE_INSERTUSRSTREVENT, 1001, phase1str);
  ...

  // Phase two
  snprintf(phase2str, 64, "Process %d(%d) beginning phase two (sampling)", 
           myIndex, myPID);
  TraceEvent(_NTO_TRACE_INSERTUSRSTREVENT, 1002, phase2str);
  ...

  // Phase three
  snprintf(phase3str, 64, "Process %d(%d) beginning phase three (partition exchange)",
           myIndex, myPID);
  TraceEvent(_NTO_TRACE_INSERTUSRSTREVENT, 1003, phase3str);
  ...

  // Phase four
  snprintf(phase4str, 64, "Process %d(%d) beginning phase four (partition merge)", 
           myIndex, myPID);
  TraceEvent(_NTO_TRACE_INSERTUSRSTREVENT, 1004, phase4str);
  ...
}

Based on the TraceEvent() calls, every process in this program emits a custom event when it begins a new phase in sorting. These events make the kernel event trace easier to navigate, by allowing you to quickly find when a given process began a particular phase, within the timeline of all events.

Aside from the custom events, our program emits only:
  • thread events for entry into the Ready, Running, Send, Receive, and Reply states
  • communication events for message Send, Receive, Reply, and Error calls

If you want even more control over kernel event logging, you can manage the kernel buffers by issuing commands to allocate memory for them, specify the mode for writing kernel events into them, and flush their contents when needed. These advanced steps are explained in Using TraceEvent() to control tracing in the System Analysis Toolkit User's Guide.