This version of this document is no longer maintained. For the latest documentation, see http://www.qnx.com/developers/docs.

Introduction

What is the System Analysis Toolkit (SAT)?

In today's computing environments, developers need to monitor a dynamic execution of realtime systems with emphasis on their key architectural components. Such monitoring can reveal hidden hardware faults and design or implementation errors, as well as help improve overall system performance.

In order to accommodate those needs, we provide sophisticated tracing and profiling mechanisms, allowing execution monitoring in real time or offline. Because it works at the operating system level, the SAT, unlike debuggers, can monitor applications without having to modify them in any way.

The main goals for the SAT are:

Ease of use
Insight into system activity
High performance and efficiency with low overhead

Why is the SAT needed?

In a running system, many things occur behind the scenes:

Kernel calls are being made
Messages are being passed
Interrupts are being handled
Threads are changing states— they're being created, blocking, running, restarting, and dying

The result of this activity are changes to the system state that are normally hidden from developers. The SAT is capable of intercepting these changes and logging them. Each event is logged with a timestamp and the ID of the CPU that handled it.

For a full understanding of how the kernel works, see the Neutrino Microkernel chapter in the System Architecture guide.

The SAT offers valuable information at all stages of a product's life cycle, from prototyping to optimization to in-service monitoring and field diagnostics.

SAT vs. Debugger

The SAT view and the debugger view.

In complicated systems, the information provided by standard debugging programs may not be detailed enough to solve the problem. Or, the problem may not be a bug as much as a process that's not behaving as expected. Unlike the SAT, debuggers lack the execution history essential to solving the many complex problems involved in “application tuning.” In a large system, often consisting of many interconnected components or processes, traditional debugging, which lets you look at only a single module, can't easily assist if the problem lies in how the modules interact with each other. Where a debugger can view a single process, the SAT can view all processes at the same time. Also, unlike debugging, the SAT doesn't need code augmentation and can be used to track the impact of external, precompiled code.

Because it offers a system-level view of the internal workings of the kernel, the SAT can be used for performance analysis and optimization of large interconnected systems as well as single processes.

It allows realtime debugging to help pinpoint deadlock and race conditions by showing what circumstances led up to the problem. Rather than just a “snapshot”, the SAT offers a “movie” of what's happening in your system.

Because the instrumented version of the kernel runs with negligible performance penalties, you can optionally leave it in the final embedded system. Should any problems arise in the field, you can use the SAT for low-level diagnostics.

The SAT offers a nonintrusive method of instrumenting the code—programs can literally monitor themselves. In addition to passive/non-intrusive event tracing, you can proactively trace events by injecting your own “flag” events.

How the SAT works

SAT Overall

The Instrumented Kernel

The Instrumented Kernel is actually the regular QNX microkernel with a small, highly efficient event-gathering module included. Except for the instrumentation, its operation is virtually indistinguishable—the Instrumented Kernel runs at 98% of the speed of our regular microkernel.

As threads run, the Instrumented Kernel continuously intercepts information about what the kernel is doing, generating time-stamped and CPU-stamped events that are stored in a circular linked list of buffers. Because the tracing occurs at the kernel level, the SAT can track the performance of all processes, including the data-capturing program.

Kernel buffer management

The kernel buffer is composed of many small buffers. Although the number of buffers is limited only by the amount of system memory, it's important to understand that this space must be managed carefully. If all of the events are being traced on an active system, the number of events can be quite large.

To allow the Instrumented Kernel to write to one part of the kernel buffer and store another part of it simultaneously, the kernel buffer is organized as a circular linked list. As the buffer data reaches a high-water mark (about 70% full), the Instrumented Kernel module sends a signal to the data-capture program with the address of the buffer. The data-capture program can then retrieve the buffer and save it to a storage location for offline processing or pass it to a data interpreter for realtime manipulation. In either case, once the buffer has been “emptied,” it is once again available for use by the kernel.

The data-capture program

The SAT includes a tracelogger that you can use to capture data. This data-capture program outputs the captured data in raw binary format to a device or file for processing. For more information about tracelogger, see the Neutrino Utilities Reference.

Data interpretation

To aid in processing the binary trace event data, we provide the libtraceparser library. The API functions let you set up a series of functions that are called when complete buffer slots of event data have been received/read from the raw binary event stream.

We also provide a linear trace event printer (traceprinter) that outputs all of the trace events ordered linearly by their timestamp as they are emitted by the kernel. This utility uses the libtraceparser library. Advanced users may wish to either customize traceprinter to make their own output program or use the API to create an interface to do the following offline or in real time:

perform analysis
display results
debug applications
system self-monitoring
show events ordered by process or by thread
show thread states and transitions
show currently running threads.

For more information about traceprinter, see the Neutrino Utilities Reference.