Controlling profiling through API calls

In your application code, you can use special macros to turn profiling on and off and profiler API functions to change which metrics are reported. Calls to these macros and functions modify the parameters and behavior of libprofilingS.

Adapting your code to use this API (and rebuilding and relaunching the application) is admittedly more work than configuring Application Profiler settings in the IDE or sending signals to a target process. However, the API allows you to profile specific code regions without trying to perfectly time signal delivery, and to dynamically change the profiling method and time units used.

Here, we demonstrate how to use some common commands.

Sample program

Your program must include the profiler header file (qprofiler.h) and call QPROFILER_START() and QPROFILER_STOP() as needed to profile the appropriate areas. For these macros to do anything, your makefile must define the QPROFILING macro for compiling and link with the libprofilingS library (for details, see “Enabling function instrumentation”).

Consider the following program:

#include <stdlib.h>
#include <stdio.h>
#include <process.h>
#include <qprofiler.h>

#define NUM_ITEMS 32767
int numbers[NUM_ITEMS];
void quickSort(int numbers[], int array_size);

int main() {
  int i;
  srand(getpid());
  // fill array with random integers
  for (i = 0; i < NUM_ITEMS; i++) {
    numbers[i] = rand();
  }

  QPROFILER_START();

  // perform quick sort on array
  quickSort(numbers, NUM_ITEMS);

  QPROFILER_STOP();

  printf("Done with sort.\n");
  for (i = 0; i < NUM_ITEMS; i++) {
    printf("%d\n", numbers[i]);
  }
  return(NULL);
}

This program generates an array of random numbers, uses the quick sort algorithm to sort it, then displays the sorted array. The profiling is turned on just before the sorting begins and turned off just after it ends. So if you set QPROF_AUTO_START to 0 (to prevent profiling from starting automatically) when you run this program, you'll see function measurements strictly for the sorting code.

Using macros to start and stop profiling is convenient because you can leave them in production code that gets built without the QPROFILING macro defined or the libprofilingS library being linked. In this case, the macros expand to nothing and thus, have no effect.

The profiler API also offers functions to change the metrics reported; to use these functions, you must build with the appropriate profiling options. Consider this second sample program:

// Parallel sorting by regular sampling --
// A four-phase parallel sorting algorithm in which each process:
// 1) Locally sorts a subset of the original list
// 2) Takes samples (element values) from its sorted list subset and sends them to
//    the master process, which sorts them to determine pivot values
// 3) Based on pivot values received from the master, sends its list partitions to
//    the other processes, with each partition containing values within a certain
//    range and designated for the process responsible for sorting that range
// 4) Merges (no resorting is necessary) the received partitions to produce a
//    fragment of the final, sorted list
int psrs() {
    ...
    // Phase three -- exchange list partitions with other processes
    if ((parentProcess) &&
        (retval = qprofiler_set_mode(MODE_KERNEL_TRACE, METHOD_REALTIME)) == -1) {
        // Error-handling code goes here
    }

    // Send each partition to the process in charge of sorting that value range
    for (i = 0; i < NUM_PROCESSES; i++) {
        if (i != myProcessIndex) {
            write(send_socket_fds[i], 
                  &local_list[offsets[i]], 
                  partition_sizes[i]);
        }
    }

    // Receive the sorted partitions from the other processes and store them in
    // the local arrays designated for the corresponding value ranges
    for (i = 0; i < NUM_PROCESSES; i++) {
        if (i != myProcessIndex) {
            read(recv_socket_fds[i], &partitions[i], MAX_PARTITION_SIZE);
        }
    }

    // Phase four -- merge list partitions received from the other processes
    if ((parentProcess) &&
        (retval = qprofiler_set_mode(MODE_LOG_FILE, METHOD_DEFAULT)) == -1) {
        // Error-handling code goes here
    }
    ...
}

This program implements a multiphase parallel sorting algorithm. In the third phase, each process sends partitions of its locally sorted list subset to the other processes. Suppose you want to analyze the target system's performance during this communication-intensive phase. One process can then enable kernel tracing mode just before the partition exchanging begins, then renable function runtime measurement after it ends.

To start the kernel event trace at this exact code location, the sample program calls qprofiler_set_mode() with the MODE_KERNEL_TRACE flag set. Although the Application Profiler controls allow you to set a delay (in seconds) for starting a trace after the application is launched, this feature isn't precise enough for multiprocess computations where timing is unpredictable. When the third phase is completed, the program calls that function again but with the MODE_LOG_FILE flag set, to resume writing function runtimes to the output file.

The API also provides a function for obtaining call chain information:

qprofiler_callstack_t[MAX_BACKTRACE_DEPTH] self_backtrace;
...
if (qprofiler_backtrace_self(
        &self_backtrace, MAX_BACKTRACE_DEPTH, 0) == -1) {
    // Error-handling code goes here
}
else {
    // Code to write out or use call chain addresses goes here
}

The qprofiler_backtrace_self() function fills the provided buffer with stack frame information for the current backtrace (call chain). This is useful if you've built your binary with call count instrumentation (and hence, are profiling with the MODE_BACKTRACING setting) and you want to display call chain details, perhaps for debugging.