Heap Analysis: Making Memory Errors a Thing of the Past

Introduction

If you develop a program that dynamically allocates memory, you're also responsible for tracking any memory that you allocate whenever a task is performed, and for releasing that memory when it's no longer required. If you fail to track the memory correctly, you may introduce “memory leaks” or unintentionally write to an area outside of the memory space.

Conventional debugging techniques usually prove to be ineffective for locating the source of corruption or leak because memory-related errors typically manifest themselves in an unrelated part of the program. Tracking down an error in a multithreaded environment becomes even more complicated because the threads all share the same memory address space.

In this chapter, we'll describe how Neutrino manages the heap and introduce you to a special version of our memory management functions that will help you to diagnose your memory management problems.

Dynamic memory management

In a program, you'll dynamically request memory buffers or blocks of a particular size from the runtime environment using malloc(), realloc(), or calloc(), and then you'll release them back to the runtime environment when they're no longer required using free().

The memory allocator ensures that your requests are satisfied by managing a region of the program's memory area known as the heap. In this heap, it tracks all of the information — such as the size of the original block — about the blocks and heap buffers that it has allocated to your program, in order that it can make the memory available to you during subsequent allocation requests. When a block is released, it places it on a list of available blocks called a free list. It usually keeps the information about a block in the header that precedes the block itself in memory.

The runtime environment grows the size of the heap when it no longer has enough memory available to satisfy allocation requests, and it returns memory from the heap to the system when the program releases memory.

The basic heap allocation mechanism is broken up into two separate pieces, a chunk-based small block allocator and a list-based large block allocator. By configuring specific parameters, you can select the various sizes for the chunks in the chunk allocator and also the boundary between the small and the large allocator.

Arena allocations

Both the small and the large portions of the allocator allocate and deallocate memory from the system in the form of arena chunks by calling mmap() and munmap(). By default, the arena allocations are performed in 32 KB chunks. This value is specified by one of the following:

This value must be a multiple of 4 KB, and currently is limited to being less than 256 KB. You can also configure this parameter by setting the MALLOC_ARENA_SIZE environment variable, or by calling mallopt() with MALLOC_ARENA_SIZE as the command.

For example, if you want to change the arena size to 16 KB, do one of the following:

Environment variables are checked only at program startup, but changing them is the easiest way of configuring parameters for the allocator so that these parameters are used for allocations that occur before main().

The allocator also attempts to cache recently used arena blocks. This cache is shared between the small- and the large-block allocator. You can configure the arena cache by setting the following environment variables:

MALLOC_ARENA_CACHE_MAXBLK
The number of cached arena blocks.
MALLOC_ARENA_CACHE_MAXSZ
The total size of the cached arena blocks.

Alternatively, you can call:

mallopt(MALLOC_ARENA_CACHE_MAXSZ, size);
mallopt(MALLOC_ARENA_CACHE_MAXBLK, number);

You can tell the allocator to never release memory back to the system from its arena cache by setting the environment variable:

export MALLOC_MEMORY_HOLD=1

or by calling:

mallopt(MALLOC_MEMORY_HOLD, 1);

Once you've changed the values by using mallopt() for either MALLOC_ARENA_CACHE_MAXSZ or MALLOC_ARENA_CACHE_MAXBLK, you must call mallopt() to cause the arena cache to be adjusted immediately:

// Adjust the cache to the current parameters or
// release all cache blocks, but don't change parameters
mallopt(MALLOC_ARENA_CACHE_FREE_NOW, 0);
mallopt(MALLOC_ARENA_CACHE_FREE_NOW, 1);

Without a call with a command of MALLOC_ARENA_CACHE_FREE_NOW, the changes made to the cache parameters will take effect whenever memory is subsequently released to the cache.

So for example, if the arena cache currently has 10 blocks, for a total size of say 320 KB, and if you change the arena parameters to MALLOC_ARENA_CACHE_MAXBLK = 5 and MALLOC_ARENA_CACHE_MAXSZ = 200 KB, an immediate call to mallopt(MALLOC_ARENA_CACHE_FREE_NOW, 0) will reduce the cache so that the number of blocks is no more than 5, and the total cache size is no more than 320 KB. If you don't make the call to mallopt(), then no immediate changes are made. If the application frees some memory, causing a new arena of size 32 KB to get released to the system, this will not be cached, but will be released to the system immediately.

You can use MALLOC_ARENA_CACHE_MAXSZ and MALLOC_ARENA_CACHE_MAXBLK either together or independently. A value of zero is ignored.

You can preallocate and populate the arena cache by setting the MALLOC_MEMORY_PREALLOCATE environment variable to a value that specifies the size of the total arena cache. The cache is populated by multiple arena allocation calls in chunks whose size is specified by the value of MALLOC_ARENA_SIZE.

The preallocation option doesn't alter the MALLOC_ARENA_CACHE_MAXBLK and MALLOC_ARENA_CACHE_MAXSZ options. So if you preallocate 10 MB of memory in cache blocks, to ensure that this memory stays in the application throughout the lifetime of the application, you should also set the values of MALLOC_ARENA_CACHE_MAXBLK and MALLOC_ARENA_CACHE_MAXSZ to something appropriate.

Small block configuration

You configure the small blocks by setting various bands of different sizes. Each band defines a fixed size block, and a number that describes the size of the pool for that size. The allocator initially adjusts all band sizes to be multiples of _MALLOC_ALIGN (which is 8), and then takes the size of the pools and normalizes them so that each band pool is constructed from a pool size of 4 KB.

By default, bands in the allocator are defined as:

so the smallest small block is 16 bytes, and the largest small block is 128 bytes. Allocations larger than the largest band size are serviced by the large allocator.

After initial normalization by the allocator, the band sizes and the pool sizes are adjusted to the following:

Band size Number of items
16 167
24 125
32 100
48 71
64 55
80 45
96 38
128 28

This normalization takes into account alignment restrictions and overhead needed by the allocator to manage these blocks. The number of items is the number of blocks of the given size that are created each time a new “bucket” is allocated.

You can specify you own band configurations by defining the following in your application's code:

typedef struct Block Block;
typedef struct Band Band;

struct Band {
  short nbpe; /* element size */
  short nalloc; /* elements per block */
  size_t slurp;
  size_t esize;
  size_t mem;
  size_t rem;
  unsigned nalloc_stats;
  Block * alist;  /* Blocks that have data to allocate */
  Block * dlist;  /* completely allocated (depleted) Blocks */
  unsigned    blk_alloced;    /* #blocks allocated */
  unsigned    blk_freed;      /* #blocks freed */
  unsigned    alloc_counter;  /* allocs */
  unsigned    free_counter;   /* frees */
  unsigned    blk_size;     /* size of allocated blocks */
};

static Band a1 = { _MALLOC_ALIGN*2, 32, 60};
static Band a2 = { _MALLOC_ALIGN*3, 32, 60};
static Band a3 = { _MALLOC_ALIGN*4, 32, 60};
static Band a4 = { _MALLOC_ALIGN*5, 24, 60};
static Band a5 = { _MALLOC_ALIGN*6, 24, 60};
static Band a6 = { _MALLOC_ALIGN*7, 24, 60};
static Band a7 = { _MALLOC_ALIGN*8, 16, 60};
static Band a8 = { _MALLOC_ALIGN*9, 8, 60};
static Band a9 = { _MALLOC_ALIGN*10, 8, 60};
static Band a10 = { _MALLOC_ALIGN*11, 8, 60};
static Band a11 = { _MALLOC_ALIGN*12, 8, 60};
static Band a12 = { _MALLOC_ALIGN*13, 8, 60};
static Band a13 = { _MALLOC_ALIGN*32, 10, 60};
Band *__dynamic_Bands[] = { &a1, &a2, &a3, &a4, &a5, &a6,
                            &a7, &a8, &a9, &a10, &a11, &a12,
                            &a13,
                          };
unsigned __dynamic_nband=13;

The main variables are *__dynamic_Bands[] and __dynamic_Bands, which specify the band configurations and the number of bands. For example, the following line:

static Band a9 = { _MALLOC_ALIGN*10, 8, 60};

specifies a band size of 80 bytes, with each chunk having at least 8 blocks, and a preallocation value of 60. The allocator first normalizes the band size to 80, and the number of items to 45. Then during initialization of the allocator, it preallocates at least 60 blocks of this size band. (Each bucket will have 45 blocks, so 60 blocks will be constructed from two buckets).


Note: If you specify your own bands:
  • The sizes must all be distinct.
  • The band configuration must be provided in ascending order of sizes (i.e., band 0 size < band 1 size < band 2 size, and so on).

A user-specified band configuration of:

static Band a1 = { 2, 32, 60};
static Band a2 = { 15, 32, 60};
static Band a3 = { 29, 32, 60};
static Band a4 = { 55, 24, 60};
static Band a5 = { 100, 24, 60};
static Band a6 = { 130, 24, 60};
static Band a7 = { 260, 8, 60};
static Band a8 = { 600, 4, 60};
Band *__dynamic_Bands[] = {&a1, &a2, &a3, &a4,
                           &a5, &a6, &a7, &a8, 
                          };
unsigned __dynamic_nband=8;

will be normalized to:

Band size Number of items
8 251
16 167
32 100
56 62
104 35
136 27
264 13
600 5

In addition to specifying the band configurations, you also have to set the environment variable:

export MALLOC_MEMORY_BANDCONFIG=1

to ensure that your configurations are picked.

For the above configuration, allocations larger than 600 bytes will be serviced by the large block allocator.

When used in conjunction with the MALLOC_MEMORY_PREALLOCATE option for the arena cache, the preallocations of blocks in bands are performed by initially populating the arena cache, and then allocating bands from this arena cache.

You can also configure the bands by using the MALLOC_BAND_CONFIG_STR environment variable. The string format is:

N:s1,n1,p1:s2,n2,p2:s3,n3,p3: ... :sN,nN,pN

where the components are:

s
The band size.
n
The number of items.
p
The preallocation value, which can be zero.

You must specify s, n, and p for each band. The string can't include any spaces; the only valid characters are digits, colons (:), and commas (,). Position is important. The parsing is simple and strict: sizes are assumed to be provided in ascending order, further validation is done by the allocator. If the allocator doesn't like the string, it ignores it completely.

Heap corruption

Heap corruption occurs when a program damages the allocator's view of the heap. The outcome can be relatively benign and cause a memory leak (where some memory isn't returned to the heap and is inaccessible to the program afterward), or it may be fatal and cause a memory fault, usually within the allocator itself. A memory fault typically occurs within the allocator when it manipulates one or more of its free lists after the heap has been corrupted.

It's especially difficult to identify the source of corruption when the source of the fault is located in another part of the code base. This is likely to happen if the fault occurs when:

Contiguous memory blocks

When contiguous blocks are used, a program that writes outside of the bounds can corrupt the allocator's information about the block of memory it's using, as well as the allocator's view of the heap. The view may include a block of memory that's before or after the block being used, and it may or may not be allocated. In this case, a fault in the allocator will likely occur during an unrelated attempt to allocate or release memory.

Multithreaded programs

Multithreaded execution may cause a fault to occur in a different thread from the thread that actually corrupted the heap, because threads interleave requests to allocate or release memory.

When the source of corruption is located in another part of the code base, conventional debugging techniques usually prove to be ineffective. Conventional debugging typically applies breakpoints — such as stopping the program from executing — to narrow down the offending section of code. While this may be effective for single-threaded programs, it's often unyielding for multithreaded execution because the fault may occur at an unpredictable time, and the act of debugging the program may influence the appearance of the fault by altering the way that thread execution occurs. Even when the source of the error has been narrowed down, there may be a substantial amount of manipulation performed on the block before it's released, particularly for long-lived heap buffers.

Allocation strategy

A program that works in a particular memory allocation strategy may abort when the allocation strategy is changed in a minor way. A good example of this is a memory overrun condition (for more information see Overrun and underrun errors,” below) where the allocator is permitted to return blocks that are larger than requested in order to satisfy allocation requests. Under this circumstance, the program may behave normally in the presence of overrun conditions. But a simple change, such as changing the size of the block requested, may result in the allocation of a block of the exact size requested, resulting in a fatal error for the offending program.

Fatal errors may also occur if the allocator is configured slightly differently, or if the allocator policy is changed in a subsequent release of the runtime library. This makes it all the more important to detect errors early in the life cycle of an application, even if it doesn't exhibit fatal errors in the testing phase.

Common sources

Some of the most common sources of heap corruption include:

Even the most robust allocator can occasionally fall prey to the above problems. Let's take a look at the last three items in more detail.

Overrun and underrun errors

Overrun and underrun errors occur when your program writes outside of the bounds of the allocated block. They're one of the most difficult type of heap corruption to track down, and usually the most fatal to program execution.

Overrun errors occur when the program writes past the end of the allocated block. Frequently this causes corruption in the next contiguous block in the heap, whether or not it's allocated. When this occurs, the behavior that's observed varies depending on whether that block is allocated or free, and whether it's associated with a part of the program related to the source of the error. When a neighboring block that's allocated becomes corrupted, the corruption is usually apparent when that block is released elsewhere in the program. When an unallocated block becomes corrupted, a fatal error will usually result during a subsequent allocation request. Although this may well be the next allocation request, it actually depends on a complex set of conditions that could result in a fault at a much later point in time, in a completely unrelated section of the program, especially when small blocks of memory are involved.

Underrun errors occur when the program writes before the start of the allocated block. Often they corrupt the header of the block itself, and sometimes, the preceding block in memory. Underrun errors usually result in a fault that occurs when the program attempts to release a corrupted block.

Releasing memory

In order to release memory, your program must track the pointer for the allocated block and pass it to the free() function. If the pointer is stale, or if it doesn't point to the exact start of the allocated block, it may result in heap corruption.

A pointer is stale when it refers to a block of memory that's already been released. A duplicate request to free() involves passing free() a stale pointer — there's no way to know whether this pointer refers to unallocated memory, or to memory that's been used to satisfy an allocation request in another part of the program.

Passing a stale pointer to free() may result in a fault in the allocator, or worse, it may release a block that's been used to satisfy another allocation request. If this happens, the code making the allocation request may compete with another section of code that subsequently allocated the same region of heap, resulting in corrupted data for one or both. The most effective way to avoid this error is to NULL out pointers when the block is released, but this is uncommon, and difficult to do when pointers are aliased in any way.

A second common source of errors is to attempt to release an interior pointer (i.e., one that's somewhere inside the allocated block rather than at the beginning). This isn't a legal operation, but it may occur when the pointer has been used in conjunction with pointer arithmetic. The result of providing an interior pointer is highly dependent on the allocator and is largely unpredictable, but it frequently results in a fault in the free() call.

A more rare source of errors is to pass an uninitialized pointer to free(). If the uninitialized pointer is an automatic (stack) variable, it may point to a heap buffer, causing the types of coherency problems described for duplicate free() requests above. If the pointer contains some other non-NULL value, it may cause a fault in the allocator.

Using uninitialized or stale pointers

If you use uninitialized or stale pointers, you might corrupt the data in a heap buffer that's allocated to another part of the program, or see memory overrun or underrun errors.

Detecting and reporting errors

The primary goal for detecting heap corruption problems is to correctly identify the source of the error, to avoid getting a fault in the allocator at some later point in time.

A first step to achieving this goal is to create an allocator that's able to determine whether the heap was corrupted on every entry into the allocator, whether it's for an allocation request or for a release request. For example, on a release request, the allocator should be capable of determining whether:

To achieve this goal, we'll use a replacement library for the allocator that can keep additional block information in the header of every heap buffer. You can use this library while testing the application to help isolate any heap corruption problems. When this allocator detects a source of heap corruption, it can print an error message indicating:

The library technique can be refined to also detect some of the sources of errors that may still elude detection, such as memory overrun or underrun errors, that occur before the corruption is detected by the allocator. This may be done when the standard libraries are the vehicle for the heap corruption, such as an errant call to memcpy(), for example. In this case, the standard memory manipulation functions and string functions can be replaced with versions that make use of the information in the debugging allocator library to determine if their arguments reside in the heap, and whether they would cause the bounds of the heap buffer to be exceeded. Under these conditions, the function can then call the error-reporting functions to provide information about the source of the error.

Using the malloc debug library

The malloc debug library provides the capabilities described in the above section. It's available when you link to either the normal memory allocator library, or to the debug library:

To access: Link using this option:
Nondebug library -lmalloc
Debug library -lmalloc_g

If you use the debug library, you must also include /usr/lib/malloc_g as the first entry of your LD_LIBRARY_PATH environment variable before running your application.


Note: This library was updated in QNX Momentics 6.3.0 SP2.

Another way to use the debug malloc library is to use the LD_PRELOAD capability to the dynamic loader. The LD_PRELOAD environment variable lets you specify libraries to load prior to any other library in the system. In this case, set the LD_PRELOAD variable to point to the location of the debug malloc library (or the nondebug one as the case may be), by saying:

LD_PRELOAD=/usr/lib/malloc_g/libmalloc.so.2

or:

LD_PRELOAD=/usr/lib/libmalloc.so.2

Note: In this chapter, all references to the malloc library refer to the debug version, unless otherwise specified.

Both versions of the library share the same internal shared object name, so it's actually possible to link against the nondebug library and test using the debug library when you run your application. To do this, you must change the LD_LIBRARY_PATH as indicated above.

The nondebug library doesn't perform heap checking; it provides the same memory allocator as the system library.

By default, the malloc library provides a minimal level of checking. When an allocation or release request is performed, the library checks only the immediate block under consideration and its neighbors, looking for sources of heap corruption.

Additional checking and more informative error reporting can be done by using additional calls provided by the malloc library. The mallopt() function provides control over the types of checking performed by the library. There are also debug versions of each of the allocation and release routines that you can use to provide both file and line information during error-reporting. In addition to reporting the file and line information about the caller when an error is detected, the error-reporting mechanism prints out the file and line information that was associated with the allocation of the offending heap buffer.

To control the use of the malloc library and obtain the correct prototypes for all the entry points into it, it's necessary to include a different header file for the library. This header file is included in <malloc_g/malloc.h>. If you want to use any of the functions defined in this header file, other than mallopt(), make sure that you link your application with the debug library. If you forget, you'll get undefined references during the link.

The recommended practice for using the library is to always use the library for debug variants in builds. In this case, the macro used to identify the debug variant in C code should trigger the inclusion of the <malloc_g/malloc.h> header file, and the malloc debug library option should always be added to the link command. In addition, you may want to follow the practice of always adding an exit handler that provides a dump of leaked memory, and initialization code that turns on a reasonable level of checking for the debug variant of the program.

The malloc library achieves what it needs to do by keeping additional information in the header of each heap buffer. The header information includes additional storage for keeping doubly-linked lists of all allocated blocks, file, line, and other debug information, flags and a CRC of the header. The allocation policies and configuration are identical to the normal system memory allocation routines except for the additional internal overhead imposed by the malloc library. This allows the malloc library to perform checks without altering the size of blocks requested by the program. Such manipulation could result in an alteration of the behavior of the program with respect to the allocator, yielding different results when linked against the malloc library.

All allocated blocks are integrated into a number of allocation chains associated with allocated regions of memory kept by the allocator in arenas or blocks. The malloc library has intimate knowledge about the internal structures of the allocator, allowing it to use short cuts to find the correct heap buffer associated with any pointer, resorting to a lookup on the appropriate allocation chain only when necessary. This minimizes the performance penalty associated with validating pointers, but it's still significant.

The time and space overheads imposed by the malloc library are too great to make it suitable for use as a production library, but are manageable enough to allow them to be used during the test phase of development and during program maintenance.

What's checked?

As indicated above, the malloc library provides a minimal level of checking by default. This includes a check of the integrity of the allocation chain at the point of the local heap buffer on every allocation request. In addition, the flags and CRC of the header are checked for integrity. When the library can locate the neighboring heap buffers, it also checks their integrity. There are also checks specific to each type of allocation request that are done. Call-specific checks are described according to the type of call below.

You can enable additional checks by using the mallopt() call. For more information on the types of checking, and the sources of heap corruption that can be detected, see of Controlling the level of checking, below.

Allocating memory

When a heap buffer is allocated using any of the heap-allocation routines, the heap buffer is added to the allocation chain for the arena or block within the heap that the heap buffer was allocated from. At this time, any problems detected in the allocation chain for the arena or block are reported. After successfully inserting the allocated buffer in the allocation chain, the previous and next buffers in the chain are also checked for consistency.

Reallocating memory

When an attempt is made to resize a buffer through a call to the realloc() function, the pointer is checked for validity if it's a non-NULL value. If it's valid, the header of the heap buffer is checked for consistency. If the buffer is large enough to satisfy the request, the buffer header is modified, and the call returns. If a new buffer is required to satisfy the request, memory allocation is performed to obtain a new buffer large enough to satisfy the request with the same consistency checks being applied as in the case of memory allocation described above. The original buffer is then released.

If fill-area boundary checking is enabled (described in the “Manual Checking” section) the guard code checks are also performed on the allocated buffer before it's actually resized. If a new buffer is used, the guard code checks are done just before releasing the old buffer.

Releasing memory

This includes, but isn't limited to, checking to ensure that the pointer provided to a free() request is correct and points to an allocated heap buffer. Guard code checks may also be performed on release operations to allow fill-area boundary checking.

Controlling the level of checking

The mallopt() function call allows extra checks to be enabled within the library. The call to mallopt() requires that the application be aware that the additional checks are programmatically enabled. The other way to enable the various levels of checking is to use environment variables for each of the mallopt() options. Using environment variables lets you specify options that will be enabled from the time the program runs, as opposed to only when the code that triggers these options to be enabled (i.e., the mallopt() call) is reached. For certain programs that perform a lot of allocations before main(), setting options using mallopt() calls from main() or after that may be too late. In such cases, it's better to use environment variables.

The prototype of mallopt() is:

int mallopt ( int cmd, 
              int value );

The arguments are:

cmd
The command you want to use. The options used to enable additional checks in the library are:

We look at some of the other commands later in this chapter.

value
A value corresponding to the command used.

For more details, see the entry for mallopt() in the Neutrino Library Reference.

Description of optional checks

MALLOC_CKACCESS
Turn on (or off) boundary checking for memory and string operations. Environment variable: MALLOC_CKACCESS.

The value argument can be:

This helps to detect buffer overruns and underruns that are a result of memory or string operations. When on, each pointer operand to a memory or string operation is checked to see if it's a heap buffer. If it is, the size of the heap buffer is checked, and the information is used to ensure that no assignments are made beyond the bounds of the heap buffer. If an attempt is made that would assign past the buffer boundary, a diagnostic warning message is printed.

Here's how you can use this option to find an overrun error:

...
char *p;
int opt;
opt = 1;
mallopt(MALLOC_CKACCESS, opt);
p = malloc(strlen("hello"));
strcpy(p, "hello, there!");  /* a warning is generated
                                          here */
...

The following illustrates how access checking can trap a reference through a stale pointer:

...

char *p;
int opt;
opt = 1;
mallopt(MALLOC_CKACCESS, opt);
p = malloc(30);
free(p);
strcpy(p, "hello, there!");
MALLOC_FILLAREA
Turn on (or off) fill-area boundary checking that validates that the program hasn't overrun the user-requested size of a heap buffer. Environment variable: MALLOC_FILLAREA.

The value argument can be:

It does this by applying a guard code check when the buffer is released or when it's resized. The guard code check works by filling any excess space available at the end of the heap buffer with a pattern of bytes. When the buffer is released or resized, the trailing portion is checked to see if the pattern is still present. If not, a diagnostic warning message is printed.

The effect of turning on fill-area boundary checking is a little different than enabling other checks. The checking is performed only on memory buffers allocated after the point in time at which the check was enabled. Memory buffers allocated before the change won't have the checking performed.

Here's how you can catch an overrun with the fill-area boundary checking option:

...
...
int *foo, *p, i, opt;
opt = 1;
mallopt(MALLOC_FILLAREA, opt);
foo = (int *)malloc(10*4);
for (p = foo, i = 12; i > 0; p++, i--)
    *p = 89;
free(foo);  /* a warning is generated here */
MALLOC_CKCHAIN
Enable (or disable) full chain checking. This option is expensive and should be considered as a last resort when some code is badly corrupting the heap and otherwise escapes the detection of boundary checking or fill-area boundary checking. Environment variable: MALLOC_CKCHAIN.

The value argument can be:

This kind of corruption can occur under a number of circumstances, particularly when they're related to direct pointer assignments. In this case, the fault may occur before a check such as fill-area boundary checking can be applied. There are also circumstances in which both fill-area boundary checking and the normal attempts to check the headers of neighboring buffer fail to detect the source of the problem. This may happen if the buffer that's overrun is the first or last buffer associated with a block or arena. It may also happen when the allocator chooses to satisfy some requests, particularly those for large buffers, with a buffer that exactly fits the program's requested size.

Full-chain checking traverses the entire set of allocation chains for all arenas and blocks in the heap every time a memory operation (including allocation requests) is performed. This lets the developer narrow down the search for a source of corruption to the nearest memory operation.

Forcing verification

You can force a full allocation chain check at certain points while your program is executing, without turning on chain checking. Specify the following option for cmd:

MALLOC_VERIFY
Perform a chain check immediately. If an error is found, perform error handling. The value argument is ignored.

Specifying an error handler

Typically, when the library detects an error, a diagnostic message is printed and the program continues executing. In cases where the allocation chains or another crucial part of the allocator's view is hopelessly corrupted, an error message is printed and the program is aborted (via abort()).

You can override this default behavior by specifying what to do when a warning or a fatal condition is detected:

cmd
The error handler to set; one of:
MALLOC_FATAL
Specify the malloc fatal handler. Environment variable: MALLOC_FATAL.
MALLOC_WARN
Specify the malloc warning handler handler. Environment variable: MALLOC_WARN.
value
An integer value that indicates which one of the standard handlers provided by the library to use:
M_HANDLE_ABORT
Terminate execution with a call to abort().
M_HANDLE_EXIT
Exit immediately.
M_HANDLE_IGNORE
Ignore the error and continue.
M_HANDLE_CORE
Cause the program to dump a core file.
M_HANDLE_SIGNAL
Stop the program when this error occurs, by sending it a stop signal (SIGSTOP). This lets you attach to this process using a debugger. The program is stopped inside the error-handler function, and a backtrace from there should show you the exact location of the error.

If you use environment variables to specify options to the malloc library for either MALLOC_FATAL or MALLOC_WARN, you must pass the value that indicates the handler, not its symbolic name:

Handler Value
M_HANDLE_IGNORE 0
M_HANDLE_ABORT 1
M_HANDLE_EXIT 2
M_HANDLE_CORE 3
M_HANDLE_SIGNAL 4

These values are also defined in /usr/include/malloc_g/malloc-lib.h.


Note: M_HANDLE_CORE and M_HANDLE_SIGNAL were added in QNX Momentics 6.3.0 SP2.

You can OR any of these handlers with the value, MALLOC_DUMP, to cause a complete dump of the heap before the handler takes action.


Here's how you can cause a memory overrun error to abort your program:

...
int *foo, *p, i;
int opt;
opt = 1;
mallopt(MALLOC_FILLAREA,  opt);
foo = (int *)malloc(10*4);
for (p = foo, i = 12; i > 0; p++, i--)
    *p = 89;
opt = M_HANDLE_ABORT;
mallopt(MALLOC_WARN, opt);
free(foo); /* a fatal error is generated here */

Other environment variables

MALLOC_INITVERBOSE
Enable some initial verbose output regarding other variables that are enabled.
MALLOC_BTDEPTH
Set the depth of the backtrace for allocations (i.e., where the allocation occurred) on CPUs that support deeper backtrace levels. Currently the builtin-return-address feature of gcc is used to implement deeper backtraces for the debug malloc library. The default value is 0.
MALLOC_TRACEBT
Set the depth of the backtrace for errors and warnings on CPUs that support deeper backtrace levels. Currently the builtin-return-address feature of gcc is used to implement deeper backtraces for the debug malloc library. The default value is 0.
MALLOC_DUMP_LEAKS
Trigger leak detection on exit of the program. The output of the leak detection is sent to the file named by this variable.
MALLOC_TRACE
Enable tracing of all calls to malloc(), free(), calloc(), realloc(), etc. A trace of the various calls is store in the file named by this variable.
MALLOC_CKACCESS_LEVEL
Specify the level of checking performed by the MALLOC_CKACCESS option. By default, a basic level of checking is performed. By increasing the level of checking, additional things that could be errors are also flagged. For example, a call to memset() with a length of zero is normally safe, since no data is actually moved. If the arguments, however, point to illegal locations (memory references that are invalid), this normally suggests a case where there is a problem potentially lurking inside the code. By increasing the level of checking, these kinds of errors are also flagged.

Note: These environment variables were added in QNX Momentics 6.3.0 SP2.

Caveats

The debug malloc library, when enabled with various checking, uses more stack space (i.e., calls more functions, uses more local variables etc.) than the regular libc allocator. This implies that programs that explicitly set the stack size to something smaller than the default may encounter problems such as running out of stack space. This may cause the program to crash. You can prevent this by increasing the stack space allocated to the threads in question.

MALLOC_FILLAREA is used to do fill-area checking. If fill-area checking isn't enabled, the program can't detect certain types of errors. For example, if an application accesses beyond the end of a block, and the real block allocated by the allocator is larger than what was requested, the allocator won't flag an error unless MALLOC_FILLAREA is enabled. By default, this checking isn't enabled.

MALLOC_CKACCESS is used to validate accesses to the str* and mem* family of functions. If this variable isn't enabled, such accesses won't be checked, and errors aren't reported. By default, this checking isn't enabled.

MALLOC_CKCHAIN performs extensive heap checking on every allocation. When you enable this environment variable, allocations can be much slower. Also since full heap checking is performed on every allocation, an error anywhere in the heap could be reported upon entry into the allocator for any operation. For example, a call to free(x) will check block x as well as the complete heap for errors before completing the operation (to free block x). So any error in the heap will be reported in the context of freeing block x, even if the error itself isn't specifically related to this operation.

When the debug library reports errors, it doesn't always exit immediately; instead it continues to perform the operation that causes the error, and corrupts the heap (since the operation that raises the warning is actually an illegal operation). You can control this behavior by using the MALLOC_WARN and MALLOC_FATAL handler described earlier. If specific handlers are not provided, the heap will be corrupted and other errors could result and be reported later because of the first error. The best solution is to focus on the first error and fix it before moving onto other errors. See the description of MALLOC_CKCHAIN for more information on how these errors may end up getting reported.

Although the debug malloc library allocates blocks to the process using the same algorithms as the standard allocator, the library itself requires additional storage to maintain block information, as well as to perform sanity checks. This means that the layout of blocks in memory using the debug allocator is slightly different than with the standard allocator.

If you use certain optimization options such as -O1, -O2, or -O3, the debug malloc library won't work correctly because these options make gcc use builtin versions of some functions, such as strcpy() and strcmp(). Use the -fno-builtin option to prevent this.

Manual checking (bounds checking)

There are times when it may be desirable to obtain information about a particular heap buffer or print a diagnostic or warning message related to that heap buffer. This is particularly true when the program has its own routines providing memory manipulation and you wish to provide bounds checking. This can also be useful for adding additional bounds checking to a program to isolate a problem such as a buffer overrun or underrun that isn't associated with a call to a memory or string function.

In the latter case, rather than keeping a pointer and performing direct manipulations on the pointer, the program may define a pointer type that contains all relevant information about the pointer, including the current value, the base pointer, and the extent of the buffer. Access to the pointer can then be controlled through macros or access functions. The access functions can perform the necessary bounds checks and print a warning message in response to attempts to exceed the bounds.

Any attempt to dereference the current pointer value can be checked against the boundaries obtained when the pointer was initialized. If the boundary is exceeded, you should call the malloc_warning() function to print a diagnostic message and perform error handling. The prototype is:

void malloc_warning( const char *funcname,
                     const char *file,
                     int line,
                     const void *link );

Getting pointer information

You can use the following to obtain information about the pointer:

find_malloc_ptr()
void* find_malloc_ptr ( const void* ptr,
                        arena_range_t* range );
    

This function finds information about the heap buffer containing the given C pointer, including the type of allocation structure it's contained in and the pointer to the header structure for the buffer. The function returns a pointer to the Dhead structure associated with this particular heap buffer. You can use the returned pointer with the DH_*() macros to obtain more information about the heap buffer. If the pointer doesn't point into the range of a valid heap buffer, the function returns NULL.

Memory leaks

The ability of the malloc library to keep full allocation chains of all the heap memory allocated by the program — as opposed to just accounting for some heap buffers — allows heap memory leaks to be detected by the library in response to requests by the program. Leaks can be detected in the program by performing tracing on the entire heap. This is described in the sections that follow.

Tracing

Tracing is an operation that attempts to determine whether a heap object is reachable by the program. In order to be reachable, a heap buffer must be available either directly or indirectly from a pointer in a global variable or on the stack of one of the threads. If this isn't the case, then the heap buffer is no longer visible to the program and can't be accessed without constructing a pointer that refers to the heap buffer — presumably by obtaining it from a persistent store such as a file or a shared memory object.

The set of global variables and stack for all threads is called the root set. Because the root set must be stable for tracing to yield valid results, tracing requires that all threads other than the one performing the trace be suspended while the trace is performed.

Tracing operates by constructing a reachability graph of the entire heap. It begins with a root set scan that determines the root set comprising the initial state of the reachability graph. The roots that can be found by tracing are:

Once the root set scan is complete, tracing initiates a mark operation for each element of the root set. The mark operation looks at a node of the reachability graph, scanning the memory space represented by the node, looking for pointers into the heap. Since the program may not actually have a pointer directly to the start of the buffer — but to some interior location — and it isn't possible to know which part of the root set or a heap object actually contains a pointer, tracing utilizes specialized techniques for coping with ambiguous roots. The approach taken is described as a conservative pointer estimation since it assumes that any word-sized object on a word-aligned memory cell that could point to a heap buffer or the interior of that heap buffer actually points to the heap buffer itself.

Using conservative pointer estimation for dealing with ambiguous roots, the mark operation finds all children of a node of the reachability graph. For each child in the heap that's found, it checks to see whether the heap buffer has been marked as referenced. If the buffer has been marked, the operation moves on to the next child. Otherwise, the trace marks the buffer, and then recursively initiates a mark operation on that heap buffer.

The tracing operation is complete when the reachability graph has been fully traversed. At this time every heap buffer that's reachable will have been marked, as could some buffers that aren't actually reachable, due to the conservative pointer estimation. Any heap buffer that hasn't been marked is definitely unreachable, constituting a memory leak. At the end of the tracing operation, all unmarked nodes can be reported as leaks.

Causing a trace and giving results

A program can cause a trace to be performed and memory leaks to be reported by calling the malloc_dump_unreferenced() function provided by the library:

int malloc_dump_unreferenced ( int fd, 
                               int detail );

Suspend all threads, clear the mark information for all heap buffers, perform the trace operation, and print a report of all memory leaks detected. All items are reported in memory order.

fd
The file descriptor on which the report should be produced.
detail
How the trace operation should deal with any heap corruption problems it encounters:
1
Any problems encountered can be treated as fatal errors. After the error encountered is printed, abort the program. No report is produced.
0
Print case errors, and a report based on whatever heap information is recoverable.

Analyzing dumps

The dump of unreferenced buffers prints out one line of information for each unreferenced buffer. The information provided for a buffer includes:

File and line information is available if the call to allocate the buffer was made using one of the library's debug interfaces. Otherwise, the return address of the call is reported in place of the line number. In some circumstances, no return address information is available. This usually indicates that the call was made from a function with no frame information, such as the system libraries. In such cases, the entry can usually be ignored and probably isn't a leak.

From the way tracing is performed, we can see that some leaks may escape detection and may not be reported in the output. This happens if the root set or a reachable buffer in the heap has something that looks like a pointer to the buffer.

Likewise, each reported leak should be checked against the suspected code identified by the line or call return address information. If the code in question keeps interior pointers — pointers to a location inside the buffer, rather than the start of the buffer — the trace operation will likely fail to find a reference to the buffer. In this case, the buffer may well not be a leak. In other cases, there is almost certainly a memory leak.

Compiler support

The gcc compiler has a feature called Mudflap that adds extra code to the compiled program to check for buffer overruns. Mudflap slows a program's performance, so you should use it while testing, and turn it off in the production version. In C++ programs, you can also use the techniques described below.

C++ issues

In place of a raw pointer, C++ programs can make use of a CheckedPtr template that acts as a smart pointer. The smart pointer has initializers that obtain complete information about the heap buffer on an assignment operation and initialize the current pointer position. Any attempt to dereference the pointer causes bounds checking to be performed and prints a diagnostic error in response an attempt to dereference a value beyond the bounds of the buffer. The CheckedPtr template is provided in the <malloc_g/malloc.h> header for C++ programs.

You can modify the checked pointer template provided for C++ programs to suit the needs of the program. The bounds checking performed by the checked pointer is restricted to checking the actual bounds of the heap buffer, rather than the program requested size.

For C programs it's possible to compile individual modules that obey certain rules with the C++ compiler to get the behavior of the CheckedPtr template. C modules obeying these rules are written to a dialect of ANSI C that can be referred to as Clean C.

Clean C

The Clean C dialect is that subset of ANSI C that is compatible with the C++ language. Writing Clean C requires imposing coding conventions to the C code that restrict use to features that are acceptable to a C++ compiler. This section provides a summary of some of the more pertinent points to be considered. It is a mostly complete but by no means exhaustive list of the rules that must be applied.

To use the C++ checked pointers, the module including all header files it includes must be compatible with the Clean C subset. All the system headers for Neutrino as well as the <malloc_g/malloc.h> header satisfy this requirement.

The most obvious aspect to Clean C is that it must be strict ANSI C with respect to function prototypes and declarations. The use of K&R prototypes or definitions isn't allowed in Clean C. Similarly, you can't use default types for variable and function declarations.

Another important consideration for declarations is that you must provide forward declarations when referencing an incomplete structure or union. This frequently occurs for linked data structures such as trees or lists. In this case, the forward declaration must occur before any declaration of a pointer to the object in the same or another structure or union. For example, you could declare a list node as follows:

  
struct ListNode;
struct ListNode {
   struct ListNode *next;
   void *data;
};

Operations on void pointers are more restrictive in C++. In particular, implicit coercions from void pointers to other types aren't allowed, including both integer types and other pointer types. You must explicitly cast void pointers to other types.

The use of const should be consistent with C++ usage. In particular, pointers that are declared as const must always be used in a compatible fashion. You can't pass const pointers as non-const arguments to functions unless you typecast the const away.

C++ example

Here's how you could use checked pointers in the overrun example given earlier to determine the exact source of the error:

typedef CheckedPtr<int> intp_t;
...
intp_t foo, p;
int i;
int opt;
opt = 1;
mallopt(MALLOC_FILLAREA, opt);
foo = (int *)malloc(10*4);
opt = M_HANDLE_ABORT;
mallopt(MALLOC_WARN, opt);
for (p = foo, i = 12; i > 0; p++, i--)
    *p = 89; /* a fatal error is generated here */
opt = M_HANDLE_IGNORE;
mallopt(MALLOC_WARN, opt);
free(foo);