Optimizing an application after analysis
The memory-analyzing tools tell you the total memory a process is using, the sizes of its memory segments, and the history and breakdown of its heap usage. This knowledge helps you determine what programming steps are needed to reduce an application's memory footprint, which can improve performance.
Memory efficiency is often critical in embedded systems, where memory is limited (especially with the absence of swapping) and many processes need to run continuously. The optimization steps you'll want to take depend on what the analysis results reveal about memory type distribution. For example, you can spend considerable time optimizing the heap but if your program uses more static memory than it should, this other problem must be dealt with.
Memory distribution of processes
- Code — Executable code (instructions) belonging to the application or static libraries.
- Shared Code — Executable code from shared libraries. If many processes use the same library, their virtual segments containing its code are mapped to the same physical segment.
- Data — A data segment for the application and data segments for the shared libraries. This memory type is usually referred to as static memory.
- Stack — Memory required for function stacks (there's one stack per thread).
- Heap — All memory dynamically allocated by the process.
- Shared Heap — Other memory allocated by different means, including shared and mapped memory.
The IDE has several tools for viewing process memory distribution. In the System Information,
the Memory Information view shows the memory breakdown
by type and provides details about individual segments. Note that type
is different from virtual memory category;
the correspondance is given in How memory types relate to virtual memory categories
.
You can view the heap distribution through the Malloc Information view, which displays the used, overhead, and free heap memory sizes. The Memory Analysis tool graphs this same information as well as all heap allocations and deallocations, in an interactive editor window. Through the Valgrind UI controls, you can run Massif to collect heap snapshots, then analyze the heap breakdown measured at the detailed snapshots.
After examining the memory distribution data with these tools, you should focus on the areas of high consumption for
nonshared memory. Note that nonshared memory
can include stack and heap memory used by shared libraries.
This term covers anything not created as a shared memory object; this last concept is explained in the
Shared memory
entry of the System Architecture guide. Optimizing shared
memory is unlikely to notably reduce the overall memory consumption on the target machine.
The techniques for improving memory efficiency greatly vary for different memory types. We outline some of these techniques below.
Heap optimizations
- Eliminate explicit memory leaks
- The easiest way to begin optimizing the heap is to eliminate explicit memory leaks, which occur when blocks become inaccessible because their pointer values aren't kept properly. Memory Analysis lets you check for leaks at fixed intervals and outputs a list of memory errors and tags any leaks with a keyword. Valgrind Memcheck can check for specific leak types, to identify leaks resulting from incorrect pointer values or broken pointer chains.
- Eliminate implicit memory leaks
- After fixing the explicit leaks, you should fix the implicit leaks. These are leaks caused by heap objects that keep growing in size but remain accessible through pointers. To find such cases, Memory Analysis lets you filter the results to see only events for unmatched allocations or deallocations or for blocks that remain in memory for the program's duration. Viewing these events lets you find places where the program is steadily accumulating memory.
- Reduce heap fragmentation
- Heap fragmentation occurs when a process accumulates many free blocks of varying size in noncontiguous addresses. In this case, the process will often allocate another physical page even if it seems to have enough free memory.
- Reduce the overhead of allocated objects
- There are several sources of overhead for heap-allocated objects:
- User overhead — The application might request more heap memory than it really needs. This often results from predictive algorithms, such as those used by realloc(). You can reduce this overhead by better estimating the average data size. To do this for a particular call chain, examine the related allocation backtraces in the Memory Backtrace view. Or, if your data model allows it, truncate the memory to fit into the actual size of the object, after the data growth stops.
- Padding overhead — In programs that run on processors with alignment restrictions, the fields in a struct type can get arranged in a way that makes the overall size of the structure larger than the sum of the sizes of its individual fields. You can save some space by rearranging the fields; usually, it's better to put fields of the same type together. You can measure the result by writing a sizeof test. Typically, this task is valuable when the resulting overall size matches a preallocated band size (see below).
- Block overhead — Sometimes there's extra space in heap blocks because the memory allocated is more than what's requested. In the Memory Analysis results, the Memory Events view shows the requested versus actual allocation sizes and the Usage tab shows what percentage of the heap is overhead (extra space). Whenever possible, choose an allocation size that matches a size for preallocated bands (you can see their sizes in the Bands tab), especially for realloc() calls. Also, if you can, try to align data structures with these band sizes.
- Tune the allocator
- Occasionally, application-driven data structures have fixed sizes and you can improve memory efficiency by
customizing the allocated block sizes. Or, your application may experience free blocks overhead,
when a lot of memory has been freed by the code but the process hasn't returned many pages.
This happens if the process doesn't reach the
low watermark
on heap usage, which causes it to return some pages. In these two cases, you must either write your own allocator or contact QNX to obtain a customizable allocator.
Code optimizations
- Ensure that the binary file is compiled without debug information when you measure it. Debug information is the largest contributor to file size.
- Strip the binary to remove any remaining symbol information.
- Remove any unused functions.
- Find and eliminate code clones.
- Try setting compiler optimization flags (e.g., -O, -O2). Note that there is no guarantee that the code will be smaller; it can actually be larger in some cases.
- Don't use the char type to perform int arithmetics, particularly for local variables. Converting between these types requires the compiler to insert code, which affects performance and code size, especially on ARM processors.
- Bit fields are also very expensive in arithmetics on all platforms; it's better to use bit arithmetics explicitly to avoid hidden costs of conversions.
Data optimizations
- Inspect global arrays that consume a lot of static memory. It may be better to use the heap, particularly for objects that aren't used throughout the program's entire lifetime.
- Find and remove unused global variables.
- Determine if any structures have padding overhead. If so, consider rearranging their fields to achieve a smaller overall size.
Stack optimizations
Sometimes, it's worth the effort to optimize the stack. For example, your application may have frequent high peaks in stack activity, meaning that large stack segments constantly get mapped to physical memory. These situations can be hard to detect through conventional testing. Although the program might run properly during testing, the system could fail in the field, likely when it's busiest and needed the most.
You can watch the Memory Information view for stack allocation statistics and then locate and fix code that uses the stack heavily. Typically, heavy stack usage occurs in two situations: recursive calls, which should be avoided in embedded systems, and usage of many large local variables, such as arrays kept on the stack.