Defragmenting physical memory

Most computer users are familiar with the concept of disk fragmentation, whereby over time, the free space on the disk is split into small blocks scattered among the in-use blocks. A similar problem occurs as the OS allocates and frees pieces of physical memory; as time passes, the system's physical memory can become fragmented. Eventually, even though there might be a significant amount of memory free in total, it's fragmented so that a request for a large piece of contiguous memory will fail.

Contiguous memory is often required for device drivers if the device uses DMA. The normal workaround is to ensure that all device drivers are initialized early (before memory is fragmented) and that they hold onto their memory. This is a harsh restriction, particularly for embedded systems that might want to use different drivers depending on the actions of the user; starting all possible device drivers simultaneously may not be feasible.

The algorithms that QNX Neutrino uses to allocate physical memory help to significantly reduce the amount of fragmentation that occurs. However, no matter how smart these algorithms might be, specific application behavior can result in fragmented free memory. Consider a completely degenerate application that routinely allocates 8 KB of memory and then frees half of it. If such an application runs long enough, it will reach a point where half of the system memory is free, but no free block is larger than 4 KB.

Thus, no matter how good our allocation routines are at avoiding fragmentation, in order to satisfy a request for contiguous memory, it may be necessary to run some form of defragmentation algorithm.

The term "fragmentation" can apply to both in-use memory and free memory:

Memory that's in use by an application is considered to be fragmented if it's discontiguous (that is, a large allocation is satisfied with a number of smaller blocks of memory from different locations in the physical address map).
Free memory is considered to be fragmented if it consists of small blocks separated by blocks of memory that are in use.

In disk-based filesystems, fragmentation of in-use blocks is most important, as it impacts the read and write performance of the device. Fragmentation of free blocks is important only in that it leads to fragmentation of in-use blocks as new blocks are allocated. In general, users of disk-based systems don't care about allocating contiguous blocks, except as it impacts performance.

For the QNX Neutrino memory system, both forms of fragmentation are important but for different reasons:

If in-use memory is fragmented, it prevents the memory subsystem from using large page sizes to map the memory, which in turn leads to poorer performance than might otherwise occur. (Some architectures don't support large page sizes; on these architectures, fragmentation of in-use memory is irrelevant.)
If free memory is fragmented, it prevents an application from allocating contiguous memory, which in turn might lead to complete failure of the application.

To defragment free memory, the memory manager swaps memory that's in use for memory that's free, in such a way that the free memory blocks coalesce into larger blocks that are sufficient to satisfy a request for contiguous memory.

When an application allocates memory, it's provided by the operating system in quanta, 4-KB blocks of memory that exist on 4-KB boundaries. The operating system programs the MMU so that the application can reference the physical block of memory through a virtual address; during operation, the MMU translates a virtual address into a physical address.

For example, a request for 16 KB of memory is satisfied by allocating four 4-KB quanta. The operating system sets aside the four physical blocks for the application and configures the MMU to ensure that the application can reference them through a 16-KB contiguous virtual address. However, these blocks might not be physically contiguous; the operating system can arrange the MMU configuration (the virtual to physical mapping) so that non-contiguous physical addresses are accessed through contiguous virtual addresses.

The task of defragmentation consists of changing existing memory allocations and mappings to use different underlying physical pages. By swapping around the underlying physical quanta, the OS can consolidate the fragmented free blocks into contiguous runs. However, it's careful to avoid moving certain types of memory where the virtual-to-physical mapping can't safely be changed:

Memory allocated by the kernel and addressed through the one-to-one mapping area can't be moved, because the one-to-one mapping area defines the mapping of virtual to physical addresses, and the OS can't change the physical address without also changing the virtual address.
Memory that's locked by the application (see mlock() and mlockall()) can't be moved: by locking the memory, the application is indicating that moving the memory isn't acceptable.
An application that runs with I/O privileges (see the _NTO_TCTL_IO flag for ThreadCtl()) has all pages locked by default, because device drivers often require physical addresses.
Pages of memory that have mutex objects on them aren't currently moved. While it's possible to move these pages, mutex objects are registered with the kernel through their physical addresses, so moving a page with a mutex on it would require rehashing the mutex object in the kernel.

There are other times when memory can't be moved; see "Automatically marking memory as unmovable," below.

Defragmentation is done, if necessary, when an application allocates a piece of contiguous memory. The application does this through the mmap() call, providing MAP_PHYS | MAP_ANON flags. If it isn't possible to satisfy a MAP_PHYS allocation with contiguous memory, what happens depends on whether defragmentation is disabled or enabled:

If it's disabled, mmap() fails.
If it's enabled, the memory manager runs a memory-defragmentation algorithm that attempts to rearrange memory mappings across the system in order to allow the MAP_PHYS allocation to be satisfied.

Note: During the memory defragmentation, the thread calling mmap() is blocked. Compaction can take a significant amount of time (particularly on systems with large amounts of memory), but other system activities are mostly unaffected.

Since other system tasks are running simultaneously, the defragmentation algorithm takes into account that memory mappings can change while the algorithm is running.

Defragmenting is enabled by default. You can disable it by using the procnto command-line option -m~d, and enable it by using the -md option.