Handling mmap() messages

Updated: April 19, 2023

Most of the time, you won't need to provide a handler for _IO_MMAP messages because the default handling is sufficient for most resource managers.

When a process calls mmap(), the function sends a _MEM_MAP message to the OS's memory manager, which then sends an _IO_MMAP message to the appropriate resource manager (most likely for a filesystem). The resource manager replies with the information that the memory manager needs, and then the memory manager replies to the original process.

The default handler for _IO_MMAP messages is iofunc_mmap_default(). In QNX Neutrino 7.1 and later, there's an exended version, iofunc_mmap_default_ext()

You might provide your own handler in various cases:

Describing the physical layout of a shared memory object

The default implementation of the _IO_MMAP handler replies with a very strict set of information, including protection bits, the offset within the file, the connection ID of the file from the memory manager, and a file descriptor (currently unused):

struct _io_mmap_reply {
    uint32_t                    zero;
    uint32_t                    allowed_prot;
    uint64_t                    offset;
    int32_t                     coid;
    int32_t                     fd;
};

typedef union {
    struct _io_mmap             i;
    struct _io_mmap_reply       o;
} io_mmap_t;

QNX Neutrino 7.1 or later provides an extended reply structure that provides additional information to help with:

The _io_mmap_reply_ext_stat structure is defined as follows:

union _io_mmap_reply_ext_stat {
        struct stat stat;
        struct __stat_t32_2001  t32_2001;
        struct __stat_t32_2008  t32_2008;
        struct __stat_t64_2008  t64_2008;
#if __PTR_BITS__ == 32
        struct __stat_t32_2001  preferred;
#else
        struct __stat_t64_2008  preferred;
#endif
};

struct _io_mmap_reply_ext {
        struct _io_mmap_reply           base;
        struct {
                _Uint32t                                flags;
                _Uint32t                                zero32;
                _Uint64t                                zero64[4];
                union _io_mmap_reply_ext_stat stat;
        } extended;
};

typedef union {
        struct _io_mmap                 i;
        struct _io_mmap_reply_ext       o;
} io_mmap_ext_t;

The flags field lets the resource manager pass back small bits of information to the memory manager. The memory manager sets the flags field to 0 before sending the message; if the resource manager is using an extended reply, it must set _IO_MMAP_REPLY_FLAGS_USE_EXTENDED in the flags.

The zero32 and zero64 fields are for future extension and must be set to zero. There stat field can store all different formats of the stat data structure. On 64-bit systems, the information should be in the TS64/2008 format, and on 32-bit systems it should be in the TS32/2001 format.

For more information about the _io_mmap_reply_ext_stat structure, see the entry for iofunc_mmap_ext() in the C Library Reference.

The message sent to resource managers has the same outgoing format, but the reply part is larger; resource managers that know about the extension can cue on the size of the reply to know if they're communicating with a version of procnto that supports extended replies. For example:

if (MsgInfo(ctp->rcvid, &info) != EOK) {
        return errno;
}

switch (info.dstmsglen) {
case sizeof(struct _io_mmap_reply_ext):
    // up-to-date procnto
    return do_some_extended_handling(...);
case sizeof(struct _io_mmap_reply):
    // old procnto
    return do_old_style_handling(...);
default:
    // err, what ?
    return EBADMSG;
}

The extended reply could be used for the following:

Cache invalidation and reuse
Without the extended reply, the memory manager relies on the mtime information for a file to decide whether or not to invalidate a possibly existing cache entry for the file. This is problematic because the mtime is under control of any user on the system that has access to the file.

Instead, a filesystem could track if a file has been modified or not (including forcibly modifying the mtime) and tell the memory manager to invalidate the cache or not via the following bits in the flags field:

  • _IO_MMAP_REPLY_FLAGS_CACHE_DEFAULT
  • _IO_MMAP_REPLY_FLAGS_CACHE_FORCE_INVALIDATE
  • _IO_MMAP_REPLY_FLAGS_CACHE_READ_FROM_CACHE
  • _IO_MMAP_REPLY_FLAGS_CACHE_DEFER_TO_MEMMGR

If the resource manager asks for the default behavior or asks the memory manager to handle this, it uses the old behavior (relying on mtime).

Avoiding extra messages to the resource manager following the _IO_MMAP message
The memory manager needs some information found in the stat information from a file, and thus calls fstat(), which is another message to the resource manager. It also needs one bit of info found by invoking fstatvfs() (another message). The resource manager can instead send that information back as part of the extended reply and use some bits in the flags field to indicate what it's returning:
  • _IO_MMAP_REPLY_FLAGS_BYPASS_FSTATVFS
  • _IO_MMAP_REPLY_FLAGS_REMOVABLE

The resource manager can use the _IO_MMAP_REPLY_FLAGS_STAT_FORM_TO_FLAGS(form) macro to indicate the type of stat structure that it provided. Specify _STAT_FORM_T32_2001, _STAT_FORM_T32_2008, _STAT_FORM_T64_2008, or _STAT_FORM_PREFERRED for form.

There's also an _IO_MMAP_REPLY_FLAGS_STAT_FORM(flags) macro for extracting a _STAT_FORM_* value from the flags member.

If the resource manager doesn't reply with this info, the memory manager calls fstat() and fstatvfs() as before.

We've extended the resource manager API to include functions that handle the extended reply:

iofunc_mmap_ext()
This helper function takes two argument in addition to the ones from the regular iofunc_mmap() function: flags and stat. These let a resource manager that wants control over the reply to specify the flags and stat contents that go directly into the reply contents.
iofunc_mmap_default_ext()
This handler function has the same signature as iofunc_mmap_default() and is thus a drop-in replacement. In addition to calling iofunc_mmap_ext, it fills in the stat information if called on a device that's mounted. It will thus set the _IO_MMAP_REPLY_FLAGS_STAT flags with _STAT_FORM_PREFERRED.

For more information about these functions, see the C Library Reference.

Direct mapping of shared memory objects in resource managers

A resource manager that supports mmap() usually does so via page I/O, basically having the process manager read and write from the resource manager via read/write calls and caching/flushing the data as needed.

A resource manager that has shared memory objects that the memory manager looks after can use the extended reply to the _IO_MMAP message to provide file descriptors for those objects to its clients, potentially presenting the shared memory objects in a different way to the clients.

The use cases for this include the following:

Note: This approach of sharing the shared memory object with clients doesn't use shm_create_handle(), so it doesn't take part in the policy enacted by the memory manager. The writer of a resource manager using this approach thus has to implement a policy for tracking which clients have mapped the object, which ones had it revoked, and so on.

For example, let's suppose we have a resource manager for somedev that has 1 MB physical memory at 0x1000000, and 8 KB of registers at 0x2000000, both split into two virtual devices managed by the resource manager.

What the resource manager would like to do is the following:

To achieve this, the resource manager would:

  1. Create two shared memory objects:
    struct somedev_shm { int mem; int reg; };
    struct somedev_shm shm;
    shm.mem = shm_open(SHM_ANON, O_CREAT|O_RDWR, 0600);
    shm.reg = shm_open(SHM_ANON, O_CREAT|O_RDWR, 0600);
      
  2. Back the objects with the two different physical ranges:
    #define KB(x) ((x)   << 10)
    #define MB(x) (KB(x) << 10)
    shm_ctl(shm.mem, SHMCTL_PHYS, 0x1000000, MB(1));
    shm_ctl(shm.reg, SHMCTL_PHYS, 0x2000000, KB(8));
      
  3. Present these objects to the clients upon mmap() requests, via the resource manager's handler for _IO_MMAP messages:
    If the request is for: Return:
    /dev/somedev/1/reg shm.reg, offset 0
    /dev/somedev/2/reg shm.reg, offset 4 KB
    /dev/somedev/1/mem shm.mem, offset 0
    /dev/somedev/2/mem shm.mem, offset 500 KB

In order to do this, the resource manager must use the extended _IO_MMAP reply data structure (_io_mmap_reply_ext) and:

The memory manager then finds the object associated with the file descriptor and associates it with the mmap() request.

The resource manager is also given, in the requested_len member of the io_mmap_t structure, the length that was passed to mmap(), which it can use to make decisions based on the requested mapping size, if it needs to.