Handling mmap() messages

QNX SDP8.0Writing a Resource ManagerDeveloper

Most of the time, you won't need to provide a handler for _IO_MMAP messages because the default handling is sufficient for most resource managers.

When a process calls mmap(), the function sends a _MEM_MAP message to the OS's memory manager, which then sends an _IO_MMAP message to the appropriate resource manager (most likely for a filesystem). The resource manager replies with the information that the memory manager needs, and then the memory manager replies to the original process.

The default handler for _IO_MMAP messages is iofunc_mmap_default(). There's also an extended version, iofunc_mmap_default_ext().

You might provide your own handler in various cases:

  • If your resource doesn't support memory mapping, you could write a handler that simply returns ENOSYS via _RESMGR_STATUS().
  • If your resource manager can provide stat information for the memory manager, so that it won't have to send an additional message to your resource manager to get it, use iofunc_mmap_default_ext() as the handler, as described below.
  • If you want to provide more security for the shared memory object. Some methods of doing this aren't specific to resource managers (see Secure Buffer Management in the Shared Memory chapter of the QNX OS Programmer's Guide), but you can use the _IO_MMAP handler to provide additional security.

Describing the physical layout of a shared memory object

The default _IO_MMAP handler replies with a strict set of information, including protection bits, the offset within the file, the connection ID of the file from the memory manager, and a file descriptor (currently unused):
struct _io_mmap_reply {
    uint32_t        zero;
    uint32_t        allowed_prot;
    uint64_t        offset;
    int32_t         coid;
    int32_t         fd;
};

typedef union {
    struct _io_mmap         i;
    struct _io_mmap_reply   o;
} io_mmap_t;

The QNX OS provides an extended reply structure that stores information to help with:

  • management of the memory manager cache that maps the file
  • avoiding multiple messages to the resource manager on a single _IO_MMAP transaction by including stat/fstatvfs information
  • allowing the resource manager to specify an open object to map instead of relying on read/write functions
The _io_mmap_reply_ext_stat structure is defined as follows:
union _io_mmap_reply_ext_stat {
    struct stat stat;
    struct __stat_t32_2001  t32_2001;
    struct __stat_t32_2008  t32_2008;
    struct __stat_t64_2008  t64_2008;
#if __PTR_BITS__ == 32
    struct __stat_t32_2001  preferred;
#else
    struct __stat_t64_2008  preferred;
#endif
};

struct _io_mmap_reply_ext {
    struct _io_mmap_reply   base;
    struct {
        _Uint32t        flags;
        _Uint32t        zero32;
        _Uint64t        zero64[4];
        union _io_mmap_reply_ext_stat stat;
    } extended;
};

typedef union {
    struct _io_mmap             i;
    struct _io_mmap_reply_ext   o;
} io_mmap_ext_t;

The flags field lets the resource manager pass back small bits of information to the memory manager. The memory manager sets the flags field to 0 before sending the message; if the resource manager is using an extended reply, it must set _IO_MMAP_REPLY_FLAGS_USE_EXTENDED in the flags.

The zero32 and zero64 fields are for future extension and must be set to zero. The stat field can store all different formats of the stat data structure. The information should be in the TS64/2008 format.

For more information about the _io_mmap_reply_ext_stat structure, see the entry for iofunc_mmap_ext() in the C Library Reference.

The message sent to resource managers has the same outgoing format, but the reply part is larger; resource managers that know about the extension can examine the size of the reply to know if they're communicating with a version of procnto that supports extended replies. For example:
if (MsgInfo(ctp->rcvid, &info) != EOK) {
    return errno;
}

switch (info.dstmsglen) {
case sizeof(struct _io_mmap_reply_ext):
    // up-to-date procnto
    return do_some_extended_handling(...);
case sizeof(struct _io_mmap_reply):
    // old procnto
    return do_old_style_handling(...);
default:
    // err, what ?
    return EBADMSG;
}

The extended reply could be used for the following:

Cache invalidation and reuse
Without the extended reply, the memory manager relies on the mtime information for a file to decide whether to invalidate a possibly existing cache entry for the file. This is problematic because the mtime is under control of any user on the system that has access to the file.

Instead, a filesystem could track if a file has been modified or not (including forcibly modifying the mtime) and tell the memory manager to invalidate the cache or not via the following bits in the flags field:

  • _IO_MMAP_REPLY_FLAGS_CACHE_DEFAULT
  • _IO_MMAP_REPLY_FLAGS_CACHE_FORCE_INVALIDATE
  • _IO_MMAP_REPLY_FLAGS_CACHE_READ_FROM_CACHE
  • _IO_MMAP_REPLY_FLAGS_CACHE_DEFER_TO_MEMMGR

If the resource manager asks for the default behavior or asks the memory manager to handle this, it uses the old behavior (relying on mtime).

Avoiding extra messages to the resource manager following the _IO_MMAP message
The memory manager needs some information found in the stat information from a file, and thus calls fstat(), which is another message to the resource manager. It also needs one bit of information found by invoking fstatvfs() (another message). The resource manager can instead send that information back as part of the extended reply and use some bits in the flags field to indicate what it's returning:
  • _IO_MMAP_REPLY_FLAGS_BYPASS_FSTATVFS
  • _IO_MMAP_REPLY_FLAGS_REMOVABLE

The resource manager can use the _IO_MMAP_REPLY_FLAGS_STAT_FORM_TO_FLAGS(form) macro to indicate the type of stat structure that it provided. Specify _STAT_FORM_T64_2008 or _STAT_FORM_PREFERRED for form.

There's also an _IO_MMAP_REPLY_FLAGS_STAT_FORM(flags) macro for extracting a _STAT_FORM_* value from the flags member.

If the resource manager doesn't reply with this information, the memory manager calls fstat() and fstatvfs() as before.

We've extended the resource manager API to include functions that handle the extended reply:

iofunc_mmap_ext()
This helper function takes two arguments in addition to the ones from the regular iofunc_mmap() function: flags and stat. These allow a resource manager that wants control over the reply to specify the flags and stat contents that go directly into the reply contents.
iofunc_mmap_default_ext()
This handler function has the same signature as iofunc_mmap_default() and is thus a drop-in replacement. In addition to calling iofunc_mmap_ext(), it fills in the stat information if called on a device that's mounted. It will thus set the _IO_MMAP_REPLY_FLAGS_STAT flags with _STAT_FORM_PREFERRED.

For more information about these functions, see the C Library Reference.

Direct mapping of shared memory objects in resource managers

A resource manager that supports mmap() usually does so via page I/O, basically having the process manager read and write from the resource manager via read/write calls and caching/flushing the data as needed.

A resource manager that has shared memory objects that the memory manager looks after can use the extended reply to the _IO_MMAP message to provide file descriptors for those objects to its clients, potentially presenting the shared memory objects in a different way to the clients.

The use cases for this include the following:

  • Providing access to hardware without the client having to directly map it. For example, a resource manager could be a driver managing some portion of physical memory and a window into some registers (another physical memory range).

    The resource manager is the only one to know about the physical addresses it manages, but it presents them to its client via files of a well-known format. The clients can map and get access to these files without knowing the physical addresses.

  • Allowing a resource manager to expose a memory allocation interface. This would be done by allocating a shared memory object, backing it with some type of memory, then having the resource manager manage that memory and hand out pieces to clients. An example of this could be buffers for graphics.
Note:
This approach of sharing the shared memory object with clients doesn't use shm_create_handle(), so it doesn't take part in the policy enacted by the memory manager. The writer of a resource manager using this approach thus has to implement a policy for tracking which clients have mapped the object, which ones had it revoked, and so on.

For example, let's suppose we have a resource manager for somedev that has 1 MB physical memory at 0x1000000, and 8 KB of registers at 0x2000000, both split into two virtual devices managed by the resource manager.

What the resource manager would like to do is the following:

  • Create a reference to the first virtual device at /dev/somedev/1, and the second at /dev/somedev/2.
  • Present the memory as 500 KB regions under /dev/somedev/1/mem and /dev/somedev/2/mem.
  • Present the registers as 4 KB pages under /dev/somedev/1/reg and /dev/somedev/2/reg.
  • Let clients map these, but keep some control on who has mapped what.

To achieve this, the resource manager would:

  1. Create two shared memory objects:
    struct somedev_shm { int mem; int reg; };
    struct somedev_shm shm;
    shm.mem = shm_open(SHM_ANON, O_CREAT|O_RDWR, 0600);
    shm.reg = shm_open(SHM_ANON, O_CREAT|O_RDWR, 0600);
    
  2. Back the objects with the two different physical ranges:
    #define KB(x) ((x)   << 10)
    #define MB(x) (KB(x) << 10)
    shm_ctl(shm.mem, SHMCTL_PHYS, 0x1000000, MB(1));
    shm_ctl(shm.reg, SHMCTL_PHYS, 0x2000000, KB(8));
    
  3. Present these objects to the clients upon mmap() requests, via the resource manager's handler for _IO_MMAP messages:
    If the request is for: Return:
    /dev/somedev/1/reg shm.reg, offset 0
    /dev/somedev/2/reg shm.reg, offset 4 KB
    /dev/somedev/1/mem shm.mem, offset 0
    /dev/somedev/2/mem shm.mem, offset 500 KB

In order to do this, the resource manager must use the extended _IO_MMAP reply data structure (_io_mmap_reply_ext) and:

  • set the _IO_MMAP_REPLY_FLAGS_SERVER_SHMEM_OBJECT bit in the structure's extended.flags
  • set base.fd to the file descriptor of the resource manager's open shared memory object
  • set base.offset to an offset within the shared memory object, possibly based on the offset requested by the client, depending on the service provided by the resource manager

The memory manager then finds the object associated with the file descriptor and associates it with the mmap() request.

The resource manager is also given, in the requested_len member of the io_mmap_t structure, the length that was passed to mmap(), which it can use to make decisions based on the requested mapping size, if it needs to.

Page updated: