Resource Managers

This chapter includes:

What is a resource manager?
The client's view
The resource manager's view
The resource manager library
Writing a resource manager
Handler routines
Alphabetical listing of connect and I/O functions
Examples
Advanced topics
Summary

What is a resource manager?

In this chapter, we'll take a look at what you need to understand in order to write a resource manager.

A resource manager is simply a program with some well-defined characteristics. This program is called different things on different operating systems — some call them “device drivers,” “I/O managers,” “filesystems,” “drivers,” “devices,” and so on. In all cases, however, the goal of this program (which we'll just call a resource manager) is to present an abstract view of some service.

Also, since Neutrino is a POSIX-conforming operating system, it turns out that the abstraction is based on the POSIX specification.

Examples of resource managers

Before we get carried away, let's take a look at a couple of examples and see how they “abstract” some “service.” We'll look at an actual piece of hardware (a serial port) and something much more abstract (a filesystem).

Serial port

On a typical system, there usually exists some way for a program to transmit output and receive input from a serial, RS-232-style hardware interface. This hardware interface consists of a bunch of hardware devices, including a UART (Universal Asynchronous Receiver Transmitter) chip which knows how to convert the CPU's parallel data stream into a serial data stream and vice versa.

In this case, the “service” being provided by the serial resource manager is the capability for a program to send and receive characters on a serial port.

We say that an “abstraction” occurs, because the client program (the one ultimately using the service) doesn't know (nor does it care about) the details of the UART chip and its implementation. All the client program knows is that to send some characters it should call the fprintf() function, and to receive some characters it should call the fgets() function. Notice that we used standard, POSIX function calls to interact with the serial port.

Filesystem

As another example of a resource manager, let's examine the filesystem. This consists of a number of cooperating modules: the filesystem itself, the block I/O driver, and the disk driver.

The “service” being offered here is the capability for a program to read and write characters on some medium. The “abstraction” that occurs is the same as with the serial port example above — the client program can still use the exact same function calls (e.g., the fprintf() and fgets() functions) to interact with a storage medium instead of a serial port. In fact, the client really doesn't know or need to know which resource manager it's interacting with.

Characteristics of resource managers

As we saw in our examples (above), the key to the flexibility of the resource managers is that all the functionality of the resource manager is accessed using standard POSIX function calls — we didn't use “special” functions when talking to the serial port. But what if you need to do something “special,” something very device-specific? For example, setting the baud rate on a serial port is an operation that's very specific to the serial port resource manager — it's totally meaningless to the filesystem resource manager. Likewise, setting the file position via lseek() is useful in a filesystem, but meaningless in a serial port. The solution POSIX chose for this is simple. Some functions, like lseek(), simply return an error code on a device that doesn't support them. Then there's the “catch-all” (and non-POSIX) device control function, called devctl(), that allows device-specific functionality to be provided. Devices that don't understand the particular devctl() command simply return an error, just as devices that don't understand the lseek() command would.

Since we've mentioned lseek() and devctl() as two common commands, it's worthwhile to note that pretty much all file-descriptor (or FILE * stream) function calls are supported by resource managers.

This naturally leads us to the conclusion that resource managers will be dealing almost exclusively with file-descriptor based function calls. Since Neutrino is a message-passing operating system, it follows that the POSIX functions get translated into messages, which are then sent to resource managers. It is this “POSIX-function to message-passing” translation trick that lets us decouple clients from resource managers. All a resource manager has to do is handle certain well-defined messages. All a client has to do is generate the same well-defined messages that the resource manager is expecting to receive and handle.

Since the interaction between clients and resource managers is based on message passing, it makes sense to make this “translation layer” as thin as possible. For example, when a client does an open() and gets back a file descriptor, the file descriptor is in fact the connection ID! This connection ID (file descriptor) gets used in the client's C library functions (such as read()) where a message is created and sent to the resource manager.

The client's view

We've already seen a hint of what the client expects. It expects a file-descriptor-based interface, using standard POSIX functions.

In reality, though, there are a few more things going on “under the hood.”

For example, how does the client actually connect to the appropriate resource manager? What happens in the case of union filesystems (where multiple filesystems are responsible for the same “namespace”)? How are directories handled?

Finding the server

The first thing that a client does is call open() to get a file descriptor. (Note that if the client calls the higher-level function fopen() instead, the same discussion applies — fopen() eventually calls open()).

Inside the C library implementation of open(), a message is constructed, and sent to the process manager (procnto) component. The process manager is responsible for maintaining information about the pathname space. This information consists of a tree structure that contains pathnames and node descriptor, process ID, channel ID, and handle associations:

Process Manager maintained tree structure.

Neutrino's namespace.

Note that in the diagram above and in the descriptions that follow, I've used the designation fs-qnx4 as the name of the resource manager that implements the QNX 4 filesystem — in reality, it's a bit more complicated, because the filesystem drivers are based on a series of DLLs that get bundled together. So, there's actually no executable called fs-qnx4; we're just using it as a placeholder for the filesystem component.

Let's say that the client calls open():

fd = open ("/dev/ser1", O_WRONLY);

In the client's C library implementation of open(), a message is constructed and sent to the process manager. This message states, “I want to open /dev/ser1; who should I talk to?”

First stage of name resolution

First stage of name resolution.

The process manager receives the request and looks through its tree structure to see if there's a match (let's assume for now that we need an exact match). Sure enough, the pathname “/dev/ser1” matches the request, and the process manager is able to reply to the client: “I found /dev/ser1. It's being handled by node descriptor 0, process ID 44, channel ID 1, handle 1. Send them your request!”

Remember, we're still in the client's open() code!

So, the open() function creates another message, and a connection to the specified node descriptor (0, meaning our node), process ID (44), channel ID (1), stuffing the handle into the message itself. This message is really the “connect” message — it's the message that the client's open() library uses to establish a connection to a resource manager (step 3 in the picture below). When the resource manager gets the connect message, it looks at it and performs validation. For example, you may have tried to open-for-write a resource manager that implements a read-only filesystem, in which case you'd get back an error (in this case, EROFS). In our example, however, the serial port resource manager looks at the request (we specified O_WRONLY; perfectly legal for a serial port) and replies back with an EOK (step 4 in the picture below).

The _IO_CONNECT message

The _IO_CONNECT message.

Finally, the client's open() returns to the client with a valid file descriptor.

Really, this file descriptor is the connection ID we just used to send a connect message to the resource manager! Had the resource manager not given us an EOK, we would have passed this error back to the client (via errno and a -1 return from open()). (It's worthwhile to note that the process manager can return the node ID, process ID and channel ID of more than one resource manager in response to a name resolution request. In that case, the client will try each of them in turn until one succeeds, returns an error that's not ENOSYS, ENOENT, or EROFS, or the client exhausts the list, in which case the open() fails. We'll discuss this further when we look at the “before” and “after” flags, later on.)

Finding the process manager

Now that we understand the basic steps used to find a particular resource manager, we need to solve the mystery of, “How did we find the process manager to begin with?” Actually, this one's easy. By definition, the process manager has a node descriptor of 0 (meaning this node), a process ID of 1, and a channel ID of 1. So, the ND/PID/CHID triplet 0/1/1 always identifies the process manager.

Handling directories

The example we used above was that of a serial port resource manager. We also stated an assumption: “let's assume for now that we need an exact match.” The assumption is only half-true — all the pathname matching we'll be talking about in this chapter has to completely match a component of the pathname, but may not have to match the entire pathname. We'll clear this up shortly.

Suppose I had code that does this:

fp = fopen ("/etc/passwd", "r");

Recall that fopen() eventually calls open(), so we have open() asking about the pathname /etc/passwd. But there isn't one in the diagram:

Process Manager maintained tree structure.

Neutrino's namespace.

We do notice, however, that fs-qnx4 has registered its association of ND/PID/CHID at the pathname “/.” Although it's not shown on the diagram, fs-qnx4 registered itself as a directory resource manager — it told the process manager that it'll be responsible for “/” and below. This is something that the other, “device” resource managers (e.g., the serial port resource manager) didn't do. By setting the “directory” flag, fs-qnx4 is able to handle the request for “/etc/passwd” because the first part of the request is “/” — a matching component!

What if we tried to do the following?

fd = open ("/dev/ser1/9600.8.1.n", O_WRONLY);

Well, since the serial port resource manager doesn't have the directory flag set, the process manager will look at it and say “Nope, sorry, the pathname /dev/ser1 is not a directory. I'm going to have to fail this request.” The request fails right then and there — the process manager doesn't even return a ND/PID/CHID/handle that the open() function should try.

Obviously, as hinted at in my choice of parameters for the open() call above, it may be a clever idea to allow some “traditional” drivers to be opened with additional parameters past the “usual” name. However, the rule of thumb here is, “If you can get away with it in a design review meeting, knock yourself out.” Some of my students, upon hearing me say that, pipe up with “But I am the design review committee!” To which I usually reply, “You are given a gun. Shoot yourself in the foot. :-)”

Unioned filesystems

Take a closer look at the diagram we've been using:

Process Manager maintained tree structure

Neutrino's namespace.

Notice how both fs-qnx4 and the process manager have registered themselves as being responsible for “/”? This is fine, and nothing to worry about. In fact, there are times when it's a very good idea. Let's consider one such case.

Suppose you have a very slow network connection and you've mounted a networked filesystem over it. You notice that you often use certain files and wish that they were somehow magically “cached” on your system, but alas, the designers of the network filesystem didn't provide a way for you to do that. So, you write yourself a pass-through filesystem (called fs-cache) that sits on top of the network filesystem. Here's how it looks from the client's point of view:

Overlaid filesystems.

Overlaid filesystems.

Both fs-nfs (the network filesystem) and your caching filesystem (fs-cache) have registered themselves for the same prefix, namely “/nfs.” As we mentioned above, this is fine, normal, and legal under Neutrino.

Let's say that the system just started up and your caching filesystem doesn't have anything in it yet. A client program tries to open a file, let's say /nfs/home/rk/abc.txt. Your caching filesystem is “in front of” the network filesystem (I'll show you how to do that later, when we discuss resource manager implementation).

At this point, the client's open() code does the usual steps:

Message to the process manager: “Whom should I talk to about the filename /nfs/home/rk/abc.txt?”
Response from the process manager: “Talk to fs-cache first, and then fs-nfs.”

Notice here that the process manager returned two sets of ND/PID/CHID/handle; one for fs-cache and one for fs-nfs. This is critical.

Now, the client's open() continues:

Message to fs-cache: “I'd like to open the file /nfs/home/rk/abc.txt for read, please.”
Response from fs-cache: “Sorry, I've never heard of this file.”

At this point, the client's open() function is out of luck as far as the fs-cache resource manager is concerned. The file doesn't exist! However, the open() function knows that it got a list of two ND/PID/CHID/handle tuples, so it tries the second one next:

Message to fs-nfs: “I'd like to open the file /nfs/home/rk/abc.txt for read, please.”
Response from fs-nfs: “Sure, no problem!”

Now that the open() function has an EOK (the “no problem”), it returns the file descriptor. The client then performs all further interactions with the fs-nfs resource manager.

The only time that we “resolve” to a resource manager is during the open() call. This means that once we've successfully opened a particular resource manager, we will continue to use that resource manager for all file descriptor calls.

So how does our fs-cache caching filesystem come into play? Well, eventually, let's say that the user is done reading the file (they've loaded it into a text editor). Now they want to write it out. The same set of steps happen, with an interesting twist:

Message to the process manager: “Whom should I talk to about the filename /nfs/home/rk/abc.txt?”
Response from the process manager: “Talk to fs-cache first, and then fs-nfs.”
Message to fs-cache: “I'd like to open the file /nfs/home/rk/abc.txt for write, please.”
Response from fs-cache: “Sure, no problem.”

Notice that this time, in step 3, we opened the file for write and not read as we did previously. It's not surprising, therefore, that fs-cache allowed the operation this time (in step 4).

Even more interesting, observe what happens the next time we go to read the file:

Message to the process manager: “Whom should I talk to about the filename /nfs/home/rk/abc.txt?”
Response from the process manager: “Talk to fs-cache first, and then fs-nfs.”
Message to fs-cache: “I'd like to open the file /nfs/home/rk/abc.txt for read, please.”
Response from fs-cache: “Sure, no problem.”

Sure enough, the caching filesystem handled the request for the read this time (in step 4)!

Now, we've left out a few details, but these aren't important to getting across the basic ideas. Obviously, the caching filesystem will need some way of sending the data across the network to the “real” storage medium. It should also have some way of verifying that no one else modified the file just before it returns the file contents to the client (so that the client doesn't get stale data). The caching filesystem could handle the first read request itself, by loading the data from the network filesystem on the first read into its cache. And so on.

Running more than one pass-through filesystem or resource manager on overlapping pathname spaces might cause deadlocks.

UFS versus UMP

A slight terminology digression is in order. The primary difference between a Unioned File System (UFS) and a Unioned Mount Point (UMP) is that the UFS is based on a per-file organization, while the UMP is based on a per-mountpoint organization.

In the above cache filesystem example, we showed a UFS, because no matter how deep the file was in the tree structure, either resource manager was able to service it. In our example, consider another resource manager (let's call it “foobar”) taking over “/nfs/other.” In a UFS system, the fs-cache process would be able to cache files from that as well, just by attaching to “/nfs.” In a UMP implementation, which is the default in Neutrino since it does longest prefix match, only the foobar resource manager would get the open requests.

Client summary

We're done with the client side of things. The following are key points to remember:

The client usually triggers communication with the resource manager via open() (or fopen()).
Once the client's request has “resolved” to a particular resource manager, we never change resource managers.
All further messages for the client's session are based on the file descriptor (or FILE * stream), (e.g., read(), lseek(), fgets()).
The session is terminated (or “dissociated”) when the client closes the file descriptor or stream (or terminates for any reason).
All client file-descriptor-based function calls are translated into messages.

The resource manager's view

Let's look at things from the resource manager's perspective. Basically, the resource manager needs to tell the process manager that it'll be responsible for a certain part of the pathname space (it needs to register itself). Then, the resource manager needs to receive messages from clients and handle them. Obviously, things aren't quite that simple.

Let's take a quick overview look at the functions that the resource manager provides, and then we'll look at the details.

Registering a pathname

The resource manager needs to tell the process manager that one or more pathnames are now under its domain of authority — effectively, that this particular resource manager is prepared to handle client requests for those pathnames.

The serial port resource manager might handle (let's say) four serial ports. In this case, it would register four different pathnames with the process manager: /dev/ser1, /dev/ser2, /dev/ser3, and /dev/ser4. The impact of this is that there are now four distinct entries in the process manager's pathname tree, one for each of the serial ports. Four entries isn't too bad. But what if the serial port resource manager handled one of those fancy multiport cards, with 256 ports on it? Registering 256 individual pathnames (i.e., /dev/ser1 through /dev/ser256) would result in 256 different entries in the process manager's pathname tree! The process manager isn't optimized for searching this tree; it assumes that there will be a few entries in the tree, not hundreds.

As a rule, you shouldn't discretely register more than a few dozen pathnames at each level — this is because a linear search is performed. The 256 port registration is certainly beyond that. In that case, what the multiport serial resource manager should do is register a directory-style pathname, for example /dev/multiport. This occupies only one entry in the process manager's pathname tree. When a client opens a serial port, let's say port 57:

fp = fopen ("/dev/multiport/57", "w");

The process manager resolves this to the ND/PID/CHID/handle for the multiport serial resource manager; it's up to that resource manager to decide if the rest of the pathname (in our case, the “57”) is valid. In this example, assuming that the variable path contains the rest of the pathname past the mountpoint, this means that the resource manager could do checking in a very simple manner:

devnum = atoi (path);
if ((devnum <= 0) || (devnum >= 256)) {
    // bad device number specified
} else {
    // good device number specified
}

This search would certainly be faster than anything the process manager could do, because the process manager must, by design, be much more general-purpose than our resource manager.

Handling messages

Once we've registered one or more pathnames, we should then be prepared to receive messages from clients. This is done in the “usual” way, with the MsgReceive() function call. There are fewer than 30 well-defined message types that the resource manager handles. To simplify the discussion and implementation, however, they're broken into two groups:

Connect messages: Always contain a pathname; these are either one-shot messages or they establish a context for further I/O messages.
I/O messages: Always based on a connect message; these perform further work.

Connect messages

Connect messages always contain a pathname. The open() function that we've been using throughout our discussion is a perfect example of a function that generates a connect message. In this case, the handler for the connect message establishes a context for further I/O messages. (After all, we expect to be performing things like read() after we've done an open()).

An example of a “one-shot” connect message is the message generated as a result of the rename() function call. No further “context” is established — the handler in the resource manager is expected to change the name of the specified file to the new name, and that's it.

I/O messages

An I/O message is expected only after a connect message and refers to the context created by that connect message. As mentioned above in the connect message discussion, open() followed by read() is a perfect example of this.

Three groups, really

Apart from connect and I/O messages, there are also “other” messages that can be received (and handled) by a resource manager. Since they aren't “resource manager” messages proper, we'll defer discussion of them until later.

The resource manager library

Before we get too far into all the issues surrounding resource managers, we have to get acquainted with QSS's resource manager library. Note that this “library” actually consists of several distinct pieces:

thread pool functions (which we discussed in the Processes and Threads chapter under “Pools of threads”)
dispatch interface
resource manager functions
POSIX library helper functions

While you certainly could write resource managers “from scratch” (as was done in the QNX 4 world), that's far more hassle than it's worth.

Just to show you the utility of the library approach, here's the source for a single-threaded version of “/dev/null”:

/*
 *  resmgr1.c
 *
 *  /dev/null using the resource manager library
*/

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <sys/iofunc.h>
#include <sys/dispatch.h>

int
main (int argc, char **argv)
{
    dispatch_t              *dpp;
    resmgr_attr_t           resmgr_attr;
    dispatch_context_t      *ctp;
    resmgr_connect_funcs_t  connect_func;
    resmgr_io_funcs_t       io_func;
    iofunc_attr_t           attr;

    // create the dispatch structure
    if ((dpp = dispatch_create ()) == NULL) {
        perror ("Unable to dispatch_create");
        exit (EXIT_FAILURE);
    }

    // initialize the various data structures
    memset (&resmgr_attr, 0, sizeof (resmgr_attr));
    resmgr_attr.nparts_max = 1;
    resmgr_attr.msg_max_size = 2048;

    // bind default functions into the outcall tables
    iofunc_func_init (_RESMGR_CONNECT_NFUNCS, &connect_func,
                      _RESMGR_IO_NFUNCS, &io_func);
    iofunc_attr_init (&attr, S_IFNAM | 0666, 0, 0);

    // establish a name in the pathname space
    if (resmgr_attach (dpp, &resmgr_attr, "/dev/mynull",
                       _FTYPE_ANY, 0, &connect_func, &io_func,
                       &attr) == -1) {
        perror ("Unable to resmgr_attach");
        exit (EXIT_FAILURE);
    }

    ctp = dispatch_context_alloc (dpp);

    // wait here forever, handling messages
    while (1) {
        if ((ctp = dispatch_block (ctp)) == NULL) {
            perror ("Unable to dispatch_block");
            exit (EXIT_FAILURE);
        }
        dispatch_handler (ctp);
    }
}

There you have it! A complete /dev/null resource manager implemented in a few function calls!

If you were to write this from scratch, and have it support all the functionality that this one does (e.g., stat() works, chown() and chmod() work, and so on), you'd be looking at many hundreds if not thousands of lines of C code.

The library really does what we just talked about

By way of introduction to the library, let's see (briefly) what the calls do in the /dev/null resource manager.

dispatch_create()

Creates a dispatch structure; this will be used for blocking on the message reception.

iofunc_attr_init()

Initializes the attributes structure used by the device. We'll discuss attributes structures in more depth later, but for now, the short story is that there's one of these per device name, and they contain information about a particular device.

iofunc_func_init()

Initializes the two data structures cfuncs and ifuncs, which contain pointers to the connect and I/O functions, respectively. You might argue that this call has the most “magic” in it, as this is where the actual “worker” routines for handling all the messages got bound into a data structure. We didn't actually see any code to handle the connect message, or the I/O messages resulting from a client read() or stat() function etc. That's because the library is supplying default POSIX versions of those functions for us, and it's the iofunc_func_init() function that binds those same default handler functions into the two supplied tables.

resmgr_attach()

Creates the channel that the resource manager will use for receiving messages, and talks to the process manager to tell it that we're going to be responsible for “/dev/null.” While there are a lot of parameters, we'll see them all in painful detail later. For now, it's important to note that this is where the dispatch handle (dpp), pathname (the string /dev/null), and the connect (cfuncs) and I/O (ifuncs) message handlers all get bound together.

dispatch_context_alloc()

Allocates a dispatch internal context block. It contains information relevant to the message being processed.

Once you've called dispatch_context_alloc(), don't call message_attach() or resmgr_attach() specifying a different maximum message size or a different number of message parts for the same dispatch handle. (This doesn't apply to pulse_attach() or select_attach() because you can't specify the sizes with these functions.)

dispatch_block()

This is the dispatch layer's blocking call; it's where we wait for a message to arrive from a client.

dispatch_handler()

Once the message arrives from the client, this function is called to process it.

Behind the scenes at the library

You've seen that your code is responsible for providing the main message receiving loop:

while (1) {
    // wait here for a message
    if ((ctp = dispatch_block (ctp)) == NULL) {
        perror ("Unable to dispatch_block");
        exit (EXIT_FAILURE);
    }
    // handle the message
    dispatch_handler (ctp);
}

This is very convenient, for it lets you place breakpoints on the receiving function and to intercept messages (perhaps with a debugger) during operation.

The library implements the “magic” inside of the dispatch_handler() function, because that's where the message is analyzed and disposed of through the connect and I/O functions tables we mentioned earlier.

In reality, the library consists of two cooperating layers: a base layer that provides “raw” resource manager functionality, and a POSIX layer that provides POSIX helper and default functions. We'll briefly define the two layers, and then in “Resource manager structure,” below, we'll pick up the details.

The base layer

The bottom-most layer consists of functions that begin with resmgr_*() in their names. This class of function is concerned with the mechanics of making a resource manager work.

I'll just briefly mention the functions that are available and where we'd use them. I'll then refer you to QSS's documentation for additional details on these functions.

The base layer functions consist of:

resmgr_msgreadv() and resmgr_msgread(): Reads data from the client's address space using message passing.
resmgr_msgwritev() and resmgr_msgwrite(): Writes data to the client's address space using message passing.
resmgr_open_bind(): Associates the context from a connect function, so that it can be used later by an I/O function.
resmgr_attach(): Creates a channel, associates a pathname, dispatch handle, connect functions, I/O functions, and other parameters together. Sends a message to the process manager to register the pathname.
resmgr_detach(): Opposite of resmgr_attach(); dissociates the binding of the pathname and the resource manager.
pulse_attach(): Associates a pulse code with a function. Since the library implements the message receive loop, this is a convenient way of “gaining control” for handling pulses.
pulse_detach(): Dissociates a pulse code from the function.

In addition to the functions listed above, there are also numerous functions dealing with the dispatch interface.

One function from the above list that deserves special mention is resmgr_open_bind(). It associates some form of context data when the connect message (typically as a result of the client calling open() or fopen()) arrives, so that this data block is around when the I/O messages are being handled. Why didn't we see this in the /dev/null handler? Because the POSIX-layer default functions call this function for us. If we're handling all the messages ourselves, we'd certainly call this function.

The resmgr_open_bind() function not only sets up the context block for further I/O messages, but also initializes other data structures used by the resource manager library itself.

The rest of the functions from the above list are somewhat intuitive — we'll defer their discussion until we use them.

The POSIX layer

The second layer provided by QSS's resource manager library is the POSIX layer. As with the base layer, you could code a resource manager without using it, but it would be a lot of work! Before we can talk about the POSIX-layer functions in detail, we need to look at some of the base layer data structures, the messages that arrive from the clients, and the overall structure and responsibilities of a resource manager.

Writing a resource manager

Now that we've introduced the basics — how the client looks at the world, how the resource manager looks at the world, and an overview of the two cooperating layers in the library, it's time to focus on the details.

In this section, we'll take a look at the following topics:

data structures
resource manager structure
POSIX-layer data structure
handler routines
and of course, lots of examples

Keep in mind the following “big picture,” which contains almost everything related to a resource manager:

Resmgr big picture

Architecture of a resource manager — the big picture.

Data structures

The first thing we need to understand is the data structures used to control the operation of the library:

resmgr_attr_t control structure
resmgr_connect_funcs_t connect table
resmgr_io_funcs_t I/O table

And one data structure that's used internally by the library:

resmgr_context_t internal context block

Later, we'll see the OCB, attributes structure, and mount structure data types that are used with the POSIX-layer libraries.

`resmgr_attr_t` control structure

The control structure (type resmgr_attr_t) is passed to the resmgr_attach() function, which puts the resource manager's path into the general pathname space and binds requests on this path to a dispatch handle.

The control structure (from <sys/dispatch.h>) has the following contents:

typedef struct _resmgr_attr {
    unsigned flags;
    unsigned nparts_max;
    unsigned msg_max_size;
    int      (*other_func) (resmgr_context_t *ctp, void *msg);
} resmgr_attr_t;

The other_func message handler

In general, you should avoid using this member. This member, if non-NULL, represents a routine that will get called with the current message received by the resource manager library when the library doesn't recognize the message. While you could use this to implement “private” or “custom” messages, this practice is discouraged (use either the _IO_DEVCTL or _IO_MSG handlers, see below). If you wish to handle pulses that come in, I recommend that you use the pulse_attach() function instead.

You should leave this member with the value NULL.

The data structure sizing parameters

These two parameters are used to control various sizes of messaging areas.

The nparts_max parameter controls the size of the dynamically allocated iov member in the resource manager library context block (of type resmgr_context_t, see below). You'd typically adjust this member if you were returning more than a one-part IOV from some of your handling functions. Note that it has no effect on the incoming messages — this is only used on outgoing messages.

The msg_max_size parameter controls how much buffer space the resource manager library should set aside as a receive buffer for the message. The resource manager library will set this value to be at least as big as the header for the biggest message it will be receiving. This ensures that when your handler function gets called, it will be passed the entire header of the message. Note, however, that the data (if any) beyond the current header is not guaranteed to be present in the buffer, even if the msg_max_size parameter is “large enough.” An example of this is when messages are transferred over a network using Qnet. (For more details about the buffer sizes, see “The resmgr_context_t internal context block,” below.)

The flags parameter

This parameter gives additional information to the resource manager library. For our purposes, we'll just pass a 0. You can read up about the other values in the Neutrino Library Reference under the resmgr_attach() function.

`resmgr_connect_funcs_t` connect table

When the resource manager library receives a message, it looks at the type of message and sees if it can do anything with it. In the base layer, there are two tables that affect this behavior. The resmgr_connect_funcs_t table, which contains a list of connect message handlers, and the resmgr_io_funcs_t table, which contains a similar list of I/O message handlers. We'll see the I/O version below.

When it comes time to fill in the connect and I/O tables, we recommend that you use the iofunc_func_init() function to load up the tables with the POSIX-layer default handler routines. Then, if you need to override some of the functionality of particular message handlers, you'd simply assign your own handler function instead of the default routine. We'll see this in the section “Putting in your own functions.” Right now, let's look at the connect functions table itself (this is from <sys/resmgr.h>):

typedef struct _resmgr_connect_funcs {
  unsigned nfuncs;

  int (*open)
      (ctp, io_open_t *msg, handle, void *extra);
  int (*unlink)
      (ctp, io_unlink_t *msg, handle, void *reserved);
  int (*rename)
      (ctp, io_rename_t *msg, handle, io_rename_extra_t *extra);
  int (*mknod)
      (ctp, io_mknod_t *msg, handle, void *reserved);
  int (*readlink)
      (ctp, io_readlink_t *msg, handle, void *reserved);
  int (*link)
      (ctp, io_link_t *msg, handle, io_link_extra_t *extra);
  int (*unblock)
      (ctp, io_pulse_t *msg, handle, void *reserved);
  int (*mount)
      (ctp, io_mount_t *msg, handle, io_mount_extra_t *extra);

} resmgr_connect_funcs_t;

Note that I've shortened the prototype down by omitting the resmgr_context_t * type for the first member (the ctp), and the RESMGR_HANDLE_T * type for the third member (the handle). For example, the full prototype for open is really:

int (*open) (resmgr_context_t *ctp,
            io_open_t *msg,
            RESMGR_HANDLE_T *handle,
            void *extra);

The very first member of the structure (nfuncs) indicates how big the structure is (how many members it contains). In the above structure, it should contain the value “8,” for there are 8 members (open through to mount). This member is mainly in place to allow QSS to upgrade this library without any ill effects on your code. For example, suppose you had compiled in a value of 8, and then QSS upgraded the library to have 9. Because the member only had a value of 8, the library could say to itself, “Aha! The user of this library was compiled when we had only 8 functions, and now we have 9. I'll provide a useful default for the ninth function.” There's a manifest constant in <sys/resmgr.h> called _RESMGR_CONNECT_NFUNCS that has the current number. Use this constant if manually filling in the connect functions table (although it's best to use iofunc_func_init()).

Notice that the function prototypes all share a common format. The first parameter, ctp, is a pointer to a resmgr_context_t structure. This is an internal context block used by the resource manager library, and which you should treat as read-only (except for one field, which we'll come back to).

The second parameter is always a pointer to the message. Because the functions in the table are there to handle different types of messages, the prototypes match the kind of message that each function will handle.

The third parameter is a RESMGR_HANDLE_T structure called a handle — it's used to identify the device that this message was targeted at. We'll see this later as well, when we look at the attributes structure.

In order to correctly define RESMGR_HANDLE_T, #include <sys/iofunc.h> before <sys/resmgr.h>.

Finally, the last parameter is either “reserved” or an “extra” parameter for functions that need some extra data. We'll show the extra parameter as appropriate during our discussions of the handler functions.

`resmgr_io_funcs_t` I/O table

The I/O table is very similar in spirit to the connect functions table just shown above. Here it is, from <sys/resmgr.h>:

typedef struct _resmgr_io_funcs {
  unsigned nfuncs;
  int (*read)       (ctp, io_read_t *msg,     ocb);
  int (*write)      (ctp, io_write_t *msg,    ocb);
  int (*close_ocb)  (ctp, void *reserved,     ocb);
  int (*stat)       (ctp, io_stat_t *msg,     ocb);
  int (*notify)     (ctp, io_notify_t *msg,   ocb);
  int (*devctl)     (ctp, io_devctl_t *msg,   ocb);
  int (*unblock)    (ctp, io_pulse_t *msg,    ocb);
  int (*pathconf)   (ctp, io_pathconf_t *msg, ocb);
  int (*lseek)      (ctp, io_lseek_t *msg,    ocb);
  int (*chmod)      (ctp, io_chmod_t *msg,    ocb);
  int (*chown)      (ctp, io_chown_t *msg,    ocb);
  int (*utime)      (ctp, io_utime_t *msg,    ocb);
  int (*openfd)     (ctp, io_openfd_t *msg,   ocb);
  int (*fdinfo)     (ctp, io_fdinfo_t *msg,   ocb);
  int (*lock)       (ctp, io_lock_t *msg,     ocb);
  int (*space)      (ctp, io_space_t *msg,    ocb);
  int (*shutdown)   (ctp, io_shutdown_t *msg, ocb);
  int (*mmap)       (ctp, io_mmap_t *msg,     ocb);
  int (*msg)        (ctp, io_msg_t *msg,      ocb);
  int (*dup)        (ctp, io_dup_t *msg,      ocb);
  int (*close_dup)  (ctp, io_close_t *msg,    ocb);
  int (*lock_ocb)   (ctp, void *reserved,     ocb);
  int (*unlock_ocb) (ctp, void *reserved,     ocb);
  int (*sync)       (ctp, io_sync_t *msg,     ocb);
  int (*power)      (ctp, io_power_t *msg,    ocb);
} resmgr_io_funcs_t;

For this structure as well, I've shortened the prototype by removing the type of the ctp member (resmgr_context_t *) and the last member (ocb, of type RESMGR_OCB_T *). For example, the full prototype for read is really:

int (*read) (resmgr_context_t *ctp,
            io_read_t *msg,
            RESMGR_OCB_T *ocb);

In order to correctly define RESMGR_OCB_T, #include <sys/iofunc.h> before <sys/resmgr.h>.

The very first member of the structure (nfuncs) indicates how big the structure is (how many members it contains). The proper manifest constant for initialization is _RESMGR_IO_NFUNCS.

Note that the parameter list in the I/O table is also very regular. The first parameter is the ctp, and the second parameter is the msg, just as they were in the connect table handlers.

The third parameter is different, however. It's an ocb, which stands for “Open Context Block.” It holds the context that was bound by the connect message handler (e.g., as a result of the client's open() call), and is available to the I/O functions.

As discussed above, when it comes time to fill in the two tables, we recommend that you use the iofunc_func_init() function to load up the tables with the POSIX-layer default handler routines. Then, if you need to override some of the functionality of particular message handlers, you'd simply assign your own handler function instead of the POSIX default routine. We'll see this in the section “Putting in your own functions.”

The `resmgr_context_t` internal context block

Finally, one data structure is used by the lowest layer of the library to keep track of information that it needs to know about. You should view the contents of this data structure as “read-only,” (except for the iov member).

Here's the data structure (from <sys/resmgr.h>):

typedef struct _resmgr_context {
  int                 rcvid;
  struct _msg_info    info;
  resmgr_iomsgs_t     *msg;
  dispatch_t          *dpp;
  int                 id;
  unsigned            msg_max_size;
  int                 status;
  int                 offset;
  int                 size;
  iov_t               iov [1];
} resmgr_context_t;

As with the other data structure examples, I've taken the liberty of deleting reserved fields.

Let's look at the contents:

rcvid: The receive ID from the resource manager library's MsgReceivev() function call. Indicates who you should reply to (if you're going to do the reply yourself).
info: Contains the information structure returned by MsgReceivev() in the resource manager library's receive loop. Useful for getting information about the client, including things like the node descriptor, process ID, thread ID, and so on. See the documentation for MsgReceivev() for more details.
msg: A pointer to a union of all possible message types. This isn't very useful to you, because each of your handler functions get passed the appropriate union member as their second parameter.
dpp: A pointer to the dispatch structure that you passed in to begin with. Again, not very useful to you, but obviously useful to the resource manager library.
id: The identifier for the mountpoint this message was meant for. When you did the resmgr_attach(), it returned a small integer ID. This ID is the value of the id member. Note that you'd most likely never use this parameter yourself, but would instead rely on the attributes structure passed to you in your io_open() handler.
msg_max_size: This contains the msg_max_size that was passed in as the msg_max_size member of resmgr_attr_t (given to the resmgr_attach() function) so that the size, offset, and msg_max_size are all contained in one handy structure/location.
status: This is where your handler function places the result of the operation. Note that you should always use the macro _RESMGR_STATUS() to write this field. For example, if you're handling the connect message from an open(), and you're a read-only resource manager but the client wanted to open you for write, you'd return an EROFS errno via (typically) _RESMGR_STATUS (ctp, EROFS).
offset: The current number of bytes into the client's message buffer. Only relevant to the base layer library when used with resmgr_msgreadv() with combine messages (see below).
size: This tells you how many bytes are valid in the message area that gets passed to your handler function. This number is important because it indicates if more data needs to be read from the client (for example, if not all of the client's data was read by the resource manager base library), or if storage needs to be allocated for a reply to the client (for example, to reply to the client's read() request).
iov: The I/O Vector table where you can write your return values, if returning data. For example, when a client calls read() and your read-handling code is invoked, you may need to return data. This data can be set up in the iov array, and your read-handling code can then return something like _RESMGR_NPARTS (2) to indicate (in this example) that both iov [0] and iov [1] contain data to return to the client. Note that the iov member is defined as only having one element. However, you'll also notice that it's conveniently at the end of the structure. The actual number of elements in the iov array is defined by you when you set the nparts_max member of the control structure above (in the section “resmgr_attr_t control structure,” above).

Resource manager structure

Now that we've seen the data structures, we can discuss interactions between the parts that you'd supply to actually make your resource manager do something.

We'll look at:

The resmgr_attach() function and its parameters
Putting in your own functions
The general flow of a resource manager
Messages that should be connect messages but aren't
Combine messages

The resmgr_attach() function and its parameters

As you saw in the /dev/null example above, the first thing you'll want to do is register your chosen “mountpoint” with the process manager. This is done via resmgr_attach(), which has the following prototype:

int
resmgr_attach (void *dpp,
               resmgr_attr_t *resmgr_attr,
               const char *path,
               enum _file_type file_type,
               unsigned flags,
               const resmgr_connect_funcs_t *connect_funcs,
               const resmgr_io_funcs_t *io_funcs,
               RESMGR_HANDLE_T *handle);

Let's examine these arguments, in order, and see what they're used for.

dpp: The dispatch handle. This lets the dispatch interface manage the message receive for your resource manager.
resmgr_attr: Controls the resource manager characteristics, as discussed above.
path: The mountpoint that you're registering. If you're registering a discrete mountpoint (such as would be the case, for example, with /dev/null, or /dev/ser1), then this mountpoint must be matched exactly by the client, with no further pathname components past the mountpoint. If you're registering a directory mountpoint (such as would be the case, for example, with a network filesystem mounted as /nfs), then the match must be exact as well, with the added feature that pathnames past the mountpoint are allowed; they get passed to the connect functions stripped of the mountpoint (for example, the pathname /nfs/etc/passwd would match the network filesystem resource manager, and it would get etc/passwd as the rest of the pathname).
file_type: The class of resource manager. See below.
flags: Additional flags to control the behavior of your resource manager. These flags are defined below.
connect_funcs and io_funcs: These are simply the list of connect functions and I/O functions that you wish to bind to the mountpoint.
handle: This is an “extendable” data structure (aka “attributes structure”) that identifies the resource being mounted. For example, for a serial port, you'd extend the standard POSIX-layer attributes structure by adding information about the base address of the serial port, the baud rate, etc. Note that it does not have to be an attributes structure — if you're providing your own “open” handler, then you can choose to interpret this field any way you wish. It's only if you're using the default iofunc_open_default() handler as your “open” handler that this field must be an attributes structure.

The flags member can contain any of the following flags (or the constant 0 if none are specified):

_RESMGR_FLAG_BEFORE or _RESMGR_FLAG_AFTER: These flags indicate that your resource manager wishes to be placed before or after (respectively) other resource managers with the same mountpoint. These two flags would be useful with unioned (overlaid) filesystems. We'll discuss the interactions of these flags shortly.
_RESMGR_FLAG_DIR: This flag indicates that your resource manager is taking over the specified mountpoint and below — it's effectively a filesystem style of resource manager, as opposed to a discretely-manifested resource manager.
_RESMGR_FLAG_OPAQUE: If set, prevents resolving to any other manager below your mount point except for the path manager. This effectively eliminates unioning on a path.
_RESMGR_FLAG_FTYPEONLY: This ensures that only requests that have the same _FTYPE_* as the file_type passed to resmgr_attach() are matched.
_RESMGR_FLAG_FTYPEALL: This flag is used when a resource manager wants to catch all client requests, even those with a different _FTYPE_* specification than the one passed to resmgr_attach() in the file_type argument. This can only be used in conjunction with a registration file type of _FTYPE_ALL.
_RESMGR_FLAG_SELF: Allow this resource manager to talk to itself. This really is a “Don't try this at home, kids” kind of flag, because allowing a resource manager to talk to itself can break the send-hierarchy and lead to deadlock (as was discussed in the Message Passing chapter).

You can call resmgr_attach() as many times as you wish to mount different mountpoints. You can also call resmgr_attach() from within the connect or I/O functions — this is kind of a neat feature that allows you to “create” devices on the fly.

When you've decided on the mountpoint, and want to create it, you'll need to tell the process manager if this resource manager can handle requests from just anyone, or if it's limited to handling requests only from clients who identify their connect messages with special tags. For example, consider the POSIX message queue (mqueue) driver. It's not going to allow (and certainly wouldn't know what to do with) “regular” open() messages from any old client. It will allow messages only from clients that use the POSIX mq_open(), mq_receive(), and so on, function calls. To prevent the process manager from even allowing regular requests to arrive at the mqueue resource manager, mqueue specified _FTYPE_MQUEUE as the file_type parameter. This means that when a client requests a name resolution from the process manager, the process manager won't even bother considering the resource manager during the search unless the client has specified that it wants to talk to a resource manager that has identified itself as _FTYPE_MQUEUE.

Unless you're doing something very special, you'll use a file_type of _FTYPE_ANY, which means that your resource manager is prepared to handle requests from anyone. For the full list of _FTYPE_* manifest constants, take a look in <sys/ftype.h>.

With respect to the “before” and “after” flags, things get a little bit more interesting. You can specify only one of these flags or the constant 0.

Let's see how this works. A number of resource managers have started, in the order given in the table. We also see the flags they passed for the flags member. Observe the positions they're given:

Resmgr	Flag	Order
1	_RESMGR_FLAG_BEFORE	1
2	_RESMGR_FLAG_AFTER	1, 2
3	0	1, 3, 2
4	_RESMGR_FLAG_BEFORE	1, 4, 3, 2
5	_RESMGR_FLAG_AFTER	1, 4, 3, 5, 2
6	0	1, 4, 6, 3, 5, 2

As you can see, the first resource manager to actually specify a flag always ends up in that position. (From the table, resource manager number 1 was the first to specify the “before” flag; no matter who registers, resource manager 1 is always first in the list. Likewise, resource manager 2 was the first to specify the “after” flag; again, no matter who else registers, it's always last.) If no flag is specified, it effectively acts as a “middle” flag. When resource manager 3 started with a flag of zero, it got put into the middle. As with the “before” and “after” flags, there's a preferential ordering given to all the “middle” resource managers, whereby newer ones are placed in front of other, existing “middle” ones.

However, in reality, there are very few cases where you'd actually mount more than one, and even fewer cases where you'd mount more than two resource managers at the same mountpoint. Here's a design tip: expose the ability to set the flags at the command line of the resource manager so that the end-user of your resource manager is able to specify, for example, -b to use the “before” flag, and -a to use the “after” flag, with no command-line option specified to indicate that a zero should be passed as the flag.

Keep in mind that this discussion applies only to resource managers mounted with the same mountpoint. Mounting “/nfs” with a “before” flag and “/disk2” with an “after” flag will have no effect on each other; only if you were to then mount another “/nfs” or “/disk2” would these flags (and rules) come into play.

Finally, the resmgr_attach() function returns a small integer handle on success (or -1 for failure). This handle can then be used subsequently to detach the pathname from the process manager's internal pathname tables.

Putting in your own functions

When designing your very first resource manager, you'll most likely want to take an incremental design approach. It can be very frustrating to write thousands of lines of code only to run into a fundamental misunderstanding and then having to make the ugly decision of whether to try to kludge (er, I mean “fix”) all that code, or scrap it and start from scratch.

The recommended approach for getting things running is to use the iofunc_func_init() POSIX-layer default initializer function to fill the connect and I/O tables with the POSIX-layer default functions. This means that you can literally write your initial cut of your resource manager as we did above, in a few function calls.

Which function you'll want to implement first really depends on what kind of resource manager you're writing. If it's a filesystem type of resource manager where you're taking over a mountpoint and everything below it, you'll most likely be best off starting with the io_open() function. On the other hand, if it's a discretely manifested resource manager that does “traditional” I/O operations (i.e., you primarily access it with client calls like read() and write()), then the best place to start would be the io_read() and/or io_write() functions. The third possibility is that it's a discretely manifested resource manager that doesn't do traditional I/O operations, but instead relies on devctl() or ioctl() client calls to perform the majority of its functionality. In that case, you'd start at the io_devctl() function.

Regardless of where you start, you'll want to make sure that your functions are getting called in the expected manner. The really cool thing about the POSIX-layer default functions is that they can be placed directly into the connect or I/O functions table. This means that if you simply want to gain control, perform a printf() to say “I'm here in the io_open!”, and then “do whatever should be done,” you're going to have an easy time of it. Here's a portion of a resource manager that takes over the io_open() function:

// forward reference
int io_open (resmgr_context_t *, io_open_t *,
             RESMGR_HANDLE_T *, void *);

int
main ()
{
    // everything as before, in the /dev/null example
    // except after this line:
    iofunc_func_init (_RESMGR_CONNECT_NFUNCS, &cfuncs,
                      _RESMGR_IO_NFUNCS, &ifuncs);

    // add the following to gain control:
    cfuncs.open = io_open;

Assuming that you've prototyped the io_open() function call correctly, as in the code example, you can just use the default one from within your own!

int
io_open (resmgr_context_t *ctp, io_open_t *msg,
         RESMGR_HANDLE_T *handle, void *extra)
{
    printf ("I'm here in the io_open!\n");
    return (iofunc_open_default (ctp, msg, handle, extra));
}

In this manner, you're still using the default POSIX-layer iofunc_open_default() handler, but you've also gained control to do a printf().

Obviously, you could do this for the io_read(), io_write(), and io_devctl() functions as well as any others that have POSIX-layer default functions. In fact, this is a really good idea, because it shows you that the client really is calling your resource manager as expected.

The general flow of a resource manager

As we alluded to in the client and resource manager overview sections above, the general flow of a resource manager begins on the client side with the open(). This gets translated into a connect message and ends up being received by the resource manager's io_open() outcall connect function.

This is really key, because the io_open() outcall function is the “gate keeper” for your resource manager. If the message causes the gate keeper to fail the request, you will not get any I/O requests, because the client never got a valid file descriptor. Conversely, if the message is accepted by the gate keeper, the client now has a valid file descriptor and you should expect to get I/O messages.

But the io_open() outcall function plays a greater role. Not only is it responsible for verifying whether the client can or can't open the particular resource, it's also responsible for:

initializing internal library parameters
binding a context block to this request
binding an attribute structure to the context block.

The first two operations are performed via the base layer function resmgr_open_bind(); the binding of the attribute structure is done via a simple assignment.

Once the io_open() outcall function has been called, it's out of the picture. The client may or may not send I/O messages, but in any case will eventually terminating the “session” with a message corresponding to the close() function. Note that if the client suffers an unexpected death (e.g., gets hit with SIGSEGV, or the node that it's running on crashes), the operating system will synthesize a close() message so that the resource manager can clean up. Therefore, you are guaranteed to get a close() message!

Messages that should be connect messages but aren't

Here's an interesting point you may have noticed. The client's prototype for chown() is:

int
chown (const char *path,
       uid_t owner,
       gid_t group);

Remember, a connect message always contains a pathname and is either a one-shot message or establishes a context for further I/O messages.

So, why isn't there a connect message for the client's chown() function? In fact, why is there an I/O message?!? There's certainly no file descriptor implied in the client's prototype!

The answer is, “to make your life simpler!”

Imagine if functions like chown(), chmod(), stat(), and others required the resource manager to look up the pathname and then perform some kind of work. (This is, by the way, the way it was implemented in QNX 4.) The usual problems with this are:

Each function has to call the lookup routine.
Where file descriptor versions of these functions exist, the driver has to provide two separate entry points; one for the pathname version, and one for the file descriptor version.

In any event, what happens under Neutrino is that the client constructs a combine message — really just a single message that comprises multiple resource manager messages. Without combine messages, we could simulate chown() with something like this:

int
chown (const char *path, uid_t owner, gid_t group)
{
    int fd, sts;

    if ((fd = open (path, O_RDWR)) == -1) {
        return (-1);
    }
    sts = fchown (fd, owner, group);
    close (fd);
    return (sts);
}

where fchown() is the file-descriptor-based version of chown(). The problem here is that we are now issuing three function calls (and three separate message passing transactions), and incurring the overhead of open() and close() on the client side.

With combine messages, under Neutrino a single message that looks like this is constructed directly by the client's chown() library call:

A combine message.

The message has two parts, a connect part (similar to what the client's open() would have generated) and an I/O part (the equivalent of the message generated by the fchown()). There is no equivalent of the close() because we implied that in our particular choice of connect messages. We used the _IO_CONNECT_COMBINE_CLOSE message, which effectively states “Open this pathname, use the file descriptor you got for handling the rest of the message, and when you run off the end or encounter an error, close the file descriptor.”

The resource manager that you write doesn't have a clue that the client called chown() or that the client did a distinct open(), followed by an fchown(), followed by a close(). It's all hidden by the base-layer library.

Combine messages

As it turns out, this concept of combine messages isn't useful just for saving bandwidth (as in the chown() case, above). It's also critical for ensuring atomic completion of operations.

Suppose the client process has two or more threads and one file descriptor. One of the threads in the client does an lseek() followed by a read(). Everything is as we expect it. If another thread in the client does the same set of operations, on the same file descriptor, we'd run into problems. Since the lseek() and read() functions don't know about each other, it's possible that the first thread would do the lseek(), and then get preempted by the second thread. The second thread gets to do its lseek(), and then its read(), before giving up CPU. The problem is that since the two threads are sharing the same file descriptor, the first thread's lseek() offset is now at the wrong place — it's at the position given by the second thread's read() function! This is also a problem with file descriptors that are dup()'d across processes, let alone the network.

An obvious solution to this is to put the lseek() and read() functions within a mutex — when the first thread obtains the mutex, we now know that it has exclusive access to the file descriptor. The second thread has to wait until it can acquire the mutex before it can go and mess around with the position of the file descriptor.

Unfortunately, if someone forgot to obtain a mutex for each and every file descriptor operation, there'd be a possibility that such an “unprotected” access would cause a thread to read or write data to the wrong location.

Let's look at the C library call readblock() (from <unistd.h>):

int
readblock (int fd,
           size_t blksize,
           unsigned block,
           int numblks,
           void *buff);

(The writeblock() function is similar.)

You can imagine a fairly “simplistic” implementation for readblock():

int
readblock (int fd, size_t blksize, unsigned block,
           int numblks, void *buff)
{
    lseek (fd, blksize * block, SEEK_SET); // get to the block
    read (fd, buff, blksize * numblks);
}

Obviously, this implementation isn't useful in a multi-threaded environment. We'd have to at least put a mutex around the calls:

int
readblock (int fd, size_t blksize, unsigned block,
           int numblks, void *buff)
{
    pthread_mutex_lock (&block_mutex);
    lseek (fd, blksize * block, SEEK_SET); // get to the block
    read (fd, buff, blksize * numblks);
    pthread_mutex_unlock (&block_mutex);
}

(We're assuming the mutex is already initialized.)

This code is still vulnerable to “unprotected” access; if some other thread in the process does a simple non-mutexed lseek() on the file descriptor, we've got a bug.

The solution to this is to use a combine message, as we discussed above for the chown() function. In this case, the C library implementation of readblock() puts both the lseek() and the read() operations into a single message and sends that off to the resource manager:

The readblock() function's combine message.

The reason that this works is because message passing is atomic. From the client's point of view, either the entire message has gone to the resource manager, or none of it has. Therefore, an intervening “unprotected” lseek() is irrelevant — when the readblock() operation is received by the resource manager, it's done in one shot. (Obviously, the damage will be to the unprotected lseek(), because after the readblock() the file descriptor's offset is at a different place than where the original lseek() put it.)

But what about the resource manager? How does it ensure that it processes the entire readblock() operation in one shot? We'll see this shortly, when we discuss the operations performed for each message component.

POSIX-layer data structures

There are three data structures that relate to the POSIX-layer support routines. Note that as far as the base layer is concerned, you can use any data structures you want; it's the POSIX layer that requires you to conform to a certain content and layout. The benefits delivered by the POSIX layer are well worth this tiny constraint. As we'll see later, you can add your own content to the structures as well.

The three data structures are illustrated in the following diagram, showing some clients using a resource manager that happens to manifest two devices:

Data structures - big picture

Data structures — the big picture.

The data structures (defined in <sys/iofunc.h>) are:

iofunc_ocb_t — OCB structure: Contains information on a per-file-descriptor basis
iofunc_attr_t — attributes structure: Contains information on a per-device basis
iofunc_mount_t — mount structure: Contains information on a per-mountpoint basis

When we talked about the I/O and connect tables, you saw the OCB and attributes structures — in the I/O tables, the OCB structure was the last parameter passed. The attributes structure was passed as the handle in the connect table functions (third argument). The mount structure is usually a global structure and is bound to the attributes structure “by hand” (in the initialization code that you supply for your resource manager).

Be sure to #include <sys/iofunc.h> before <sys/resmgr.h>, or else the data structures won't be defined properly.

The `iofunc_ocb_t` OCB structure

The OCB structure contains information on a per-file-descriptor basis. What this means is that when a client performs an open() call and gets back a file descriptor (as opposed to an error indication), the resource manager will have created an OCB and associated it with the client. This OCB will be around for as long as the client has the file descriptor open. Effectively, the OCB and the file descriptor are a matched pair. Whenever the client calls an I/O function, the resource manager library will automatically associate the OCB, and pass it along with the message to the I/O function specified by the I/O function table entry. This is why the I/O functions all had the ocb parameter passed to them. Finally, the client will close the file descriptor (via close()), which will cause the resource manager to dissociate the OCB from the file descriptor and client. Note that the client's dup() function simply increments a reference count. In this case, the OCB gets dissociated from the file descriptor and client only when the reference count reaches zero (i.e., when the same number of close()s have been called as open() and dup()s.)

As you might suspect, the OCB contains things that are important on a per-open or per-file-descriptor basis. Here are the contents (from <sys/iofunc.h>):

typedef struct _iofunc_ocb {
  IOFUNC_ATTR_T *attr;
  int32_t       ioflag;
  SEE_BELOW!!!  offset;
  uint16_t      sflag;
  uint16_t      flags;
} iofunc_ocb_t;

Ignore the comment about the offset field for now; we'll come back to it immediately after this discussion.

The iofunc_ocb_t members are:

attr

A pointer to the attributes structure related to this OCB. A common coding idiom you'll see in the I/O functions is “ocb->attr,” used to access a member of the attributes structure.

ioflag

The open mode; how this resource was opened (e.g. read only). The open modes (as passed to open() on the client side) correspond to the ioflag values as follows:

Open mode	ioflag value
O_RDONLY	_IO_FLAG_RD
O_RDWR	_IO_FLAG_RD \| _IO_FLAG_WR
O_WRONLY	_IO_FLAG_WR

offset

The current lseek() offset into this resource.

sflag

The sharing flag (see <share.h>) used with the client's sopen() function call. These are the flags SH_COMPAT, SH_DENYRW, SH_DENYWR, SH_DENYRD, and SH_DENYNO.

flags

System flags. The two flags currently supported are IOFUNC_OCB_PRIVILEGED, which indicates whether a privileged process issued the connect message that resulted in this OCB, and IOFUNC_OCB_MMAP, which indicates whether this OCB is in use by a mmap() call on the client side. No other flags are defined at this time. You can use the bits defined by IOFUNC_OCB_FLAGS_PRIVATE for your own private flags.

If you wish to store additional data along with the “normal” OCB, rest assured that you can “extend” the OCB. We'll discuss this in the “Advanced topics” section.

The strange case of the offset member

The offset field is, to say the least, interesting. Have a look at <sys/iofunc.h> to see how it's implemented. Depending on what preprocessor flags you've set, you may get one of six (!) possible layouts for the offset area. But don't worry too much about the implementation — there are really only two cases to consider, depending on whether you want to support 64-bit offsets:

yes — the offset member is 64 bits
no (32-bit integers) — the offset member is the lower 32 bits; another member, offset_hi, contains the upper 32 bits.

For our purposes here, unless we're specifically going to talk about 32 versus 64 bits, we'll just assume that all offsets are 64 bits, of type off_t, and that the platform knows how to deal with 64-bit quantities.

The `iofunc_attr_t` attributes structure

Whereas the OCB was a per-open or per-file-descriptor structure, the attributes structure is a per-device data structure. You saw that the standard iofunc_ocb_t OCB had a member called attr that's a pointer to the attribute structure. This was done so the OCB has access to information about the device. Let's take a look at the attributes structure (from <sys/iofunc.h>):

typedef struct _iofunc_attr {
  IOFUNC_MOUNT_T           *mount;
  uint32_t                 flags;
  int32_t                  lock_tid;
  uint16_t                 lock_count;
  uint16_t                 count;
  uint16_t                 rcount;
  uint16_t                 wcount;
  uint16_t                 rlocks;
  uint16_t                 wlocks;
  struct _iofunc_mmap_list *mmap_list;
  struct _iofunc_lock_list *lock_list;
  void                     *list;
  uint32_t                 list_size;
  SEE_BELOW!!!             nbytes;
  SEE_BELOW!!!             inode;
  uid_t                    uid;
  gid_t                    gid;
  time_t                   mtime;
  time_t                   atime;
  time_t                   ctime;
  mode_t                   mode;
  nlink_t                  nlink;
  dev_t                    rdev;
} iofunc_attr_t;

The nbytes and inode members have the same set of #ifdef conditionals as the offset member of the OCB (see “The strange case of the offset member” above).

Note that some of the fields of the attributes structure are useful only to the POSIX helper routines.

Let's look at the fields individually:

mount: A pointer to the optional iofunc_mount_t mount structure. This is used in the same way that the pointer from the OCB to the attribute structure was used, except that this value can be NULL in which case the mount structure defaults are used (see “The iofunc_mount_t mount structure” below). As mentioned, the mount structure is generally bound “by hand” into the attributes structure in code that you supply for your resource manager initialization.
flags: Contains flags that describe the state of other attributes structure fields. We'll discuss these shortly.
lock_tid: In order to prevent synchronization problems, multiple threads using the same attributes structure will be mutually exclusive. The lock_tid contains the thread ID of the thread that currently has the attributes structure locked.
lock_count: Indicates how many threads are trying to use this attributes structure. A value of zero indicates that the structure is unlocked. A value of one or more indicates that one or more threads are using the structure.
count: Indicates the number of OCBs that have this attributes structure open for any reason. For example, if one client has an OCB open for read, another client has another OCB open for read/write, and both OCBs point to this attribute structure, then the value of count would be 2, to indicate that two clients have this resource open.
rcount: Count readers. In the example given for count, rcount would also have the value 2, because two clients have the resource open for reading.
wcount: Count writers. In the example given for count, wcount would have the value 1, because only one of the clients has this resource open for writing.
rlocks: Indicates the number of OCBs that have read locks on the particular resource. If zero, means there are no read locks, but there may be write locks.
wlocks: Same as rlocks but for write locks.
mmap_list: Used internally by POSIX iofunc_mmap_default().
lock_list: Used internally by POSIX iofunc_lock_default().
list: Reserved for future use.
list_size: Size of area reserved by list.
nbytes: Size of the resource, in bytes. For example, if this resource described a particular file, and that file was 7756 bytes in size, then the nbytes member would contain the number 7756.
inode: Contains a file or resource serial number, that must be unique per mountpoint. The inode should never be zero, because zero traditionally indicates a file that's not in use.
uid: User ID of the owner of this resource.
gid: Group ID of the owner of this resource.
mtime: File modification time, updated or at least invalidated whenever a client write() is processed.
atime: File access time, updated or at least invalidated whenever a client read() that returns more than zero bytes is processed.
ctime: File change time, updated or at least invalidated whenever a client write(), chown(), or chmod() is processed.
mode: File's mode. These are the standard S_* values from <sys/stat.h>, such as S_IFCHR, or in octal representation, such as 0664 to indicate read/write permission for owner and group, and read-only permission for other.
nlink: Number of links to the file, returned by the client's stat() function call.
rdev: For a character special device, this field consists of a major and minor device code (10 bits minor in the least-significant positions; next 6 bits are the major device number). For other types of devices, contains the device number. (See below in “Of device numbers, inodes, and our friend rdev,” for more discussion.)

As with the OCB, you can extend the “normal” attributes structure with your own data. See the “Advanced topics” section.

The `iofunc_mount_t` mount structure

The mount structure contains information that's common across multiple attributes structures.

Here are the contents of the mount structure (from <sys/iofunc.h>):

typedef struct _iofunc_mount {
  uint32_t       flags;
  uint32_t       conf;
  dev_t          dev;
  int32_t        blocksize;
  iofunc_funcs_t *funcs;
} iofunc_mount_t;

The flags member contains just one flag, IOFUNC_MOUNT_32BIT. This flag indicates that offset in the OCB, and nbytes and inode in the attributes structure, are 32-bit. Note that you can define your own flags in flags, using any of the bits from the constant IOFUNC_MOUNT_FLAGS_PRIVATE.

The conf member contains the following flags:

IOFUNC_PC_CHOWN_RESTRICTED

Indicates if the filesystem is operating in a “chown-restricted” manner, meaning if only root is allowed to chown a file.

IOFUNC_PC_NO_TRUNC

Indicates that the filesystem doesn't truncate the name.

IOFUNC_PC_SYNC_IO

Indicates that the filesystem supports synchronous I/O operations. If this bit isn't set, the following may occur:

The default iofunc layer _IO_OPEN handler, iofunc_open_default(), fails if the client specifies O_DSYNC, O_RSYNC, or O_SYNC.
The iofunc_sync_verify() function returns EINVAL.
Attempts to set O_DSYNC, O_RSYNC, or O_SYNC with fcntl() or the DCMD_ALL_SETFLAGS devctl() command fail.

IOFUNC_PC_LINK_DIR

Indicates that linking/unlinking of directories is allowed.

The dev member contains the device number and is described below in “Of device numbers, inodes, and our friend rdev.”

The blocksize describes the native blocksize of the device in bytes. For example, on a typical rotating-medium storage system, this would be the value 512.

Finally, the funcs pointer points to a structure (from <sys/iofunc.h>):

typedef struct _iofunc_funcs {
  unsigned      nfuncs;

  IOFUNC_OCB_T *(*ocb_calloc)
                    (resmgr_context_t *ctp,
                     IOFUNC_ATTR_T *attr);

  void          (*ocb_free)
                    (IOFUNC_OCB_T *ocb);
} iofunc_funcs_t;

As with the connect and I/O functions tables, the nfuncs member should be stuffed with the current size of the table. Use the constant _IOFUNC_NFUNCS for this.

The ocb_calloc and ocb_free function pointers can be filled with addresses of functions to call whenever an OCB is to be allocated or deallocated. We'll discuss why you'd want to use these functions later when we talk about extending OCBs.

Of device numbers, inodes, and our friend rdev

The mount structure contains a member called dev. The attributes structure contains two members: inode and rdev. Let's look at their relationships by examining a traditional disk-based filesystem. The filesystem is mounted on a block device (which is the entire disk). This block device might be known as /dev/hd0 (the first hard disk in the system). On this disk, there might be a number of partitions, such as /dev/hd0t77 (the first QNX filesystem partition on that particular device). Finally, within that partition, there might be an arbitrary number of files, one of which might be /hd/spud.txt.

The dev (or “device number”) member, contains a number that's unique to the node that this resource manager is registered with. The rdev member is the dev number of the root device. Finally, the inode is the file serial number. (Note that you can obtain major and minor device numbers by calling rsrcdbmgr_devno_attach(); see the Neutrino Library Reference for more details. You are limited to 64 major devices and 1024 minor devices per major device.)

Let's relate that to our disk example. The following table shows some example numbers; after the table we'll take a look at where these numbers came from and how they're related.

Device	dev	inode	rdev
`/dev/hd0`	6	2	1
`/dev/hd0t77`	1	12	77
`/hd/spud.txt`	77	47343	N/A

For the raw block device, /dev/hd0, the process manager assigned both the dev and inode values (the 6 and the 2 in the table above). The resource manager picked a unique rdev value (of 1) for the device when it started.

For the partition, /dev/hd0t77, the dev value came from the raw block device's rdev number (the 1). The inode was selected by the resource manager as a unique number (within the rdev). This is where the 12 came from. Finally, the rdev number was selected by the resource manager as well — in this case, the writer of the resource manager selected 77 because it corresponded to the partition type.

Finally, for the file, /hd/spud.txt, the dev value (77) came from the partition's rdev value. The inode was selected by the resource manager (in the case of a file, the number is selected to correspond to some internal representation of the file — it doesn't matter what it is so long as it's not zero, and it's unique within the rdev). This is where the 47343 came from. For a file, the rdev field is not meaningful.

Handler routines

Not all outcalls correspond to client messages — some are synthesized by the kernel, and some by the library.

I've organized this section into the following:

general notes
connect functions notes

followed by an alphabetical listing of connect and I/O messages.

General notes

Each handler function gets passed an internal context block (the ctp argument) which should be treated as “read-only,” except for the iov member. This context block contains a few items of interest, as described above in “resmgr_context_t internal context block.” Also, each function gets passed a pointer to the message (in the msg argument). You'll be using this message pointer extensively, as that contains the parameters that the client's C library call has placed there for your use.

The function that you supply must return a value (all functions are prototyped as returning in int). The values are selected from the following list:

_RESMGR_NOREPLY

Indicates to the resource manager library that it should not perform the MsgReplyv() — the assumption is that you've either performed it yourself in your handler function, or that you're going to do it some time later.

_RESMGR_NPARTS (n)

The resource manager library should return an n-part IOV when it does the MsgReplyv() (the IOV is located in ctp -> iov). Your function is responsible for filling in the iov member of the ctp structure, and then returning _RESMGR_NPARTS() with the correct number of parts.

The iov member of ctp is allocated dynamically, so it must be big enough to hold the number of array elements that you're writing into the iov member! See the section “resmgr_attr_t control structure” above, for information on setting the nparts_max member.

_RESMGR_DEFAULT

This instructs the resource manager library to perform the low-level default function (This is not the same as the iofunc_*_default() functions!) You'd rarely ever use this return value. In general, it causes the resource manager library to return an errno of ENOSYS to the client, which indicates that the function is not supported.

An errno value

Indicates to the resource manager library that it should call MsgError() with this value as the error parameter. This generally causes the client function (e.g. open()) to return -1 and set errno on the client side to the returned value.

_RESMGR_ERRNO (errno)

(Deprecated) This return value had been used to “wrap” an errno number as the return value of the message. For example, if a client issued an open() request for a read-only device, it would be appropriate to return the error value EROFS. Since this function is deprecated, you can return the error number directly instead of wrapping it with the _RESMGR_ERRNO macro (e.g., return (EROFS); instead of the more cumbersome return (_RESMGR_ERRNO (EROFS));.)

_RESMGR_PTR (ctp, addr, len)

This is a convenience macro that accepts the context pointer ctp, and fills its first IOV element to point to the address specified by addr for the length specified by len, and then returns the equivalent of _RESMGR_NPARTS (1) to the library. You'd generally use this if you return single-part IOVs from your function.

Locking, unlocking, and combine message handling

We saw the client side of a combine message when we looked at readblock() (in “Combine messages”). The client was able to atomically construct a message that contained multiple resource manager “submessages” — in the example, these were messages corresponding to the individual functions lseek() and read(). From the client's perspective, the two (or more) functions were at least sent atomically (and, due to the nature of message passing, will be received atomically by the resource manager). What we haven't yet talked about is how we ensure that the messages are processed atomically.

This discussion applies not only to combine messages, but to all I/O messages received by the resource manager library (except the close message, which we'll come back to shortly).

The very first thing that the resource manager library does is to lock the attribute structure corresponding to the resource being used by the received message. Then, it processes one or more submessages from the incoming message. Finally, it unlocks the attribute structure.

This ensures that the incoming messages are handled atomically, for no other thread in the resource manager (in the case of a multithreaded resource manager, of course) can “jump in” and modify the resource while a thread is busy using it. Without the locking in place, two client threads could both issue what they believe to be an atomic combine message (say lseek() and read()). Since the resource manager might have two different threads running in it and processing messages, the two resource manager threads could possibly preempt each other, and the lseek() components could interfere with each other. With locking and unlocking, this is prevented, because each message that accesses a resource will be completed in its entirety atomically.

Locking and unlocking the resource is handled by default helper functions (iofunc_lock_ocb_default() and iofunc_unlock_ocb_default()), which are placed in the I/O table at the lock_ocb and unlock_ocb positions. You can, of course, override these functions if you want to perform further actions during this locking and unlocking phase.

Note that the resource is unlocked before the io_close() function is called. This is necessary because the io_close() function will free the OCB, which would effectively invalidate the pointer used to access the attributes structure, which is where the lock is stored! Also note that none of the connect functions do this locking, because the handle that's passed to them does not have to be an attribute structure (and the locks are stored in the attribute structure).

Connect functions notes

Before we dive into the individual messages, however, it's worth pointing out that the connect functions all have an identical message structure (rearranged slightly, see <sys/iomsg.h> for the original):

struct _io_connect {
    // Internal use
    uint16_t type;
    uint16_t subtype;
    uint32_t file_type;
    uint16_t reply_max;
    uint16_t entry_max;
    uint32_t key;
    uint32_t handle;
    uint32_t ioflag;
    uint32_t mode;
    uint16_t sflag;
    uint16_t access;
    uint16_t zero;
    uint8_t  eflag;

    // End-user parameters
    uint16_t path_len;
    uint8_t  extra_type;
    uint16_t extra_len;
    char     path [1];
};

You'll notice that I've divided the struct _io_connect structure into two areas, an “Internal use” part and an “End-user parameters” part.

Internal use part

The first part consists of fields that the resource manager library uses to:

determine the type of message sent from the client.
validate (ensure that the message is not spoofed).
track access mode (used by helper functions).

To keep things simple, I recommend that you always use the helper functions (the iofunc_*_default() ones) in all connect functions. These will return a pass/fail indication, and after that point, you can then use the “End-user parameters” members within the connect function.

End-user parameter part

The second half of the members directly concern your implementation of the connect functions:

path_len and path: The pathname (and its length) that's the operand (i.e., the pathname you're operating on).
extra_type and extra_len: Additional parameters (pathnames, for example) relevant to the connect function.

To get a sense of how the path member is used as “the pathname you're operating on,” let's examine something like the rename() function. This function takes two pathnames; the “original” pathname and the “new” pathname. The original pathname is passed in path, because it's the thing being worked on (it's the filename that's undergoing the name change). The new pathname is the argument to the operation. You'll see that the extra parameter passed to the connect functions conveniently contains a pointer to the argument of the operation — in this case, the new pathname. (Implementation-wise, the new pathname is stored just past the original pathname in the path pointer, with alignment taken into consideration, but you don't have to do anything about this — the extra parameter conveniently gives you the correct pointer.)

Alphabetical listing of connect and I/O functions

This section gives an alphabetical listing of the connect and I/O function entry points that you can fill in (the two tables passed to resmgr_attach()). Remember that if you simply call iofunc_func_init(), all these entries will be filled in with the appropriate defaults; you'd want to modify a particular entry only if you wish to handle that particular message. In the “Examples” section, below, you'll see some examples of the common functions.

io_chmod()
io_chown()
io_close_dup()
io_close_ocb()
io_devctl()
io_dup()
io_fdinfo()
io_link()
io_lock()
io_lock_ocb()
io_lseek()
io_mknod()
io_mmap()
io_mount()
io_msg()
io_notify()
io_open()
io_openfd()
io_pathconf()
io_power()
io_read()
io_readlink()
io_rename()
io_shutdown()
io_space()
io_stat()
io_sync()
io_unblock() [CONNECT]
io_unblock() [I/O]
io_unlink()
io_unlock_ocb()
io_utime()
io_write()

It may seem confusing at first, but note that there are in fact two unblock outcalls — one is a connect function and one is an I/O function. This is correct; it's a reflection of when the unblock occurs. The connect version of the unblock function is used when the kernel unblocks the client immediately after the client has sent the connect message; the I/O version of the unblock function is used when the kernel unblocks the client immediately after the client has sent an I/O message.

In order not to confuse the client's C-library call (for example, open()) with the resource manager connect outcall that goes into that particular slot, we've given all of our functions an “io_” prefix. For example, the function description for the open connect outcall slot will be under io_open().

io_chmod()

int io_chmod (resmgr_context_t *ctp, io_chmod_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O function

Default handler: iofunc_chmod_default()

Helper functions: iofunc_chmod()

Client functions: chmod(), fchmod()

Messages: _IO_CHMOD

Data structure:

struct _io_chmod {
  uint16_t type;
  uint16_t combine_len;
  mode_t   mode;
};

typedef union {
  struct _io_chmod i;
} io_chmod_t;

Description: Responsible for changing the mode for the resource identified by the passed ocb to the value specified by the mode message member.

Returns: The status via the helper macro _RESMGR_STATUS().

io_chown()

int io_chown (resmgr_context_t *ctp, io_chown_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O function

Default handler: iofunc_chown_default()

Helper functions: iofunc_chown()

Client functions: chown(), fchown()

Messages: _IO_CHOWN

Data structure:

struct _io_chown {
  uint16_t type;
  uint16_t combine_len;
  int32_t  gid;
  int32_t  uid;
};

typedef union {
  struct _io_chown i;
} io_chown_t;

Description: Responsible for changing the user ID and group ID fields for the resource identified by the passed ocb to uid and gid, respectively. Note that the mount structure flag IOFUNC_PC_CHOWN_RESTRICTED and the OCB flag field should be examined to determine whether the filesystem allows chown() to be performed by non-root users.

Returns: The status via the helper macro _RESMGR_STATUS().

io_close_dup()

int io_close_dup (resmgr_context_t *ctp, io_close_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O function

Default handler: iofunc_close_dup_default()

Helper functions: iofunc_close_dup()

Client functions: close(), fclose()

Messages: _IO_CLOSE

Data structure:

struct _io_close {
  uint16_t type;
  uint16_t combine_len;
};

typedef union {
  struct _io_close i;
} io_close_t;

Description: This is the real function handler for the client's close() or fclose() function calls. Note that you'd almost never take over this function; you'd leave it as iofunc_close_dup_default() in the I/O table. This is because the base layer keeps track of the number of open(), dup() and close() messages issued for a particular OCB, and will then synthesize an io_close_ocb() outcall (see below) when the last close() message has been received for a particular OCB. Note that the receive IDs present in ctp->rcvid may not necessarily match up with those passed to io_open(). However, it's guaranteed that at least one receive ID will match the receive ID from the io_open() function. The “extra” receive IDs are the result of (possibly internal) dup()-type functionality.

Returns: The status via the helper macro _RESMGR_STATUS().

io_close_ocb()

int io_close_ocb (resmgr_context_t *ctp, void *reserved, RESMGR_OCB_T *ocb)

Classification: I/O function (synthesized by library)

Default handler: iofunc_close_ocb_default()

Helper functions: none

Client function: none — synthesized by library

Messages: none — synthesized by library

Data structure:

// synthesized by library
struct _io_close {
  uint16_t type;
  uint16_t combine_len;
};

typedef union {
  struct _io_close i;
} io_close_t;

Description: This is the function that gets synthesized by the base-layer library when the last close() has been received for a particular OCB. This is where you'd perform any final cleanup you needed to do before the OCB is destroyed. Note that the receive ID present in ctp->rcvid is zero, because this function is synthesized by the library and doesn't necessarily correspond to any particular message.

Returns: The status via the helper macro _RESMGR_STATUS().

io_devctl()

int io_devctl (resmgr_context_t *ctp, io_devctl_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_devctl_default()

Helper functions: iofunc_devctl()

Client functions: devctl(), ioctl()

Messages: _IO_DEVCTL

Data structure:

struct _io_devctl {
  uint16_t type;
  uint16_t combine_len;
  int32_t  dcmd;
  int32_t  nbytes;
  int32_t  zero;
};

struct _io_devctl_reply {
  uint32_t zero;
  int32_t  ret_val;
  int32_t  nbytes;
  int32_t  zero2;
};

typedef union {
  struct _io_devctl       i;
  struct _io_devctl_reply o;
} io_devctl_t;

Description: Performs the device I/O operation as passed from the client's devctl() in dcmd. The client encodes a direction into the top two bits of dcmd, indicating how the devctl() is to transfer data (the “to” field refers to the _POSIX_DEVDIR_TO bit; the “from” field refers to the _POSIX_DEVDIR_FROM bit):

to field	from field	Meaning
0	0	No data transfer
0	1	Transfer from driver to client
1	0	Transfer from client to driver
1	1	Transfer bidirectionally

In the case of no data transfer, the driver is expected to simply perform the command given in dcmd. In the case of a data transfer, the driver is expected to transfer the data from and/or to the client, using the helper functions resmgr_msgreadv() and resmgr_msgwritev(). The client indicates the size of the transfer in the nbytes member; the driver is to set the outgoing structure's nbytes member to the number of bytes transferred.

Note that the input and output data structures are zero-padded so that they align with each other. This means that the implicit data area begins at the same address in the input and output structures.

If using the helper routine iofunc_devctl(), beware that it'll return the constant _RESMGR_DEFAULT in the case where it can't do anything with the devctl() message. This return value is there to decouple legitimate errno return values from an “unrecognized command” return value. Upon receiving a _RESMGR_DEFAULT, the base-layer library will respond with an errno of ENOSYS, which the client's devctl() library function will translate into ENOTTY.

It's up to your function to check the open mode against the operation; no checking is done anywhere in either the client's devctl() library or in the resource manager library. For example, it's possible to open a resource manager “read-only” and then issue a devctl() to it telling it to “format the hard disk” (which is very much a “write” operation). It would be prudent to verify the open mode first before proceeding with the operation.

Note that the range of dcmd values you can use is limited (0x0000 through 0x0FFF inclusive is reserved for QSS). Other values may be in use; take a look through the include files that have the name <sys/dcmd_*.h>.

Returns: The status via the helper macro _RESMGR_STATUS() and the reply buffer (with reply data, if required).

For an example, take a look at “A simple io_devctl() example,” below.

io_dup()

int io_dup (resmgr_context_t *ctp, io_dup_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: NULL — handled by base layer

Helper functions: none

Client functions: dup(), dup2(), fcntl(), fork(), spawn*(), vfork()

Messages: _IO_DUP

Data structure:

struct _io_dup {
  uint16_t         type;
  uint16_t         combine_len;
  struct _msg_info info;
  uint32_t         reserved;
  uint32_t         key;
};

typedef union {
  struct _io_dup   i;
} io_dup_t;

Description: This is the dup() message handler. As with the io_close_dup(), you won't likely handle this message yourself. Instead, the base-layer library will handle it.

Returns: The status via the helper macro _RESMGR_STATUS().

io_fdinfo()

int io_fdinfo (resmgr_context_t *ctp, io_fdinfo_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_fdinfo_default()

Helper functions: iofunc_fdinfo()

Client function: iofdinfo()

Messages: _IO_FDINFO

Data structure:

struct _io_fdinfo {
  uint16_t         type;
  uint16_t         combine_len;
  uint32_t         flags;
  int32_t          path_len;
  uint32_t         reserved;
};

struct _io_fdinfo_reply {
  uint32_t         zero [2];
  struct _fdinfo   info;
};

typedef union {
  struct _io_fdinfo        i;
  struct _io_fdinfo_reply  o;
} io_fdinfo_t;

Description: This function is used to allow clients to retrieve information directly about the attributes and pathname which is associated with a file descriptor. The client-side function iofdinfo() is used. The path string implicitly follows the struct _io_fdinfo_reply data structure. Use of the default function is sufficient for discretely-manifested pathname resource managers.

Returns: The length of the path string being returned is set via the helper macro _IO_SET_FDINFO_LEN().

io_link()

int io_link (resmgr_context_t *ctp, io_link_t *msg, RESMGR_HANDLE_T *handle, io_link_extra_t *extra)

Classification: Connect

Default handler: none

Helper functions: iofunc_link()

Client function: link()

Messages: _IO_CONNECT with subtype _IO_CONNECT_LINK

Data structure:

struct _io_connect {
  // internal fields (as described above)
  uint16_t path_len;
  uint8_t  extra_type;
  uint16_t extra_len;
  char     path [1];
};

struct _io_connect_link_reply {
  uint32_t reserved1;
  uint32_t file_type;
  uint8_t  eflag;
  uint8_t  reserved2[1];
  uint16_t chroot_len;
  uint32_t umask;
  uint16_t nentries;
  uint16_t path_len;
};

struct _io_connect_ftype_reply {
  uint16_t status;      /* Typically an errno */
  uint16_t reserved;
  uint32_t file_type;   /* _FTYPE_? in sys/ftype.h */
};

typedef union {
  struct _io_connect             connect;
  struct _io_connect_link_reply  link_reply;
  struct _io_connect_ftype_reply ftype_reply;
} io_link_t;

typedef union _io_link_extra {
  struct _msg_info              info;
  void                          *ocb;
  char                          path [1];
  struct _io_resmgr_link_extra  resmgr;
} io_link_extra_t;

Description: Creates a new link with the name given in the path member of msg to the already-existing pathname specified by the path member of extra (passed to your function). For convenience, the ocb member of extra contains a pointer to the OCB for the existing pathname.

Returns: The status via the helper macro _RESMGR_STATUS().

io_lock()

int io_lock (resmgr_context_t *ctp, io_lock_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_lock_default()

Helper functions: iofunc_lock()

Client functions: fcntl(), lockf(), flock()

Messages: _IO_LOCK

Data structure:

struct _io_lock {
  uint16_t              type;
  uint16_t              combine_len;
  uint32_t              subtype;
  int32_t               nbytes;
};

struct _io_lock_reply {
  uint32_t              zero [3];
};

typedef union {
  struct _io_lock       i;
  struct _io_lock_reply o;
} io_lock_t;

Description: This provides advisory range-based file locking for a device. The default function is most likely sufficient for most resource managers.

Returns: The status via the helper macro _RESMGR_STATUS().

io_lock_ocb()

int io_lock_ocb (resmgr_context_t *ctp, void *reserved, RESMGR_OCB_T *ocb)

Classification: I/O (synthesized by library)

Default handler: iofunc_lock_ocb_default()

Helper functions: none

Client functions: all

Messages: none — synthesized by library

Data structure: none

Description: This function is responsible for locking the attributes structure pointed to by the OCB. This is done to ensure that only one thread at a time is operating on both the OCB and the corresponding attributes structure. The lock (and corresponding unlock) functions are synthesized by the resource manager library before and after completion of message handling. See the section on “Combine messages” above for more details. You'll almost never use this outcall yourself; instead, use the POSIX-layer default function.

Returns: The status via the helper macro _RESMGR_STATUS().

io_lseek()

int io_lseek (resmgr_context_t *ctp, io_lseek_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_lseek_default()

Helper functions: iofunc_lseek()

Client functions: lseek(), fseek(), rewinddir()

Messages: _IO_LSEEK

Data structure:

struct _io_lseek {
  uint16_t         type;
  uint16_t         combine_len;
  short            whence;
  uint16_t         zero;
  uint64_t         offset;
};

typedef union {
  struct _io_lseek i;
  uint64_t         o;
} io_lseek_t;

Description: Handles the client's lseek() function. Note that a resource manager that handles directories will also need to interpret the _IO_LSEEK message for directory operations. The whence and offset parameters are passed from the client's lseek() function. The routine should adjust the OCB's offset parameter after interpreting the whence and offset parameters from the message and should return the new offset or an error.

Returns: The status via the helper macro _RESMGR_STATUS(), and optionally (if no error and if not part of a combine message) the current offset.

io_mknod()

int io_mknod (resmgr_context_t *ctp, io_mknod_t *msg, RESMGR_HANDLE_T *handle, void *reserved)

Classification: Connect

Default handler: none

Helper functions: iofunc_mknod()

Client functions: mknod(), mkdir(), mkfifo()

Messages: _IO_CONNECT, subtype _IO_CONNECT_MKNOD

Data structure:

struct _io_connect {
  // internal fields (as described above)
  uint16_t path_len;
  uint8_t  extra_type;
  uint16_t extra_len;
  char     path [1];
};

struct _io_connect_link_reply {
  uint32_t reserved1;
  uint32_t file_type;
  uint8_t  eflag;
  uint8_t  reserved2[1];
  uint16_t chroot_len;
  uint32_t umask;
  uint16_t nentries;
  uint16_t path_len;
};

struct _io_connect_ftype_reply {
  uint16_t status;      /* Typically an errno */
  uint16_t reserved;
  uint32_t file_type;   /* _FTYPE_? in sys/ftype.h */
};

typedef union {
    struct _io_connect                  connect;
    struct _io_connect_link_reply       link_reply;
    struct _io_connect_ftype_reply      ftype_reply;
} io_mknod_t;

Description: Creates a new filesystem entry point. The message is issued to create a file, named by the path member, using the filetype encoded in the mode member (from the “internal fields” part of the struct _io_connect structure, not shown).

This is really used only for the mkfifo(), mkdir(), and mknod() client functions.

Returns: The status via the helper macro _RESMGR_STATUS().

io_mmap()

int io_mmap (resmgr_context_t *ctp, io_mmap_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_mmap_default()

Helper functions: iofunc_mmap()

Client functions: mmap(), munmap(), mmap_device_io(), mmap_device_memory()

Messages: _IO_MMAP

Data structure:

struct _io_mmap {
  uint16_t              type;
  uint16_t              combine_len;
  uint32_t              prot;
  uint64_t              offset;
  struct _msg_info      info;
  uint32_t              zero [6];
};

struct _io_mmap_reply {
  uint32_t              zero;
  uint32_t              flags;
  uint64_t              offset;
  int32_t               coid;
  int32_t               fd;
};

typedef union {
  struct _io_mmap       i;
  struct _io_mmap_reply o;
} io_mmap_t;

Description: Allows the process manager to mmap() files from your resource manager. Generally, you should not code this function yourself (use the defaults provided by iofunc_func_init() — the default handler), unless you specifically wish to disable the functionality (for example, a serial port driver could choose to return ENOSYS, because it doesn't make sense to support this operation).

Only the process manager will call this resource manager function.

Note that a side effect of the process manager's calling this function is that an OCB will be created (i.e., iofunc_ocb_calloc() will be called), but this should have no consequences to a properly implemented resource manager.

Returns: The status via the helper macro _RESMGR_STATUS().

io_mount()

int io_mount (resmgr_context_t *ctp, io_mount_t *msg, RESMGR_HANDLE_T *handle, io_mount_extra_t *extra)

Classification: Connect

Default handler: none

Client functions: mount(), umount()

Helper functions: none

Messages: _IO_CONNECT with the _IO_CONNECT_MOUNT subtype.

Data structure:

struct _io_connect {
  // internal fields (as described above)
  uint16_t path_len;
  uint8_t  extra_type;
  uint16_t extra_len;
  char     path [1];
};

struct _io_connect_link_reply {
  uint32_t reserved1;
  uint32_t file_type;
  uint8_t  eflag;
  uint8_t  reserved2[1];
  uint16_t chroot_len;
  uint32_t umask;
  uint16_t nentries;
  uint16_t path_len;
};

struct _io_connect_ftype_reply {
  uint16_t status;      /* Typically an errno */
  uint16_t reserved;
  uint32_t file_type;   /* _FTYPE_? in sys/ftype.h */
};

typedef union {
  struct _io_connect             connect;
  struct _io_connect_link_reply  link_reply;
  struct _io_connect_ftype_reply ftype_reply;
} io_mount_t;

Description: This function is called whenever a mount() or umount() client function sends your resource manager a message. For more information about the io_mount handler, see “Handling mount()” in the Handling Other Messages chapter of Writing a Resource Manager.

Returns: The status via the helper macro _IO_SET_CONNECT_RET().

io_msg()

int io_msg (resmgr_context_t *ctp, io_msg_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: none.

Helper functions: none.

Client function: none — manually assembled and sent via MsgSend()

Messages: _IO_MSG

Data structure:

struct _io_msg {
  uint16_t       type;
  uint16_t       combine_len;
  uint16_t       mgrid;
  uint16_t       subtype;
};

typedef union {
  struct _io_msg i;
} io_msg_t;

Description: The _IO_MSG interface is a more general, but less portable, variation on the ioctl()/devctl() theme. The mgrid is used to identify a particular manager — you should not perform actions for requests that don't conform to your manager ID. The subtype is effectively the command that the client wishes to perform. Any data that's transferred implicitly follows the input structure. Data that's returned to the client is sent on its own, with the status returned via _RESMGR_STATUS(). You can get a “manager ID” from QSS.

Returns: The status via the helper macro _RESMGR_STATUS().

io_notify()

int io_notify (resmgr_context_t *ctp, io_notify_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: none

Helper functions: iofunc_notify(), iofunc_notify_remove(), iofunc_notify_remove_strict(), iofunc_notify_trigger(), iofunc_notify_trigger_strict()

Client functions: select(), ionotify()

Messages: _IO_NOTIFY

Data structure:

struct _io_notify {
  uint16_t                type;
  uint16_t                combine_len;
  int32_t                 action;
  int32_t                 flags;
  struct sigevent         event;
};

struct _io_notify_reply {
  uint32_t                zero;
  uint32_t                flags;
};

typedef union {
  struct _io_notify       i;
  struct _io_notify_reply o;
} io_notify_t;

Description: The handler is responsible for installing, polling, or removing a notification handler. The action and flags determine the kind of notification operation and conditions; the event is a struct sigevent structure that defines the notification event (if any) that the client wishes to be signaled with. You'd use the MsgDeliverEvent() or iofunc_notify_trigger() functions to deliver the event to the client.

Returns: The status via the helper macro _RESMGR_STATUS(); the flags are returned via message reply.

io_open()

int io_open (resmgr_context_t *ctp, io_open_t *msg, RESMGR_HANDLE_T *handle, void *extra)

Classification: Connect

Default handler: iofunc_open_default()

Helper functions: iofunc_open(), iofunc_ocb_attach()

Client functions: open(), fopen(), sopen() (and others)

Messages: _IO_CONNECT with one of _IO_CONNECT_COMBINE, _IO_CONNECT_COMBINE_CLOSE or _IO_CONNECT_OPEN subtypes.

Data structure:

struct _io_connect {
  // internal fields (as described above)
  uint16_t path_len;
  uint8_t  extra_type;
  uint16_t extra_len;
  char     path [1];
};

struct _io_connect_link_reply {
  uint32_t reserved1;
  uint32_t file_type;
  uint8_t  eflag;
  uint8_t  reserved2[1];
  uint16_t chroot_len;
  uint32_t umask;
  uint16_t nentries;
  uint16_t path_len;
};

struct _io_connect_ftype_reply {
  uint16_t status;      /* Typically an errno */
  uint16_t reserved;
  uint32_t file_type;   /* _FTYPE_? in sys/ftype.h */
};

typedef union {
  struct _io_connect             connect;
  struct _io_connect_link_reply  link_reply;
  struct _io_connect_ftype_reply ftype_reply;
} io_open_t;

Description: This is the main entry point into the resource manager. It checks that the client indeed has the appropriate permissions to open the file, binds the OCB to the internal library structures (via resmgr_bind_ocb(), or iofunc_ocb_attach()), and returns an errno. Note that not all input and output structure members are relevant for this function.

Returns: The status via the helper macro _IO_SET_CONNECT_RET().

io_openfd()

int io_openfd (resmgr_context_t *ctp, io_openfd_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_openfd_default()

Helper functions: iofunc_openfd()

Client function: openfd()

Messages: _IO_OPENFD

Data structure:

struct _io_openfd {
  uint16_t          type;
  uint16_t          combine_len;
  uint32_t          ioflag;
  uint16_t          sflag;
  uint16_t          reserved1;
  struct _msg_info  info;
  uint32_t          reserved2;
  uint32_t          key;
};

typedef union {
  struct _io_openfd i;
} io_openfd_t;

Description: This function is similar to the handler provided for io_open(), except that instead of a pathname, an already-open file descriptor is passed (by virtue of passing you the ocb in the function call).

Returns: The status via the helper macro _RESMGR_STATUS().

io_pathconf()

int io_pathconf (resmgr_context_t *ctp, io_pathconf_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_pathconf_default()

Helper functions: iofunc_pathconf()

Client functions: fpathconf(), pathconf()

Messages: _IO_PATHCONF

Data structure:

struct _io_pathconf {
  uint16_t            type;
  uint16_t            combine_len;
  short               name;
  uint16_t            zero;
};

typedef union {
  struct _io_pathconf i;
} io_pathconf_t;

Description: The handler for this message is responsible for returning the value of the configurable parameter name for the resource associated with this OCB. Use the default function and add additional cases for the name member as appropriate for your device.

Returns: The status via the helper macro _IO_SET_PATHCONF_VALUE() and the data via message reply.

io_power()

int io_power (resmgr_context_t *ctp, io_power_t *msg, RESMGR_OCB_T *ocb)

This function is reserved by QSS for future use. You should initialize the I/O table using iofunc_func_init() and not modify this entry.

io_read()

int io_read (resmgr_context_t *ctp, io_read_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_read_default()

Helper functions: iofunc_read_verify()

Client functions: read(), readdir()

Messages: _IO_READ

Data structure:

struct _io_read {
  uint16_t        type;
  uint16_t        combine_len;
  int32_t         nbytes;
  uint32_t        xtype;
  uint32_t        zero;
};

typedef union {
  struct _io_read i;
} io_read_t;

Description: Responsible for reading data from the resource. The client specifies the number of bytes it's prepared to read in the nbytes input member. You return the data, advance the offset in the OCB, and update the appropriate time fields.

Note that the xtype member may specify a per-read-message override flag. This should be examined. If you don't support any extended override flags, you should return an EINVAL. We'll see the handling of one particularly important (and tricky!) override flag called _IO_XTYPE_OFFSET in the io_read() and io_write() examples below.

Note also that the _IO_READ message arrives not only for regular files, but also for reading the contents of directories. You must ensure that you return an integral number of struct dirent members in the directory case. For more information about returning directory entries, see the example in the “Advanced topics” section under “Returning directory entries.”

The helper function iofunc_read_verify() should be called to ascertain that the file was opened in a mode compatible with reading. Also, the iofunc_sync_verify() function should be called to verify if the data needs to be synchronized to the medium. (For a read(), that means that the data returned is guaranteed to be on-media.)

Returns: The number of bytes read, or the status, via the helper macro _IO_SET_READ_NBYTES(), and the data itself via message reply.

For an example of returning just data, take a look at “A simple io_read() example” below. For a more complicated example of returning both data and directory entries, look in the “Advanced topics” section under “Returning directory entries.”

io_readlink()

int io_readlink (resmgr_context_t *ctp, io_readlink_t *msg, RESMGR_HANDLE_T *handle, void *reserved)

Classification: Connect

Default handler: none

Helper functions: iofunc_readlink()

Client function: readlink()

Messages: _IO_CONNECT with subtype _IO_CONNECT_READLINK

Data structure:

struct _io_connect {
  // internal fields (as described above)
  uint16_t path_len;
  uint8_t  extra_type;
  uint16_t extra_len;
  char     path [1];
};

struct _io_connect_link_reply {
  uint32_t reserved1;
  uint32_t file_type;
  uint8_t  eflag;
  uint8_t  reserved2[1];
  uint16_t chroot_len;
  uint32_t umask;
  uint16_t nentries;
  uint16_t path_len;
};

struct _io_connect_ftype_reply {
  uint16_t status;      /* Typically an errno */
  uint16_t reserved;
  uint32_t file_type;   /* _FTYPE_? in sys/ftype.h */
};

typedef union {
  struct _io_connect             connect;
  struct _io_connect_link_reply  link_reply;
  struct _io_connect_ftype_reply ftype_reply;
} io_readlink_t;

Description: Responsible for reading the contents of a symbolic link as specified by the path member of the input structure. The bytes returned are the contents of the symbolic link; the status returned is the number of bytes in the reply. A valid return should be done only for a symbolic link; all other accesses should return an error code.

Returns: The status via the helper macro _RESMGR_STATUS() and the data via message reply.

io_rename()

int io_rename (resmgr_context_t *ctp, io_rename_t *msg, RESMGR_HANDLE_T *handle, io_rename_extra_t *extra)

Classification: Connect

Default handler: none

Helper functions: iofunc_rename()

Client function: rename()

Messages: _IO_CONNECT with subtype _IO_CONNECT_RENAME

Data structure:

struct _io_connect {
  // internal fields (as described above)
  uint16_t path_len;
  uint8_t  extra_type;
  uint16_t extra_len;
  char     path [1];
};

struct _io_connect_link_reply {
  uint32_t reserved1;
  uint32_t file_type;
  uint8_t  eflag;
  uint8_t  reserved2[1];
  uint16_t chroot_len;
  uint32_t umask;
  uint16_t nentries;
  uint16_t path_len;
};

struct _io_connect_ftype_reply {
  uint16_t status;      /* Typically an errno */
  uint16_t reserved;
  uint32_t file_type;   /* _FTYPE_? in sys/ftype.h */
};

typedef union _io_rename_extra {
  char     path [1];
} io_rename_extra_t;

typedef union {
  struct _io_connect             connect;
  struct _io_connect_link_reply  link_reply;
  struct _io_connect_ftype_reply ftype_reply;
} io_rename_t;

Description: Performs the rename operation, given the new name in path and the original name in the path member of the passed extra parameter. Implementation note: the pathname of the original name is given (rather than an OCB) specifically for the case of handling a rename of a file that's hard-linked to another file. If the OCB were given, there would be no way to tell apart the two (or more) versions of the hard-linked file.

This function will be called only with two filenames that are on the same filesystem (same device). Therefore, there's no need to check for a case where you'd return EXDEV. This doesn't prevent you from returning EXDEV if you don't wish to perform the rename() yourself (for example, it may be very complicated to do the rename operation from one directory to another). In the case of returning EXDEV, the shell utility mv will perform a cp followed by an rm (the C library function rename() will do no such thing — it will return only an errno of EXDEV).

Also, all symlinks will be resolved, where applicable, before this function is called, and the pathnames passed will be absolute and rooted in the filesystem for which this resource manager is responsible.

Returns: The status via the helper macro _RESMGR_STATUS().

io_shutdown()

int io_shutdown (resmgr_context_t *ctp, io_shutdown_t *msg, RESMGR_OCB_T *ocb)

This function is reserved by QSS for future use. You should initialize the I/O table using iofunc_func_init() and not modify this entry.

io_space()

int io_space (resmgr_context_t *ctp, io_space_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: none

Helper functions: iofunc_space_verify()

Client functions: chsize(), fcntl(), ftruncate(), ltrunc()

Messages: _IO_SPACE

Data structure:

struct _io_space {
  uint16_t         type;
  uint16_t         combine_len;
  uint16_t         subtype;
  short            whence;
  uint64_t         start;
  uint64_t         len;
};

typedef union {
  struct _io_space i;
  uint64_t         o;
} io_space_t;

Description: This is used to allocate or free space occupied by the resource. The subtype parameter indicates whether to allocate (if set to F_ALLOCSP) or deallocate (if set to F_FREESP) storage space. The combination of whence and start give the location where the beginning of the allocation or deallocation should occur; the member len indicates the size of the operation.

Returns: The number of bytes (size of the resource) via the helper macro _RESMGR_STATUS().

io_stat()

int io_stat (resmgr_context_t *ctp, io_stat_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_stat_default()

Helper functions: iofunc_stat()

Client functions: stat(), lstat(), fstat()

Messages: _IO_STAT

Data structure:

struct _io_stat {
  uint16_t        type;
  uint16_t        combine_len;
  uint32_t        zero;
};

typedef union {
  struct _io_stat i;
  struct stat     o;
} io_stat_t;

Description: Handles the message that requests information about the resource associated with the passed OCB. Note that the attributes structure contains all the information required to fulfill the stat() request; the helper function iofunc_stat() fills a struct stat structure based on the attributes structure. Also, the helper function modifies the stored dev/rdev members to be unique from a single node's point of view (useful for performing stat() calls to files over a network). There's almost no reason to write your own handler for this function.

Returns: The status via the helper macro _RESMGR_STATUS() and the struct stat via message reply.

io_sync()

int io_sync (resmgr_context_t *ctp, io_sync_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_sync_default()

Helper functions: iofunc_sync_verify(), iofunc_sync()

Client functions: fsync(), fdatasync()

Messages: _IO_SYNC

Data structure:

struct _io_sync {
  uint16_t        type;
  uint16_t        combine_len;
  uint32_t        flag;
};

typedef union {
  struct _io_sync i;
} io_sync_t;

Description: This is the entry point for a flush command. The helper function iofunc_sync() is passed the flag member from the input message, and returns one of the following values, which indicate what actions your resource manager must take:

0 — do nothing.
O_SYNC — everything associated with the file (including the file contents, directory structures, inodes, etc.) must be present and recoverable from media.
O_DSYNC — only the data portion of the file must be present and recoverable from media.

Note that this outcall will occur only if you've agreed to provide sync services by setting the mount structure flag.

Returns: Returns the status via the helper macro _RESMGR_STATUS().

io_unblock() [CONNECT]

int io_unblock (resmgr_context_t *ctp, io_pulse_t *msg, RESMGR_HANDLE_T *handle, void *reserved)

Classification: Connect (synthesized by kernel, synthesized by library)

Default handler: none

Helper functions: iofunc_unblock()

Client function: none — kernel action due to signal or timeout

Messages: none — synthesized by library

Data structure: (See I/O version of io_unblock(), next)

Description: This is the connect message version of the unblock outcall, synthesized by the library as a result of a kernel pulse due to the client's attempt to unblock during the connect message phase. See the I/O version of io_unblock() for more details.

Returns: The status via the helper macro _RESMGR_STATUS().

See the section in the Message Passing chapter, titled “Using the _NTO_MI_UNBLOCK_REQ” for a detailed discussion of unblocking strategies.

io_unblock() [I/O]

int io_unblock (resmgr_context_t *ctp, io_pulse_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O (synthesized by kernel, synthesized by library)

Default handler: iofunc_unblock_default()

Helper functions: iofunc_unblock()

Client function: none — kernel action due to signal or timeout

Messages: none — synthesized by library

Data structure: pointer to message structure being interrupted

Description: This is the I/O message version of the unblock outcall, synthesized by the library as a result of a kernel pulse due to the client's attempt to unblock during the I/O message phase. The connect message phase io_unblock() handler is substantially the same (see the preceding section).

Common to both unblock handlers (connect and I/O) is the characteristic that the client wishes to unblock, but is at the mercy of the resource manager. The resource manager must reply to the client's message in order to unblock the client. (This is discussed in the Message Passing chapter when we looked at the ChannelCreate() flags, particularly the _NTO_CHF_UNBLOCK flag).

Returns: The status via the helper macro _RESMGR_STATUS().

See the section in the Message Passing chapter, titled “Using the _NTO_MI_UNBLOCK_REQ” for a detailed discussion of unblocking strategies.

io_unlink()

int io_unlink (resmgr_context_t *ctp, io_unlink_t *msg, RESMGR_HANDLE_T *handle, void *reserved)

Classification: Connect

Default handler: none

Helper functions: iofunc_unlink()

Client function: unlink()

Messages: _IO_CONNECT with subtype _IO_CONNECT_UNLINK

Data structure:

struct _io_connect {
  // internal fields (as described above)
  uint16_t path_len;
  uint8_t  extra_type;
  uint16_t extra_len;
  char     path [1];
};

struct _io_connect_link_reply {
  uint32_t reserved1;
  uint32_t file_type;
  uint8_t  eflag;
  uint8_t  reserved2[1];
  uint16_t chroot_len;
  uint32_t umask;
  uint16_t nentries;
  uint16_t path_len;
};

struct _io_connect_ftype_reply {
  uint16_t status;      /* Typically an errno */
  uint16_t reserved;
  uint32_t file_type;   /* _FTYPE_? in sys/ftype.h */
};

typedef union {
  struct _io_connect             connect;
  struct _io_connect_link_reply  link_reply;
  struct _io_connect_ftype_reply ftype_reply;
} io_unlink_t;

Description: Responsible for unlinking the file whose pathname is passed in the input message structure's path member.

Returns: The status via the helper macro _RESMGR_STATUS().

io_unlock_ocb()

int io_unlock_ocb (resmgr_context_t *ctp, void *reserved, RESMGR_OCB_T *ocb)

Classification: I/O (synthesized by library)

Default handler: iofunc_unlock_ocb_default()

Helper functions: none

Client functions: all

Messages: none — synthesized by library

Data structure: none

Description: Inverse of io_lock_ocb(), above. That is, it's responsible for unlocking the attributes structure pointed to by the OCB. This operation releases the attributes structure so that other threads in the resource manager may operate on it. See the section on “Combine messages” above for more details.

Returns: The status via the helper macro _RESMGR_STATUS().

io_utime()

int io_utime (resmgr_context_t *ctp, io_utime_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_utime_default()

Helper functions: iofunc_utime()

Client function: utime()

Messages: _IO_UTIME

Data structure:

struct _io_utime {
  uint16_t         type;
  uint16_t         combine_len;
  int32_t          cur_flag;
  struct utimbuf   times;
};

typedef union {
  struct _io_utime i;
} io_utime_t;

Description: Changes the access and modification times to either “now” (if they are zero) or the specified values. Note that this message handler may be required to modify the IOFUNC_ATTR_* flags in the attribute structure as per POSIX rules. You'll almost never use this outcall yourself, but will instead use the POSIX-layer helper function.

Returns: The status via the helper macro _RESMGR_STATUS().

io_write()

int io_write (resmgr_context_t *ctp, io_write_t *msg, RESMGR_OCB_T *ocb)

Classification: I/O

Default handler: iofunc_write_default()

Helper functions: iofunc_write_verify()

Client functions: write(), fwrite(), etc.

Messages: _IO_WRITE

Data structure:

struct _io_write {
  uint16_t         type;
  uint16_t         combine_len;
  int32_t          nbytes;
  uint32_t         xtype;
  uint32_t         zero;
};

typedef union {
  struct _io_write i;
} io_write_t;

Description: This message handler is responsible for getting data that the client wrote to the resource manager. It gets passed the number of bytes the client is attempting to write in the nbytes member; the data implicitly follows the input data structure (unless the xtype override is _IO_XTYPE_OFFSET; see “A simple io_write() example” below!) The implementation will need to re-read the data portion of the message from the client, using resmgr_msgreadv() or the equivalent. The return status is the number of bytes actually written or an errno.

Note that the helper function iofunc_write_verify() should be called to ascertain that the file was opened in a mode compatible with writing. Also, the iofunc_sync_verify() function should be called to verify if the data needs to be synchronized to the medium.

Returns: The status via the helper macro _IO_SET_WRITE_NBYTES().

For an example, take a look at “A simple io_write() example” below.

Examples

I'm now going to show you a number of “cookbook” examples you can cut and paste into your code, to use as a basis for your projects. These aren't complete resource managers — you'll need to add the thread pool and dispatch “skeleton” shown immediately below, and ensure that your versions of the I/O functions are placed into the I/O functions table after you've done the iofunc_func_init(), in order to override the defaults!

I'll start with a number of simple examples that show basic functionality for the various resource manager message handlers:

io_read()
io_write()
io_devctl() (without data transfer)
io_devctl() (with data transfer)

And then in the advanced topics section, we'll look at an io_read() that returns directory entries.

The basic skeleton of a resource manager

The following can be used as a template for a resource manager with multiple threads. (We've already seen a template that can be used for a single-threaded resource manager above in “The resource manager library,” when we discussed a /dev/null resource manager).

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <sys/iofunc.h>
#include <sys/dispatch.h>
#include <string.h>

static resmgr_connect_funcs_t   connect_func;
static resmgr_io_funcs_t        io_func;
static iofunc_attr_t            attr;

int main (int argc, char **argv)
{
    thread_pool_attr_t    pool_attr;
    thread_pool_t         *tpp;
    dispatch_t            *dpp;
    resmgr_attr_t         resmgr_attr;
    int                   id;

    if ((dpp = dispatch_create ()) == NULL) {
        fprintf (stderr,
                 "%s:  Unable to allocate dispatch context.\n",
                 argv [0]);
        return (EXIT_FAILURE);
    }

    memset (&pool_attr, 0, sizeof (pool_attr));
    pool_attr.handle = dpp;
    pool_attr.context_alloc = (void *) dispatch_context_alloc;
    pool_attr.block_func    = (void *) dispatch_block;
    pool_attr.handler_func  = (void *) dispatch_handler;
    pool_attr.context_free  = (void *) dispatch_context_free;

    // 1) set up the number of threads that you want
    pool_attr.lo_water = 2;
    pool_attr.hi_water = 4;
    pool_attr.increment = 1;
    pool_attr.maximum = 50;

    if ((tpp = thread_pool_create (&pool_attr,
                                   POOL_FLAG_EXIT_SELF)) == NULL) {
        fprintf (stderr,
                 "%s:  Unable to initialize thread pool.\n",
                 argv [0]);
        return (EXIT_FAILURE);
    }

    iofunc_func_init (_RESMGR_CONNECT_NFUNCS, &connect_func,
                      _RESMGR_IO_NFUNCS, &io_func);
    iofunc_attr_init (&attr, S_IFNAM | 0777, 0, 0);

    // 2) override functions in "connect_func" and "io_func" as
    // required here

    memset (&resmgr_attr, 0, sizeof (resmgr_attr));
    resmgr_attr.nparts_max = 1;
    resmgr_attr.msg_max_size = 2048;

    // 3) replace "/dev/whatever" with your device name
    if ((id = resmgr_attach (dpp, &resmgr_attr, "/dev/whatever",
                _FTYPE_ANY, 0, &connect_func, &io_func,
                &attr)) == -1) {
        fprintf (stderr,
                 "%s:  Unable to attach name.\n", argv [0]);
        return (EXIT_FAILURE);
    }

    // Never returns
    thread_pool_start (tpp);
    
    return (EXIT_SUCCESS);
}

For more information about the dispatch interface (i.e., the dispatch_create() function), see the documentation in the Neutrino Library Reference.

Step 1

Here you'd use the thread pool functions to create a pool of threads that will be able to service messages in your resource manager. Generally, I recommend that you start off with a single-threaded resource manager, as we did with the /dev/null example mentioned above. Once you have the basic functionality running, you can then add threads. You'd modify the lo_water, hi_water, increment, and maximum members of the pool_attr structure as described in the “Threads & Processes” chapter where we discuss the thread pool functions.

Step 2

Here you'd add whatever functions you want to supply. These are the outcalls we just discussed (e.g. io_read(), io_devctl(), etc.) For example, to add your own handler for the _IO_READ message that points to a function supplied by you called my_io_read(), you'd add the following line of code:

    io_func.io_read = my_io_read;

This will override the POSIX-layer default function that got put into the table by iofunc_func_init() with a pointer to your function, my_io_read().

Step 3

You probably don't want your resource manager called /dev/whatever, so you should select an appropriate name. Note that the resmgr_attach() function is where you bind the attributes structure (the attr parameter) to the name — if you wish to have multiple devices handled by your resource manager, you'd call resmgr_attach() multiple times, with different attributes structures (so that you could tell the different registered names apart at runtime).

A simple io_read() example

To illustrate how your resource manager might return data to a client, consider a simple resource manager that always returns the constant string "Hello, world!\n". There are a number of issues involved, even in this very simple case:

matching of client's data area size to data being returned
handling of EOF case
maintenance of context information (the lseek() index)
updating of POSIX stat() information

Data area size considerations

In our case, the resource manager is returning a fixed string of 14 bytes — there is exactly that much data available. This is identical to a read-only file on a disk that contains the string in question; the only real difference is that this “file” is maintained in our C program via the statement:

char    *data_string = "Hello, world!\n";

The client, on the other hand, can issue a read() request of any size — the client could ask for one byte, 14 bytes, or more. The impact of this on the io_read() functionality you're going to provide is that you must be able to match the client's requested data size with what's available.

Handling of EOF case

A natural fallout of the way you handle the client's data area size considerations is the corner case of dealing with the End-Of-File (EOF) on the fixed string. Once the client has read the final “\n” character, further attempts by the client to read more data should return EOF.

Maintenance of context information

Both the “Data area size considerations” and the “Handling of EOF case” scenarios will require that context be maintained in the OCB passed to your io_read() function, specifically the offset member.

Updating POSIX information

One final consideration: when data is read from a resource manager, the POSIX access time (atime) variable needs to be updated. This is so that a client stat() function will show that someone has indeed accessed the device.

The code

Here's the code that addresses all the above points. We'll go through it step-by-step in the discussion that follows:

/*
 * io_read1.c
*/

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/neutrino.h>
#include <sys/iofunc.h>

// our data string
char    *data_string = "Hello, world!\n";

int
io_read (resmgr_context_t *ctp, io_read_t *msg, iofunc_ocb_t *ocb)
{
    int     sts;
    int     nbytes;
    int     nleft;
    int     off;
    int     xtype;
    struct _xtype_offset *xoffset;

    // 1) verify that the device is opened for read
    if ((sts = iofunc_read_verify (ctp, msg, ocb, NULL)) != EOK) {
        return (sts);
    }

    // 2) check for and handle an XTYPE override
    xtype = msg -> i.xtype & _IO_XTYPE_MASK;
    if (xtype == _IO_XTYPE_OFFSET) {
        xoffset = (struct _xtype_offset *) (&msg -> i + 1);
        off = xoffset -> offset;
    } else if (xtype == _IO_XTYPE_NONE) {
        off = ocb -> offset;
    } else {   // unknown, fail it
        return (ENOSYS);
    }

    // 3) how many bytes are left?
    nleft = ocb -> attr -> nbytes - off;

    // 4) how many bytes can we return to the client?
    nbytes = min (nleft, msg -> i.nbytes);

    // 5) if returning data, write it to client
    if (nbytes) {
        MsgReply (ctp -> rcvid, nbytes, data_string + off, nbytes);

        // 6) set up POSIX stat() "atime" data
        ocb -> attr -> flags |= IOFUNC_ATTR_ATIME |
                                IOFUNC_ATTR_DIRTY_TIME;

        // 7) advance the lseek() index by the number of bytes
        // read if not _IO_XTYPE_OFFSET
        if (xtype == _IO_XTYPE_NONE) {
            ocb -> offset += nbytes;
        }
    } else {
        // 8) not returning data, just unblock client
        MsgReply (ctp -> rcvid, EOK, NULL, 0);
    }

    // 9) indicate we already did the MsgReply to the library
    return (_RESMGR_NOREPLY);
}

Step 1

Here we ensured that the client's open() call had in fact specified that the device was to be opened for reading. If the client opened the device for writing only, and then attempted to perform a read from it, it would be considered an error. In that case, the helper function iofunc_read_verify() would return EBADF, and not EOK, so we'd return that value to the library, which would then pass it along to the client.

Step 2

Here we checked to see if the client had specified an xtype-override — a per-message override (e.g., because while the device had been opened in non-blocking mode, this specifies for this one request that we'd like blocking behavior). Note that the blocking aspect of the “xtype” override can be noted by the iofunc_read_verify() function's last parameter — since we're illustrating a very simple example, we just passed in a NULL indicating that we don't care about this aspect.

More important, however, is to see how particular “xtype” modifiers are handled. An interesting one is the _IO_XTYPE_OFFSET modifier, which, if present, indicates that the message passed from the client contains an offset and that the read operation should not modify the “current file position” of the file descriptor (this is used by the function pread(), for example). If the _IO_XTYPE_OFFSET modifier is not present, then the read operation can go ahead and modify the “current file position.” We use the variable xtype to store the “xtype” that we received in the message, and the variable off to represent the current offset that we should be using during processing. You'll see some additional handling of the _IO_XTYPE_OFFSET modifier below, in step 7.

If there is a different “xtype override” than _IO_XTYPE_OFFSET (and not the no-op one of _IO_XTYPE_NONE), we fail the request with ENOSYS. This simply means that we don't know how to handle it, and we therefore return the error up to the client.

Steps 3 & 4

To calculate how many bytes we can actually return to the client, we perform steps 3 and 4, which figure out how many bytes are available on the device (by taking the total device size from ocb -> attr -> nbytes and subtracting the current offset into the device). Once we know how many bytes are left, we take the smaller of that number and the number of bytes that the client specified that they wish to read. For example, we may have seven bytes left, and the client wants to only read two. In that case, we can return only two bytes to the client. Alternatively, if the client wanted 4096 bytes, but we had only seven left, we could return only seven bytes.

Step 5

Now that we've calculated how many bytes we're going to return to the client, we need to do different things based on whether or not we're returning data. If we are returning data, then after the check in step 5, we reply to the client with the data. Notice that we use data_string + off to return data starting at the correct offset (the off is calculated based on the xtype override). Also notice the second parameter to MsgReply() — it's documented as the status argument, but in this case we're using it to return the number of bytes. This is because the implementation of the client's read() function knows that the return value from its MsgSendv() (which is the status argument to MsgReply(), by the way) is the number of bytes that were read. This is a common convention.

Step 6

Since we're returning data from the device, we know that the device has been accessed. We set the IOFUNC_ATTR_ATIME and IOFUNC_ATTR_DIRTY_TIME bits in the flags member of the attribute structure. This serves as a reminder to the io_stat() function that the access time is not valid and should be fetched from the system clock before replying. If we really wanted to, we could have stuffed the current time into the atime member of the attributes structure, and cleared the IOFUNC_ATTR_DIRTY_TIME flag. But this isn't very efficient, since we're expecting to get a lot more read() requests from the client than stat() requests. However, your usage patterns may dictate otherwise.

So which time does the client see when it finally does call stat()? The iofunc_stat_default() function provided by the resource manager library will look at the flags member of the attribute structure to see if the times are valid (the atime, ctime, and mtime fields). If they are not (as will be the case after our io_read() has been called that returned data), the iofunc_stat_default() function will update the time(s) with the current time. The real value of the time is also updated on a close(), as you'd expect.

Step 7

Now we advance the lseek() offset by the number of bytes that we returned to the client, only if we are not processing the _IO_XTYPE_OFFSET override modifier. This ensures that, in the non-_IO_XTYPE_OFFSET case, if the client calls lseek() to get the current position, or (more importantly) when the client calls read() to get the next few bytes, the offset into the resource is set to the correct value. In the case of the _IO_XTYPE_OFFSET override, we leave the ocb version of the offset alone.

Step 8

Contrast step 6 with this step. Here we only unblock the client, we don't perform any other functions. Notice also that there is no data area specified to the MsgReply(), because we're not returning data.

Step 9

Finally, in step 9, we perform processing that's common regardless of whether or not we returned data to the client. Since we've already unblocked the client via the MsgReply(), we certainly don't want the resource manager library doing that for us, so we tell it that we've already done that by returning _RESMGR_NOREPLY.

Effective use of other messaging functions

As you'll recall from the Message Passing chapter, we discussed a few other message-passing functions — namely MsgWrite(), MsgWritev(), and MsgReplyv(). The reason I'm mentioning them here again is because your io_read() function may be in an excellent position to use these functions. In the simple example shown above, we were returning a contiguous array of bytes from one memory location. In the real world, you may need to return multiple pieces of data from various buffers that you've allocated. A classical example of this is a ring buffer, as might be found in a serial device driver. Part of the data may be near the end of the buffer, with the rest of it “wrapped” to the top of the buffer. In this case, you'll want to use a two-part IOV with MsgReplyv() to return both parts. The first part of the IOV would contain the address (and length) of the bottom part of the data, and the second part of the IOV would contain the address (and length) of the top part of the data. Or, if the data is going to arrive in pieces, you may instead choose to use MsgWrite() or MsgWritev() to place the data into the client's address space as it arrives and then specify a final MsgReply() or MsgReplyv() to unblock the client. As we've seen above, there's no requirement to actually transfer data with the MsgReply() function — you can use it to simply unblock the client.

A simple io_write() example

The io_read() example was fairly simple; let's take a look at io_write(). The major hurdle to overcome with the io_write() is to access the data. Since the resource manager library reads in a small portion of the message from the client, the data content that the client sent (immediately after the _IO_WRITE header) may have only partially arrived at the io_write() function. To illustrate this, consider the client writing one megabyte — only the header and a few bytes of the data will get read by the resource manager library. The rest of the megabyte of data is still available on the client side — the resource manager can access it at will.

There are really two cases to consider:

the entire contents of the client's write() message were read by the resource manager library, or
they were not.

The real design decision, however, is, “how much trouble is it worth to try to save the kernel copy of the data already present?” The answer is that it's not worth it. There are a number of reasons for this:

Message passing (the kernel copy operation) is extremely fast.
There is overhead required to see if the data all fits or not.
There is additional overhead in trying to “save” the first dribble of data that arrived, in light of the fact that more data is waiting.

I think the first two points are self-explanatory. The third point deserves clarification. Let's say the client sent us a large chunk of data, and we did decide that it would be a good idea to try to save the part of the data that had already arrived. Unfortunately, that part is very small. This means that instead of being able to deal with the large chunk all as one contiguous array of bytes, we have to deal with it as one small part plus the rest. Effectively, we have to “special case” the small part, which may have an impact on the overall efficiency of the code that deals with the data. This can lead to headaches, so don't do this!

The real answer, then, is to simply re-read the data into buffers that you've prepared. In our simple io_write() example, I'm just going to malloc() the buffer each time, read the data into the buffer, and then release the buffer via free(). Granted, there are certainly far more efficient ways of allocating and managing buffers!

One further wrinkle introduced in the io_write() example is the handling of the _IO_XTYPE_OFFSET modifier (and associated data; it's done slightly differently than in the io_read() example).

Here's the code:

/*
 * io_write1.c
*/

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/neutrino.h>
#include <sys/iofunc.h>

void
process_data (int offet, void *buffer, int nbytes)
{
    // do something with the data
}

int
io_write (resmgr_context_t *ctp, io_write_t *msg,
          iofunc_ocb_t *ocb)
{
    int     sts;
    int     nbytes;
    int     off;
    int     start_data_offset;
    int     xtype;
    char    *buffer;
    struct _xtype_offset *xoffset;

    // verify that the device is opened for write
    if ((sts = iofunc_write_verify (ctp, msg, ocb, NULL)) != EOK)
    {
        return (sts);
    }

    // 1) check for and handle an XTYPE override
    xtype = msg -> i.xtype & _IO_XTYPE_MASK;
    if (xtype == _IO_XTYPE_OFFSET) {
        xoffset = (struct _xtype_offset *) (&msg -> i + 1);
        start_data_offset = sizeof (msg -> i) + sizeof (*xoffset);
        off = xoffset -> offset;
    } else if (xtype == _IO_XTYPE_NONE) {
        off = ocb -> offset;
        start_data_offset = sizeof (msg -> i);
    } else {   // unknown, fail it
        return (ENOSYS);
    }

    // 2) allocate a buffer big enough for the data
    nbytes = msg -> i.nbytes;
    if ((buffer = malloc (nbytes)) == NULL) {
        return (ENOMEM);
    }

    // 3) (re-)read the data from the client
    if (resmgr_msgread (ctp, buffer, nbytes,
                        start_data_offset) == -1)
    {
        free (buffer);
        return (errno);
    }

    // 4) do something with the data
    process_data (off, buffer, nbytes);

    // 5) free the buffer
    free (buffer);

    // 6) set up the number of bytes for the client's "write"
    // function to return
    _IO_SET_WRITE_NBYTES (ctp, nbytes);

    // 7) if any data written, update POSIX structures and OCB offset
    if (nbytes) {
        ocb -> attr -> flags |= IOFUNC_ATTR_MTIME | IOFUNC_ATTR_DIRTY_TIME;
        if (xtype == _IO_XTYPE_NONE) {
            ocb -> offset += nbytes;
        }
    }

    // 8) tell the resource manager library to do the reply, and that it
    // was okay
    return (EOK);
}

As you can see, a few of the initial operations performed were identical to those done in the io_read() example — the iofunc_write_verify() is analogous to the iofunc_read_verify() function, and the xtype override check is the same.

Step 1

Here we performed much the same processing for the “xtype override” as we did in the io_read() example, except for the fact that the offset is not stored as part of the incoming message structure. The reason it's not stored there is because a common practice is to use the size of the incoming message structure to determine the starting point of the actual data being transferred from the client. We take special pains to ensure the offset of the start of the data (doffset) is correct in the xtype handling code.

Step 2

Here we allocate a buffer that's big enough for the data. The number of bytes that the client is writing is presented to us in the nbytes member of the msg union. This is stuffed automatically by the client's C library in the write() routine. Note that if we don't have sufficient memory to handle the malloc() request, we return the error number ENOMEM to the client — effectively, we're passing on the return code to the client to let it know why its request wasn't completed.

Step 3

Here we use the helper function resmgr_msgread() to read the entire data content from the client directly into the newly allocated buffer. In most cases we could have just used MsgRead(), but in the case where this message is part of a “combine message,” resmgr_msgread() performs the appropriate “magic” for us (see the “Combine message” section for more information on why we need to do this.) The parameters to resmgr_msgread() are fairly straightforward; we give it the internal context pointer (ctp), the buffer into which we want the data placed (buffer), and the number of bytes that we wish read (the nbytes member of the message msg union). The last parameter is the offset into the current message, which we calculated above, in step 1. The offset effectively skips the header information that the client's C library implementation of write() put there, and proceeds directly to the data. This actually brings about two interesting points:

We could use an arbitrary offset value to read chunks of the client's data in any order and size we want.
We could use resmgr_msgreadv() (note the “v”) to read data from the client into an IOV, perhaps describing various buffers, similar to what we did with the cache buffers in the filesystem discussion in the Message Passing chapter.

Step 4

Here you'd do whatever you want with the data — I've just called a made-up function called process_data() and passed it the buffer and size.

Step 5

This step is crucial! Forgetting to do it is easy, and will lead to “memory leaks.” Notice how we also took care to free the memory in the case of a failure in step 3.

Step 6

We're using the macro _IO_SET_WRITE_NBYTES() (see the entry for iofunc_write_verify() in the Neutrino Library Reference) to store the number of bytes we've written, which will then be passed back to the client as the return value from the client's write(). It's important to note that you should return the actual number of bytes! The client is depending on this.

Step 7

Now we do similar housekeeping for stat(), lseek(), and further write() functions as we did for the io_read() routine (and again, we modify the offset in the ocb only in the case of this not being a _IO_XTYPE_OFFSET type of message). Since we're writing to the device, however, we use the IOFUNC_ATTR_MTIME constant instead of the IOFUNC_ATTR_ATIME constant. The MTIME flag means “modification” time, and a write() to a resource certainly “modifies” it.

Step 8

The last step is simple: we return the constant EOK, which tells the resource manager library that it should reply to the client. This ends our processing. The resource manager will use the number of bytes that we stashed away with the _IO_SET_WRITE_NBYTES() macro in the reply and the client will unblock; the client's C library write() function will return the number of bytes that were written by our device.

A simple io_devctl() example

The client's devctl() call is formally defined as:

#include <sys/types.h>
#include <unistd.h>
#include <devctl.h>

int
devctl (int fd,
        int dcmd,
        void *dev_data_ptr,
        size_t nbytes,
        int *dev_info_ptr);

We should first understand this function before we look at the resource manager side of things. The devctl() function is used for “out of band” or “control” operations. For example, you may be writing data to a sound card (the actual digital audio samples that the sound card should convert to analog audio), and you may decide that you need to change the number of channels from 1 (mono) to 2 (stereo), or the sampling rate from the CD-standard (44.1 kHz) to the DAT-standard (48 kHz). The devctl() function is the appropriate way to do this. When you write a resource manager, you may find that you don't need any devctl() support at all and that you can perform all the functionality needed simply through the standard read() and write() functions. You may, on the other hand, find that you need to mix devctl() calls with the read() and write() calls, or indeed that your device uses only devctl() functions and does not use read() or write().

The devctl() function takes these arguments:

fd: The file descriptor of the resource manager that you're sending the devctl() to.
dcmd: The command itself — a combination of two bits worth of direction, and 30 bits worth of command (see discussion below).
dev_data_ptr: A pointer to a data area that can be sent to, received from, or both.
nbytes: The size of the dev_data_ptr data area.
dev_info_ptr: An extra information variable that can be set by the resource manager.

The top two bits in the dcmd encode the direction of data transfer, if any. For details, see the description in the I/O reference section (under io_devctl()).

When the _IO_DEVCTL message is received by the resource manager, it's handled by your io_devctl() function. Here is a very simple example, which we'll assume is used to set the number of channels and the sampling rate for the audio device we discussed above:

/*
 * io_devctl1.c
*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <sys/neutrino.h>
#include <sys/iofunc.h>

#define DCMD_AUDIO_SET_CHANNEL_MONO         1
#define DCMD_AUDIO_SET_CHANNEL_STEREO       2
#define DCMD_AUDIO_SET_SAMPLE_RATE_CD       3
#define DCMD_AUDIO_SET_SAMPLE_RATE_DAT      4

int
io_devctl (resmgr_context_t *ctp, io_devctl_t *msg,
           iofunc_ocb_t *ocb)
{
    int     sts;

    // 1) see if it's a standard devctl()
    if ((sts = iofunc_devctl_default (ctp, msg, ocb)) !=
        _RESMGR_DEFAULT)
    {
        return (sts);
    }

    // 2) see which command it was, and act on it
    switch (msg -> i.dcmd) {
    case    DCMD_AUDIO_SET_CHANNEL_MONO:
        audio_set_nchannels (1);
        break;
    case    DCMD_AUDIO_SET_CHANNEL_STEREO:
        audio_set_nchannels (2);
        break;
    case    DCMD_AUDIO_SET_SAMPLE_RATE_CD:
        audio_set_samplerate (44100);
        break;
    case    DCMD_AUDIO_SET_SAMPLE_RATE_DAT:
        audio_set_samplerate (48000);
        break;

    // 3) in case it's a command that we don't recognize, fail it
    default:
        return (ENOSYS);
    }

    // 4) tell the client it worked
    memset (&msg -> o, 0, sizeof (msg -> o));
    SETIOV (ctp -> iov, &msg -> o, sizeof (msg -> o));
    return (_RESMGR_NPARTS (1));
}

Step 1

In the first step, we see again the use of a helper function, this time iofunc_devctl_default(), which is used to perform all default processing for the devctl() function. If you didn't supply your own io_devctl(), and just let iofunc_func_init() initialize the I/O and connect functions tables for you, the iofunc_devctl_default() function is what would get called. We include it in our io_devctl() function because we want it to handle all the regular devctl() cases for us. We examine the return value; if it's not _RESMGR_DEFAULT, then this means that the iofunc_devctl_default() function “handled” the request, so we just pass along its return value as our return value.

If the constant _RESMGR_DEFAULT is the return value, then we know that the helper function didn't handle the request and that we should check to see if it's one of ours.

Step 2

This checking is done in step 2 via the switch/case statement. We simply compare the dcmd values that the client code would have stuffed into the second argument to devctl() to see if there's a match. Note that we call the fictitious functions audio_set_nchannels() and audio_set_samplerate() to accomplish the actual “work” for the client. An important note that should be mentioned here is that we've specifically avoided touching the data area aspects of devctl() — you may be thinking, “What if I wanted to set the sample rate to some arbitrary number n, how would I do that?” That will be answered in the next io_devctl() example below.

Step 3

This step is simply good defensive programming. We return an error code of ENOSYS to tell the client that we didn't understand their request.

Step 4

Finally, we clear out the return structure and set up a one-part IOV to point to it. Then we return a value to the resource manager library encoded by the macro _RESMGR_NPARTS() telling it that we're returning a one part IOV. This is then returned to the client. We could alternatively have used the _RESMGR_PTR() macro:

// instead of this
    // 4) tell the client it worked
    memset (&msg -> o, 0, sizeof (msg -> o));
    SETIOV (ctp -> iov, &msg -> o, sizeof (msg -> o));
    return (_RESMGR_NPARTS (1));

// we could have done this
    // 4) tell the client it worked
    memset (&msg -> o, 0, sizeof (msg -> o));
    return (_RESMGR_PTR (ctp, &msg -> o, sizeof (msg -> o)));

The reason we cleared out the return structure here (and not in the io_read() or io_write() examples) is because in this case, the return structure has actual contents! (In the io_read() case, the only data returned was the data itself and the number of bytes read — there was no “return data structure,” and in the io_write() case, the only data returned was the number of bytes written.)

An io_devctl() example that deals with data

In the previous io_devctl() example, above, we raised the question of how to set arbitrary sampling rates. Obviously, it's not a good solution to create a large number of DCMD_AUDIO_SET_SAMPLE_RATE_* constants — we'd rapidly use up the available bits in the dcmd member.

From the client side, we'll use the dev_data_ptr pointer to point to the sample rate, which we'll simply pass as an integer. Therefore, the nbytes member will simply be the number of bytes in an integer (4 on a 32-bit machine). We'll assume that the constant DCMD_AUDIO_SET_SAMPLE_RATE is defined for this purpose.

Also, we'd like to be able to read the current sampling rate. We'll also use the dev_data_ptr and nbytes as described above, but in the reverse direction — the resource manager will return data into the memory location pointed to by dev_data_ptr (for nbytes) instead of getting data from that memory location. Let's assume that the constant DCMD_AUDIO_GET_SAMPLE_RATE is defined for this purpose.

Let's see what happens in the resource manager's io_devctl(), as shown here (we won't discuss things that have already been discussed in the previous example):

/*
 * io_devctl2.c
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <devctl.h>
#include <sys/neutrino.h>
#include <sys/iofunc.h>

#define DCMD_AUDIO_SET_SAMPLE_RATE          1
#define DCMD_AUDIO_GET_SAMPLE_RATE          2

int
io_devctl (resmgr_context_t *ctp, io_devctl_t *msg,
           iofunc_ocb_t *ocb)
{
    int     sts;
    int     nbytes;

    // 1) Declare a pointer to the data area of the message.
    void    *data;

    if ((sts = iofunc_devctl_default (ctp, msg, ocb)) !=
        _RESMGR_DEFAULT)
    {
        return (sts);
    }

    // 2) Preset the number of bytes that we'll return to zero.
    nbytes = 0;

    // Check for all commands; we'll just show the ones we're
    // interested in here.
    switch (msg -> i.dcmd) {

    // 3) Process the SET command.
    case    DCMD_AUDIO_SET_SAMPLE_RATE:
        data = _DEVCTL_DATA (msg->i);
        audio_set_samplerate (* (int *) data);
        break;

    // 4) Process the GET command.
    case    DCMD_AUDIO_GET_SAMPLE_RATE:
        data = _DEVCTL_DATA (msg->o);
        * (int *) data = audio_get_samplerate ();
        nbytes = sizeof (int);
        break;
    }

    // 5) Return data (if any) to the client.
    memset (&msg -> o, 0, sizeof (msg -> o));
    msg -> o.nbytes = nbytes;
    SETIOV (ctp -> iov, &msg -> o, sizeof (msg -> o) + nbytes);
    return (_RESMGR_NPARTS (1));
}

Step 1

We've declared a void * called data that we're going to use as a general purpose pointer to the data area. If you refer to the io_devctl() description above, you'll see that the data structure consists of a union of an input and output header structure, with the data area implicitly following that header. We'll use the _DEVCTL_DATA() macro (see the entry for iofunc_devctl() in the Neutrino Library Reference) to get a pointer to that data area.

Step 2

Here we need to indicate how many bytes we're going to return to the client. Simply for convenience, I've set the nbytes variable to zero before doing any processing — this way I don't have to explicitly set it to zero in each of the switch/case statements.

Step 3

Now for the “set” command. We call the fictitious function audio_set_samplerate(), and we pass it the sample rate which we obtained by dereferencing the data pointer (which we “tricked” into being a pointer to an integer. Well, okay, we didn't trick it, we used a standard C language typecast.) This is a key mechanism, because this is how we “interpret” the data area (the client's dev_data_ptr) according to the command. In a more complicated case, you may be typecasting it to a large structure instead of just a simple integer. Obviously, the client's and resource manager's definitions of the structure must be identical — the best place to define the structure, therefore, is in the .h file that contains your DCMD_* command code constants.

Step 4

For the “get” command in step 4, the processing is very similar (with the typecast), except this time we're writing into the data structure instead of reading from it. Note that we also set the nbytes variable to correspond to the number of bytes that we want to return to the client. For more complicated data accesses, you'd return the size of the data area (i.e., if it's a structure, you'd return the size of the structure).

Step 5

Finally, to return data to the client, we need to note that the client is expecting a header structure, as well as the return data (if any) to immediately follow the header structure. Therefore, in this step, we clear out the header structure to zeros and set the number of bytes (the nbytes member) to the number of bytes that we're returning (recall we had pre-initialized this to zero). Then, we set up a one-part IOV with a pointer to the header and extend the size of the header by the number of bytes we're returning. Lastly, we simply tell the resource manager library that we're returning a one-part IOV to the client.

Important note

Recall the discussion in the io_write() sample above, about the data area following the header. To recap, we stated that the bytes following the header may or may not be complete (i.e., the header may or may not have been read in its entirety from the client), depending on how much data was read in by the resource manager library. Then we went on to discuss how it was inefficient to try to “save” a message pass and to “reuse” the data area. However, things are slightly different with devctl(), especially if the amount of data being transferred is fairly small (as was the case in our examples). In these cases, there's a good chance that the data has in fact been read into the data area, so it is indeed a waste to re-read the data. There is a simple way to tell how much space you have: the size member of ctp contains the number of bytes that are available for you starting at the msg parameter. The size of the data area beyond the end of the message buffer that's available is calculated by subtracting the size of the message buffer from the size member of ctp:

data_area_size = ctp -> size - sizeof (*msg);

Note that this size is equally valid when you are returning data to the client (as in the DCMD_AUDIO_GET_SAMPLE_RATE command).

For anything larger than the allocated region, you'll want to perform the same processing we did with the io_write() example (above) for getting data from the client, and you'll want to allocate a buffer to be used for returning data to the client.

Advanced topics

Now that we've covered the “basics” of resource managers, it's time to look at some more complicated aspects:

extending the OCB
extending the attributes structure
blocking within the resource manager
returning directory entries

Extending the OCB

In some cases, you may find the need to extend the OCB. This is relatively painless to do. The common uses for extending the OCB are to add extra flags you wish to maintain on a per-open basis. One such flag could be used with the io_unblock() handler to cache the value of the kernel's _NTO_MI_UNBLOCK_REQ flag. (See the Message Passing chapter, under “Using the _NTO_MI_UNBLOCK_REQ” for more details.)

To extend the OCB, you'll need to provide two functions—one to allocate (and initialize) the new OCB and one to free it—to override the defaults, iofunc_ocb_calloc() and iofunc_ocb_free(). Then, you'll need to bind your two customized functions into the mount structure. (Yes, this does mean that you'll need a mount structure, if only for this one purpose.) Finally, you'll need to define your own OCB typedef, so that the prototypes for the code are all correct.

Let's look at the OCB typedef first, and then we'll see how to override the functions:

#define IOFUNC_OCB_T struct my_ocb
#include <sys/iofunc.h>

This tells the included file, <sys/iofunc.h>, that the manifest constant IOFUNC_OCB_T now points to your new and improved OCB structure.

It's very important to keep in mind that the “normal” OCB must appear as the first entry in your extended OCB! This is because the POSIX helper library passes around a pointer to what it expects is a normal OCB — it doesn't know about your extended OCB, so therefore the first data element at the pointer location must be the normal OCB.

Here's our extended OCB:

typedef struct my_ocb
{
    iofunc_ocb_t    normal_ocb;
    int             my_extra_flags;
    …
} my_ocb_t;

Finally, here's the code that illustrates how to override the allocation and deallocation functions in the mount structure:

// declare
iofunc_mount_t      mount;
iofunc_funcs_t      mount_funcs;

// set up the mount functions structure
// with our allocate/deallocate functions

// _IOFUNC_NFUNCS is from the .h file
mount_funcs.nfuncs = _IOFUNC_NFUNCS;

// your new OCB allocator
mount_funcs.ocb_calloc = my_ocb_calloc;

// your new OCB deallocator
mount_funcs.ocb_free = my_ocb_free;

// set up the mount structure
memset (&mount, 0, sizeof (mount));

Then all you have to do is bind the mount functions to the mount structure, and the mount structure to the attributes structure:

…

mount.funcs = &mount_funcs;
attr.mount = &mount;

The my_ocb_calloc() and my_ocb_free() functions are responsible for allocating and initializing an extended OCB and for freeing the OCB, respectively. They are prototyped as:

IOFUNC_OCB_T *
my_ocb_calloc (resmgr_context_t *ctp, IOFUNC_ATTR_T *attr);

void
my_ocb_free (IOFUNC_OCB_T *ocb);

This means that the my_ocb_calloc() function gets passed both the internal resource manager context and the attributes structure. The function is responsible for returning an initialized OCB. The my_ocb_free() function gets passed the OCB and is responsible for releasing the storage for it.

It's important to realize that the OCB may be allocated by functions other than the normal io_open() handler — for example, the memory manager may allocate an OCB. The impact of this is that your OCB allocating function must be able to initialize the OCB with the attr argument.

There are two interesting uses for these two functions (that have nothing to do with extending the OCB):

OCB allocation/deallocation monitor
more efficient allocation/deallocation

OCB monitor

In this case, you can simply “tie in” to the allocator/deallocator and monitor the usage of the OCBs (for example, you may wish to limit the total number of OCBs outstanding at any given time). This may prove to be a good idea if you're not taking over the io_open() outcall, and yet still need to intercept the creation of (and possibly deletion of) OCBs.

More efficient allocation

Another use for overriding the library's built-in OCB allocator/deallocator is that you may wish to keep the OCBs on a free list, instead of the library's calloc() and free() functions. If you're allocating and deallocating OCBs at a high rate, this may prove to be more efficient.

Extending the attributes structure

You may wish to extend the attributes structure in cases where you need to store additional device information. Since the attributes structure is associated on a “per-device” basis, this means that any extra information you store there will be accessible to all OCBs that reference that device (since the OCB contains a pointer to the attributes structure). Often things like serial baud rate, etc. are stored in extended attributes structures.

Extending the attributes structure is much simpler than dealing with extended OCBs, simply because attributes structures are allocated and deallocated by your code anyway.

You have to perform the same “trick” of overriding the header files with the “new” attributes structure as we did with the extended OCB above:

#define IOFUNC_ATTR_T struct my_attr
#include <sys/iofunc.h>

Next you actually define the contents of your extended attribute structures. Note that the extended attribute structure must have the “normal” attribute structure encapsulated as the very first element, just as we did with the extended OCB (and for the same reasons).

Blocking within the resource manager

So far we've avoided talking about blocking within the resource manager. We assume that you will supply an outcall function (e.g., a handler for io_read()), and that the data will be available immediately. What if you need to block, waiting for the data? For example, performing a read() on the serial port might need to block until a character arrives. Obviously, we can't predict how long this will take.

Blocking within a resource manager is based on the same principles that we discussed in the Message Passing chapter — after all, a resource manager is really a server that handles certain, well-defined messages. When the message corresponding to the client's read() request arrives, it does so with a receive ID, and the client is blocked. If the resource manager has the data available, it will simply return the data as we've already seen in the various examples above. However, if the data isn't available, the resource manager will need to keep the client blocked (if the client has indeed specified blocking behavior for the operation) to continue processing other messages. What this really means is that the thread (in the resource manager) that received the message from the client should not block, waiting for the data. If it did block, you can imagine that this could eventually use up a great number of threads in the resource manager, with each thread waiting for some data from some device.

The correct solution to this is to store the receive ID that arrived with the client's message onto a queue somewhere, and return the special constant _RESMGR_NOREPLY from your handler. This tells the resource manager library that processing for this message has completed, but that the client shouldn't be unblocked yet.

Some time later, when the data arrives, you would then retrieve the receive ID of the client that was waiting for the message, and construct a reply message containing the data. Finally, you would reply to the client.

You could also extend this concept to implementing timeouts within the server, much as we did with the example in the Clocks, Timers, and Getting a Kick Every So Often chapter (in the “Server-maintained timeouts” section). To summarize, after some period of time, the client's request was deemed to have “timed out” and the server replied with some form of failure message to the receive ID it had stored away .

Returning directory entries

In the example for the io_read() function above, we saw how to return data. As mentioned in the description of the io_read() function (in the “Alphabetical listing of Connect and I/O functions”), the io_read() function may return directory entries as well. Since this isn't something that everyone will want to do, I discuss it here.

First of all, let's look at why and when you'd want to return directory entries rather than raw data from io_read().

If you discretely manifest entries in the pathname space, and those entries are not marked with the _RESMGR_FLAG_DIR, then you won't have to return directory entries in io_read(). If you think about this from a “filesystem” perspective, you're effectively creating “file” types of objects. If, on the other hand, you do specify _RESMGR_FLAG_DIR, then you're creating a “directory” type of object. Nobody other than you knows what the contents of that directory are, so you have to be the one to supply this data. That's exactly why you'd return directory entries from your io_read() handler.

Generally speaking …

Generally speaking, returning directory entries is just like returning raw data, except:

You must return an integral number of struct dirent entries.
You must fill in the struct dirent entries.

The first point means that you cannot return, for example, seven and a half struct dirent entries. If eight of these structures don't fit into the allotted space, then you must return only seven.

The second point is fairly obvious; it's mentioned here only because filling in the struct dirent can be a little tricky compared to the “raw data” approach for a “normal” io_read().

The `struct dirent` structure and friends

Let's take a look at the struct dirent structure, since that's the data structure returned by the io_read() function in case of a directory read. We'll also take a quick look at the client calls that deal with directory entries, since there are some interesting relations to the struct dirent structure.

In order for a client to work with directories, the client uses the functions closedir(), opendir(), readdir(), rewinddir(), seekdir(), and telldir().

Notice the similarity to the “normal” file-type functions (and the commonality of the resource manager messages):

Directory Function	File Function	Message (resmgr)
closedir()	close()	_IO_CLOSE
opendir()	open()	_IO_CONNECT
readdir()	read()	_IO_READ
rewinddir()	lseek()	_IO_LSEEK
seekdir()	lseek()	_IO_LSEEK
telldir()	tell()	_IO_LSEEK

If we assume for a moment that the opendir() and closedir() functions will be handled automatically for us, we can focus on just the _IO_READ and _IO_LSEEK messages and related functions.

Offsets

The _IO_LSEEK message and related function is used to “seek” (or “move”) within a file. It does the exact same thing within a directory; you can move to the “first” directory entry (by explicitly giving an offset to seekdir() or by calling rewinddir()), or any arbitrary entry (by using seekdir()), or you can find out the current location in the directory entry list (by using telldir()).

The “trick” with directories, however, is that the seek offsets are entirely up to you to define and manage. This means that you may decide to call your directory entry offsets “0,” “1,” “2” and so on, or you may instead call them “0,” “64,” “128” and so on. The only important thing here is that the offsets must be consistent in both the io_lseek() handler as well as the io_read() handler functions.

In the example below, we'll assume that we're using the simple “0,” “1,” “2” … approach. (You might use the “0,” “64,” “128” … approach if those numbers correspond to, for example, some kind of on-media offsets. Your choice.)

So now all that's left is to “simply” fill in the struct dirent with the “contents” of our directory. Here's what the struct dirent looks like (from <dirent.h>):

struct dirent {
    ino_t      d_ino;
    off_t      d_offset;
    uint16_t   d_reclen;
    uint16_t   d_namelen;
    char       d_name [1];
};

Here's a quick explanation of the various members:

d_ino: The “inode” — a mountpoint-unique serial number that cannot be zero (zero traditionally indicates that the entry corresponding to this inode is free/empty).
d_offset: The offset into the directory we just talked about above. In our example, this will be a simple number like “0,” “1,” “2,” etc. In some filesystems, this is the offset of the next directory.
d_reclen: The size of the entire struct dirent field and any extensions that may be placed within it. The size includes any alignment filler required.
d_namelen: The number of characters in the d_name field, not including the NUL terminator.
d_name: The name of this directory entry, which must be NUL terminated.

When returning the struct dirent entries, the return code passed back to the client is the number of bytes returned.

Example

In this example, we're going to create a resource manager called /dev/atoz that will be a directory resource manager. It's going to manifest the “files” /dev/atoz/a through to dev/atoz/z, with a cat of any of the files returning the uppercase letter corresponding to the filename. Here's a sample command-line session to give you an idea of how this works:

# cd /dev
# ls
atoz    null    ptyp2   socket  ttyp0   ttyp3
enet0   ptyp0   ptyp3   text    ttyp1   zero
mem     ptyp1   shmem   tty     ttyp2
# ls -ld atoz
dr-xr-xr-x  1 root      0                26 Sep 05 07:59 atoz
# cd atoz
# ls
a       e       i       m       q       u       y
b       f       j       n       r       v       z
c       g       k       o       s       w
d       h       l       p       t       x
# ls -l e
-r--r--r--  1 root      0                 1 Sep 05 07:59 e
# cat m
M# cat q
Q#

The example above illustrates that the directory atoz shows up in the /dev directory, and that you can do an ls of the directory itself and cd into it. The /dev/atoz directory has a size of “26,” which is the number that we selected in the code. Once in the atoz directory, doing another ls shows the contents — the files a through z. Doing an ls of a particular file, say e, shows that the file is readable by all (the -r--r--r-- part) and is one byte in size. Finally, doing a few random cat's shows that the files indeed have the stated contents. (Note that since the files contain only one byte, there's no linefeed after the character is printed, which is why the prompt shows up on the same line as the output.)

Now that we've seen the characteristics, let's take a look at the code, which is organized into the following functions:

main() and declarations: Main function; this is where we initialize everything and start the resource manager running.
my_open(): The handler routine for the _IO_CONNECT message.
my_read(): The handler routine for the _IO_READ message.
my_read_dir() and my_read_file(): These two routines perform the actual work of the my_read() function.
dirent_size() and dirent_fill(): Utility functions to deal with struct dirent structure.

Note that while the code is broken up here into several short sections with text, you can find the complete version of atoz.c in the Sample Programs appendix.

main() and declarations

The first section of code presented is the main() function and some of the declarations. There's a convenience macro, ALIGN(), that's used for alignment by the dirent_fill() and dirent_size() functions.

The atoz_attrs array contains the attributes structures used for the “files” in this example. We declare NUM_ENTS array members, because we have NUM_ENTS (26) files “a” through “z.” The attributes structure used for the directory itself (i.e., the /dev/atoz directory) is declared within main() and is called simply attr. Notice the differences in the way the two types of attributes structures are filled:

file attribute structure: Marked as a regular file (the S_IFREG constant) with an access mode of 0444 (meaning everyone has read access, no one has write access). The size is “1” — the file contains only one byte, namely, the uppercase letter corresponding to the filename. The inodes for these individual files are numbered “1” through “26” inclusive (it would have been more convenient to number them “0” through “25,” but “0” is reserved).
directory attribute structure: Marked as a directory file (the S_IFDIR constant) with an access mode of 0555 (meaning that everyone has read and seek access, no one has write access). The size is “26” — this is simply a number picked based on the number of entries in the directory. The inode is “27” — a number known not to be in use by any of the other attributes structures.

Notice how we've overridden only the open member of the connect_func structure and the read member of the io_func structure. We've left all the others to use the POSIX defaults.

Finally, notice how we created the name /dev/atoz using resmgr_attach(). Most importantly, we used the flag _RESMGR_FLAG_DIR, which tells the process manager that it can resolve requests at and below this mountpoint.

/*
 *  atoz.c
 *
 *  /dev/atoz using the resource manager library
*/

#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <errno.h>
#include <dirent.h>
#include <limits.h>
#include <sys/iofunc.h>
#include <sys/dispatch.h>

#define ALIGN(x) (((x) + 3) & ~3)
#define NUM_ENTS            26

static  iofunc_attr_t   atoz_attrs [NUM_ENTS];

int
main (int argc, char **argv)
{
    dispatch_t              *dpp;
    resmgr_attr_t           resmgr_attr;
    dispatch_context_t      *ctp;
    resmgr_connect_funcs_t  connect_func;
    resmgr_io_funcs_t       io_func;
    iofunc_attr_t           attr;
    int                     i;

    // create the dispatch structure
    if ((dpp = dispatch_create ()) == NULL) {
        perror ("Unable to dispatch_create");
        exit (EXIT_FAILURE);
    }

    // initialize the various data structures
    memset (&resmgr_attr, 0, sizeof (resmgr_attr));
    resmgr_attr.nparts_max = 1;
    resmgr_attr.msg_max_size = 2048;

    // bind default functions into the outcall tables
    iofunc_func_init (_RESMGR_CONNECT_NFUNCS, &connect_func,
                      _RESMGR_IO_NFUNCS, &io_func);

    // create and initialize the attributes structure
    // for the directory.  Inodes 1-26 are reserved for the 
    // files 'a' through 'z'.  The number of bytes is 26 
    // because that's how many entries there are.
    iofunc_attr_init (&attr, S_IFDIR | 0555, 0, 0);
    attr.inode = NUM_ENTS + 1;
    attr.nbytes = NUM_ENTS;

    // and for the "a" through "z" names
    for (i = 0; i < NUM_ENTS; i++) {
        iofunc_attr_init (&atoz_attrs [i], 
                          S_IFREG | 0444, 0, 0);
        atoz_attrs [i].inode = i + 1;
        atoz_attrs [i].nbytes = 1;
    }

    // add our functions; we're interested only in
    // io_open and io_read
    connect_func.open = my_open;
    io_func.read = my_read;

    // establish a name in the pathname space
    if (resmgr_attach (dpp, &resmgr_attr, "/dev/atoz", 
                       _FTYPE_ANY, _RESMGR_FLAG_DIR, 
                       &connect_func, &io_func, 
                       &attr) == -1) {
        perror ("Unable to resmgr_attach");
        exit (EXIT_FAILURE);
    }

    // allocate a context
    ctp = dispatch_context_alloc (dpp);

    // wait here forever, handling messages
    while (1) {
        if ((ctp = dispatch_block (ctp)) == NULL) {
            perror ("Unable to dispatch_block");
            exit (EXIT_FAILURE);
        }
        dispatch_handler (ctp);
    }

    // you'll never get here
    return (EXIT_SUCCESS);
}

my_open()

While my_open() is very short, it has a number of crucial points. Notice how we decide if the resource being opened is a “file” or a “directory” based only on the pathname length. We can do this “trick” because we know that there are no other directories in this resource manager apart from the main one. If you want to have multiple directories below the mountpoint, you have to do more complicated analysis of the path member of the msg structure. For our simple example, if there's nothing in the pathname, we know it's the directory. Also, notice the extremely simplified pathname validation checking: we simply compare to make sure that there's only one character passed to us, and that the character lies within the range “a” through “z” inclusive. Again, for more complex resource managers, you'd be responsible for parsing the name past the registered mountpoint.

Now, the most important feature! Notice how we used the POSIX-layer default functions to do all the work for us! The iofunc_open_default() function is usually installed in the connect functions table at the same spot that our new my_open() function is now occupying. This means that it takes the identical set of arguments! All we have to do is decide which attributes structure we want to have bound with the OCB that the default function is going to create: either the directory one (in which case we pass attr), or one of the 26 different ones for the 26 different files (in which case we pass an appropriate element out of atoz_attrs). This is key, because the handler that you put in the open slot in the connect functions table acts as the gatekeeper to all further accesses to your resource manager.

static int
my_open (resmgr_context_t *ctp, io_open_t *msg,
         iofunc_attr_t *attr, void *extra)
{
    // an empty path means the directory, is that what we have?
    if (msg -> connect.path [0] == 0) {
        return (iofunc_open_default (ctp, msg, attr, extra));

    // else check if it's a single char 'a' -> 'z'
    } else if (msg -> connect.path [1] == 0 && 
               (msg -> connect.path [0] >= 'a' && 
                msg -> connect.path [0] <= 'z')) {

        // yes, that means it's the file (/dev/atoz/[a-z])
        return (iofunc_open_default (ctp, msg, 
                atoz_attrs + msg -> connect.path [0] - 'a', 
                extra));
    } else {
        return (ENOENT);
    }
}

my_read()

In the my_read() function, to decide what kind of processing we needed to do, we looked at the attribute structure's mode member. If the S_ISDIR() macro says that it's a directory, we call my_read_dir(); if the S_ISREG() macro says that it's a file, we call my_read_file(). (For details about these macros, see the entry for stat() in the Neutrino Library Reference.) Note that if we can't tell what it is, we return EBADF; this indicates to the client that something bad happened.

The code here doesn't know anything about our special devices, nor does it care; it simply makes a decision based on standard, well-known data.

static int
my_read (resmgr_context_t *ctp, io_read_t *msg, 
         iofunc_ocb_t *ocb)
{
    int     sts;

    // use the helper function to decide if valid
    if ((sts = iofunc_read_verify (ctp, msg, ocb, 
                                   NULL)) != EOK) {
        return (sts);
    }

    // decide if we should perform the "file" or "dir" read
    if (S_ISDIR (ocb -> attr -> mode)) {
        return (my_read_dir (ctp, msg, ocb));
    } else if (S_ISREG (ocb -> attr -> mode)) {
        return (my_read_file (ctp, msg, ocb));
    } else {
        return (EBADF);
    }
}

my_read_dir()

In my_read_dir() is where the fun begins. From a high level perspective, we allocate a buffer that's going to hold the result of this operation (called reply_msg). We then use dp to “walk” along the output buffer, stuffing struct dirent entries as we go along. The helper routine dirent_size() is used to determine if we have sufficient room in the output buffer to stuff the next entry; the helper routine dirent_fill() is used to perform the stuffing. (Note that these routines are not part of the resource manager library; they're discussed and documented below.)

On first glance this code may look inefficient; we're using sprintf() to create a two-byte filename (the filename character and a NUL terminator) into a buffer that's _POSIX_PATH_MAX (256) bytes long. This was done to keep the code as generic as possible.

Finally, notice that we use the OCB's offset member to indicate to us which particular filename we're generating the struct dirent for at any given time. This means that we also have to update the offset field whenever we return data.

The return of data to the client is accomplished in the “usual” way, via MsgReply(). Note that the status field of MsgReply() is used to indicate the number of bytes that were sent to the client.

static int
my_read_dir (resmgr_context_t *ctp, io_read_t *msg,
             iofunc_ocb_t *ocb)
{
    int     nbytes;
    int     nleft;
    struct  dirent *dp;
    char    *reply_msg;
    char    fname [_POSIX_PATH_MAX];

    // allocate a buffer for the reply
    reply_msg = calloc (1, msg -> i.nbytes);
    if (reply_msg == NULL) {
        return (ENOMEM);
    }

    // assign output buffer
    dp = (struct dirent *) reply_msg;

    // we have "nleft" bytes left
    nleft = msg -> i.nbytes;
    while (ocb -> offset < NUM_ENTS) {

        // create the filename
        sprintf (fname, "%c", ocb -> offset + 'a');

        // see how big the result is
        nbytes = dirent_size (fname);

        // do we have room for it?
        if (nleft - nbytes >= 0) {

            // fill the dirent, and advance the dirent pointer
            dp = dirent_fill (dp, ocb -> offset + 1,
                              ocb -> offset, fname);

            // move the OCB offset
            ocb -> offset++;

            // account for the bytes we just used up
            nleft -= nbytes;
        } else {

            // don't have any more room, stop
            break;
        }
    }

    // return info back to the client
    MsgReply (ctp -> rcvid, (char *) dp - reply_msg,
              reply_msg, (char *) dp - reply_msg);

    // release our buffer
    free (reply_msg);

    // tell resource manager library we already did the reply
    return (_RESMGR_NOREPLY);
}

my_read_file()

In my_read_file(), we see much the same code as we saw in the simple read example above. The only strange thing we're doing is we “know” there's only one byte of data being returned, so if nbytes is non-zero then it must be one (and nothing else). So, we can construct the data to be returned to the client by stuffing the character variable string directly. Notice how we used the inode member of the attribute structure as the basis of which data to return. This is a common trick used in resource managers that must deal with multiple resources. Another trick would be to extend the attributes structure (as discussed above in “Extending the attributes structure”) and have either the data stored there directly or a pointer to it.

static int
my_read_file (resmgr_context_t *ctp, io_read_t *msg,
              iofunc_ocb_t *ocb)
{
    int     nbytes;
    int     nleft;
    char    string;

    // we don't do any xtypes here...
    if ((msg -> i.xtype & _IO_XTYPE_MASK) != 
         _IO_XTYPE_NONE) {
        return (ENOSYS);
    }

    // figure out how many bytes are left
    nleft = ocb -> attr -> nbytes - ocb -> offset;

    // and how many we can return to the client
    nbytes = min (nleft, msg -> i.nbytes);

    if (nbytes) {
        // create the output string
        string = ocb -> attr -> inode - 1 + 'A';

        // return it to the client
        MsgReply (ctp -> rcvid, nbytes, 
                  &string + ocb -> offset,
                  nbytes);

        // update flags and offset
        ocb -> attr -> flags |= IOFUNC_ATTR_ATIME
                             | IOFUNC_ATTR_DIRTY_TIME;
        ocb -> offset += nbytes;
    } else {
        // nothing to return, indicate End Of File
        MsgReply (ctp -> rcvid, EOK, NULL, 0);
    }

    // already done the reply ourselves
    return (_RESMGR_NOREPLY);
}

dirent_size()

The helper routine dirent_size() simply calculates the number of bytes required for the struct dirent, given the alignment constraints:

int
dirent_size (char *fname)
{
  return (ALIGN (sizeof (struct dirent) - 4 + strlen (fname) + 1));
}

We subtract four bytes because the dirent structure includes space for the first four characters of the name, and we add one for the null character at the end of the name. We could also calculate the size like this:

ALIGN (offsetof (struct dirent, d_name) + strlen (fname) + 1)

Again, this routine is slight overkill for our simple resource manager, because we know how big each directory entry is going to be — all filenames are exactly one byte in length. However, it's a useful utility routine.

dirent_fill()

Finally, the helper routine dirent_fill() is used to stuff the values passed to it (namely, the inode, offset and fname fields) into the directory entry also passed. As an added bonus, it returns a pointer to where the next directory entry should begin, taking into account alignment.

struct dirent *
dirent_fill (struct dirent *dp, int inode, int offset, 
             char *fname)
{
    dp -> d_ino = inode;
    dp -> d_offset = offset;
    strcpy (dp -> d_name, fname);
    dp -> d_namelen = strlen (dp -> d_name);
    dp -> d_reclen = ALIGN (sizeof (struct dirent) - 4
                   + dp -> d_namelen + 1);
    return ((struct dirent *) ((char *) dp + 
            dp -> d_reclen));
}

Summary

Writing a resource manager is by far the most complicated task that we've discussed in this book.

A resource manager is a server that receives certain, well-defined messages. These messages fall into two broad categories:

Connect messages: Related to pathname-based operations, these may establish a context for further work.
I/O messages: Always arrive after a connect message and indicate the actual work that the client wishes to have done (e.g., stat()).

The operations of the resource manager are controlled by the thread pool functions (discussed in the Processes and Threads chapter) and the dispatch interface functions.

QSS provides a set of POSIX helper functions in the resource manager library that perform much of the work of dealing with the client's Connect and I/O messages that arrive.

There are a number of data structures relating to the clients and devices manifested by the resource manager to keep in mind:

OCB: Allocated on a per-open basis, this contains the context for the client (e.g., current lseek() position)
Attributes structure: Allocated on a per-device basis, this contains information about the device (e.g., size of the device, permissions, etc.)
Mount structure: Allocated on a per-resource-manager basis, and contains information about the characteristics of the entire resource manager.

The clients communicate with the resource manager via message passing by resolving the pathname (via the open() and other calls) into a node descriptor, process ID, channel ID, and handle.

Finally you supply the functionality you wish to actually do in your resource manager by overriding some of the callouts in the Connect and I/O functions table.