Combine Messages

This chapter contains the following topics:

Where combine messages are used
The library's combine-message handling

Where combine messages are used

In order to conserve network bandwidth and to provide support for atomic operations, combine messages are supported. A combine message is constructed by the client's C library and consists of a number of I/O and/or connect messages packaged together into one. Let's see how they're used.

Atomic operations

Consider a case where two threads are executing the following code, trying to read from the same file descriptor:

a_thread ()
{
    char buf [BUFSIZ];

    lseek (fd, position, SEEK_SET);
    read (fd, buf, BUFSIZ);
    …
}

The first thread performs the lseek() and then gets preempted by the second thread. When the first thread resumes executing, its offset into the file will be at the end of where the second thread read from, not the position that it had lseek()'d to.

This can be solved in one of three ways:

The two threads can use a mutex to ensure that only one thread at a time is using the file descriptor.
Each thread can open the file itself, thus generating a unique file descriptor that won't be affected by any other threads.
The threads can use the readblock() function, which performs an atomic lseek() and read().

Let's look at these three methods.

Using a mutex

In the first approach, if the two threads use a mutex between themselves, the following issue arises: every read(), lseek(), and write() operation must use the mutex.

If this practice isn't enforced, then you still have the exact same problem. For example, suppose one thread that's obeying the convention locks the mutex and does the lseek(), thinking that it's protected. However, another thread (that's not obeying the convention) can preempt it and move the offset to somewhere else. When the first thread resumes, we again encounter the problem where the offset is at a different (unexpected) location. Generally, using a mutex will be successful only in very tightly managed projects, where a code review will ensure that each and every thread's file functions obey the convention.

Per-thread files

The second approach — of using different file descriptors — is a good general-purpose solution, unless you explicitly wanted the file descriptor to be shared.

The readblock() function

In order for the readblock() function to be able to effect an atomic seek/read operation, it must ensure that the requests it sends to the resource manager will all be processed at the same time. This is done by combining the _IO_LSEEK and _IO_READ messages into one message. Thus, when the base layer performs the MsgReceive(), it will receive the entire readblock() request in one atomic message.

Bandwidth considerations

Another place where combine messages are useful is in the stat() function, which can be implemented by calling open(), fstat(), and close() in sequence.

Rather than generate three separate messages (one for each of the functions), the C library combines them into one contiguous message. This boosts performance, especially over a networked connection, and also simplifies the resource manager, because it's not forced to have a connect function to handle stat().

The library's combine-message handling

The resource manager library handles combine messages by presenting each component of the message to the appropriate handler routines. For example, if we get a combine message that has an _IO_LSEEK and _IO_READ in it (e.g. readblock()), the library will call our io_lseek and io_read handlers for us in turn.

But let's see what happens in the resource manager when it's handling these messages. With multiple threads, both of the client's threads may very well have sent in their “atomic” combine messages. Two threads in the resource manager will now attempt to service those two messages. We again run into the same synchronization problem as we originally had on the client end — one thread can be partway through processing the message and can then be preempted by the other thread.

The solution? The resource manager library provides callouts to lock the OCB while processing any message (except _IO_CLOSE and _IO_UNBLOCK —we'll return to these). As an example, when processing the readblock() combine message, the resource manager library performs callouts in this order:

lock_ocb handler
_IO_LSEEK message handler
_IO_READ message handler
unlock_ocb handler

Therefore, in our scenario, the two threads within the resource manager would be mutually exclusive to each other by virtue of the lock — the first thread to acquire the lock would completely process the combine message, unlock the lock, and then the second thread would perform its processing.

Let's examine several of the issues that are associated with handling combine messages:

Component responses
Component data access
Locking and unlocking the attribute structure
Various styles of connect messages
_IO_CONNECT_COMBINE_CLOSE
_IO_CONNECT_COMBINE

Component responses

As we've seen, a combine message really consists of a number of “regular” resource manager messages combined into one large contiguous message. The resource manager library handles each component in the combine message separately by extracting the individual components and then out calling to the handlers you've specified in the connect and I/O function tables, as appropriate, for each component.

This generally doesn't present any new wrinkles for the message handlers themselves, except in one case. Consider the readblock() combine message:

Client call:: readblock()
Message(s):: _IO_LSEEK , _IO_READ
Callouts:: io_lock_ocb
io_lseek
io_read
io_unlock_ocb

Ordinarily, after processing the _IO_LSEEK message, your handler would return the current position within the file. However, the next message (the _IO_READ) also returns data. By convention, only the last data-returning message within a combine message will actually return data. The intermediate messages are allowed to return only a pass/fail indication.

The impact of this is that the _IO_LSEEK message handler has to be aware of whether or not it's being invoked as part of combine message handling. If it is, it should only return either an EOK (indicating that the lseek() operation succeeded) or an error indication to indicate some form of failure.

But if the _IO_LSEEK handler isn't being invoked as part of combine message handling, it should return the EOK and the new offset (or, in case of error, an error indication only).

Here's a sample of the code for the default iofunc-layer lseek() handler:

int
iofunc_lseek_default (resmgr_context_t *ctp,
                      io_lseek_t *msg,
                      iofunc_ocb_t *ocb)
{
    /* 
     *  performs the lseek processing here
     *  may "early-out" on error conditions
     */
     . . .

    /* decision re: combine messages done here */
    if (msg -> i.combine_len & _IO_COMBINE_FLAG) {
        return (EOK);
    }

    msg -> o = offset;
    return (_RESMGR_PTR (ctp, &msg -> o, sizeof (msg -> o)));
}

The relevant decision is made in this statement:

if (msg -> i.combine_len & _IO_COMBINE_FLAG)

If the _IO_COMBINE_FLAG bit is set in the combine_len member, this indicates that the message is being processed as part of a combine message.

When the resource manager library is processing the individual components of the combine message, it looks at the error return from the individual message handlers. If a handler returns anything other than EOK, then processing of further combine message components is aborted. The error that was returned from the failing component's handler is returned to the client.

Component data access

The second issue associated with handling combine messages is how to access the data area for subsequent message components.

For example, the writeblock() combine message format has an lseek() message first, followed by the write() message. This means that the data associated with the write() request is further in the received message buffer than would be the case for just a simple _IO_WRITE message:

Client call:: writeblock()
Message(s):: _IO_LSEEK , _IO_WRITE , data
Callouts:: io_lock_ocb
io_lseek
io_write
io_unlock_ocb

This issue is easy to work around. There's a resource manager library function called resmgr_msgread() that knows how to get the data corresponding to the correct message component. Therefore, in the io_write handler, if you used resmgr_msgread() instead of MsgRead(), this would be transparent to you.

Resource managers should always use resmgr_msg*() cover functions.

For reference, here's the source for resmgr_msgread():

int resmgr_msgread( resmgr_context_t *ctp,
                    void *msg,
                    int nbytes,
                    int offset)
{
    return MsgRead(ctp->rcvid, msg, nbytes, ctp->offset + offset);
}

As you can see, resmgr_msgread() simply calls MsgRead() with the offset of the component message from the beginning of the combine message buffer. For completeness, there's also a resmgr_msgwrite() that works in an identical manner to MsgWrite(), except that it dereferences the passed ctp to obtain the rcvid.

Locking and unlocking the attribute structure

As mentioned above, another facet of the operation of the readblock() function from the client's perspective is that it's atomic. In order to process the requests for a particular OCB in an atomic manner, we must lock and unlock the attribute structure pointed to by the OCB, thus ensuring that only one resource manager thread has access to the OCB at a time.

The resource manager library provides two callouts for doing this:

lock_ocb
unlock_ocb

These are members of the I/O functions structure. The handlers that you provide for those callouts should lock and unlock the attribute structure pointed to by the OCB by calling iofunc_attr_lock() and iofunc_attr_unlock(). Therefore, if you're locking the attribute structure, there's a possibility that the lock_ocb callout will block for a period of time. This is normal and expected behavior. Note also that the attributes structure is automatically locked for you when your I/O function is called.

Connect message types

Let's take a look at the general case for the io_open handler — it doesn't always correspond to the client's open() call!

For example, consider the stat() and access() client function calls.

_IO_CONNECT_COMBINE_CLOSE

For a stat() client call, we essentially perform the sequence open()/fstat()/close(). Note that if we actually did that, three messages would be required. For performance reasons, we implement the stat() function as one single combine message:

Client call:: stat()
Message(s):: _IO_CONNECT_COMBINE_CLOSE , _IO_STAT
Callouts:: io_open
io_lock_ocb
io_stat
io_unlock_ocb
io_close

The _IO_CONNECT_COMBINE_CLOSE message causes the io_open handler to be called. It then implicitly (at the end of processing for the combine message) causes the io_close_ocb handler to be called.

_IO_CONNECT_COMBINE

For the access() function, the client's C library will open a connection to the resource manager and perform a stat() call. Then, based on the results of the stat() call, the client's C library access() may perform an optional devctl() to get more information. In any event, because access() opened the device, it must also call close() to close it:

Client call:: access()
Message(s):: _IO_CONNECT_COMBINE , _IO_STAT
_IO_DEVCTL (optional)
_IO_CLOSE
Callouts:: io_open
io_lock_ocb
io_stat
io_unlock_ocb
io_lock_ocb (optional)
io_devctl (optional)
io_unlock_ocb (optional)
io_close

Notice how the access() function opened the pathname/device — it sent it an _IO_CONNECT_COMBINE message along with the _IO_STAT message. This creates an OCB (when the io_open handler is called), locks the associated attribute structure (via io_lock_ocb), performs the stat (io_stat), and then unlocks the attributes structure (io_unlock_ocb). Note that we don't implicitly close the OCB — this is left for a later, explicit, message. Contrast this handling with that of the plain stat() above.