Returning directory entries from _IO

When the _IO_READ handler is called, it may need to return data for either a file (if S_ISDIR (ocb->attr->mode) is false) or a directory (if S_ISDIR (ocb->attr->mode) is true). We've seen the algorithm for returning data, especially the method for matching the returned data's size to the smaller of the data available or the client's buffer size.

A similar constraint is in effect for returning directory data to a client, except we have the added issue of returning block-integral data. What this means is that instead of returning a stream of bytes, where we can arbitrarily package the data, we're actually returning a number of struct dirent structures. (In other words, we can't return 1.5 of those structures; we always have to return an integral number.) The dirent structures must be aligned on 4-byte boundaries in the reply.

A struct dirent looks like this:

struct dirent {
#if _FILE_OFFSET_BITS - 0 == 64
    ino_t           d_ino;          /* File serial number. */
    off_t           d_offset;
#elif !defined(_FILE_OFFSET_BITS) || _FILE_OFFSET_BITS == 32
#if defined(__LITTLEENDIAN__)
    ino_t           d_ino;          /* File serial number. */
    ino_t           d_ino_hi;
    off_t           d_offset;
    off_t           d_offset_hi;
#elif defined(__BIGENDIAN__)
    ino_t           d_ino_hi;
    ino_t           d_ino;          /* File serial number. */
    off_t           d_offset_hi;
    off_t           d_offset;
#else
 #error endian not configured for system
#endif
#else
 #error _FILE_OFFSET_BITS value is unsupported
#endif
    int16_t             d_reclen;
    int16_t             d_namelen;
    char                d_name[1];
};

The d_ino member contains a mountpoint-unique file serial number. This serial number is often used in various disk-checking utilities for such operations as determining infinite-loop directory links. (Note that the inode value cannot be zero, which would indicate that the inode represents an unused entry.)

In some filesystems, the d_offset member is used to identify the directory entry itself; in others, it's the offset of the next directory entry. For a disk-based filesystem, this value might be the actual offset into the on-disk directory structure.

The d_reclen member contains the size of this directory entry and any other associated information (such as an optional struct stat structure appended to the struct dirent entry; see below).

The d_namelen parameter indicates the size of the d_name parameter, which holds the actual name of that directory entry. (Since the size is calculated using strlen(), the \0 string terminator, which must be present, is not counted.)

Note: The dirent structure includes space only for the first four bytes of the name; your _IO_READ handler needs to return the name and the struct dirent as a bigger structure:

struct {
    struct dirent ent;
    char namebuf[NAME_MAX + 1 + offsetof(struct dirent, d_name) -
                 sizeof( struct dirent)];
} entry

or as a union:

union {
    struct dirent ent;
    char filler[ offsetof( struct dirent, dname ) + NAME_MAX + 1];
} entry;

So in our io_read handler, we need to generate a number of struct dirent entries and return them to the client. If we have a cache of directory entries that we maintain in our resource manager, it's a simple matter to construct a set of IOVs to point to those entries. If we don't have a cache, then we must manually assemble the directory entries into a buffer and then return an IOV that points to that.

Returning directory entries from _IO_READ