The io_read() function and related utilities

The .tar filesystem's io_read() is the standard one that we've seen in the RAM disk—it decides if the request is for a file or a directory, and calls the appropriate function.

The .tar filesystem's tarfs_io_read_dir() is the exact same thing as the RAM disk version—after all, the directory entry structures in the extended attributes structure are identical.

The only function that's different is the tarfs_io_read_file() function to read the data from the .tar file on disk.

int
tarfs_io_read_file (resmgr_context_t *ctp, io_read_t *msg,
iofunc_ocb_t *ocb)
{
  int     nbytes;
  int     nleft;
  iov_t   *iovs;
  int     niovs;
  int     i;
  int     pool_flag;
  gzFile  fd;

  // we don't do any xtypes here...
  if ((msg -> i.xtype & _IO_XTYPE_MASK) != _IO_XTYPE_NONE) {
    return (ENOSYS);
  }

  // figure out how many bytes are left
  nleft = ocb -> attr -> attr.nbytes - ocb -> offset;

  // and how many we can return to the client
  nbytes = min (nleft, msg -> i.nbytes);

  if (nbytes) {

    // 1) open the on-disk .tar file
    if ((fd = gzopen (ocb -> attr -> type.vfile.name, "r")) == NULL) {
      return (errno);
    }

    // 2) calculate number of IOVs required for transfer
    niovs = (nbytes + BLOCKSIZE - 1) / BLOCKSIZE;
    if (niovs <= 8) {
      iovs = mpool_malloc (mpool_iov8);
      pool_flag = 1;
    } else {
      iovs = malloc (sizeof (iov_t) * niovs);
      pool_flag = 0;
    }
    if (iovs == NULL) {
      gzclose (fd);
      return (ENOMEM);
    }

    // 3) allocate blocks for the transfer
    for (i = 0; i < niovs; i++) {
      SETIOV (&iovs [i], cfs_block_alloc (ocb -> attr), BLOCKSIZE);
      if (iovs [i].iov_base == NULL) {
        for (--i ; i >= 0; i--) {
          cfs_block_free (ocb -> attr, iovs [i].iov_base);
        }
        gzclose (fd);
        return (ENOMEM);
      }
    }

    // 4) trim last block to correctly read last entry in a .tar file
    if (nbytes & BLOCKSIZE) {
      iovs [niovs - 1].iov_len = nbytes & BLOCKSIZE;
    }

    // 5) get the data
    gzseek (fd, ocb -> attr -> type.vfile.off + ocb -> offset, SEEK_SET);
    for (i = 0; i < niovs; i++) {
      gzread (fd, iovs [i].iov_base, iovs [i].iov_len);
    }
    gzclose (fd);

    // return it to the client
    MsgReplyv (ctp -> rcvid, nbytes, iovs, i);

    // update flags and offset
    ocb -> attr -> attr.flags |= IOFUNC_ATTR_ATIME
                              | IOFUNC_ATTR_DIRTY_TIME;
    ocb -> offset += nbytes;
    for (i = 0; i < niovs; i++) {
      cfs_block_free (ocb -> attr, iovs [i].iov_base);
    }
    if (pool_flag) {
      mpool_free (mpool_iov8, iovs);
    } else {
      free (iovs);
    }
  } else {
    // nothing to return, indicate End Of File
    MsgReply (ctp -> rcvid, EOK, NULL, 0);
  }

  // already done the reply ourselves
  return (_RESMGR_NOREPLY);
}

Many of the steps here are common with the RAM disk version, so only steps 1 through 5 are documented here:

  1. Notice that we keep the .tar on-disk file closed, and open it only as required. This is an area for improvement, in that you might find it slightly faster to have a certain cache of open .tar files, and maybe rotate them on an LRU-basis. We keep it closed so we don't run out of file descriptors; after all, you can mount hundreds (up to 1000—a QNX Neutrino limit) of .tar files with this resource manager (see the note below).
  2. We're still dealing with blocks, just as we did in the RAM-disk filesystem, because we need a place to transfer the data from the disk file. We calculate the number of IOVs we're going to need for this transfer, and then allocate the iovs array.
  3. Next, we call cfs_block_alloc() to get blocks from the block allocator, then we bind them to the iovs array. In case of a failure, we free all the blocks and fail ungracefully. A better failure mode would have been to shrink the client's request size to what we can handle, and return that. However, when you analyze this, the typical request size is 32 KB (8 blocks), and if we don't have 32 KB lying around, then we might have bigger troubles ahead.
  4. The last block is probably not going to be exactly 4096 bytes in length, so we need to trim it. Nothing bad would happen if we were to gzread() the extra data into the end of the block—the client's transfer size is limited to the size of the resource stored in the attributes structure. So I'm just being extra paranoid.
  5. And in this step, I'm being completely careless; we simply gzread() the data with no error-checking whatsoever into the blocks! :-)

The rest of the code is standard; return the buffer to the client via MsgReplyv(), update the access flags and offset, free the blocks and IOVs, etc.

Note: In step 1, I mentioned a limit of 1000 open file descriptors. This limit is controlled by the -F parameter to procnto (the kernel). In version 6.2.1 of QNX Neutrino, whatever value you pass to -F is the maximum (and default), and you cannot go higher than that value. In QNX Neutrino 6.3 or later, whatever value you pass to -F is the default, and you can go higher. You can change the value (to be lower in 6.2.1, or to be lower or higher in 6.3) via the setrlimit() function, using the RLIMIT_NOFILE resource constant.