The extended attributes structure

Updated: October 28, 2024

The first is the extended attributes structure:

typedef struct cfs_attr_s
{
  iofunc_attr_t     attr;

  int               nels;
  int               nalloc;
  union {
    struct des_s      *dirblocks;
    iov_t             *fileblocks;
    char              *symlinkdata;
  } type;
} cfs_attr_t;

As normal, the regular attributes structure, attr, is the first member. After this, the three fields are:

nels
The number of elements actually in use. These elements are the type union described below.
nalloc
The number of elements allocated. This number may be bigger than nels to make more efficient use of allocated memory. Instead of growing the memory each time we need to add one more element, the memory is grown by a multiple (currently, 64). The nels member indicates how many are actually in use. This also helps with deallocation, because we don't have to shrink the memory; we simply decrement nels.
type
This is the actual type of the entry. As you can see, it's a union of three possible types, corresponding to the three possible data elements that we can store in a filesystem: directories (type struct des_s), files (an array of iov_t's), and symbolic links (a string).

For reference, here is the struct des_s directory entry type:

typedef struct des_s
{
  char        *name;          // name of entry
  cfs_attr_t  *attr;          // attributes structure
}   des_t;

It's the name of the directory element (i.e., if you had a file called spud.txt, that would be the name of the directory element) and a pointer to the attributes structure corresponding to that element.

From this we can describe the organization of the data stored in the RAM disk.

The root directory of the RAM disk contains one cfs_attr_t, which is of type struct des_s and holds all of the entries within the root directory. Entries can be files, other directories, or symlinks. If there are 10 entries in the RAM disk's root directory, then nels would be equal to 10 (nalloc would be 64 because that's the “allocate-at-once” size), and the struct des_s member dirblocks would be an array with 64 elements in it (with 10 valid), one for each entry in the root directory.

Each of the 10 struct des_s entries describes its respective element, starting with the name of the element (the name member), and a pointer to the attributes structure for that element.

Figure 1. A directory, with subdirectories and a file, represented by the internal data types.

If the element is a text file (our spud.txt for example), then its attributes structure would use the fileblocks member of the type union, and the content of the fileblocks would be a list of iov_ts, each pointing to the data content of the file.

Note: A direct consequence of this is that we do not support sparse files. A sparse file is one with “gaps” in the allocated space. Some filesystems support this notion. For example, you may write 100 bytes of data at the beginning of the file, lseek() forward 1000000 bytes and write another 100 bytes of data. The file will occupy only a few kilobytes on disk, rather than the expected megabyte, because the filesystem didn't store the “unused” data. If, however, you write one megabyte worth of zeros instead of using lseek(), then the file would actually consume a megabyte of disk storage.

We don't support that, because all of our iov_ts are implicitly contiguous. As an exercise, you could modify the filesystem to have variable-sized iov_ts, with the constant NULL instead of the address member to indicate a “gap.”

If the element was a symbolic link, then the symlinkdata union member is used instead; the symlinkdata member contains a strdup()'d copy of the contents of the symbolic link. Note that in the case of symbolic links, the nels and nalloc members are not used, because a symbolic link can have only one value associated with it.

The mode member of the base attributes structure is used to determine whether we should look at the dirblocks, fileblocks, or symlinkdata union member. (That's why there appears to be no “demultiplexing” variable in the structure itself; we rely on the base one provided by the resource manager framework.)

A question that may occur at this point is, “Why isn't the name stored in the attributes structure?” The short answer is: hard links. A file may be known by multiple names, all hard-linked together. So, the actual “thing” that represents the file is an unnamed object, with zero or more named objects pointing to it. (I said “zero” because the file could be open, but unlinked. It still exists, but doesn't have any named object pointing to it.)