Ordering the updates to metadata

Some filesystem operations affect multiple blocks on disk. For example, consider the situation of creating or deleting a file. Most filesystems separate the name of the file (or link) from the actual attributes of the file (the inode); this supports the POSIX concept of hard links, multiple names for the same file.

Typically, the inodes reside in a fixed location on disk (the .inodes file for fs-qnx4.so, or in the header of each cylinder group for fs-ext2.so).

Creating a new filename thus involves allocating a free inode entry and populating it with the details for the new file, and then placing the name for the file into the appropriate directory. Deleting a file involves removing the name from the parent directory and marking the inode as available.

These operations must be performed in this order to prevent corruption should there be a power failure between the two writes; note that for creation the inode should be allocated before the name, as a crash would result in an allocated inode that isn't referenced by any name (an "orphaned resource" that a filesystem's check procedure can later reclaim). If the operations were performed the other way around and a power failure occurred, the result would be a name that refers to a stale or invalid inode, which is undetectable as an error. A similar argument applies, in reverse, for file deletion.

For traditional filesystems, the only way of ordering these writes is to perform the first one (or, more generally, all but the last one of a multiple-block sequence) synchronously (i.e., immediately and waiting for I/O to complete before continuing). A synchronous write is very expensive, because it involves a disk-head seek, interrupts any active sequential disk streaming, and blocks the thread until the write has completed—potentially milliseconds of dead time.