Creating a .tar file

So following along in this command-line session, I'll show you the resulting .tar file:

# ls -la
total 73
drwxrwxr-x  2 root      root   4096 Aug 17 17:31 ./
drwxrwxrwt  4 root      root   4096 Aug 17 17:29 ../
-rw-rw-r--  1 root      root   1076 Jan 14  2003 io_read.c
-rw-rw-r--  1 root      root    814 Jan 12  2003 io_write.c
-rw-rw-r--  1 root      root   6807 Feb 03  2003 main.c
-rw-rw-r--  1 root      root  11883 Feb 03  2003 tarfs.c
-rw-rw-r--  1 root      root    683 Jan 12  2003 tarfs.h
-rw-rw-r--  1 root      root   6008 Jan 15  2003 tarfs_io_read.c

# tar cvf x.tar *
io_read.c
io_write.c
main.c
tarfs.c
tarfs.h
tarfs_io_read.c

# ls -l x.tar
-rw-rw-r--  1 root      root  40960 Aug 17 17:31 x.tar

Here I've taken some of the source files in a directory and created a .tar file (called x.tar) that ends up being 40960 bytes—a nice multiple of 512 bytes, as we'd expect.

Each of the files is prefixed by a header in the .tar file, followed by the file content, aligned to a 512-byte boundary.

This is what each header looks like:

Offset Length Field Name
0 100 name
100 8 mode
108 8 uid
116 8 gid
124 12 size
136 12 mtime
148 8 chksum
156 1 typeflag
157 100 linkname
257 6 magic
263 2 version
265 32 uname
297 32 gname
329 8 devmajor
337 8 devminor
345 155 prefix
500 11 filler

Here's a description of the fields that we're interested in for the filesystem (all fields are ASCII octal unless noted otherwise):

name
The name of the stored entity (plain ASCII).
mode
The mode: read, write, execute permissions, as well as what the entity is (a file, symlink, etc.).
uid
The user ID.
gid
The group ID.
size
The size of the resource (symlinks and links get a 0 size).
typeflag
POSIX says one thing, GNU says another. Under POSIX, this is supposed to be one of the single characters “g,” “x,” or “0.” Under GNU, this is one of the single ASCII digits zero through seven, or an ASCII NUL character, indicating different types of entities. Sigh—“The nice thing about standards is there are so many to choose from.”
mtime
The modification time.
linkname
The name of the file that this file is linked to (or blank if not linked), in plain ASCII.

We've skipped a bunch of fields, such as the checksum, because we don't need them for our filesystem. (For the checksum, for example, we're simply assuming that the file has been stored properly—in the vast majority of cases, it's not actually on an antique 9-track tape—so data integrity shouldn't be a problem!)

What I meant above by “ASCII octal” fields is that the value of the number is encoded as a sequence of ASCII digits in base 8. Really.

For example, here's the very first header in the sample x.tar that we created above (addresses on the left-hand side, as well as the dump contents, are in hexadecimal, with printable ASCII characters on the right-hand side):

0000000 69 6f 5f 72 65 61 64 2e 63 00 00 00 00 00 00 00 io_read.c.......
0000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000060 00 00 00 00 30 31 30 30 36 36 34 00 30 30 30 30 ....0100664.0000
0000070 30 30 30 00 30 30 30 30 30 30 30 00 30 30 30 30 000.0000000.0000
0000080 30 30 30 32 30 36 34 00 30 37 36 31 31 31 34 31 0002064.07611141
0000090 34 36 35 00 30 31 31 33 33 34 00 20 30 00 00 00 465.011334..0...
00000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000100 00 75 73 74 61 72 20 20 00 72 6f 6f 74 00 00 00 .ustar...root...
0000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000120 00 00 00 00 00 00 00 00 00 72 6f 6f 74 00 00 00 .........root...
0000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00001A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00001B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00001C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00001D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00001E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00001F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

Here are the fields that we're interested in for our .tar filesystem:

Offset 0
The name of the entity, in this case io_read.c.
Offset 100 (0x64 hex)
The ASCII octal digits 0100664 representing S_IFREG (indicating this is a regular file), with a permission mode of 0664 (indicating it's readable by everyone, but writable by the owner and group only).
Offset 108 (0x6C)
The ASCII octal digits 0000000 representing the user ID (in this case, 0, or root).
Offset 116 (0x74)
The ASCII octal digits 0000000 representing the group ID (0000000, or group 0).
Offset 124 (0x7C)
The ASCII octal digits 00000002064 (or decimal 1076) representing the size of the entity. (This does present a limit to the file size of 77777777777 octal, or 8 gigabytes—not bad for something invented in the days of hard disks the size of washing machines with capacities of tens of megabytes!).
Offset 136 (0x88)
The ASCII octal digits 07611141465 (or decimal 1042596661 seconds after January 1st, 1970, which really converts to “Tue Jan 14 21:11:01 EST 2003.”)

The one interesting wrinkle has to do with items that are in subdirectories.

Depending on how you invoked tar when the archive was created, you may or may not have the directories listed individually within the .tar file. What I mean is that if you add the file dir/spud.txt to the archive, the question is, is there a tar header corresponding to dir? In some cases there will be, in others there won't, so our .tar filesystem will need to be smart enough to create any intermediate directories that aren't explicitly mentioned in the headers.

Note that in all cases the full pathname is given; that is, we will always have dir/spud.txt. We never have a header for the directory for dir followed by a header for the file spud.txt; we'll always get the full path to each and every component listed in a .tar file.

Let's stop and think about how this resource manager compares to the RAM-disk resource manager in the previous chapter. If you squint your eyes a little bit, and ignore a few minor details, you can say they are almost identical. We need to:

The only thing that's really different is that instead of storing the file contents in RAM, we're storing them on disk! (The fact that this is a read-only filesystem isn't really a difference, it's a subset.)