The analyze_tar_file() function

At the highest level, the analyze_tar_function() function opens the .tar file, processes each file inside by calling add_tar_entry(), and then closes the .tar file. There's a wonderful library called zlib, which lets us open even compressed files and pretend that they are just normal, uncompressed files. That's what gives us the flexibility to open either a .tar or a .tar.gz file with no additional work on our part. (The limitation of the library is that seeking may be slow, because decompression may need to occur.)

int
analyze_tar_file (cfs_attr_t *a, char *fname)
{
    gzFile  fd;
    off_t   off;
    ustar_t t;
    int     size;
    int     sts;
    char    *f;

    // 1) the .tar (or .tar.gz) file must exist :-)
    if ((fd = gzopen (fname, "r")) == NULL) {
        return (errno);
    }

    off = 0;
    f = strdup (fname);

    // 2) read the 512-byte header into "t"
    while (gzread (fd, &t, sizeof (t)) > 0 && *t.name) {
        dump_tar_header (off, &t);

        // 3) get the size
        sscanf (t.size, "%o", &size);
        off += sizeof (t);

        // 4) add this entry to the database
        if (sts = add_tar_entry (a, off, &t, f)) {
            gzclose (fd);
            return (sts);
        }

        // 5) skip the data for the entry
        off += ((size + 511) / 512) * 512;
        gzseek (fd, off, SEEK_SET);
    }
    gzclose (fd);

    return (EOK);
}

The code walkthrough is:

  1. The zlib library makes things look just like an fopen() call.
  2. We read each header into the variable t, and optionally dump the header if case debug is enabled.
  3. We read the ASCII octal size of the file following the header, then store it.
  4. The real work is done in add_tar_entry().
  5. The best part is that we skip the file content, which makes loading fast.

In step 5 we skip the file content. I'm surprised that not all of today's tar utilities do this when they're dealing with files—doing a tar tvf to get a listing of the tar file takes forever for huge files!