At the highest level, the analyze_tar_function() function opens the .tar file, processes each file inside by calling add_tar_entry(), and then closes the .tar file. There's a wonderful library called zlib, which lets us open even compressed files and pretend that they are just normal, uncompressed files. That's what gives us the flexibility to open either a .tar or a .tar.gz file with no additional work on our part. (The limitation of the library is that seeking may be slow, because decompression may need to occur.)
int
analyze_tar_file (cfs_attr_t *a, char *fname)
{
gzFile fd;
off_t off;
ustar_t t;
int size;
int sts;
char *f;
// 1) the .tar (or .tar.gz) file must exist :-)
if ((fd = gzopen (fname, "r")) == NULL) {
return (errno);
}
off = 0;
f = strdup (fname);
// 2) read the 512-byte header into "t"
while (gzread (fd, &t, sizeof (t)) > 0 && *t.name) {
dump_tar_header (off, &t);
// 3) get the size
sscanf (t.size, "%o", &size);
off += sizeof (t);
// 4) add this entry to the database
if (sts = add_tar_entry (a, off, &t, f)) {
gzclose (fd);
return (sts);
}
// 5) skip the data for the entry
off += ((size + 511) / 512) * 512;
gzseek (fd, off, SEEK_SET);
}
gzclose (fd);
return (EOK);
}
The code walkthrough is:
In step 5 we skip the file content. I'm surprised that not all of today's tar utilities do this when they're dealing with files—doing a tar tvf to get a listing of the tar file takes forever for huge files!