As mentioned above, there are two main things that happen in the news-processing world: news comes in, and news expires.

The first trick is to realize that most news expires unread, and all news expires at some point. So by looking at the header for the article, we can determine when the article will expire, and place it in a file with all the other articles that will expire at the same time:


Here we have three files, with the filenames representing the date that the batch of news articles expires, ranging from August 4, 2003 to August 6, 2003.

All we need to do is make a virtual filesystem that knows how to index into a real on-disk file, at a certain offset, for a certain length, and present those contents as if they were the real contents of the virtual file. Well, we've just done the exact same thing with the .tar filesystem! Effectively, (with a few optimizations) what we're doing is very similar to creating several different .tar files, and placing articles into those files. The files are named based on when they expire. When an article is added to the end of a file, we track its name (like /var/spool/news/comp/os/qnx/1145), the name of the bulk file we put it into, and its offset and size. (Some of the optimizations stem from the fact that we don't need to be 512-byte aligned, and that we don't need a header in the article's storage file.)

Figure 1. Articles from different newsgroups stored in a bulk file that expires on August 4, 2003.

When the time comes to expire old news, we simply remove the bulk file and any in-memory references to it.

This is the real beauty of this filesystem, and why we gain so much improvement from doing things this way than the original way. We've changed our disk access from writing tons of tiny files all over the disk to now writing sequentially to a few very large files. For expiration, we've done the same thing—we've changed from deleting tons of tiny files all over the disk to deleting one very large file when it expires. (The in-memory virtual filesystem deletion of the files is very fast; orders of magnitude faster than releasing blocks on disk.)

Note: To give you an idea of the efficiency of this approach, consider that this system was running on a QNX 4 box in the early 1990s, with a 386 at 20 MHz, and a “full” news feed (19.2 kilobaud Trailblazer modem busy over 20 hours per day). When running with cnews, the hard disk was constantly thrashing. When running with VFNews, there was no disk activity, except every five seconds when a cache was flushed. In those days, system administrators would run cnews “expiry” once a day because of the overhead. I was able to run it once per hour with no noticeable impact on the CPU/disk! Also, ISPs would replace their hard disks every so often due to the amount of disk thrashing that was occurring.