So why is this a problem?

There are a number of reasons why this isn't an ideal solution. Articles can arrive in any order—we're not always going to get all of the articles for comp.os.qnx, then all of the articles for comp.os.rsx11, then comp.os.svr4, etc. This means that as the articles arrive, the news storage program is creating files in an ad-hoc manner, all over the disk. Not only that, it's creating from a few hundred thousand to many millions of files per day, depending on the size of the feed! (This works out to tens to hundreds or more files per second! The poor disk—and filesystem—is getting quite a workout.)

Given the volume of news these days, even terabyte-sized disks would fill up fairly quickly. So, all news systems have an expiry policy, which describes how long various articles hang around before being deleted. This is usually tuned based on the newsgroup, and can range from a few days to weeks or even months for low-traffic newsgroups. With current implementations, the expiry processing takes a significant amount of time; sometimes, the expiry processing will take so long that it appears that the machine is doing nothing but expiring!

The problem is that tons of files are being created in random places on the disk each second, and also roughly the same number of files being deleted, from different random places each second. This is suboptimal, and is exacerbated by the fact that each article even gets copied around a few times before ending up in its final location.