Compression Rules with the Flash Filesystem
|QNX Neutrino flash filesystem version 3 no longer provides built-in decompression. The flash filesystem's decompression functionality has moved into the inflator resource manager. Also, you should now use the deflate utility to compress files. This technote therefore applies only to FSv2, not FSv3 (the new flash filesystem for 6.3.0). For a reference of these terminologies, please see “Migrating to the New Flash Filesystem.”|
The file compression mechanism provided with our flash filesystem is a convenient way to cut flash memory costs for customers. The flash filesystem uses popular deflate/inflate algorithms for fast and efficient compression/decompression. In short, the deflate algorithm is a combination of two algorithms. The first one takes care of removing data duplication in files; the second algorithm advantages data sequences that appear the most often by giving them shorter symbols. Those two algorithms combined provide excellent lossless compression of data and executable files. What is actually part of the flash filesystem is the inflate algorithm, which reverses what the deflate algorithm does.
The flash filesystem never compresses any files. It only detects compressed files on the media and decompresses them as they are accessed. An abstraction layer embedded in the flash filesystem code achieves efficiency and preserves POSIX compliance. Special compressed data headers on top of the flash files provide fast seek times.
This layering is quite straightforward. Specific I/O functions include handling the three basic access calls for compressed files:
The compression headers contain a synchronization cue, flags, compressed size, and normal size for the data that follows the header. These headers can be used by the three basic access calls to read decompressed data, seek into the file using a virtual offset, and find the effective size of the file.
This is where compression gets tricky. A compressed file will have two sizes:
- virtual size — this is, for the end user, the real size of the decompressed data.
- media size — the size that the file actually occupies on the media.
As a convenience, our flash filesystems offer a handy namespace that totally replicates the regular flash file's namespace, but gives the media sizes when the files are stat()'ed (rather than the virtual or effective size). Using this new namespace, files are never decompressed, so read operations will yield raw compressed data instead of the decompressed data. This namespace is accessible by default through the .cmp mountpoint directory, right under the regular flash mountpoint.
For instance, running the disk usage utility du would be practically meaningless under a flash directory with data that is decompressed on the fly. It wouldn't reflect flash media usage at all. But running the du utility under .cmp would render a better approximation of media usage.
You can use three methods to get compressed files onto your flash:
- <minilzo.h> library
The first method, which is the high-runner case, is to use the mkefs utility. The flashlzo utility can be used as a filter for mkefs to compress the files that get built into the flash filesystem. The files can also be pre-compressed by the flashlzo utility — this will be detected by mkefs and the data will be put on the flash filesystem with the proper information. What information? A simple bit that tells the flash filesystem that the file should be handled by the flash decompression layer.
The second method is to put compressed files on the media by using flashlzo, but on board, straight with the flash filesystem. This is where the .cmp mountpoint is reused. Any file created under this mountpoint will have the compressed attribute bit. Needless to say, these files should be written using the compression headers, which the flashlzo utility does well.
The third method is from C code. In this case, both the headers and the data compressed by the <minilzo.h> library will be written to the flash file. The user application that generates this data is responsible for compressing it and for putting the compression headers and the compressed data properly into the file.
|For more information on putting compressed files onto flash, see the section on “Building a flash filesystem image” in the Building Embedded Systems guide in the Embedding SDK package.|
You use the .cmp mountpoint to create previously compressed files, write previously compressed data, and check the size of compressed files. If you read a file from this mountpoint, the file won't be decompressed for you, as it is in the regular mountpoint. Now this is where we start talking about rules. All this reading and getting the size of files is fairly simple; things get ugly when it's time to write those files.
- When you write to a file created under the .cmp mountpoint, the data must be compressed.
- You can't write all over the place! Although the flash filesystem supports random writes, the same is not true for compressed files.
- Only appends are permitted when writing to a file created from the .cmp mountpoint. This has to be clear and respected, because the flash filesystem will reject any random writes to compressed files.
- The flash filesystem will never transparently compress any data.
- If compressed data needs to be put on the flash during the life of a product, this data has to be pre-compressed.
What if you need to write uncompressed data to a compressed file? You can do this, but it has to be from the regular mountpoint. And the append-only rule applies for this file as well.
|Writing uncompressed data to a compressed file can be quite wasteful, because the uncompressed data will still be encapsulated into compressed headers, so a layer of code will be used for nothing. This means that at system design time, files that are meant to be writable during the product life should not be compressed. Preferably, compressed files will remain read-only.|
As a convenience, though, it's still possible to append compressed or uncompressed data to compressed files. But we have to emphasize that this might not always be the most efficient way to store data. Actually, the compression algorithms need a minimum data set to be able to compress, so the result has to be good enough to justify the header abstraction overhead. Buffering isn't possible with compressed files, so there can't be any assumptions for limited overhead when appending to compressed files.
|Although it's possible to write uncompressed data without the header overhead to a compressed file (provided if done from the .cmp namespace), this isn't a very good idea. The file will lose the capability of rendering virtual uncompressed size and will become unseekable to positions after the first chunk of uncompressed data. The file data will still be readable, but the lost POSIX functionality should dissuade you from trying this.|
So those are the rules, and here is the exception. Truncation is a special case. If a compressed file is opened with O_TRUNC from the regular virtual namespace, the file status will become just as if it were created from this namespace. This gives you full POSIX capabilities and no compression with accompanying restrictions.
The opposite is true: If a non-compressed file is opened with truncation on the .cmp side, then compression rules apply. By the way, the ftruncate() functionality isn't provided with compressed files, but is supported with regular files.
Always consider the slowdown of compressed data access and increased CPU usage when designing a system. We've seen systems with restricted flash budget increase their boot time by large factors when using compression.
The buffer size is also selectable. This buffer represents the decompressed data size that will be associated with each compression header. Of course, a larger buffer size might allow better compression, but RAM usage will be increased for the flash filesystem driver. Buffers greater than 32K won't give any advantage, because this is the size of the <minilzo.h> window. The default buffer size is 4K.
On a slightly different note, don't be tempted to reuse the flash filesystem as a decompression engine over a block-oriented or a network filesystem. These filesystems are often available in very high-storage capacity and high-bandwidth formats. Compression over these media is pure overkill and will always be a waste of CPU resources. The flash filesystem with compression is really meant for restrained systems — the best approach for long-term life of a product is to read Moore's Law* carefully. This law is true for flash as well, so plan ahead.
|*||In 1965, Intel co-founder Gordon Moore observed that the pace of microchip technology change is such that the amount of data storage a microchip can hold doubles every year.|