Updated: April 19, 2023 |
Register with the cache-coherency library
#include <sys/cache.h> int cache_init(int flags, struct cache_ctrl *cinfo, const char *dllname);
The cache_init() function initializes the cache coherency library (libcache). Your driver must call cache_init() before using the library.
Members of the cache_ctrl structure
The cache_ctrl structure includes at least the following members:
Don't reference or modify the other fields in the structure.
Cache coherency
Device drivers for hardware that performs Direct Memory Access (DMA) use this cache coherency library. These devices are either bus-mastering devices that can directly read or modify memory, or devices that use a DMA controller to transfer data between the device and memory. The key factor is that memory may be accessed by an agent other than the CPU(s).
On some systems, cache coherency is performed by a bus-snooping protocol that is implemented in hardware. In such systems, the CPU(s) snoop all transactions on the memory bus and automatically keep their caches in sync with the memory.
For example, if an external agent modifies data at a given memory location, the CPU will observe the memory write cycle, and if it has a cached copy of the data at that memory location, it will invalidate the corresponding cache entry. The next time the CPU tries to examine the memory location, it will fetch the modified data from memory, instead of retrieving stale data from the cache.
Similarly, special action is taken if an external agent attempts to read a memory location, and a CPU has modified the memory location, but the modified copy of the data is in its cache, and hasn't yet been written to memory. In this case, the read cycle is suspended while the CPU copies the updated version of the data out to memory. Once memory has been updated with the modified version, the read cycle can continue, and the external agent gets the updated copy of the data.
On other systems, where there is no such snooping protocol implemented in hardware, cache coherency between the CPU and external devices must be maintained by driver software. These are typically single-CPU systems, since on SMP systems, bus-snooping is the usual mechanism of keeping the CPUs in sync with each other. To work on these systems, special action needs to be taken by driver software, to ensure data coherency between the CPU and external devices.
A driver ensures data coherency for systems that don't have a bus-snooping protocol in different ways. The first one is the big hammer approach that simply disables caching for any memory location that can be accessed by both the CPU and the external device. This approach, however, has a severe performance penalty; for some network devices on certain systems, the throughput reduces to roughly half of the original value.
You can solve the above throughput problem by using cacheable data buffers, but perform synchronization operations on the data buffers at strategic points in the driver. For example, in the case of packet transmission for a network device, the driver must ensure that any data pertaining to the packet had been flushed out to memory, before allowing the DMA agent to begin copying the data. In the case of packet reception, the driver should invalidate any cached data pertaining to the packet buffer, before letting the DMA agent transfer data into the receive buffer. This eliminates the possibility that the CPU could write a cached portion of the data out to the memory buffer, after the network device had updated the buffer with new packet data.
The driver can perform cache flushing and invalidation operations in one of two ways. It can issue special CPU-dependent instructions that operate on the cache, or it can use the cache-coherency library (libcache). The latter approach is preferable, since it makes your driver portable. The library performs the correct thing based on the type of CPU it's running on. For maximum portability, the library must be used whether the system has a bus-snooping protocol or not. If the system implements a bus-snooping protocol, the library determines this, and ensures that there are no unnecessary synchronization operations being performed.
Safety: | |
---|---|
Cancellation point | Yes |
Interrupt handler | No |
Signal handler | Yes |
Thread | Yes |
The cache-invalidation operation could have certain negative side effects, which the driver must take measures to avoid. If a data buffer partially shares a cache line with some other piece of data (including another data buffer), data corruption could occur. Since the invalidation is performed with cache-line granularity, invalidating data at the start or end of the buffer could potentially invalidate important data, such as program variables, which means that changes made to the data by the CPU could be inadvertently lost.
You can avoid this by padding the data buffers. The driver should pad the start of the buffer, so that it starts on the next cache line boundary. It should also pad the buffer out to the end of the last cache line of the buffer. To do this, the driver can use the cache_line_size field of the cache_ctrl structure. Note that this value could be zero (e.g., if there is a cache-coherency protocol implemented in hardware), in which case the driver doesn't need to do any padding.