cache_init()
Register with the cache-coherency library
Synopsis:
#include <sys/cache.h>
int cache_init(int flags,
struct cache_ctrl *cinfo,
const char *dllname);
Arguments:
- flags
- Zero, or flags that control the behavior of the cache-coherency library.
The only flag currently defined is:
- CACHE_INIT_FLAG_IGNORE_SCAN — specify that memory accesses to and from the device aren't snooped, whether or not the data caches in the system are reported as snooping or not. This might be required for devices in the system that bypass the bus-snooping mechanism, or as a workaround for hardware bugs.
- cinfo
- A pointer to a cache_ctrl structure that contains state information. The library uses this structure to store various information that uses when performing synchronization operations. The driver should allocate and initialize this structure. All of the members of the structure should be initialized with zeros, with the exception of the fd field. For more information regarding the members of this structure, see the description section.
- dllname
- The path of a DLL to load that provides cache-synchronization routines, or NULL to use the library specified by the LIBCACHE_DLL_PATH environment variable.
Library:
libcache
Use the -l cache option to qcc to link against this library.
Description:
The cache_init() function initializes the cache coherency library (libcache). Your driver must call cache_init() before using the library.
Members of the cache_ctrl structure
The cache_ctrl structure includes at least the following members:
- cache_line_size
- When this function returns, this field will specify the size, in bytes, of a cache line's worth of data. If the system implements a bus-snooping protocol, this field may contain zero.
- cache_flush_rate
- Provides a runtime indication to the driver, of the cost of flushing the cache:
- CACHE_OP_RATE_SNOOP
- Due to a bus-snooping mechanism, a cache flush operation has negligible cost.
- CACHE_OP_RATE_INLINE
- Cache flush operations are implemented with CPU-specific inline routines, and are inexpensive.
- CACHE_OP_RATE_CALLOUT
- Cache flush operations are implemented by calling an external function, which incurs a small CPU overhead.
- CACHE_OP_RATE_MSYNC
- Cache flush operations are implemented by calling msync(). Since this function is implemented with a system call, the operation will be very expensive. It is very unlikely that the library will end up calling msync(), but in the event that it does, the driver could potentially achieve better performance by mapping data buffers as noncacheable, so that it can avoid having to perform cache synchronization operations.
- cache_invalidate_rate
- Provides a runtime indication to the driver of the cost of invalidating the cache. The defined values for this field are similar to those defined for the cache_flush_rate field.
- fd
- The driver should set this field to NOFD.
Don't reference or modify the other fields in the structure.
Cache coherency
Device drivers for hardware that performs Direct Memory Access (DMA) use this cache coherency library. These devices are either bus-mastering devices that can directly read or modify memory, or devices that use a DMA controller to transfer data between the device and memory. The key factor is that memory may be accessed by an agent other than the CPU(s).
On some systems, cache coherency is performed by a bus-snooping protocol that is implemented in hardware. In such systems, the CPU(s) snoop all transactions on the memory bus and automatically keep their caches in sync with the memory.
For example, if an external agent modifies data at a given memory location, the CPU will observe the memory write cycle, and if it has a cached copy of the data at that memory location, it will invalidate the corresponding cache entry. The next time the CPU tries to examine the memory location, it will fetch the modified data from memory, instead of retrieving stale data from the cache.
Similarly, special action is taken if an external agent attempts to read a memory location, and a CPU has modified the memory location, but the modified copy of the data is in its cache, and hasn't yet been written to memory. In this case, the read cycle is suspended while the CPU copies the updated version of the data out to memory. Once memory has been updated with the modified version, the read cycle can continue, and the external agent gets the updated copy of the data.
On other systems, where there is no such snooping protocol implemented in hardware, cache coherency between the CPU and external devices must be maintained by driver software. These are typically single-CPU systems, since on SMP systems, bus-snooping is the usual mechanism of keeping the CPUs in sync with each other. To work on these systems, special action needs to be taken by driver software, to ensure data coherency between the CPU and external devices.
A driver ensures data coherency for systems that don't have a bus-snooping
protocol in different ways. The first one
is the big hammer
approach that simply disables caching
for any memory location that can be accessed by both the CPU and
the external device. This approach, however, has a severe performance
penalty; for some network devices on certain systems, the throughput reduces
to roughly half of the original value.
You can solve the above throughput problem by using cacheable data buffers, but perform synchronization operations on the data buffers at strategic points in the driver. For example, in the case of packet transmission for a network device, the driver must ensure that any data pertaining to the packet had been flushed out to memory, before allowing the DMA agent to begin copying the data. In the case of packet reception, the driver should invalidate any cached data pertaining to the packet buffer, before letting the DMA agent transfer data into the receive buffer. This eliminates the possibility that the CPU could write a cached portion of the data out to the memory buffer, after the network device had updated the buffer with new packet data.
The driver can perform cache flushing and invalidation operations in one of two ways. It can issue special CPU-dependent instructions that operate on the cache, or it can use the cache-coherency library (libcache). The latter approach is preferable, since it makes your driver portable. The library performs the correct thing based on the type of CPU it's running on. For maximum portability, the library must be used whether the system has a bus-snooping protocol or not. If the system implements a bus-snooping protocol, the library determines this, and ensures that there are no unnecessary synchronization operations being performed.
Returns:
- 0
- Success.
- -1
- Failure; errno is set. If this function fails, it isn't safe for devices to DMA to or from cacheable buffers. Additionally, calling other functions with the cache_ctrl structure that was provided will have unpredictable results.
Classification:
Safety: | |
---|---|
Cancellation point | Yes |
Signal handler | Yes |
Thread | Yes |
Caveats:
The cache-invalidation operation could have certain negative side effects, which the driver must take measures to avoid. If a data buffer partially shares a cache line with some other piece of data (including another data buffer), data corruption could occur. Since the invalidation is performed with cache-line granularity, invalidating data at the start or end of the buffer could potentially invalidate important data, such as program variables, which means that changes made to the data by the CPU could be inadvertently lost.
You can avoid this by padding the data buffers. The driver should pad the start of the buffer, so that it starts on the next cache line boundary. It should also pad the buffer out to the end of the last cache line of the buffer. To do this, the driver can use the cache_line_size field of the cache_ctrl structure. Note that this value could be zero (e.g., if there is a cache-coherency protocol implemented in hardware), in which case the driver doesn't need to do any padding.