Accessing hardware with dual-ported memory

Hardware devices with dual-ported memory may "pack" their respective fields on nonaligned boundaries.

For example, if we had a piece of hardware with the following layout, we'd have a problem:

Address Size Name
0x18000000 1 PKTTYPE
0x18000001 4 PKTCRC
0x18000005 2 PKTLEN

Let's see why.

The first field, PKTTYPE, is fine—it's a 1-byte field, which according to the rules could be located anywhere. But the second and third fields aren't fine. The second field, PKTCRC, is a 4-byte object, but it's not located on a 4-byte boundary (the address is not evenly divisible by 4). The third field, PKTLEN, suffers from a similar problem—it's a 2-byte field that's not on a 2-byte boundary.

The ideal solution would be for the hardware manufacturer to obey the same alignment rules that are present on the target processor, but this isn't always possible. For example, if the hardware presented a raw data buffer at certain memory locations, the hardware would have no idea how you wish to interpret the bytes present—it would simply manifest them in memory.

To access these fields, you'd make a set of manifest constants for their offsets:

#define PKTTYPE_OFF     0x0000
#define PKTCRC_OFF      0x0001
#define PKTLEN_OFF      0x0005

Then, you'd map the memory region via mmap_device_memory(). Let's say it gave you a char * pointer called ptr. Using this pointer, you'd be tempted to:

cr1 = *(ptr + PKTTYPE_OFF);
// wrong!
sr1 = * (uint32_t *) (ptr + PKTCRC_OFF);
er1 = * (uint16_t *) (ptr + PKTLEN_OFF);

However, this would give you an alignment fault on non-x86 processors for the sr1 and er1 lines.

One solution would be to manually assemble the data from the hardware, byte by byte. And that's exactly what the UNALIGNED_*() macros do. Here's the rewritten example:

cr1 = *(ptr + PKTTYPE_OFF);
// correct!
sr1 = UNALIGNED_RET32 (ptr + PKTCRC_OFF);
er1 = UNALIGNED_RET16 (ptr + PKTLEN_OFF);

The access for cr1 didn't change, because it was already an 8-bit variable—these are always "aligned." However, the access for the 16- and 32-bit variables now uses the macros.

An implementation trick used here is to make the pointer that serves as the base for the mapped area by a char *—this lets us do pointer math on it.

To write to the hardware, you'd again use macros, but this time the UNALIGNED_PUT*() versions:

*(ptr + PKTTYPE_OFF) = cr1;
UNALIGNED_PUT32 (ptr + PKTCRC_OFF, sr1);
UNALIGNED_PUT16 (ptr + PKTLEN_OFF, er1);

Of course, if you're writing code that should be portable to different-endian processors, you'll want to combine the above tricks with the previous endian macros. Let's define the hardware as big-endian. In this example, we've decided that we're going to store everything that the program uses in host order and do translations whenever we touch the hardware:

cr1 = *(ptr + PKTTYPE_OFF);  // endian neutral
sr1 = ENDIAN_BE32 (UNALIGNED_RET32 (ptr + PKTCRC_OFF));
er1 = ENDIAN_BE16 (UNALIGNED_RET16 (ptr + PKTLEN_OFF));

And:

*(ptr + PKTTYPE_OFF) = cr1;  // endian neutral
UNALIGNED_PUT32 (ptr + PKTCRC_OFF, ENDIAN_BE32 (sr1));
UNALIGNED_PUT16 (ptr + PKTLEN_OFF, ENDIAN_BE16 (er1));

Here's a simple way to remember which ENDIAN_*() macro to use. Recall that the ENDIAN_*() macros won't change the data on their respective platforms (i.e. the LE macro will return the data unchanged on a little-endian platform, and the BE macro will return the data unchanged on a big-endian platform). Therefore, to access the data (which we know has a defined endianness), we effectively want to select the same macro as the type of data. This way, if the platform is the same as the type of data present, no changes will occur (which is what we expect).