Freedom from Hardware and Platform Dependencies

Common problems
Solutions

Common problems

With the advent of multiplatform support, which involves non-x86 platforms as well as peripheral chipsets across these multiple platforms, we don't want to have to write different versions of device drivers for each and every platform.

While some platform dependencies are unavoidable, let's talk about some of the things that you as a developer can do to minimize the impact. At QNX Software Systems, we've had to deal with these same issues — for example, we support the 8250 serial chip on several different types of processors. Ethernet controllers, SCSI controllers, and others are no exception.

Let's look at these problems:

I/O space vs memory-mapped
Big-endian vs little-endian
alignment and structure packing
atomic operations

I/O space vs memory-mapped

The x86 architecture has two distinct address spaces:

16-address-line I/O space
32-address-line instruction and data space

The processor asserts a hardware line to the external bus to indicate which address space is being referenced. The x86 has special instructions to deal with I/O space (e.g. IN AL, DX vs MOV AL, address). Common hardware design on an x86 indicates that the control ports for peripherals live in the I/O address space. On non-x86 platforms, this requirement doesn't exist — all peripheral devices are mapped into various locations within the same address space as the instruction and code memory.

Big-endian vs little-endian

Big-endian vs little-endian is another compatibility issue with various processor architectures. The issue stems from the byte ordering of multibyte constants. The x86 architecture is little-endian. For example, the hexadecimal number 0x12345678 is stored in memory as:

address contents
      0 0x78
      1 0x56
      2 0x34
      3 0x12

A big-endian processor would store the data in the following order:

address contents
      0 0x12
      1 0x34
      2 0x56
      3 0x78

This issue is worrisome on a number of fronts:

typecast mangling
hardware access
network transparency

The first and second points are closely related.

Typecast mangling

Consider the following code:

func ()
{
    long a = 0x12345678;
    char *p;

    p = (char *) &a;
    printf ("%02X\n", *p);
}

On a little-endian machine, this prints the value “0x78”; on a big-endian machine, it prints “0x12”. This is one of the big (pardon the pun) reasons why structured programmers generally frown on typecasts.

Hardware access

Sometimes the hardware can present you with a conflicting choice of the “correct” size for a chunk of data. Consider a piece of hardware that has a 4 KB memory window. If the hardware brings various data structures into view with that window, it's impossible to determine a priori what the data size should be for a particular element of the window. Is it a 32-bit long integer? An 8-bit character? Blindly performing operations as in the above code sample will land you in trouble, because the CPU will determine what it believes to be the correct endianness, regardless of what the hardware manifests.

Network transparency

These issues are naturally compounded when heterogeneous CPUs are used in a network with messages being passed among them. If the implementor of the message-passing scheme doesn't decide up front what byte order will be used, then some form of identification needs to be done so that a machine with a different byte ordering can receive and correctly decode a message from another machine. This problem has been solved with protocols like TCP/IP, where a defined network byte order is always adhered to, even between homogeneous machines whose byte order differs from the network byte order.

Alignment and structure packing

On the x86 CPU, you can access any sized data object at any address (albeit some accesses are more efficient than others). On non-x86 CPUs, you can't — as a general rule, you can access only N-byte objects on an N-byte boundary. For example, to access a 4-byte long integer, it must be aligned on a 4-byte address (e.g. 0x7FBBE008). An address like 0x7FBBE009 will cause the CPU to generate a fault. (An x86 processor happily generates multiple bus cycles and gets the data anyway.)

Generally, this will not be a problem with structures defined in the header files for Neutrino, as we've taken care to ensure that the members are aligned properly. The major place that this occurs is with hardware devices that can map a window into the address space (for configuration registers, etc.), and protocols where the protocol itself presents data in an unaligned manner (e.g. CIFS/SMB protocol).

Atomic operations

One final problem that can occur with different families of processors, and SMP configurations in general, is that of atomic access to variables. Since this is so prevalent with interrupt service routines and their handler threads, we've already talked about this in the chapter on Writing an Interrupt Handler.

Solutions

Now that we've seen the problems, let's take a look at some of the solutions you can use. The following header files are shipped standard with Neutrino:

<gulliver.h>: isolates big-endian vs little-endian issues
<hw/inout.h>: provides input and output functions for I/O or memory address spaces

Determining endianness

The file <gulliver.h> contains macros to help resolve endian issues. The first thing you may need to know is the target system's endianness, which you can find out via the following macros:

__LITTLEENDIAN__: defined if little-endian
__BIGENDIAN__: defined if big-endian

A common coding style in the header files (e.g. <gulliver.h>) is to check which macro is defined and to report an error if none is defined:

#if defined(__LITTLEENDIAN__)
// do whatever for little-endian
#elif defined(__BIGENDIAN__)
// do whatever for big-endian
#else
#error ENDIAN Not defined for system
#endif

The #error statement will cause the compiler to generate an error and abort the compilation.

Swapping data if required

Suppose you need to ensure that data obtained in the host order (i.e. whatever is “native” on this machine) is returned in a particular order, either big- or little-endian. Or vice versa: you want to convert data from host order to big- or little-endian. You can use the following macros (described here as if they're functions for syntactic convenience):

ENDIAN_LE16()

uint16_t ENDIAN_LE16 (uint16_t var)

If the host is little-endian, this macro does nothing (expands simply to var); else, it performs a byte swap.

ENDIAN_LE32()

uint32_t ENDIAN_LE32 (uint32_t var)

If the host is little-endian, this macro does nothing (expands simply to var); else, it performs a quadruple byte swap.

ENDIAN_LE64()

uint64_t ENDIAN_LE64 (uint64_t var)

If the host is little-endian, this macro does nothing (expands simply to var); else, it swaps octets of bytes.

ENDIAN_BE16()

uint16_t ENDIAN_BE16 (uint16_t var)

If the host is big-endian, this macro does nothing (expands simply to var); else, it performs a byte swap.

ENDIAN_BE32()

uint32_t ENDIAN_BE32 (uint32_t var)

If the host is big-endian, this macro does nothing (expands simply to var); else, it performs a quadruple byte swap.

ENDIAN_BE64()

uint64_t ENDIAN_BE64 (uint64_t var)

If the host is big-endian, this macro does nothing (expands simply to var); else, it swaps octets of bytes.

Accessing unaligned data

To access data on nonaligned boundaries, you have to access the data one byte at a time (the correct endian order is preserved during byte access). The following macros (documented as functions for convenience) accomplish this:

UNALIGNED_RET16()

uint16_t UNALIGNED_RET16 (uint16_t *addr16)

Returns a 16-bit quantity from the address specified by addr16.

UNALIGNED_RET32()

uint32_t UNALIGNED_RET32 (uint32_t *addr32)

Returns a 32-bit quantity from the address specified by addr32.

UNALIGNED_RET64()

uint64_t UNALIGNED_RET64 (uint64_t *addr64)

Returns a 64-bit quantity from the address specified by addr64.

UNALIGNED_PUT16()

void UNALIGNED_PUT16 (uint16_t *addr16, uint16_t val16)

Stores the 16-bit value val16 into the address specified by addr16.

UNALIGNED_PUT32()

void UNALIGNED_PUT32 (uint32_t *addr32, uint32_t val32)

Stores the 32-bit value val32 into the address specified by addr32.

UNALIGNED_PUT64()

void UNALIGNED_PUT64 (uint64_t *addr64, uint64_t val64)

Stores the 64-bit value val64 into the address specified by addr64.

Examples

Here are some examples showing how to access different pieces of data using the macros introduced so far.

Mixed-endian accesses

This code is written to be portable. It accesses little_data (i.e. data that's known to be stored in little-endian format, perhaps as a result of some on-media storage scheme), and then manipulates it, writing the data back. This illustrates that the ENDIAN_*() macros are bidirectional.

uint16_t    native_data;
uint16_t    little_data;

native_data = ENDIAN_LE16 (little_data);// used as "from little-endian"
native_data++;                          // do something with native form
little_data = ENDIAN_LE16 (native_data);// used as "to little-endian"

Accessing hardware with dual-ported memory

Hardware devices with dual-ported memory may “pack” their respective fields on nonaligned boundaries. For example, if we had a piece of hardware with the following layout, we'd have a problem:

Address	Size	Name
`0x18000000`	1	PKTTYPE
`0x18000001`	4	PKTCRC
`0x18000005`	2	PKTLEN

Let's see why.

The first field, PKTTYPE, is fine — it's a 1-byte field, which according to the rules could be located anywhere. But the second and third fields aren't fine. The second field, PKTCRC, is a 4-byte object, but it's not located on a 4-byte boundary (the address is not evenly divisible by 4). The third field, PKTLEN, suffers from a similar problem — it's a 2-byte field that's not on a 2-byte boundary.

The ideal solution would be for the hardware manufacturer to obey the same alignment rules that are present on the target processor, but this isn't always possible. For example, if the hardware presented a raw data buffer at certain memory locations, the hardware would have no idea how you wish to interpret the bytes present — it would simply manifest them in memory.

To access these fields, you'd make a set of manifest constants for their offsets:

#define PKTTYPE_OFF     0x0000
#define PKTCRC_OFF      0x0001
#define PKTLEN_OFF      0x0005

Then, you'd map the memory region via mmap_device_memory(). Let's say it gave you a char * pointer called ptr. Using this pointer, you'd be tempted to:

cr1 = *(ptr + PKTTYPE_OFF);
// wrong!
sr1 = * (uint32_t *) (ptr + PKTCRC_OFF);
er1 = * (uint16_t *) (ptr + PKTLEN_OFF);

However, this would give you an alignment fault on non-x86 processors for the sr1 and er1 lines.

One solution would be to manually assemble the data from the hardware, byte by byte. And that's exactly what the UNALIGNED_*() macros do. Here's the rewritten example:

cr1 = *(ptr + PKTTYPE_OFF);
// correct!
sr1 = UNALIGNED_RET32 (ptr + PKTCRC_OFF);
er1 = UNALIGNED_RET16 (ptr + PKTLEN_OFF);

The access for cr1 didn't change, because it was already an 8-bit variable — these are always “aligned.” However, the access for the 16- and 32-bit variables now uses the macros.

An implementation trick used here is to make the pointer that serves as the base for the mapped area by a char * — this lets us do pointer math on it.

To write to the hardware, you'd again use macros, but this time the UNALIGNED_PUT*() versions:

*(ptr + PKTTYPE_OFF) = cr1;
UNALIGNED_PUT32 (ptr + PKTCRC_OFF, sr1);
UNALIGNED_PUT16 (ptr + PKTLEN_OFF, er1);

Of course, if you're writing code that should be portable to different-endian processors, you'll want to combine the above tricks with the previous endian macros. Let's define the hardware as big-endian. In this example, we've decided that we're going to store everything that the program uses in host order and do translations whenever we touch the hardware:

cr1 = *(ptr + PKTTYPE_OFF);  // endian neutral
sr1 = ENDIAN_BE32 (UNALIGNED_RET32 (ptr + PKTCRC_OFF));
er1 = ENDIAN_BE16 (UNALIGNED_RET16 (ptr + PKTLEN_OFF));

And:

*(ptr + PKTTYPE_OFF) = cr1;  // endian neutral
UNALIGNED_PUT32 (ptr + PKTCRC_OFF, ENDIAN_BE32 (sr1));
UNALIGNED_PUT16 (ptr + PKTLEN_OFF, ENDIAN_BE16 (er1));

Here's a simple way to remember which ENDIAN_*() macro to use. Recall that the ENDIAN_*() macros won't change the data on their respective platforms (i.e. the LE macro will return the data unchanged on a little-endian platform, and the BE macro will return the data unchanged on a big-endian platform). Therefore, to access the data (which we know has a defined endianness), we effectively want to select the same macro as the type of data. This way, if the platform is the same as the type of data present, no changes will occur (which is what we expect).

Accessing I/O ports

When porting code that accesses hardware, the x86 architecture has a set of instructions that manipulate a separate address space called the I/O address space. This address space is completely separate from the memory address space. On non-x86 platforms (PPC, etc.), such an address space doesn't exist — all devices are mapped into memory.

In order to keep code portable, we've defined a number of functions that isolate this behavior. By including the file <hw/inout.h>, you get the following functions:

in8(): Reads an 8-bit value.
in16(), inbe16(), inle16(): Reads a 16-bit value.
in32(), inbe32(), inle32(): Reads a 32-bit value.
in8s(): Reads a number of 8-bit values.
in16s(): Reads a number of 16-bit values.
in32s(): Reads a number of 32-bit values.
out8(): Writes a 8-bit value.
out16(), outbe16(), outle16(): Writes a 16-bit value.
out32(), outbe32(), outle32(): Writes a 32-bit value.
out8s(): Writes a number of 8-bit values.
out16s(): Writes a number of 16-bit values.
out32s(): Writes a number of 32-bit values.

On the x86 architecture, these functions perform the machine instructions in, out, rep ins*, and rep outs*. On non-x86 architectures, they dereference the supplied address (the addr parameter) and perform memory accesses.

The bottom line is that code written for the x86 will be portable to MIPS and PPC. Consider the following fragment:

iir = in8 (baseport);
if (iir & 0x01) {
    return;
}

On an x86 platform, this will perform IN AL, DX, whereas on a MIPS or PPC, it will dereference the 8-bit value stored at location baseport.

Note that the calling process must use mmap_device_io() to access the device's I/O registers.