Writing Network Drivers for io-sock

Updated: April 19, 2023

This appendix is intended to help you understand and write network drivers for io-sock.

Any network driver can be viewed as the “glue” between the underlying network hardware, and the software infrastructure of io-sock, the protocol stack above it. The “bottom half” of the driver is coded specifically for the particular hardware it supports, and the “top half” of the driver is coded specifically for io-sock.

This appendix deals specifically with the “top half” of the driver, which deals with the io-sock software infrastructure.

Drivers for io-sock are writtten using the same driver APIs as FreeBSD drivers. In the following discussion and examples, the API function calls and structures (up to and including “Modifying the makefile”) are the same as FreeBSD. This common API allows you to compile FreeBSD driver source code for io-sock with few to no code changes.

The io-sock network stack and networking drivers are also provided in diagnostic versions, which are useful when you are developing network drivers. For more information, see “Running io-sock with diagnostic features.”

Driver versioning

To work with io-sock, your driver must establish its version. This driver API version should match the current io-sock driver API version (currently 1). The version value is used to reject driver libraries (e.g., if a future version of the io-sock driver API introduces changes that are incompatible with earlier versions).

The version is checked whenever a driver is loaded to ensure that it is compatible.

For example:

…
#include <qnx/qnx_modload.h>
…
int drvr_ver = IOSOCK_VERSION_CUR;
SYSCTL_INT(_qnx_driver, OID_AUTO, sample_drvr, CTLFLAG_RD, &drvr_ver, 0,
            "Version");
…
struct _iosock_module_version iosock_module_version = IOSOCK_MODULE_VER_SYM_INIT;
static void
sample_uninit(void *arg)
{
}
SYSUNINIT(sample_uninit, SI_SUB_DUMMY, SI_ORDER_ANY, sample_uninit, NULL);

You can use sysctl to display driver versions. For example:

sysctl qnx.driver
qnx.driver.libusbdci: 1
qnx.driver.libpci: 1
qnx.driver.phy: 1

Registering the driver module

The driver module uses a device_method_t structure to provide probe, attach, detach, and shutdown callbacks. If required, it also provides implementations for PHY read, write, and status change callbacks, which are hardware-specific.

For example:

static device_method_t sample_methods[] = {       
    DEVMETHOD(device_probe,         sample_probe),       
    DEVMETHOD(device_attach,        sample_attach),       
    DEVMETHOD(device_detach,        sample_detach),       
    DEVMETHOD(device_shutdown,      sample_shutdown),

    /* MII Interface Callback*/       
    DEVMETHOD(miibus_readreg,       sample_miibus_read_reg),       
    DEVMETHOD(miibus_writereg,      sample_miibus_write_reg),       
    DEVMETHOD(miibus_statchg,       sample_miibus_statchg),

    DEVMETHOD_END
};

driver_t sample_driver ={       
    "sample",       
    sample_methods,       
    sizeof(struct sample_softc),
};
...

DRIVER_MODULE(sample, simplebus, sample_driver, sample_devclass, 0, 0);
For more information about device_method_t, see https://www.freebsd.org/cgi/man.cgi?query=driver&sektion=9&manpath=FreeBSD+13.0-RELEASE+and+Ports.

The PHY callbacks are discussed in Loading PHY-specific handling using media-independent interface (MII).”

The DRIVER_MODULE macro registers the driver with the system and adds it to the list of device drivers for a particular bus type (held by a devclass object). For more information, see https://www.freebsd.org/cgi/man.cgi?query=DRIVER_MODULE&sektion=9&manpath=FreeBSD+13.0-RELEASE+and+Ports and https://www.freebsd.org/cgi/man.cgi?query=devclass&sektion=9&manpath=FreeBSD+13.0-RELEASE+and+Ports.

Checking if a device is supported

Use the probe function (e.g., sample_probe()) to determine if the device is supported by searching for suitable devices in the device tree that have matching information in the OFB (Open Firmware Bus/DTB), PCI, or USB info. For example:
sample_probe(device_t dev)
{
        if (!ofw_bus_status_okay(dev)) {
                return (ENXIO);
        }

        if (!ofw_bus_is_compatible(dev, "Hardware Descriptor string from DTB file")) {
                return (ENXIO);
        }
        device_set_desc(dev, "Sample Controller");

        return (BUS_PROBE_DEFAULT);
}

The device_t argument is the pointer type for the structure dev. All API functions have device_t as a first parameter.

When a device is loaded, all probe functions that are associated with the bus are called. The value that the probe function returns determines which driver is the best one to use for the device. For example, BUS_PROBE_DEFAULT indicates that the device is a normal device matching a Plug and Play ID and is the normal return value for drivers to use. For a list and description of all the conventional return values, see https://www.freebsd.org/cgi/man.cgi?query=DEVICE_PROBE&sektion=9&manpath=FreeBSD+13.0-RELEASE+and+Ports.

For example, devs-re.so, devs-em.so, and devs-ixgbe.so drivers are loaded. If you add a re PCI device, all probe functions are called (re_probe(), em_probe(), ixgbe_probe()). The return values indicate that devs-re.so is the best driver to use and the re_attach() function is called. Although typically there is only one match, io-sock supports having multiple drivers that support the same hardware. For example, a system may have a generic driver that supports all PHY devices in addition to more specific drivers.

You implement the probe callback as part of driver module initialization (see Registering the driver module).

Attaching a device

The attach function (e.g., sample_attach) is executed for each device that is detected. This callback implementation should contain everything needed to initialize the hardware, allocate resources, attach to the interrupt, initialize PHY, and so on. You implement it as part of driver module initialization (see Registering the driver module).

For example, use sample_attach() to set the sc variable, which specifies the driver-specific software context of dev, using device_get_softc():

sample_attach(device_t dev) {
...
	sc = device_get_softc(dev);
...
In most cases, sc is a pointer to a driver-specific structure where all driver information is kept. The software context is automatically allocated and zeroed when the device is attached.

A node variable set by sample_attach() can provide a handle to a specific place within the DT file where hardware information related to that device is kept. For example:

...
    node = ofw_bus_get_node(dev);
...

Example of a DT file:

ethernet@4033c000 {                               
      compatible = "Hardware Descriptor string from DTB file";
      reg = <0x0 0x4033c000 0x0 0x2000 0x0 0x4007c004 0x0 0x4>;
      interrupt-parent = <0x1>;
      interrupts = <0x0 0x39 0x4>;
      interrupt-names = "macirq";
      tx-fifo-depth = <0x5000>;
      rx-fifo-depth = <0x5000>;
      clocks = <0x4 0x2e 0x4 0x2e 0x4 0x38>;
      clock-names = "stmmaceth", "pclk", "tx";
      pinctrl-names = "default";
      pinctrl-0 = <0x1c>;
      phy-mode = "rgmii";
      status = "okay";

The following code from the example sample_attach() reads a string value from a DT file:

if (OF_getprop_alloc(node, "phy-mode", (void **)&phy_mode)) {
    if (strcmp(phy_mode, "rgmii") == 0) {
        sc->phy_mode = PHY_MODE_RGMII;
    }
    if (strcmp(phy_mode, "rmii") == 0) {
        sc->phy_mode = PHY_MODE_RMII;
    }
    OF_prop_free(phy_mode);
}

Allocating resources

Use bus_alloc_resources() to allocate resources from a parent bus. For example:

...
    bus_alloc_resources(dev, sample_spec, sc->res);
...

The res argument is an array of type struct that is used for both mapping the memory region used by the device and interrupt mapping. The size of the array depends on the actual number of memory regions and interrupts the hardware uses. The sample_spec argument is an array that holds the description of a specific resource type. For example:

static struct resource_spec sample_spec[] = {
        { SYS_RES_MEMORY,       0,      RF_ACTIVE },
        { SYS_RES_IRQ,          0,      RF_ACTIVE },
        RESOURCE_SPEC_END
};

After bus_alloc_resources() is executed, the third parameter (RF_ACTIVE) holds an address pointer to a memory resource or region, or a pointer for an interrupt resource. This parameter can be passed to bus_setup_intr().

Registering interrupts

Use bus_setup_intr() to perform interrupt registration within sample_attach(). The following example creates and attaches an interrupt handler to an interrupt that has been allocated by bus_alloc_resources():
bus_setup_intr(dev, sc->res[1], INTR_TYPE_NET | INTR_MPSAFE, NULL,
    sample_intr, sc, &sc->intr_cookie);
The interrupt handler does not run in the kernel interrupt context. Instead, a dedicated thread is created to handle the interrupt handling.

The &sc->intr_cookie argument is a pointer to a void pointer that bus_setup_intr() uses if it successfully establishes an interrupt.

After the interrupt is fired, the driver should clear or mask the interrupt source before it returns from the interrupt handler function.

Specifying the memory for Direct Memory Access

Direct Memory Access (DMA) improves performance by transferring data without involving the CPU. A DMA transaction can transfer data between a device and memory, a device and another device, or memory and memory. You specify the memory for DMA transactions using the following three tasks:

Creating a memory tag

A DMA memory tag is a machine-dependent opaque type that describes the characteristics of DMA transactions. These tags are organized into a hierarchy. Because each child tag inherits the restrictions of its parent, all devices along the path of DMA transactions contribute to the constraints that apply. For example:
error = bus_dma_tag_create(
                bus_get_dma_tag(sc->dev),       /* Parent tag. */
                1, 0,                           /* alignment, boundary */
                BUS_SPACE_MAXADDR_32BIT,        /* lowaddr */
                BUS_SPACE_MAXADDR,              /* highaddr */
                NULL, NULL,                     /* filter, filterarg */
                MCLBYTES, TX_DMA_MFUF_CHUNK,    /* maxsize, nsegments */
                MCLBYTES,                       /* maxsegsize */
                0,                              /* flags */
                NULL, NULL,                     /* lockfunc, lockarg */
                &sc->txbuf_tag);
                

Allocating memory

Memory can be allocated as a single segment using bus_dmamem_alloc(). For example:
bus_dmamem_alloc(sc->txdesc_tag, (void**)&sc→txdesc_ring,
    BUS_DMA_COHERENT | BUS_DMA_WAITOK | BUS_DMA_ZERO, &sc->txdesc_map);
Then, the initial load operation is required to obtain the bus address of the allocated memory. For example:
bus_dmamap_load(sc->txdesc_tag, sc→txdesc_map, sc->txdesc_ring,
    TX_DESC_SIZE, sample_get1paddr, &sc->txdesc_ring_paddr, 0);
where sample_get1paddr is a pointer to a callback that returns the physical address of that segment.
Alternatively, you can use bus_dmamap_create(), which allocates and initializes a DMA map. For example:
for (idx = 0; idx < TX_MAP_BUFFER_LEN; idx++)
    bus_dmamap_create(sc->txbuf_tag, BUS_DMA_COHERENT, &sc-txbuf_map[idx].map);

Load the memory map of physical addresses

After either the hardware or the CPU writes to the memory they share, synchronization needs to be performed by calling bus_dmamap_sync(). In the following example, bus_dmamap_sync() is called after the hardware has finished writing data to the receive buffer and returns the buffer to io-sock so that it can access that memory:
bus_dmamap_sync(sc->rxdesc_tag, sc->rxdesc_map, BUS_DMASYNC_POSTREAD);

BUS_DMASYNC_POSTREAD specifies synchronization after the device updates host memory and before the CPU accesses host memory. For a description of all the available memory synchronization operation specifiers, see https://www.freebsd.org/cgi/man.cgi?query=bus_dmamap_sync&sektion=9&manpath=FreeBSD+13.0-RELEASE+and+Ports.

DMA shutdown handlers

Any driver that uses DMA needs to implement a shutdown handler method that stops all DMA. It is called when io-sock exits (either terminating normally or crashing). Without a shutdown handler, memory that was reserved for the DMA continues to be modified by the hardware even though the system now considers this freed memory. This situation can corrupt any memory that is then provided by the system at the same memory location.

This function should shut down DMA in the quickest and simplest way possible (i.e., reset the device) and ignore other resources (i.e., memory) because those resources are cleaned up automatically when the process terminates. A more complete shut down is done with detach().

Initialize an ethernet interface structure

An Ethernet interface structure needs to be initialized for each network interface. For example:
struct ifnet *ifp;
...
sc->ifp = ifp = if_alloc(IFT_ETHER);

ifp->if_softc = sc;
if_initname(ifp, device_get_name(dev), device_get_unit(dev));
ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
ifp->if_capabilities = IFCAP_VLAN_MTU;
ifp->if_capenable = ifp->if_capabilities;
ifp->if_transmit = sample_transmit;
ifp->if_qflush = sample_qflush;
ifp->if_ioctl = sample_ioctl;
ifp->if_init = sample_init;
IFQ_SET_MAXLEN(&ifp->if_snd, TX_DESC_COUNT - 1);
ifp->if_snd.ifq_drv_maxlen = TX_DESC_COUNT - 1;
IFQ_SET_READY(&ifp->if_snd);
ifp->if_hdrlen = sizeof(struct ether_vlan_header);
ether_ifattach(ifp, macaddr);
where:
  • sample-transmit is a driver-defined asynchronous transmit callback. The callback receives a pointer to a mbuf structure that the driver should transmit (see Transmitting a packet).
  • sample-ioctl is a driver-defined I/O control handler. Possible I/O control codes are defined in sockio.h.
  • sample-qflush is a driver-defined synchronous callback that should flush packet queues.
  • sample-init is a driver-defined synchronous callback that should initialize and bring up the hardware (e.g., reset the chip and the watchdog timer, and enable the receiver unit). Because it is called on per-interface basis, it receives a pointer to a if_softc structure as a parameter.

Loading PHY-specific handling using media-independent interface (MII)

PHY-specific handling is implemented within a PHY-specific driver, and loaded with mii_attach(). For example:
mii_attach(dev, &sc->miibus, ifp, sample_media_change,
    sample_media_status, BMSR_DEFCAPMASK, phynum,
    MII_OFFSET_ANY, MIIF_FORCEANEG);
The driver has to provide implementations for PHY read, write, and status change callbacks as part of driver module initialization (see Registering the driver module). For example, in the device_method_t example provided above:
static device_method_t sample_methods[] = {
        DEVMETHOD(device_probe,         sample_probe),
        DEVMETHOD(device_attach,        sample_attach),
        DEVMETHOD(device_detach,        sample_detach),
        DEVMETHOD(device_shutdown,      sample_shutdown),

        /* MII Interface */
        DEVMETHOD(miibus_readreg,       sample_miibus_read_reg),
        DEVMETHOD(miibus_writereg,      sample_miibus_write_reg),
        DEVMETHOD(miibus_statchg,       sample_miibus_statchg),

        DEVMETHOD_END
};

Receiving a packet

Packet reception starts with an interrupt from the hardware. After processing, filled received packets are drained from the hardware, new empty packets are passed to the hardware, and the filled received packets are passed to io-sock. For example (from sample_attach()):

(*ifp->if_input)(ifp, m);

Transmitting a packet

The transmit callback is called by io-sock and is registered early in the attach phase. In addition, a pointer to a transmit queue flushing function should be provided to io-sock. For example (from sample_attach()):
ifp->if_transmit = sample_transmit;
ifp->if_qflush = sample_qflush;

The transmit callback should be implemented asynchronously. Generally speaking, the driver first needs to determine if the hardware resources to transmit a packet are available (descriptors, buffers, etc.). If the hardware runs out of transmit resources, it should return from the transmit function and transmit the packet when those resources become available.

Note: The if_start() function pointer is deprecated and should be set to NULL (the default).

Working with mbuf chains

An mbuf (short for memory buffer) is a basic unit of memory management that is used to store network packets and socket buffers. A network packet may span multiple mbufs arranged into a mbuf chain (linked list), which allows adding or trimming network headers with little overhead.

To avoid compatibility issues with future versions, QNX recommends that you don't modify mbuf internals when you develop an io-sock driver. However, it is useful to understand the general structure of an mbuf, which is defined in include/devs/sys/mbuf.h.

The following example allocates an mbuf:
struct mbuf *m;

m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR);
(For flag definitions, see https://www.freebsd.org/cgi/man.cgi?query=mbuf&sektion=9&manpath=FreeBSD+13.0-RELEASE+and+Ports or mbuf.h).

The bus_dmamap_load_mbuf_sg() function allows you to map mbuf chains for DMA transfers. In the following example, an mbuf chain is received in a transfer callback and mapped into a DMA memory transaction for the hardware:

bus_dmamap_load_mbuf_sg(sc->txbuf_tag,
                sc->txbuf_map[sc->txbuf_idx_head].map, m0, seg, &nsegs, 0);

The seg argument specifies a scatter/gather segment array that the caller provides and the function fills in. The nsegs argument is returned with the number of segments filled in.

If the bus_dmamap_load_mbuf_sg() call above fails, you can “collapse” (i.e., defragment) the mbuf chain into a smaller number of segments and try again. For example:

m_collapse(m0, M_NOWAIT, TX_DMA_MFUF_CHUNK);

If the hardware does not support scatter/gather addressing, you can “collapse” the mbuf chain to a contiguous buffer. This method is slower. For example:

m_defrag(m0, M_NOWAIT);

Modifying the makefile

Add include devs/devs.mk to the end of common.mk, as shown in the following example common.mk:

ifndef QCONFIG
QCONFIG=qconfig.mk
endif
include $(QCONFIG)

define PINFO
PINFO DESCRIPTION=Sample io-sock driver
endef

include devs/devs.mk

Loading an io-sock driver

After you locate a *.dtb file for your platform, there are two ways you can then load the driver:

For more information, see Starting io-sock and driver management.”