Programming Overview

Process model
Processes and threads
Priorities and scheduling
Scheduling policies
Why threads?
Talking to hardware

Process model

The Neutrino OS architecture consists of the microkernel and some number of cooperating processes. These processes communicate with each other via various forms of interprocess communication (IPC). Message passing is the primary form of IPC in Neutrino.

Figure showing software bus with IPC

The Neutrino architecture acts as a kind of “software bus” that lets you dynamically plug in/out OS modules.

The above diagram shows the graphics driver sending a message to the font manager when it wants the bitmap for a font. The font manager responds with the bitmap.

The Photon microGUI windowing system is also made up of a number of cooperating processes: the GUI manager (Photon), a font manager (phfontFA), the graphics driver manager (io-graphics), and others. If the graphics driver needs to draw some text, it sends a message to the font manager asking for bitmaps in the desired font for the text to be drawn in. The font manager responds with the requested bitmaps, and the graphics driver then draws the bitmaps on the screen.

An application as a set of processes

This idea of using a set of cooperating processes isn't limited to the OS “system processes.” Your applications should be written in exactly the same way. You might have some driver process that gathers data from some hardware and then needs to pass that data on to other processes, which then act on that data.

Let's use the example of an application that's monitoring the level of water in a reservoir. Should the water level rise too high, then you'll want to alert an operator as well as open some flow-control valve.

In terms of hardware, you'll have some water-level sensor tied to an I/O board in a computer. If the sensor detects some water, it will cause the I/O board to generate an interrupt.

The software consists of a driver process that talks to the I/O board and contains an interrupt handler to deal with the board's interrupt. You'll also have a GUI process that will display an alarm window when told to do so by the driver, and finally, another driver process that will open/close the flow-control valve.

Why break this application into multiple processes? Why not have everything done in one process? There are several reasons:

Each process lives in its own protected memory space. If there's a bug such that a pointer has a value that isn't valid for the process, then when the pointer is next used, the hardware will generate a fault, which the kernel handles (the kernel will set the SIGSEGV signal on the process).
This approach has two benefits. The first is that a stray pointer won't cause one process to overwrite the memory of another process. The implications are that one process can go bad while other processes keep running.
The second benefit is that the fault will occur precisely when the pointer is used, not when it's overwriting some other process's memory. If a pointer were allowed to overwrite another process's memory, then the problem wouldn't manifest itself until later and would therefore be much harder to debug.
It's very easy to add or remove processes from an application as need be. This implies that applications can be made scalable — adding new features is simply a matter of adding processes.
Processes can be started and stopped on the fly, which comes in handy for dynamic upgrading or simply for stopping an offending process.
Processing can be easily distributed across multiple processors in a networked environment.
The code for a process is much simpler if it concentrates on doing a single job. For example, a single process that acts as a driver, a GUI front-end, and a data logger would be fairly complex to build and maintain. This complexity would increase the chances of a bug, and any such bug would likely affect all the activities being done by the process.
Different programmers can work on different processes without fear of overwriting each other's work.

Processes and threads

Different operating systems often have different meanings for terms such as “process,” “thread,” “task,” “program,” and so on.

Some definitions

In the Neutrino OS, an application typically means a collection of processes, though sometimes, especially for UI pieces, it can mean just one process. A program is the file generated as the result of a compile and link operation; when you run a program on a QNX Neutrino target, this creates a process; a process is, basically, a particular instance of a running program.

A thread is a single flow of execution or control. At the lowest level, this equates to the program counter or instruction pointer register advancing through some machine instructions. Each thread has its own current value for this register.

A process is a collection of resources shared by one or more threads. These resources include at least the following:

all code and data in the process, including all nonlocal (nonstack) variables
signal handlers (although you typically have one thread that handles signals, and you block them in all the other threads)

It isn't safe to use floating-point operations in signal handlers.
signal ignore state
file descriptors (fds), and their relatives, such as sockets, mqueue descriptors, etc.
channels
side-channel connections

Along with ownership of these resources goes cleanup. When a process terminates, all process-owned resources are cleaned up, including terminating all threads, releasing all memory, closing all file descriptors, etc. This happens for both normal termination (e.g., calling exit() ) and abnormal termination (e.g., dying due to accessing invalid memory).

Threads don't share such things as stack, values for the various registers, SMP thread-affinity mask, and a few other things.

Two threads residing in two different processes don't share very much. About the only thing they do share is the CPU, and maybe not even that if they're running on a multicore processor. Threads in different processes can share memory, but this takes a little setup (see shm_open() in the Library Reference for an example).

When you run a program (creating a process), you're automatically running a thread. This thread is called the “main” thread, since the first programmer-provided function that runs in a C program is main(). The main thread can then create additional threads if need be.

A few things are special about the main thread. One is that if it returns normally, the code it returns to calls exit(). Calling exit() terminates the process, meaning that all threads in the process are terminated. So when you return normally from the main thread, the process is terminated. When other threads in the process return normally, the code they return to calls pthread_exit(), which terminates just that thread.

Another special thing about the main thread is that if it terminates in such a manner that the process is still around (e.g. it calls pthread_exit() and there are other threads in the process), then the memory for the main thread's stack is not freed up. This is because the command-line arguments are on that stack and other threads may need them. If any other thread terminates, then that thread's stack is freed.

Priorities and scheduling

Although there's a good discussion of priorities and scheduling policies in the System Architecture manual (see “Thread scheduling” in the chapter on the microkernel), it will help to go over that topic here in the context of a programmer's guide.

Neutrino provides a priority-driven preemptive architecture. Priority-driven means that each thread can be given a priority and will be able to access the CPU based on that priority. If a low-priority thread and a high-priority thread both want to run, then the high-priority thread will be the one that gets to run.

Preemptive means that if a low-priority thread is currently running and then a high-priority thread suddenly wants to run, then the high-priority thread will take over the CPU and run, thereby preempting the low-priority thread.

On a multicore (SMP) system, QNX Neutrino still runs the highest-priority runnable thread on one of the available cores. Additional cores run other threads in the system, although not necessarily the next highest-priority thread or threads.

Priority range

Threads can have a scheduling priority ranging from 1 to 255 (the highest priority), independent of the scheduling policy. A thread inherits the priority of its parent thread by default. Non-root threads can have a priority ranging from 1 to 63 (by default); root threads (i.e. those with an effective uid of 0) are allowed to set priorities above 63.

The process manager has a set of special idle threads (one per available CPU core) that have priority 0 and are always ready to run. A CPU core is considered to be idle when the idle thread is scheduled to run on that core. At any instant, a core is either idle or busy; only by averaging over time can a CPU be said to be some percentage busy (e.g., 75% CPU usage).

A thread has both a real priority and an effective priority, and is scheduled in accordance with its effective priority. The thread itself can change both its real and effective priority together, but the effective priority may change because of priority inheritance or the scheduling policy. Normally, the effective priority is the same as the real priority.

Interrupt handlers are of higher priority than any thread, but they're not scheduled in the same way as threads. If an interrupt occurs, then:

Whatever thread was running loses the CPU on the core that's handling the interrupt. On multi-core machines this may vary due to hardware configuration and BSP setup.
The kernel runs some hardware-specific code supplied by the BSP to identify the interrupt.
The kernel calls the appropriate interrupt handler or handlers.
The kernel runs some hardware-specific code supplied by the BSP to do end-of-interrupt processing.

Figure showing thread priorities

Thread priorities range from 0 (lowest) to 255 (highest).

Although interrupt handlers aren't scheduled in the same way as threads, they're considered to be of a higher priority because an interrupt handler will preempt any running thread.

BLOCKED and READY states

To fully understand how scheduling works, you must first understand what it means when we say a thread is BLOCKED and when a thread is in the READY state. You must also understand a particular data structure in the kernel called the ready queue.

A thread is BLOCKED if it doesn't want the CPU, which might happen for several reasons, such as:

The thread is sleeping.
The thread is waiting for a message from another thread.
The thread is waiting on a mutex that some other thread owns.

When designing an application, you always try to arrange it so that if any thread is waiting for something, make sure it isn't spinning in a loop using up the CPU. In general, try to avoid polling. If you do have to poll, then you should try to sleep for some period between polls, thereby giving lower-priority threads the CPU should they want it.

For each type of blocking there is a blocking state. We'll discuss these states briefly as they come up. Examples of some blocking states are REPLY-blocked, RECEIVE-blocked, MUTEX-blocked, INTERRUPT-blocked, and NANOSLEEP-blocked. There's also a STOPPED state, when a thread has been suspended by a SIGSTOP signal and is waiting for a SIGCONT.

A thread is READY if it wants a CPU but something else currently has it. If a thread currently has a CPU, then it's in the RUNNING state. Simply put, a thread that's either READY or RUNNING isn't blocked.

The ready queue

The ready queue is a simplified version of a kernel data structure consisting of a queue with one entry per priority. Each entry in turn consists of another queue of the threads that are READY at the priority. Any threads that aren't READY aren't in any of the queues — but they will be when they become READY.

Let's first consider the ready queue on a single-core system.

Figure showing READY processes

The ready queue for five threads on a single-core system.

In the above diagram, threads B–F are READY. Thread A is currently running. All other threads (G–Z) are BLOCKED. Threads A, B, and C are at the highest priority, so they'll share the processor based on the running thread's scheduling policy.

The active thread is the one in the RUNNING state.

Every thread is assigned a priority. The scheduler selects the next thread to run by looking at the priority assigned to every thread in the READY state (i.e. capable of using the CPU). The thread with the highest priority that's at the head of its priority's queue is selected to run. In the above diagram, thread A was formerly at the head of priority 10's queue, so thread A was moved to the RUNNING state.

On a multicore system, this becomes a lot more complex, with issues such as core-affinity optimizations and CPU masks for various threads, making the scheduling decisions more complicated. But the ready queue concept carries over as the primary driver of scheduling decisions for multicore systems as well.

Rescheduling a running thread

The microkernel makes scheduling decisions whenever it's is entered as the result of a kernel call, exception, or hardware interrupt. A scheduling decision is made whenever the execution state of any thread changes — it doesn't matter which processes the threads might reside within. Threads are scheduled globally across all processes.

Normally, the running thread continues to run, but the scheduler performs a context switch from one thread to another whenever the running thread:

Blocks

The running thread blocks when it must wait for some event to occur (response to an IPC request, wait on a mutex, etc.). The blocked thread is removed from the running array, and another thread is selected to run:

In the single-core (simple) case, the highest-priority ready thread that's at the head of its priority's queue is chosen and allowed to run.
In a multicore system, the scheduler has some flexibility in deciding exactly how to schedule the other threads, with an eye towards optimizing cache usage and minimizing thread migration. This could mean that some processors will be running lower-priority threads while a higher-priority thread is waiting to run on the processor it last ran on. The next time a processor that's running a lower-priority thread makes a scheduling decision, it will choose the higher-priority one.
In any case, the realtime scheduling rules that were in place on a uniprocessor system are guaranteed to be upheld on a multicore system.

When the blocked thread is subsequently unblocked, it's usually placed on the end of the ready queue for its priority level.

Is preempted

The running thread is preempted when a higher-priority thread is placed on the ready queue (it becomes READY as the result of its block condition being resolved). The preempted thread is moved to the start of the ready queue for that priority, and the higher-priority thread runs. When it's time for a thread at that priority level to run again, that thread resumes execution — a preempted thread will not lose its place in the queue for its priority level.

Yields

The running thread voluntarily yields the processor (e.g, via sched_yield()) and is placed on the end of the ready queue for that priority. The highest-priority thread then runs (which may still be the thread that just yielded).

Scheduling policies

To meet the needs of various applications, Neutrino provides these scheduling policies:

FIFO scheduling — SCHED_FIFO
Round-robin scheduling — SCHED_RR
Sporadic scheduling — SCHED_SPORADIC

Another scheduling policy (called “other” — SCHED_OTHER) behaves in the same way as round-robin. We don't recommend using the “other” scheduling policy, because its behavior may change in the future.

Each thread in the system may run using any method. Scheduling methods are effective on a per-thread basis, not on a global basis for all threads and processes on a node.

Remember that these scheduling policies apply only when two or more threads that share the same priority are READY (i.e. the threads are directly competing with each other). If a higher-priority thread becomes READY, it immediately preempts all lower-priority threads.

In the following diagram, three threads of equal priority are READY. If Thread A blocks, Thread B will run.

Figure showing scheduling of equal-priority threads

Thread A blocks; Thread B runs.

A thread can call pthread_attr_setschedparam() or pthread_attr_setschedpolicy() to set the scheduling parameters and policy to use for any threads that it creates.

Although a thread gets its scheduling policy from its parent thread (usually by inheritance), the thread can call pthread_setschedparam() to request to change the algorithm and priority applied by the kernel, or pthread_setschedprio() to change just the priority. A thread can get information about its current algorithm and policy by calling pthread_getschedparam(). Both these functions take a thread ID as their first argument; you can call pthread_self() to get the calling thread's ID. For example:

struct sched_param param;
int policy, retcode;

/* Get the scheduling parameters. */

retcode = pthread_getschedparam( pthread_self(), &policy, &param);
if (retcode != EOK) {
    printf ("pthread_getschedparam: %s.\n", strerror (retcode));
    return EXIT_FAILURE;
}

printf ("The assigned priority is %d, and the current priority is %d.\n",
        param.sched_priority, param.sched_curpriority);

/* Increase the priority. */

param.sched_priority++;

/* Set the scheduling algorithm to FIFO */

policy = SCHED_FIFO;

retcode = pthread_setschedparam( pthread_self(), policy, &param);
if (retcode != EOK) {
    printf ("pthread_setschedparam: %s.\n", strerror (retcode));
    return EXIT_FAILURE;
}

When you get the scheduling parameters, the sched_priority member of the sched_param structure is set to the assigned priority, and the sched_curpriority member is set to the priority that the thread is currently running at (which could be different because of priority inheritance).

Our libraries provide a number of ways to get and set scheduling parameters:

pthread_getschedparam(), pthread_setschedparam(), pthread_setschedprio()

These are your best choice for portability.

SchedGet(), SchedSet()

You can use these to get and set the scheduling priority and policy, but they aren't portable because they're kernel calls.

sched_getparam(), sched_setparam(), sched_getscheduler(), and sched_setscheduler(),

These functions are intended for use in single-threaded processes.

Our implementations of these functions don't conform completely to POSIX. In multi-threaded applications, they get or set the parameters for thread 1 in the process pid, or for the calling thread if pid is 0. If you depend on this behavior, your code won't be portable. POSIX 1003.1 says these functions should return -1 and set errno to EPERM in a multi-threaded application.

getprio(), setprio()

QNX Neutrino supports these functions only for compatibility with QNX 4 programs; don't use them in new programs.

getpriority(), setpriority()

Deprecated; don't use these functions.

FIFO scheduling

In FIFO (SCHED_FIFO) scheduling, a thread selected to run continues executing until it:

voluntarily relinquishes control (e.g. it blocks)
is preempted by a higher-priority thread

Figure showing FIFO scheduling

FIFO scheduling. Thread A runs until it blocks.

Round-robin scheduling

In round-robin (SCHED_RR) scheduling, a thread selected to run continues executing until it:

voluntarily relinquishes control
is preempted by a higher-priority thread
consumes its timeslice

Figure showing round-robin scheduling

Round-robin scheduling. Thread A ran until it consumed its timeslice; the next READY thread (Thread B) now runs.

A timeslice is the unit of time assigned to every process. Once it consumes its timeslice, a thread is put at the end of its queue in the ready queue and the next READY thread at the same priority level is given control.

A timeslice is calculated as:

4 × ticksize

If your processor speed is greater than 40 MHz, then the ticksize defaults to 1 millisecond; otherwise, it defaults to 10 milliseconds. So, the default timeslice is either 4 milliseconds (the default for most CPUs) or 40 milliseconds (the default for slower hardware).

Apart from time-slicing, the round-robin scheduling method is identical to FIFO scheduling.

Sporadic scheduling

The sporadic (SCHED_SPORADIC) scheduling policy is generally used to provide a capped limit on the execution time of a thread within a given period of time. This behavior is essential when Rate Monotonic Analysis (RMA) is being performed on a system that services both periodic and aperiodic events. Essentially, this algorithm allows a thread to service aperiodic events without jeopardizing the hard deadlines of other threads or processes in the system.

Under sporadic scheduling, a thread's priority can oscillate dynamically between a foreground or normal priority and a background or low priority. For more information, see “Sporadic scheduling” in the QNX Neutrino Microkernel chapter of the System Architecture guide.

Why threads?

Now that we know more about priorities, we can talk about why you might want to use threads. We saw many good reasons for breaking things up into separate processes, but what's the purpose of a multithreaded process?

The basic purpose of having multiple threads is so you can start something new before finishing something old, that is, something that's already in process.

Let's take the example of a driver. A driver typically has two obligations: one is to talk to the hardware and the other is to talk to other processes. Generally, talking to the hardware is more time-critical than talking to other processes. When an interrupt comes in from the hardware, it needs to be serviced in a relatively small window of time — the driver shouldn't be busy at that moment talking to another process.

One way of fixing this problem is to choose a way of talking to other processes where this situation simply won't arise (e.g. don't send messages to another process such that you have to wait for acknowledgment, don't do any time-consuming processing on behalf of other processes, etc.).

Another way is to use two threads: a higher-priority thread that deals with the hardware and a lower-priority thread that talks to other processes. The lower-priority thread can be talking away to other processes without affecting the time-critical job at all, because when the interrupt occurs, the higher-priority thread will preempt the lower-priority thread and then handle the interrupt.

Although this approach does add the complication of controlling access to any common data structures between the two threads, Neutrino provides synchronization tools such as mutexes (mutual exclusion locks), which can ensure exclusive access to any data shared between threads.

Talking to hardware

As mentioned earlier in this chapter, writing device drivers is like writing any other program. Only core OS services reside in kernel address space; everything else, including device drivers, resides in process or user address space. This means that a device driver has all the services that are available to regular applications.

Many models are available to driver developers under QNX Neutrino. Generally, the type of driver you're writing will determine the driver model you'll follow. For example, graphics drivers follow one particular model, which allows them to plug into the Screen graphics subsystem, network drivers follow a different model, and so on.

On the other hand, depending on the type of device you're targeting, it may not make sense to follow any existing driver model at all.

This section provides an overview of accessing and controlling device-level hardware in general.

Probing the hardware

If you're targeting a closed embedded system with a fixed set of hardware, your driver may be able to assume that the hardware it's going to control is present in the system and is configured in a certain way. But if you're targeting more generic systems, you want to first determine whether the device is present. Then you need to figure out how the device is configured (e.g., what memory ranges and interrupt level belong to the device).

For some devices, there's a standard mechanism for determining configuration. Devices that interface to the PCI bus have such a mechanism; each PCI device has a unique vendor and device ID assigned to it. The following piece of code demonstrates how, for a given PCI device, to determine whether the device is present in the system and what resources have been assigned to it:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <hw/pci.h>

int main() {

   struct pci_dev_info info;
   void *hdl;
   int i;

   memset(&info, 0, sizeof (info));

   if (pci_attach(0) < 0) {
      perror("pci_attach");
      exit(EXIT_FAILURE);
   }

   /*
    * Fill in the Vendor and Device ID for a 3dfx VooDoo3
    * graphics adapter.
    */

   info.VendorId = 0x121a;
   info.DeviceId = 5;

   if ((hdl = pci_attach_device(0, PCI_SHARE|PCI_INIT_ALL, 0, &info)) == 0) {
      perror("pci_attach_device");
      exit(EXIT_FAILURE);
   }

   for (i = 0; i < 6; i++) {
      if (info.BaseAddressSize[i] > 0)
         printf("Aperture %d: "
                "Base 0x%llx Length %d bytes Type %s\n", i,
                PCI_IS_MEM(info.CpuBaseAddress[i]) ? PCI_MEM_ADDR(info.CpuBaseAddress[i]) :
                   PCI_IO_ADDR(info.CpuBaseAddress[i]),
                info.BaseAddressSize[i],
                PCI_IS_MEM(info.CpuBaseAddress[i]) ? "MEM" : "IO");
   }

   printf("IRQ 0x%x\n", info.Irq);
   pci_detach_device(hdl);

   return EXIT_SUCCESS;
}

For more information, see the pci*() functions in the C Library Reference.

Different buses have different mechanisms for determining which resources have been assigned to the device. On some buses, such as the ISA bus, there's no such mechanism. How do you determine whether an ISA device is present in the system and how it's configured? The answer is card-dependent (with the exception of PnP ISA devices).

Accessing the hardware

Once you've determined what resources have been assigned to the device, you're now ready to start communicating with the hardware. How you do this depends on the resources.

I/O resources

Before a thread may attempt any port I/O operations, it must be running at the correct privilege level; otherwise you'll get a protection fault. To get I/O privileges, call ThreadCtl():

ThreadCtl(_NTO_TCTL_IO, 0);

Next you need to map the I/O base address (one of the addresses returned in the CpuBaseAddress array of the pci_dev_info structure above) into your process's address space, using mmap_device_io(). For example:

uintptr_t iobase;

iobase = mmap_device_io(info.BaseAddressSize[2], info.CpuBaseAddress[2]);

Now you may perform port I/O, using functions such as in8(), in32(), out8(), and so on, adding the register index to iobase to address a specific register:

out32(iobase + SHUTDOWN_REGISTER, 0xdeadbeef);

Note that the call to mmap_device_io() isn't necessary on x86 systems, but it's still a good idea to include it for the sake of portability. In the case of some legacy x86 hardware, it may not make sense to call it; for example, a VGA-compatible device has I/O ports at well-known, fixed locations (e.g., 0x3c0, 0x3d4, 0x3d5) with no concept of an I/O base as such. You could access the VGA controller, for example, as follows:

out8(0x3d4, 0x11);
out8(0x3d5, in8(0x3d5) & ~0x80);

Memory-mapped resources

For some devices, registers are accessed via regular memory operations. To gain access to a device's registers, you need to map them to a pointer in the driver's virtual address space by calling mmap_device_memory(). For example:

volatile uint32_t *regbase; /* device has 32-bit registers */

regbase = mmap_device_memory(NULL, info.BaseAddressSize[0],
             PROT_READ|PROT_WRITE|PROT_NOCACHE, 0, info.CpuBaseAddress[0]);

Note the following:

We declared regbase with the volatile keyword to prevent the compiler from optimizing out accesses to the device's registers.
We specified the PROT_NOCACHE flag to ensure that the CPU won't defer or omit read/write cycles to the device's registers, nor will it deliver the reads or writes to the registers in a different order than the driver issued them.

Now you may access the device's memory using the regbase pointer. For example:

regbase[SHUTDOWN_REGISTER] = 0xdeadbeef;

IRQs

You can attach an interrupt handler to the device by calling either InterruptAttach() or InterruptAttachEvent(). For example:

InterruptAttach(_NTO_INTR_CLASS_EXTERNAL | info.Irq,
                handler, NULL, 0, _NTO_INTR_FLAGS_END);

The driver must successfully call ThreadCtl(_NTO_TCTL_IO, 0) before it can attach an interrupt.

The essential difference between InterruptAttach() and InterruptAttachEvent() is the way in which the driver is notified that the device has triggered an interrupt:

With InterruptAttach(), the driver's handler function is called directly by the kernel. Since it's running in kernel space, the handler is severely restricted in what it can do:
- From within this handler, it isn't safe to call most of the C library functions.
- If you spend too much time in the handler, other processes and interrupt handlers of a lower or equal priority won't be able to run.
- Doing too much work in the interrupt handler can negatively affect the system's realtime responsiveness.
We recommend that you do the bare minimum within the handler (e.g., acknowledge the interrupt at the hardware level) and then deliver an event to the driver. The driver then completes the rest of the work at process level at the driver's normal priority.
With InterruptAttachEvent(), you specify an event to be delivered to the driver when the device triggers an interrupt. The driver then handles the interrupt at the process level.

For more information, see the “Writing an Interrupt Handler” chapter in this guide.

Summary

The modular architecture is apparent throughout the entire system: the Neutrino OS itself consists of a set of cooperating processes, as does an application. And each individual process can comprise several cooperating threads. What “keeps everything together” is the priority-based preemptive scheduling in Neutrino, which ensures that time-critical tasks are dealt with by the right thread or process at the right time.