Thread affinity

One issue that often arises in a multicore environment can be put like this: "Can I make it so that one processor handles the GUI, another handles the database, and the other two handle the realtime functions?"

The answer is: "Yes, absolutely."

This is done through the magic of thread affinity, the ability to associate certain programs (or even threads within programs) with a particular processor or processors.

Thread affinity works like this. When a thread starts up, its affinity mask (or runmask) is set to allow it to run on all processors. This implies that there's no inheritance of the thread affinity mask, so it's up to the thread to use ThreadCtl() with the _NTO_TCTL_RUNMASK control flag to set its runmask:

if (ThreadCtl( _NTO_TCTL_RUNMASK, (void *)my_runmask) == -1) {
    /* An error occurred. */
}

The runmask is simply a bitmap; each bit position indicates a particular processor. For example, the runmask 0x05 (binary 00000101) allows the thread to run on processors 0 (the 0x01 bit) and 2 (the 0x04 bit).

Note: If you use _NTO_TCTL_RUNMASK, the runmask is limited to the size of an int (currently 32 bits). Threads created by the calling thread don't inherit the specified runmask.

If you want to support more processors than will fit in an int, or you want to set the inherit mask, you'll need to use the _NTO_TCTL_RUNMASK_GET_AND_SET_INHERIT command described below.

The <sys/neutrino.h> file defines some macros that you can use to work with a runmask:

RMSK_SET(cpu, p)
Set the bit for cpu in the mask pointed to by p.
RMSK_CLR(cpu, p)
Clear the bit for cpu in the mask pointed to by p.
RMSK_ISSET(cpu, p)
Determine if the bit for cpu is set in the mask pointed to by p.

The CPUs are numbered from 0. These macros work with runmasks of any length.

Bound multiprocessing (BMP) is a variation on SMP that lets you specify which processors a process or thread and its children can run on. To specify this, you use an inherit mask.

To set a thread's inherit mask, you use ThreadCtl() with the _NTO_TCTL_RUNMASK_GET_AND_SET_INHERIT control flag. Conceptually, the structure that you pass with this command is as follows:

struct _thread_runmask {
    int size;
    unsigned runmask[size];
    unsigned inherit_mask[size];
};

If you set the runmask member to a nonzero value, ThreadCtl() sets the runmask of the calling thread to the specified value. If you set the runmask member to zero, the runmask of the calling thread isn't altered.

If you set the inherit_mask member to a nonzero value, ThreadCtl() sets the calling thread's inheritance mask to the specified value(s); if the calling thread creates any children by calling pthread_create(), fork(), spawn(), or exec(), the children inherit this mask. If you set the inherit_mask member to zero, the calling thread's inheritance mask isn't changed.

If you look at the definition of _thread_runmask in <sys/neutrino.h>, you'll see that it's actually declared like this:

struct _thread_runmask {
    int         size;
/*  unsigned    runmask[size];      */
/*  unsigned    inherit_mask[size]; */
};

This is because the number of elements in the runmask and inherit_mask arrays depends on the number of processors in your multicore system. You can use the RMSK_SIZE() macro to determine how many unsigned integers you need for the masks; pass the number of CPUs (found in the system page) to this macro.

Here's a code snippet that shows how to set up the runmask and inherit mask:

unsigned    num_elements = 0;
int         *rsizep, masksize_bytes, size;
unsigned    *rmaskp, *imaskp;
void        *my_data;

/* Determine the number of array elements required to hold
 * the runmasks, based on the number of CPUs in the system. */
num_elements = RMSK_SIZE(_syspage_ptr->num_cpu);

/* Determine the size of the runmask, in bytes. */
masksize_bytes = num_elements * sizeof(unsigned);

/* Allocate memory for the data structure that we'll pass
 * to ThreadCtl(). We need space for an integer (the number
 * of elements in each mask array) and the two masks
 * (runmask and inherit mask). */

size = sizeof(int) + 2 * masksize_bytes;
if ((my_data = malloc(size)) == NULL) {
    /* Not enough memory. */
    …
} else {
    memset(my_data, 0x00, size);

    /* Set up pointers to the "members" of the structure. */
    rsizep = (int *)my_data;
    rmaskp = rsizep + 1;
    imaskp = rmaskp + num_elements;

    /* Set the size. */
    *rsizep = num_elements;

    /* Set the runmask. Call this macro once for each processor
       the thread can run on. */
    RMSK_SET(cpu1, rmaskp);

    /* Set the inherit mask. Call this macro once for each
       processor the thread's children can run on. */
    RMSK_SET(cpu1, imaskp);

    if ( ThreadCtl( _NTO_TCTL_RUNMASK_GET_AND_SET_INHERIT,
                   my_data) == -1) {
        /* Something went wrong. */
        …
    }
}

You can also use the -C and -R options to the on command to launch processes with a runmask (assuming they don't set their runmasks programmatically); for example, use on -C 1 io-pkt-v4 to start io-pkt-v4 and lock all threads to CPU 1. This command sets both the runmask and the inherit mask.

You can also use the same options to the slay command to modify the runmask of a running process or thread. For example, slay -C 0 io-pkt-v4 moves all of io-pkt-v4's threads to run on CPU 0. If you use the -C and -R options, slay sets the runmask; if you also use the -i option, slay also sets the process's or thread's inherit mask to be the same as the runmask.