Processor affinity, clusters, runmasks, and inherit masks
A question that often arises in a multicore environment is: Can I make it so that one
processor handles the GUI, another handles the database, and the other two handle
the realtime functions?
The answer is: Yes, absolutely.
This is done through processor affinity, the ability to associate a thread with a particular processor or cluster of processors.
A cluster is a group of associated processors. All clusters on a system are defined by the startup code. Some clusters are always defined, including one representing all the processors on the system, and a set of clusters where each one represents a different processor. So on a system with N processors, there are always (at least) N+1 clusters.
Custom clusters may be defined by the startup program, either directly in the program code or configured by the -c command option. This allows processors to be grouped based on hardware properties. For example, on a machine with heterogeneous processors, the startup program might define one cluster for the faster processors (or CPUs) and another for the slower ones. A compute-bound thread could then set its affinity mask (i.e., its association with a particular cluster) to the faster cluster.
pidin syspage=cluster
For details about the naming of clusters, refer to the -c option in the startup-* entry in the Utilities Reference.
- No processor may be in more than eight clusters
- Each thread can be associated with only one cluster
- A thread can run only on processors that belong to the cluster indicated by its runmask
Runmasks can be set explicitly through functions, as explained in the next section,
or implicitly by causing any thread that receives a message to inherit the sender's runmask,
as explained in Client runmask inheritance
below.
However, setting the runmask affects the processor affinity for only the calling thread,
not further threads that it may create. To change the processor affinity of any further threads,
you use an inherit mask, as explained in
Setting inherit masks
below.
Setting runmasks
When a thread starts up, its runmask depends on whether it was created as an additional thread in
an existing process, or as the first (main) thread of a new process.
In the first case, the runmask is the inherit mask of the creator thread
(i.e., the thread that called a function such as thrd_create() or
pthread_create()).
In the second case, the runmask can be defined by attribute-setting functions (refer to
Runmasks and inherit masks used in new processes
below).
if (ThreadCtl( _NTO_TCTL_RUNMASK, (void *)my_runmask ) == -1) {
/* An error occurred. */
}
The runmask is a bitmap; each bit position indicates a particular processor.
For example, the setting 0x05
(binary 00000101
)
allows the thread to run on processors 0 (the 0x01
bit) and 2 (the 0x04
bit).
If this runmask matches a cluster, the call succeeds; otherwise it fails with EINVAL.
Setting inherit masks
struct _thread_runmask {
int size;
/* unsigned runmask[size]; */
/* unsigned inherit_mask[size]; */
};
The last two fields are commented out because the number of elements in the runmask and inherit_mask arrays depends on the number of processors on your multicore system. For example, on a system with 48 processors, the runmask and inherit_mask arrays contain two elements each.
- RMSK_SIZE(num_cpu)
- Determine how many unsigned integers are needed for the masks in the _thread_runmask structure. You must pass the number of processors, which is found in the system page, to this macro.
- RMSK_SET(cpu, p)
- Set the bit for cpu in the mask pointed to by p.
- RMSK_CLR(cpu, p)
- Clear the bit for cpu in the mask pointed to by p.
- RMSK_ISSET(cpu, p)
- Determine if the bit for cpu is set in the mask pointed to by p.
The CPUs (processors) are numbered from 0. These macros work with runmasks of any length.
Effects of setting zero and non-zero values
If you set the runmask member to a nonzero value and it matches an existing cluster, ThreadCtl() sets the runmask of the calling thread to this value. If the value doesn't match a cluster, the call fails with EINVAL. If you set this member to zero, the runmask of the calling thread isn't changed.
If you set the inherit_mask member to a nonzero value that matches an existing cluster, ThreadCtl() sets the calling thread's inherit mask to this value. If the value doesn't match a cluster, the call fails with EINVAL. If you set this member to zero, the calling thread's inherit mask isn't changed.
If the calling thread goes on to create any new threads, then the runmask and inherit_mask fields for the new threads are set to the inherit mask of the creator thread. This occurs whether the threads are created directly (e.g., by calling thrd_create() or pthread_create()) or indirectly as part of a new process (e.g., by calling fork() or posix_spawn()). In the case of exec(), these fields are unchanged.
Code sample
unsigned num_elements = 0;
int *rsizep, masksize_bytes, size;
unsigned *rmaskp, *imaskp;
void *my_data;
/* Determine the number of array elements required to hold the masks,
* based on the number of processors on the system. */
num_elements = RMSK_SIZE(_syspage_ptr->num_cpu);
/* Determine the size of the runmask, in bytes. */
masksize_bytes = num_elements * sizeof(unsigned);
/* Allocate memory for the data structure that we'll pass to ThreadCtl().
* We need space for an integer (the number of elements in each mask array)
* and the two masks (runmask and inherit mask). */
size = sizeof(int) + 2 * masksize_bytes;
if ((my_data = malloc(size)) == NULL) {
/* Not enough memory. */
…
} else {
memset(my_data, 0x00, size);
/* Set up pointers to the "members" of the structure. */
rsizep = (int *)my_data;
rmaskp = rsizep + 1;
imaskp = rmaskp + num_elements;
/* Set the size. */
*rsizep = num_elements;
/* Set the runmask. Call this macro once for each processor that the current
thread can run on. */
RMSK_SET(cpu1, rmaskp);
/* Set the inherit mask. Call this macro once for each processor that
you want any threads created by the current thread to be able to run on. */
RMSK_SET(cpu1, imaskp);
if ( ThreadCtl(_NTO_TCTL_RUNMASK_GET_AND_SET_INHERIT, my_data) == -1) {
/* Something went wrong. */
…
}
}
Runmasks and inherit masks used in new processes
If you're starting a new process, you can specify a runmask by calling posix_spawnattr_setxflags() to set the POSIX_SPAWN_EXPLICIT_CPU extended flag, then posix_spawnattr_setrunmask() to set the runmask for a new process, and then posix_spawn() to create the process.
If you specify the runmask attribute, then when the new process starts up, the main thread's runmask and inherit mask are given that runmask value. If you don't define a runmask, then when the new process starts up, the main thread's runmask and inherit mask are set to the inherit mask of the creator thread (i.e., the one that called posix_spawn()). In both cases, if no further changes are made in the new process (i.e., none of the mask-setting functions described in previous sections are called), then this results in normal inherit mask behavior, meaning all threads in the new process are given the runmask and inherit mask of the main thread.
Client runmask inheritance
When creating a channel, you can enable the _NTO_CHF_INHERIT_RUNMASK flag to cause any thread that receives a message on this channel to inherit the sender's runmask. With this client runmask inheritance, the server's runmask has no effect on the server's execution behavior when it receives messages. This is because the server is either RECEIVE-blocked and, hence, not running, or running with the runmask of the client whose message it's handling.
If this flag is set, then when a server thread receives a pulse on the channel, the thread's runmask is set to the thread's inherit mask.
For more information, refer to the _NTO_CHF_INHERIT_RUNMASK description in the ChannelCreate() entry in the C Library Reference.