Designing with multiprocessing in mind
This section contains some general tips on how to design programs so that they can scale to N processors.
Use the multicore primitives
Don't assume that your program will run only on one processor. This means staying away from the FIFO synchronization trick mentioned above. Also, you should use the multicore-aware InterruptLock() and InterruptUnlock() functions.
Assume that threads really do run concurrently
As mentioned above, it isn't merely a useful programming abstraction
to pretend that
threads run simultaneously; you should design as if they really do.
That way, when you move to a multicore system, you won't have any nasty surprises
(but you can use BMP if you have problems and don't want to modify the code).
Break the problem down
do_graphics ()
{
int x;
for (x = 0; x < XRESOLUTION; x++) {
do_one_line (x);
}
}
In the above example, we're doing ray-tracing. We've looked at the problem and decided that the function do_one_line() only generates output to the screen—it doesn't rely on the results from any other invocation of do_one_line().
To make optimal use of a multicore system, you would start multiple threads, each running on one processor.
The question then becomes how many threads to start. Obviously, starting XRESOLUTION threads (where XRESOLUTION is far greater than the number of processors, perhaps 1024 to 4) isn't a particularly good idea—you're creating a lot of threads, all of which will consume stack resources and kernel resources as they compete for the limited pool of CPUs.
#include <sys/syspage.h>
int num_x_per_cpu;
do_graphics ()
{
int num_cpus;
int i;
pthread_t *tids;
// figure out how many CPUs there are...
num_cpus = _syspage_ptr -> num_cpu;
// allocate storage for the thread IDs
tids = malloc (num_cpus * sizeof (pthread_t));
// figure out how many X lines each CPU can do
num_x_per_cpu = XRESOLUTION / num_cpus;
// start up one thread per CPU, passing it the ID
for (i = 0; i < num_cpus; i++) {
pthread_create (&tids[i], NULL, do_lines, (void *) i);
}
// now all the "do_lines" are off running on the processors
// we need to wait for their termination
for (i = 0; i < num_cpus; i++) {
pthread_join (tids[i], NULL);
}
// now they are all done
}
void *
do_lines (void *arg)
{
int cpunum = (int) arg; // convert void * to an integer
int x;
for (x = cpunum * num_x_per_cpu; x < (cpunum + 1) *
num_x_per_cpu; x++) { do_line (x);
}
}
The above approach lets the maximum number of threads run simultaneously on the multicore system. There's no point creating more threads than there are CPUs, because they'll simply compete with each other for CPU time.
Note that in this example, we didn't specify which processor to run each thread on. We don't need to in this case, because the READY thread with the highest priority always runs on the next available processor. The threads will tend to run on different processors (depending on what else is running in the system). You typically use the same priority for all the worker threads if they're doing similar work.
An alternative approach is to use a semaphore. You could preload the semaphore with the count of available CPUs. Then, you create threads whenever the semaphore indicates that a CPU is available. This is conceptually simpler, but involves the overhead of creating and destroying threads for each iteration.