Joining
The simplest method of synchronization is to join the threads as they terminate. Joining really means waiting for termination.
#include <pthread.h>
int pthread_join (pthread_t thread, void **value_ptr);
To use pthread_join(), you pass it the thread ID of the thread that you wish to join, and an optional value_ptr, which can be used to store the termination return value from the joined thread. (You can pass in a NULL if you aren't interested in this value—we're not, in this case.)
int num_lines_per_cpu, num_cpus;
int main (int argc, char **argv)
{
int cpu;
pthread_t *thread_ids;
... // perform initializations
thread_ids = malloc (sizeof (pthread_t) * num_cpus);
num_lines_per_cpu = num_x_lines / num_cpus;
for (cpu = 0; cpu < num_cpus; cpu++) {
pthread_create (&thread_ids [cpu], NULL,
do_one_batch, (void *) cpu);
}
// synchronize to termination of all threads
for (cpu = 0; cpu < num_cpus; cpu++) {
pthread_join (thread_ids [cpu], NULL);
}
... // display results
}
You'll notice that this time we passed the first argument to pthread_create()
as a pointer to a pthread_t.
This is where the thread ID of the newly created thread gets stored.
After the first for
loop finishes, we have num_cpus
threads running, plus the thread that's running main().
We're not too concerned about the main() thread consuming all our
CPU; it's going to spend its time waiting.
The waiting is accomplished by doing a pthread_join() to each of our
threads in turn.
First, we wait for thread_ids [0] to finish.
When it completes, the pthread_join() will unblock.
The next iteration of the for
loop will cause us to wait for
thread_ids [1] to finish, and so on, for all num_cpus threads.
A common question that arises at this point is, What if the threads
finish in the reverse order?
In other words, what if there are 4 CPUs, and, for whatever reason, the
thread running on the last CPU (CPU 3) finishes first, and then the
thread running on CPU 2 finishes next, and so on?
Well, the beauty of this scheme is that nothing bad happens.
The first thing that's going to happen is that the pthread_join()
will block on thread_ids [0].
Meanwhile, thread_ids [3] finishes.
This has absolutely no impact on the main() thread, which is still
waiting for the first thread to finish.
Then thread_ids [2] finishes.
Still no impact.
And so on, until finally thread_ids [0] finishes, at
which point, the pthread_join() unblocks, and we immediately
proceed to the next iteration of the for
loop.
The second iteration of the for
loop executes a pthread_join()
on thread_ids [1], which will not block—it returns immediately.
Why?
Because the thread identified by thread_ids [1] is already finished.
Therefore, our for
loop will whip
through the other threads, and then exit.
At that point, we know that we've synched up with all the computational threads,
so we can now display the results.