Now, let's compare the three methods using various categories, and we'll also describe some of the trade-offs.

With system 1, we see the loosest coupling. This has the advantage that each of the three processes can be easily (i.e., via the command line, as opposed to recompile/redesign) replaced with a different module. This follows naturally, because the "unit of modularity" is the entire module itself. System 1 is also the only one that can be distributed among multiple nodes in a QNX Neutrino network. Since the communications pathway is abstracted over some connectioned protocol, it's easy to see that the three processes can be executing on any machine in the network. This may be a very powerful scalability factor for your design—you may need your system to scale up to having hundreds of machines distributed geographically (or in other ways, e.g., for peripheral hardware capability) and communicating with each other.

Once we commit to a shared memory region, however, we lose the ability to distribute over a network. QNX Neutrino doesn't support network-distributed shared memory objects. So in system 2, we've effectively limited ourselves to running all three processes on the same box. We haven't lost the ability to easily remove or change a component, because we still have separate processes that can be controlled from the command line. But we have added the constraint that all the removable components need to conform to the shared-memory model.

In system 3, we've lost all the above abilities. We definitely can't run different threads from one process on multiple nodes (we can run them on different processors in an SMP system, though). And we've lost our configurability aspects—now we need to have an explicit mechanism to define which "input," "processing," or "output" algorithm we want to use (which we can solve with shared objects, also known as DLLs.)

So why would I design my system to have multiple threads like system 3? Why not go for the maximally flexible system 1?

Well, even though system 3 is the most inflexible, it is most likely going to be the fastest. There are no thread-to-thread context switches for threads in different processes, I don't have to set up memory sharing explicitly, and I don't have to use abstracted synchronization methods like pipes, POSIX message queues, or message passing to deliver the data or control information—I can use basic kernel-level thread-synchronization primitives. Another advantage is that when the system described by the one process (with the three threads) starts, I know that everything I need has been loaded off the storage medium (i.e., I'm not going to find out later that "Oops, the processing driver is missing from the disk!"). Finally, system 3 is also most likely going to be the smallest, because we won't have three individual copies of "process" information (e.g., file descriptors).

To sum up: know what the trade-offs are, and use what works for your design.