Using the thread scheduler and multicore together

On a multicore system, you can use scheduler partitions and symmetric multiprocessing (SMP) to reap the rewards of both. For more information, see the Multicore Processing User's Guide.

Note the following facts:

On an SMP machine, the thread scheduler considers the time to be 100%, not (say) 400% for a four-processor machine
The thread scheduler first attempts to keep every processor busy; only then does it apply budgets. For example, when you have a four-processor machine, and if partitions are divided into 70%, 10%, 10%, and 10%, if there's only one thread running in each partition, the thread scheduler runs all four threads all the time. The thread scheduler and the aps command report the partition's consumed time as 25%, 25%, 25%, and 25%.

It may seem unlikely to have only one thread per partition, since most systems have many threads. However, there is a way this situation will occur on a multithreaded system.

The runmask controls which CPUs a thread is allowed to run on. With careful (or foolish) use of the runmask, it's possible to arrange things so that there aren't enough threads that are permitted to run on a particular processor for the scheduler to meet its budgets.

If there are several threads that are ready to run, and they're permitted to run on each CPU, then the thread scheduler correctly guarantees each partition's minimum budget.

Note: On a hyperthreaded machine, actual throughput of partitions may not match the percentage of CPU time usage reported by the thread scheduler. This discrepancy occurs because on a hyperthreaded machine, throughput isn't always proportional to time, regardless of what kind of scheduler is being used. This scenario is most likely to occur when a partition doesn't contain enough ready threads to occupy all of the pseudo-processors on a hyperthreaded machine.

Scheduler partitions and BMP

Certain combinations of runmasks and partition budgets can have surprising results.

For example, suppose we have a two-CPU SMP machine, with these partitions:

Pa, with a budget of 50%
System, with a budget of 50%

Now, suppose the system is idle. If you run a priority-10 thread that's locked to CPU 1 and is in an infinite loop in partition Pa, the thread scheduler interprets this to mean that you intend Pa to monopolize CPU 1. That's because CPU 1 can provide only 50% of the entire machine's processing time.

If you run another thread at priority 9, also locked to CPU 1, but in the System partition, the thread scheduler interprets that to mean you also want the System partition to monopolize CPU 1.

The thread scheduler has a dilemma: it can't satisfy the requirements of both partitions. What it actually does is allow partition Pa to monopolize CPU 1.

This is why: from an idle start, the thread scheduler observes that both partitions have available budget. When partitions have available budget, the thread scheduler schedules in realtime mode, which is strict priority scheduling. So partition Pa runs. However, because CPU 1 can never satisfy the budget of partition Pa; Pa never runs out of budget. Therefore, the thread scheduler remains in realtime mode and the lower-priority System partition never runs.

For this example, the aps show command might display:

                    +-------- CPU Time -------+-- Critical Time --
Partition name   id | Budget |  Max |    Used | Budget |      Used
--------------------+-------------------------+-------------------
System            0 |    50% | 100% |   0.09% |  200ms |   0.000ms
Pa                1 |    50% | 100% |  49.93% |    0ms |   0.000ms
--------------------+-------------------------+-------------------
Total               |   100% |      |  50.02% |

The System partition receives no CPU time even though it contains a thread that is ready to run.

Similar situations can occur when there are several partitions, each having a budget less than 50%, but whose budgets sum to 50% or more.

Avoiding infinite loops is a good way to avoid these situations. However, if you're running third-party software, you may not have control over the code.