Averaging window

How does the averaging window work?

The averaging window consists of tables. There are two tables per partition, one for the CPU time spent while critical, and another for any CPU time spent. The tables have one slot per timer tick. So a 100-ms averaging window, with a 1-ms clock period, has 100 slots. Each slot is used to hold the CPU time spent during a particular tick interval. For example:

[99ms ago][98 ms ago][97 ms ago]....[1 ms ago][current ms]

The slots hold the total CPU times of all threads in that partition as measured by consecutive calls to ClockCycles(). Note that total CPU times are then scaled by a carefully chosen factor so that all numbers fit into a 32-bit unsigned integer register.

At any time, the sum of the elements of a table represents the total CPU time used by that partition over the averaging period.

When the scheduler stops a thread running, it adds the time spent by that thread since when it started, or since the last tick, into the current ms slot of the table. If the thread was running as critical, the scheduler also adds the time to the current ms slot of that partition's critical time table. The scheduler also does this when a clock tick occurs.

However, on a clock tick, after billing the current thread to its partition's [current ms] slot, the scheduler also rotates the table. To rotate the table, it does the following:

This is called window rotation. Each rotation effectively provides available budget back to the partition that ran 99 ms ago. Window rotation is implemented without summing the entire table, shifting the table, or calls to the malloc() or free() functions.

What is the window-rotation algorithm?

The averaging window isn't physically rotated. It's logically rotated:

Can I change the window size?

See the next answer.

How does changing the window size affect scheduling?

You can change the window size with SchedCtl(SCHED_APS_SET_PARMS,...) on the fly. The scheduler doesn't malloc() new tables, but it does zero the history in all tables, zeros all the totals, and zeros the table indexes.

The effect is to wipe the memory of the scheduler. Here the scheduler assumes that no partition has run in the last x ms, where x is the new window size.

We recommend you leave the window size at the default, or set it during startup. Also, you shouldn't change the window size often.

How do maximum latencies relate to the averaging window size?

In general, the longer the averaging window, the longer the partition has to wait before it gets the CPU time.

For example, with a 100 milliseconds averaging window and a partition p with a 10% budget, the partition p will exhaust its budget if it runs continuously for 10 milliseconds. It has to wait another 90 milliseconds before window rotations cause the averaging window to lose memory of its past execution. So, it will be 90 milliseconds before the partition p gets some available budget back and runs again.

However, in most real systems that engage in inter-partition interaction, partition p's 10 milliseconds of running time is likely to get spread out in the averaging window. So even if p exhausts the budget soon, it will most likely get available budget back in much less than 90 milliseconds.

The Adaptive Partitioning User's Guide describes an unlikely scenario where two interacting partitions result in a larger latency than the window size budget.