System startup and HA

A point that arises directly out of our cascade failure discussion has to do with system startup. Often, even an HA system is designed such that starting up the system and the normal running operation are two distinct things.

When you stop to think about this, they really don't need to be—what's the difference between a system that's starting up, and a system where every component has crashed? If the system is designed properly, there might not be any difference. Each component restarts (and we'll see how that's done below). When it starts up, it treats the lack of a lower-layer component as if the lower-layer component had just failed. Soon, the lower-layer component will start up as well, and operation can resume as if the layer below it suffered a brief outage.