Achieving cold standby

Updated: May 06, 2022

For some systems, a cold standby approach may be sufficient. While the cold standby approach does have a higher MTTR than the other two, it is significantly easier to implement. All the overlord needs to do is notice that the process has failed, and then start a new version of the process.

Usually this means that the newly started process initializes itself in the same way that it would if it was just starting up for the first timeā€”it may read a configuration file, test its dependent subsystems, bind to whatever services it needs, and then advertise itself to higher level processes as being ready to service their requests.

A cold standby process might be something like a serial port driver. If it faults, the overlord simply starts a new version of the serial port driver. The driver initializes the serial ports, and then advertises itself (for example, by putting /dev/ser1 and /dev/ser2 into the pathname space) as being available. Higher-level processes may notice that the serial port seemed to go away for a little while, but that it's back in operation, and the system can proceed.