High Availability
The term High Availability (HA) is commonly used in telecommunications and other industries to describe a system's ability to remain up and running without interruption for extended periods of time.
The celebrated
five nines
availability metric refers to the
percentage of uptime a system can sustain in a year—99.999% uptime amounts to
about five minutes of downtime per year.
Obviously, an effective HA solution involves various hardware and software components that combine to form a stable, working system. Assuming reliable hardware components with sufficient redundancy, how can an OS best remain stable and responsive when a particular component or application program fails? And in cases where redundant hardware may not be an option (e.g., consumer appliances), how can the OS itself support HA?