High Availability
In this chapter, we'll take a look at the concept of high availability (HA). We'll discuss the definition of availability, examine the terms and concepts, and take a look at how we can make our software more highly available.
All software has bugs, and bugs manifest themselves in a variety of ways. For example, a module could run out of memory and not handle it properly, or leak memory, or get hit with a SIGSEGV, and so on. This leads to two questions:
- How do you recover from those bugs?
- How do you upgrade the software once you've found and fixed bugs?
Obviously, it's not a satisfactory solution to simply say to the customer, What? Your
system crashed? Oh, no problem, just reboot your computer!
For the second point, it's also not a reasonable thing to suggest to the customer that they
shut everything down, and simply upgrade
everything to the latest version, and then restart it.
Some customers simply cannot afford the downtime presented by either of those solutions.
Let's define some terms, and then we'll talk about how we can address these (very important) concerns.
