High Availability

In this chapter, we'll take a look at the concept of high availability (HA). We'll discuss the definition of availability, examine the terms and concepts, and take a look at how we can make our software more highly available.

All software has bugs, and bugs manifest themselves in a variety of ways. For example, a module could run out of memory and not handle it properly, or leak memory, or get hit with a SIGSEGV, and so on. This leads to two questions:

Obviously, it's not a satisfactory solution to simply say to the customer, “What? Your system crashed? Oh, no problem, just reboot your computer!”

For the second point, it's also not a reasonable thing to suggest to the customer that they shut everything down, and simply “upgrade” everything to the latest version, and then restart it.

Some customers simply cannot afford the downtime presented by either of those “solutions.”

Let's define some terms, and then we'll talk about how we can address these (very important) concerns.