High Availability

High availability is a very interesting topic. In this chapter, I discuss the concept of high availability (what it is, how it's measured, and how to achieve it). We'll discuss such things as Mean Time Between Failures (MTBF), Mean Time To Repair (MTTR), and the formula used to calculate availability.

I'll also talk about how you can design your systems to be "highly-available" and some of the problems that you'll run into. Unfortunately, in a lot of today's designs, high availability is done as an afterthought — this almost always leads to disaster.

By thinking about high availability up front, you'll be able to benefit from having the architectural insight necessary to design highly-available systems.