Decreasing the MTTR

A much simpler and less expensive alternative, though, is to decrease the MTTR. Recall that the MTTR is in the denominator of the availability formula and is what is really driving the availability away from 100% (i.e., if the MTTR was zero, then the availability would be MTBF / MTBF, or 100%, regardless of the actual value of the MTBF.) So anything you can do to make the system recover faster goes a long way towards increasing your availability number.

Sometimes, speed of recovery is not generally thought about until later. This is usually due to the philosophy of “Who cares how long it takes to boot up? Once it's up and running it'll be fast!” Once again, it's a trade off—sometimes taking a long time to boot up is a factor of doing some work “up front” so that the application or system runs faster—perhaps precalculating tables, doing extensive hardware testing up front, etc.

Another important factor is that decreasing MTTR generally needs to be designed into the system right up front. This statement applies to HA in general—it's a lot more work to patch a system that doesn't take HA into account, than it is to design one with HA in mind.