Where's the problem?

QNX SDP8.0High Availability Framework Developer's GuideDeveloper

Obviously, systems fail. For one reason or another, systems aren't as available for use as their users and designers would like them to be. Of all the possible causes of system failure — power outages, component breakdowns, operator errors, software faults, etc. — the lion's share belongs to software faults.

Many HA systems try to address the problem of system failure by turning to hardware solutions such as:

  • rugged hardware
  • redundant systems/components
  • hot-swap CompactPCI components
  • clustering

But if so many system crashes are caused by software faults, then throwing more hardware at the problem may not solve it at all. What if the system's memory state isn't properly restored after recovery? What if yours is an HA system (e.g., a consumer appliance) where redundant hardware simply isn't an option? Or what if your particular HA system is based on a custom chassis for which a PCI-based HA solution would be pointless?

Page updated: