What's in the High Availability Framework?

The QNX Neutrino High Availability Framework consists of the following main components:

QNX Neutrino RTOS

We're not just trying to be thorough by listing the OS itself here! And it's first in the list for good reason — the QNX Neutrino microkernel architecture inherently provides a robust environment for building highly reliable applications. Many of the particular features required in an HA application — system stability, isolation of software modules, dynamic upgrading of software components, etc. — are already included in the OS.

The microkernel provides system-wide stability by offering full memory protection to all processes. And there's very little code running in kernel mode that could cause the microkernel itself to fail. All individual processes, whether applications or OS services — including device drivers — can be started and stopped dynamically, without jeopardizing system uptime.

For more on the suitability of the QNX Neutrino RTOS for HA, see the next chapter in this guide.

High Availability Manager (HAM)

A HAM is a "smart watchdog" — a highly resilient manager process that can monitor your system and perform multistage recovery whenever system services or processes fail or no longer respond.

As a self-monitoring manager, a HAM is resilient to internal failures. If, for whatever reason, the HAM itself is stopped abnormally, it can immediately and completely reconstruct its own state by handing over to a mirror process called the Guardian.

For details on the HAM, see the chapter Using the High Availability Manager in this guide.

HAM API

The HAM API library of more than 35 ham_*() functions gives you a simple mechanism to talk to a HAM. This API is implemented as a thread-safe library you can link against.

You use the API to interact with a HAM in order to begin monitoring processes and to set up the various conditions (e.g., the death of a server) that will trigger certain recovery actions.

For descriptions of the functions in the HAM API, see the HAM API Reference chapter in this guide.

Client Recovery Library

The client recovery library provides a drop-in enhancement solution for many standard libc I/O operations. The HA library's cover functions provide automatic recovery mechanisms for failed connections that can be recovered from in an HA scenario.

For descriptions of the client library functions, see the Client Recovery Library Reference chapter in this guide.

Examples

You'll find several sample code listings (and source) that illustrate such tasks as restarting, heartbeating, and more. Since the examples deal with some typical fault-recovery scenarios, you may be able to easily tailor this source for your HA applications.

For details, see the Examples appendix in this guide.