I've worked at a few companies that have HA systems.
QNX Software Systems has the HAT (High Availability Toolkit) and the HAM (High Availability Manager). HAT is a toolkit that includes the HAM, various APIs for client recovery, and many source code examples. HAM is the manager component that monitors processes on your system.
QNX Neutrino includes the /proc filesystem, which is where you get information about processes so you can write your own policies and monitor things that are of interest in your system.
There are several other HA packages available.