Glossary
QNX SDP8.0High Availability Framework Developer's GuideDeveloper
- action
- A specific task the HAM will perform under certain associated conditions. Examples of actions include executing an external process, restarting a process that has died, sending a signal or pulse notification, etc.
- availability
- The ability of a system to provide its intended service without interruption for extended periods of time.
- clustering
- A method of distributing processing among several computers in order to
reduce the number of single points of failures (SPOFs).
This concept isn't related to clusters, which are groups of associated processors and are explained in the Programmer's Guide.
- condition
- An event that will trigger certain actions for the HAM to perform. Examples of conditions include the death of entity, a missed heartbeat, etc.
- entity
- A process that the HAM will monitor. Entities can explicitly ask to be monitored (i.e., as self-attached entities), or they may be monitored without ever realizing it.
- five nines
- The celebrated availability metric that refers to a system's ability to remain up and running 99.999% of the time per year.
- Guardian
- The HAM's
clone
, a stand-in process that the HAM creates to ensure uninterrupted HA management within the QNX OS environment. - HAM
- High Availability Manager.
- heartbeat
- A
wellness
orliveness
notification sent at specific intervals by a client to the HAM. - hot swap
- The ability to remove or insert a component in a live system.
- MMU
- Memory Management Unit. A device on many CPUs that alerts the OS if a process tries to access memory that's been allocated to another process.
- MTTF
- Mean Time To Failure. This is the average length of time that the system will remain in service before failing. You want this to be as long as possible.
- MTTR
- Mean Time To Repair. This is the amount of time it takes for the system to resume operation after any component fails or is upgraded. You want this to be as small as possible.
- SPOF
- Single point of failure. Any particular
weak link
in a system would be considered a SPOF, because its demise would put the entire system at risk. - watchdog
- A trusted piece of hardware whose main purpose is to
trigger code that will check the sanity of the system. There
are software watchdogs as well; the HAM may be considered a
smart watchdog.
Page updated: