Glossary
- 5 nines (high availability)
- A system that's characterized as having a 5 nines
availability rating
(99.999%).
This means that the system has a downtime
of just 5 minutes per year.
A 6 nines system will have a downtime of just 20 minutes every forty years.
This is also known variously in the industry as carrier-class
or telco-class availability.
- analog (data acquisition)
- Indicates an input or output signal that corresponds to a range of voltages.
In the real world, an analog signal is continuously variable, meaning that it
can take on any value within the range.
When the analog value is used with a data acquisition card, it will be digitized,
meaning that only a finite number of discrete values are represented.
Analog inputs are digitized by an analog to digital (A/D) convertor,
and analog outputs are synthesized by a digital to analog (D/A) convertor.
Generally, most convertors have an accuracy of 8, 12, 16, or more bits.
Compare digital.
- asynchronous
- Used to indicate that a given operation is not synchronized to another
operation. For example, a pulse
is a form of asynchronous message-passing
in that the sender is not blocked
waiting for the message to be received.
Contrast with synchronous.
See also
blocking
and
receive a message.
- availability (high availability)
- Availability is a ratio that expresses the amount of time that a
system is up and available for use. It is calculated by
taking the MTBF and dividing it by
the sum of the MTBF plus the
MTTR, and is usually expressed
as a percentage.
To increase a system's availability, you need to raise the MTBF
and/or lower the MTTR.
The availability is usually stated as the number of leading 9s
in the ratio (see 5 nines).
An availability of 100% (also known as continuous availability)
is extremely difficult, if not impossible, to attain because that would imply that the value of
MTTR was zero (and the availability was just MTBF divided by MTBF, or 1)
or that the MTBF was infinity.
- blocking
- A means for a thread to synchronize with other threads or events. In the
blocking state (of which there are about a dozen), a thread doesn't
consume any CPU — it's waiting on a list maintained within the
kernel. When the event that the thread was waiting for occurs
(or a timeout occurs),
the thread is unblocked and is able to consume CPU again.
See also
unblock.
- cascade failure (high availability)
- A cascade failure is one in which several modules fail as the result of a single
failure.
A good example of this is if a process
is using a driver, and the driver fails.
If the process that used the driver isn't fault tolerant,
then it too may fail.
If other processes that depend on this driver aren't fault tolerant,
then they too will fail.
This chain of failures is called a cascade failure.
The North American power outage of August 14, 2003 is a good example.
See also fault tolerance.
- client (message-passing)
- QNX Neutrino's message-passing architecture is
structured around a client/server relationship.
In the case of the client, it's the one requesting services of
a server. The client generally accesses these services using
standard file-descriptor-based function calls (e.g., lseek()),
which are synchronous, in that the client's
call doesn't return until the request is completed by the server.
A thread can be both a client and a server
at the same time.
- code (memory)
- A code segment is one that is executable.
This means that instructions can be executed from it.
Also known as a text segment.
Contrast with data or stack
segments.
- cold standby (high availability)
- Cold standby mode refers to the way a failed software component is restarted.
In cold-standby mode, the software component is generally restarted by loading
it from media (disk, network), having the component go through initializations,
and then having the component advertise itself as ready.
Cold standby is the slowest of the three modes (cold, warm, and
hot), and, while its timing is
system specific, it usually takes on the order of tens of milliseconds
to seconds.
Cold standby is the simplest standby model to implement, but also the one that
impacts MTTR the most negatively.
See also
hot standby,
restartability,
and
warm standby.
- continuous availability (high availability)
- A system with an availability
of 100%.
The system has no downtime, and as such, is
difficult, if not impossible, to attain with moderately complex systems.
The reason it's difficult to attain is that every piece of software, hardware,
and infrastructure has some kind of failure rate.
There is always some non-zero probability of a catastrophic
failure for the system as a whole.
- data (memory)
- A data segment is one that is not executable.
It's typically used for storing data, and as such, can be marked read-only, write-only,
read/write, or no access.
Contrast with code or stack
segments.
- deadlock
- A failure condition reached when two threads are mutually
blocked on each other, with each thread waiting for the other to respond.
This condition can be generated quite easily; simply have two threads send
each other a message — at this point, both threads are waiting for the
other thread to reply to the request.
Since each thread is blocked, it will not have a chance to reply, hence deadlock.
To avoid deadlock, clients and servers
should be structured around a send hierarchy.
(Of course, deadlock can occur with more than two threads; A sends to B, B sends to C, and C
sends back to A, for example.)
See also
blocking,
client,
reply to a message,
send a message,
server,
and
thread.
- digital (data acquisition)
- Indicates an input or output signal that has two states only, usually identified as on or off (other
names are commonly used as well, energized and de-energized for example).
Compare analog.
- exponential backoff (high availability)
- A policy
that's used to determine at what intervals a process should be restarted.
Its use is to prevent overburdening the system in case a component keeps failing.
See also
restartability.
- fault tolerance (high availability)
- A term used in conjunction with high availability
that refers to a system's ability to handle a fault.
When a fault occurs in a fault-tolerant system, the software is able to
work around the fault, for example, by retrying the operation or switching
to an alternate server.
Generally, fault tolerance is incorporated into a system to avoid
cascade failures.
See also
cascade failure.
- guard page (stack)
- An inaccessible data area present at the end of the valid virtual address range for a
stack.
The purpose of the guard page is to cause a memory-access exception should the
stack overflow past its defined range.
- HA (or high availability)
- A designation applied to a system to indicate that it
has a high level of availability.
A system that's designed for high-availability needs to consider
cascade failures,
restartability, and
fault tolerance.
Generally speaking, a system designated as high availability
will have an availability of 5 nines
or better.
See also
cascade failure.
- hot standby (high availability)
- Hot-standby mode refers to the way in which a failed software component is restarted.
In hot-standby mode, the software component is actively running, and effectively
shadows the state of the primary process.
The primary process feeds it updates, so that
the secondary (or standby) process is ready to take over the
instant that the primary process fails.
Hot standby is the fastest, but most expensive to implement of the three modes (cold,
warm, and hot), and, while its timing is
system specific, is usually thought of as being on the order of microseconds to
milliseconds.
Hot standby is very expensive to implement, because it must continually be shadowing
the data updates from the primary process,
and must be able to assume operation when the primary dies.
Hot standby, however, is the preferred solution to minimizing MTTR
and hence increasing availability.
See also
cold standby,
restartability,
and
warm standby.
- in-service upgrade or ISU (high availability)
- An upgrade performed on a live system, with the least
amount of impact to the operation of the system.
The basic algorithm is to simulate a fault
and then, instead of having the overlord process
restart the failed component, it instead starts a new version.
In certain cases, the policy of the
overlord may be to perform a version downgrade instead of an upgrade.
See also
fault tolerance
and
restartability.
- message-passing
- The QNX Neutrino operating system is based on a message-passing model, where all
services are provided in a synchronous manner by
passing messages around from client to
server.
The client will send a message to the server and
block. The server will receive a message from
the client, perform some amount of processing, and then reply
to the client's message, which will unblock
the client.
See also
blocking
and
reply to a message.
- MTBF or Mean Time Between Failures (high availability)
- The MTBF is expressed in hours and indicates the mean time that elapses between
failures.
MTBF is applied to both software and hardware, and is used, in conjunction with
the MTTR, in the calculation of
availability.
A computer backplane, for example, may have an MTBF that's measured in the tens
of thousands of hours of operation (several years).
Software usually has a lower MTBF than hardware.
- MTTR or Mean Time To Repair (high availability)
- The MTTR is expressed in hours, and indicates the mean time required to repair
a system.
MTTR is applied to both software and hardware, and is used, in conjunction with
the MTBF, in the calculation of
availability.
A server, for example, may have an MTTR that's measured in milliseconds,
whereas a hardware component may have an MTTR that's measured in minutes or
hours, depending on the component.
Software usually has a much lower MTTR than hardware.
- overlord (high availability)
- A process responsible for monitoring the stability of various
system processes, according to the policy,
and performing actions (such as restarting processes based on a restart policy).
The overlord may also be involved with an in-service upgrade
or downgrade.
See also
restartability.
- policy (high availability)
- A set of rules used in a high-availability
system to determine the limits that are enforced by the overlord
process against other processes in the system.
The policy also determines how such processes are restarted,
and may include algorithms such as exponential backoff.
See also
restartability.
- primary (high availability)
- The primary designation refers to the active process when used
in discussions of cold, warm, and hot standby.
The primary system is running, and the secondary
system(s) is/are the backup system(s).
See also
cold standby,
warm standby,
and
hot standby.
- process (noun)
- A non-schedulable entity that occupies memory, effectively acting as a container
for one or more threads.
See also
thread.
- pulse (message-passing)
- A nonblocking message received in a manner similar to a regular message.
It is non-blocking for the sender, and can be waited on by the receiver using
the standard message-passing functions MsgReceive() and MsgReceivev()
or the special pulse-only receive function MsgReceivePulse().
While most messages are typically sent from client to
server, pulses are generally sent in the opposite
direction, so as not to break the send hierarchy
(which could cause deadlock).
See also
receive a message.
- QNX Software Systems
- The company responsible for the QNX 2, QNX 4, and QNX Neutrino operating systems.
- QSS
- An abbreviation for QNX Software Systems.
- receive a message (message-passing)
- A thread can receive a message by calling MsgReceive() or MsgReceivev().
If there is no message available, the thread will block, waiting for one.
A thread that receives a message is said to be a server.
See also
blocking.
- reply to a message (message-passing)
- A server will reply to a client's
message to deliver the results of the client's request back to the client, and unblock the client.
See also
client.
- resource manager
- A server process that provides certain
well-defined file-descriptor-based services to arbitrary clients.
A resource manager supports a limited set of messages that correspond to standard
client C library functions such as open(), read(), write(),
lseek(), devctl(), etc.
See also
client.
- restartability (high availability)
- The characteristic of a system or process that lets it
be gracefully restarted from a faulted state.
Restartability is key in lowering MTTR,
and hence in increasing availability.
The overlord process is responsible for
determining that another process has exceeded some kind of limit, and then, based on the policy,
the overlord process may be responsible for restarting the component.
- secondary (or standby) (high availability)
- Refers to the inactive process when used
in discussions of cold, warm, and hot standby.
The primary system is the one that's currently
running; the secondary system is the backup system.
There may be more than one secondary process.
See also
cold standby,
warm standby,
and
hot standby.
- segment (memory)
- A contiguous chunk of memory with the same accessibility permissions throughout.
Note that this is different from the (now archaic) x86 term, which indicated something accessible
via a segment register.
In this definition, a segment can be of an arbitrary size.
Segments typically represent code (or text),
data, stack,
or other uses.
- send a message (message-passing)
- A thread can send a message to another thread. The MsgSend*() series
of functions are used to send the message; the sending thread blocks until
the receiving thread replies to the message.
A thread that sends a message is said to be a client.
See also
blocking,
message-passing,
and
reply to a message.
- send hierarchy
- A design paradigm where messages are sent in one direction, and replies flow in the opposite direction.
The primary purpose of having a send hierarchy is to avoid deadlock.
A send hierarchy is accomplished by assigning clients
and servers a level, and ensuring that
messages that are being sent go only to a higher level.
This avoids the potential for deadlock where two threads would send to each other,
because it would violate the send hierarchy — one thread should not have
sent to the other thread, because that other thread must have been at a lower level.
See also
client,
reply to a message,
send a message,
server,
and
thread.
- server (message-passing)
- A regular, user-level process that provides
certain types of functionality (usually file-descriptor-based) to clients.
Servers are typically resource managers.
QNX Neutrino provides an extensive library that performs much
of the functionality of a resource manager for you.
The server's job is to receive messages from clients,
process them, and then reply to the messages, which
unblocks the clients.
A thread within a process can be both a client and a server
at the same time.
See also
client,
receive a message,
reply to a message,
resource manager,
and
unblock.
- stack (memory)
- A stack segment is one used for the stack of a thread.
It generally is placed at a special virtual address
location, can be grown on demand,
and has a guard page.
Contrast with data or code
segments.
- synchronous
- Used to indicate that a given operation has some synchronization to another
operation. For example, during a message-passing operation,
when the server does a MsgReply() (to reply to
the client), unblocking the client is said to be
synchronous to the reply operation.
Contrast with asynchronous.
See also
message-passing
and
unblock.
- timeout
- Many kernel calls support the concept of a timeout, which limits the time spent in a
blocked state.
The blocked state will be exited if whatever condition was being waited upon has been
satisfied, or the timeout time has elapsed.
See also
blocking.
- thread
- A single, schedulable, flow of execution. Threads are implemented directly within
the QNX Neutrino kernel and are manipulated by the POSIX pthread*() function
calls. A thread will need to synchronize with other threads (if any) by using
various synchronization primitives such as mutexes,
condition variables, semaphores, etc.
Threads are scheduled in either FIFO or Round Robin scheduling mode.
A thread is always associated with a process.
- unblock
- A thread that had been blocked will be unblocked
when the condition it has been blocked on is met,
or a timeout occurs.
For example, a thread
might be blocked waiting to receive a message.
When the message is sent, the thread will be unblocked.
See also
blocking
and
send a message.
- warm standby (high availability)
- Warm-standby mode refers to the way a failed software component is restarted.
In warm-standby mode, the software component is lying in a dormant
state, perhaps having performed some rudimentary initialization.
The component is waiting for the failure of its primary
component; when that happens, the component completes its initializations,
and then advertises itself as being ready to serve requests.
Warm standby is the middle-of-the-road version of the three modes (cold, warm, and
hot).
While its timing is
system-specific, this is usually thought of as being on the order of milliseconds.
Warm standby is relatively easy to implement, because it performs its usual
initializations (as if it were running in primary mode),
then halts and waits for the failure of the primary before
continuing operation.
See also
cold standby,
hot standby,
restartability,
and
server.