server-monitor
Watch designated servers and take action if they don't handle an unblock pulse
Syntax:
server-monitor [-fv] [-p n] [-U {user_name | uid[:gid[,gid]*]}]max_unblock_time
[server_ident server_action]*
Runs on:
QNX OS
Options:
- -f
- Leave the server-monitor process in the foreground rather than moving it to the background.
- -p n
- The size of the pulse pool that procnto can use to inform server-monitor of misbehaving servers. The argument n must be greater than 0. If this option is omitted or set to default, procnto uses the default of 200 pulses, which means the kernel can inform server-monitor of 200 concurrently misbehaving servers.
If the pulse pool becomes depleted, a SIGKILL is sent to server-monitor. In a system where it is configured to be a critical process, this causes the system to enter its design safe state (DSS).
- -U {user_name | uid[:gid[,gid]*]}
- Switch to the specified user or uid and gid(s) once setup is complete.
- -v
- Log server actions and warnings to slogger2. You can use slog2info to view the logs. Errors are always logged, whether or not you specify this option.
- max_unblock_time
- The number of milliseconds a server is given to handle an unblock pulse before server-monitor takes action. This value must be in the range from 1 through UINT32_MAX - 1 (4294967294); otherwise server-monitor logs a message to slogger2 and then exits.
- server_ident
- Either a name or process ID for the server.
Actions specified for an executable name apply to all processes with that name. You should specify a name if the server is likely to be terminated and restarted in the normal operation of your system.
- server_action
- A comma-separated list of actions, each in the form one_action[:delay].
Each one_action is one of the following:
- A signal name or number — server-monitor sends the specified signal to the server.
- reboot — reboot the system; server-monitor calls sysmgr_reboot().
- ignore — stop any further processing of this notification; server-monitor tells the process manager to stop sending it pulses about the server.
If you specify a delay, server-monitor waits the specified number of milliseconds before moving to the next action in the list; otherwise it waits for 100 ms.
Description:
The server-monitor provides a way to detect servers that don't respond to unblock pulses.
If a server sets the _NTO_CHF_UNBLOCK flag when it calls ChannelCreate(), then the process manager delivers an unblock pulse when a thread that's REPLY-blocked on the channel attempts to unblock before the server replies to its message. Servers must respond to these unblock pulses, or else the clients can be blocked forever.
The server-monitor watches a list of servers and takes the specified actions if a server doesn't handle unblock pulses in a certain time.
When server-monitor starts, it registers itself with the process manager. In order to do this, server-monitor needs to have the PROCMGR_AID_SERVER_MONITOR ability enabled (see procmgr_ability() in the C Library Reference). There can be only one instance of server-monitor running at any time.
When a client requests an unblock, the kernel sends an unblock pulse to the server that the client is blocked on. If server-monitor is running, and the server doesn't unblock the client within max_unblock_time milliseconds, then the kernel sends a pulse to server-monitor. When server-monitor receives this pulse, it checks to see if the server's process ID is in its list, or if a pattern in the list matches the server's process executable name:
- If there's no match, server-monitor sends an
ignore
message to the process manager to inform it that it isn't interested in this server. - If a match is found, server-monitor starts the appropriate action sequence for the server by immediately invoking the first action. After the specified or default delay, server-monitor invokes the next action if the client thread is still blocked. This continues until the client thread is unblocked or server-monitor has taken all the actions in the sequence. If the client gets unblocked and then blocks again on a new message before the timer for the next action expires on the original blockage, then server-monitor starts the sequence again.
If you want to determine whether server-monitor is running, you can send a _PROC_SERVMON message with a subtype of _PROC_SERVER_MONITOR_ALIVE to the process manager. For example:
int is_server_monitor_running (const pid_t pid)
{
proc_servmon_t msg = { 0 };
int ret;
msg.i.type = _PROC_SERVMON;
msg.i.subtype = _PROC_SERVER_MONITOR_ALIVE;
ret = MsgSendnc(PROCMGR_COID, &msg.i, sizeof(msg.i), NULL, 0);
return ret;
}
The result is EOK if server-monitor is running, or ESRCH if it isn't.
Examples:
Start server-monitor with a max_unblock_time of 500 ms, and a list of actions for process ID 123456:
server-monitor 500 123456 SIGTERM:300,SIGKILL,reboot
In this case, when server-monitor gets a pulse indicating that the server with the process ID 123456 has failed to unblock a client, it sends the server a SIGTERM signal. If the client thread is still blocked after a delay of 300 ms, server-monitor sends the server a SIGKILL signal. If the client thread is still blocked after a further delay of 100 ms (the default), server-monitor reboots the system.
Start server-monitor with a max_unblock_time of 500 ms, and a list of actions for all processes with the name io-sock:
server-monitor 500 io-sock SIGTERM:300,SIGKILL,reboot
If io-sock will be terminated and restarted during normal target system operation, you should use the process name because io-sock will have a new process ID every time it's restarted.
Exit status:
- 0
- Success.
- -1
- An error occurred; check the system log for more information.