server-monitor

Updated: April 19, 2023

Watch designated servers and take action if they don't handle an unblock pulse

Syntax:

server-monitor [-fv] [-U {user_name | uid[:gid[,gid]*]}] max_unblock_time 
  [server_ident server_action]*

Runs on:

QNX Neutrino

Options:

-f
Leave the server-monitor process in the foreground rather than moving it to the background.
-U {user_name | uid[:gid[,gid]*]}
Switch to the specified user or uid and gid(s) once setup is complete.
-v
Log server actions and warnings to slogger2. You can use slog2info to view the logs. Errors are always logged, whether or not you specify this option.
max_unblock_time
The number of milliseconds a server is given to handle an unblock pulse before server-monitor takes action. This value must be in the range from 1 through UINT32_MAX - 1 (4294967294); otherwise server-monitor logs a message to slogger2 and then exits.
server_ident
Either a name or process ID for the server.

Actions specified for an executable name apply to all processes with that name. You should specify a name if the server is likely to be terminated and restarted in the normal operation of your system.

server_action
A comma-separated list of actions, each in the form one_action[:delay]. Each one_action is one of the following:
  • A signal name or number — server-monitor sends the specified signal to the server.
  • reboot — reboot the system; server-monitor calls sysmgr_reboot().
  • ignore — stop any further processing of this notification; server-monitor tells the process manager to stop sending it pulses about the server.

If you specify a delay, server-monitor waits the specified number of milliseconds before moving to the next action in the list; otherwise it waits for 100 ms.

Description:

The server-monitor provides a way to detect servers that don't respond to unblock pulses.

If a server sets the _NTO_CHF_UNBLOCK flag when it calls ChannelCreate(), then the process manager delivers an unblock pulse when a thread that's REPLY-blocked on the channel attempts to unblock before the server replies to its message. Servers must respond to these unblock pulses, or else the clients can be blocked forever.

The server-monitor watches a list of servers and takes the specified actions if a server doesn't handle unblock pulses in a certain time.

When server-monitor starts, it registers itself with the process manager. In order to do this, server-monitor needs to have the PROCMGR_AID_SERVER_MONITOR ability enabled (see procmgr_ability() in the C Library Reference). There can be only one instance of server-monitor running at any time.

When a client requests an unblock, the kernel sends an unblock pulse to the server that the client is blocked on. If server-monitor is running, and the server doesn't unblock the client within max_unblock_time milliseconds, then the kernel sends a pulse to server-monitor. When server-monitor receives this pulse, it checks to see if the server's process ID is in its list, or if a pattern in the list matches the server's process executable name:

Note: If server-monitor has a set of actions registered for a process's ID, and another set registered for the process's executable name, it invokes the set registered for the process ID.

If you want to determine whether server-monitor is running, you can send a _PROC_SERVMON message with a subtype of _PROC_SERVER_MONITOR_ALIVE to the process manager. For example:

int is_server_monitor_running (const pid_t pid)
{
  proc_servmon_t msg = { 0 };
  int ret;

  msg.i.type = _PROC_SERVMON;
  msg.i.subtype = _PROC_SERVER_MONITOR_ALIVE;

  ret = MsgSendnc(PROCMGR_COID, &msg.i, sizeof(msg.i), NULL, 0);

  return ret;
}

The result is EOK if server-monitor is running, or ESRCH if it isn't.

Examples:

Start server-monitor with a max_unblock_time of 500 ms, and a list of actions for process ID 123456:

server-monitor 500 123456 SIGTERM:300,SIGKILL,reboot

In this case, when server-monitor gets a pulse indicating that the server with the process ID 123456 has failed to unblock a client, it sends the server a SIGTERM signal. If the client thread is still blocked after a delay of 300 ms, server-monitor sends the server a SIGKILL signal. If the client thread is still blocked after a further delay of 100 ms (the default), server-monitor reboots the system.

Start server-monitor with a max_unblock_time of 500 ms, and a list of actions for all processes with the name io-audio:

server-monitor 500 io-audio SIGTERM:300,SIGKILL,reboot

If io-audio will be terminated and restarted during normal target system operation, you should use the process name because io-audio will have a new process ID every time it's restarted.

Exit status:

0
Success.
-1
An error occurred; check the system log for more information.