slm

System launch and monitor: launch complex applications consisting of many processes that must be started in a specific order

Syntax:

slm [-avV] [-D debug_mode] [-n subsystem_path] [-p priority]
    [-P search_path] [-r recovery_mode] [-R frequency/sec|min|hour]
    [-s comp_name] [-t polling_interval] [-T total_wait]
    [-x comp_name] config_file

Runs on:

QNX Neutrino

Options:

-a
Adopt running daemon processes. Use this option to integrate SLM with an existing system where some server processes may already be running. If you place component entries for all relevant system processes in the configuration file, SLM will adopt these processes at startup as if it had launched them itself (and can thus control the processes via the command interface or restart them automatically if they terminate abnormally; see Normal vs. abnormal termination).
-D debug_mode
Specify when to use the <SLM:debug> argument list (instead of the normal <SLM:args> list). One of: cmd (default), startup, or always. With cmd, the debug list is used only when the module is started using slmctl with -d. With startup, all components launched at startup (see the -s option) initially use the debug list, but then honor the -d option of subsequent restarts. With always, the debug list is always used.
-n subsystem_path
Set the access point (default is /dev/slm) for client applications to write control and query commands.
-p priority
Set the priority of the SLM server thread (default is 30).
-P search_path
Set the search path for executables (default is $PATH). When launching a process, SLM looks in the search path to find the executable if the corresponding command element doesn't contain a full path.
-r recovery_mode
Set the recovery mode for components monitored by SLM. One of: none, stop, or replace (the default). The action specified with the -r option is performed when a component terminates abnormally if that component doesn't override this setting in its repair element.
-R frequency
Set how frequently SLM attempts to recover a component that has terminated abnormally. The frequency argument specifies the maximum number of recovery attempts as an integer and one of the following suffixes, separated by a forward slash: sec (seconds), min (minutes), or hour. For example, 1/min. Default is 2/min (2 times per minute).
-s comp_name
Name a component or module to launch when SLM starts. For convenience, you can use the built-in pseudo-modules all and none (default is "all").
-t polling_interval
Set the polling interval in milliseconds for the wait property. Default is 100.
-T total_wait
Set the total wait time in milliseconds. Default is 50000.
-v
Specifies output verbosity (messages are written to slog2info). The -v option is cumulative; each additional v adds a level of verbosity, up to 7 levels. Default level is warning messages.
-V
Log output messages to the console. The -V option is cumulative; each additional V adds a level of verbosity. Default level is error messages.
-x comp_name
Name a component or module to terminate when SLM terminates. For convenience, you can use the built-in pseudo-modules all and none (default is all).

Description:

The System Launch and Monitor (SLM) service automates the management of complex, multi-process applications that must be started in a specific order.

One or more configuration files control SLM's behavior. Configuration files specify the processes to run, their properties, and any interprocess dependencies. SLM uses the information in the configuration file to internally construct a directed acyclic graph (DAG). SLM uses the DAG to determine the order in which it starts the processes.

Similarly, when a process fails, SLM determines any dependent processes to terminate and restart, when SLM starts the process again.

When you start SLM, you must specify a configuration file, but all the other parameters are optional.

Client applications can control SLM using the slmctl utility or by directly writing commands to the /dev/slm interface.

Control and query commands

Client applications can control SLM by writing commands to the /dev/slm interface.

Control commands can start, stop, restart, or replace a specified module or component. When you start a component, SLM will start any dependencies (that aren't already running) and wait for them as required. When you stop a component, SLM first stops any dependents on the component. Restarting is a sequential composition of stop and start operations and is typically applied to set a specific high-level module state. Replacing will stop and relaunch a component and then restart any currently active components that had a dependency on that component. This is typically applied to update a low-level component process.

Query commands can list the dependencies (depend), running components (active), or components that terminated abnormally (dead). Command lines consist of the command, any options, and a module or component name, if appropriate.
Note:

Only the system superuser (UID 0) can execute the control and query commands (except active and depend).

The following table summarizes the control and query commands:
Command Options Description
active -v List the active (running) components.
dead -v -w List the dead (faulted) components.
depend -s -u -v List dependencies or dependents for the specified component.
start -d -v -x Start the specified component.
stop -s -v -x Stop the specified component.
restart -d -s -v -x Stop, then start the specified component.
replace -d -s -v -x Update the specified component.
The following table describes all the options:
Option Description
-d Debug mode: start components with their debug argument list.
-p pid Only display information for the process with the specified ID. Used only with active.
-s Stateless: ignore any stateless dependencies when stopping components.
-u Used by: list components that depend on the specified component.
-v Verbose: give details of each action performed when responding to a command.
-w Wait: block until a process terminates abnormally.
-x Explain: list the required actions but don't perform them.

Command example

Following execution of a command written to /dev/slm, the results are available to be read from the same file descriptor. Here's a simple example (with no error handling):
int    slm;
char   text[128];

slm = open("/dev/slm", O_RDWR);
write(slm, "start -v all", 12);
while (read(slm, text, sizeof(text)) > 0)
    printf("%s\n", text);
close(slm);

Issuing commands via the slmctl utility

Besides writing control/query commands programmatically, you can use the slmctl utility to send SLM commands (via the command line or typed interactively). It uses the following syntax:

slmctl [-n subsystem_path] "command [component]"...

where -n subsystem_path sets the access point that client applications write control and query commands to. Should match the path specified by the slm option -n. Default is /dev/slm.

The utility displays the results of each action in a line describing the operation on the specified component or module as follows:
Utility output Meaning
START component pid|error Component was started.
start component Component already active (no errors).
WAIT component error Waiting for component.
wait component Component already active (no errors).
STOP component error Component stopped.
stop component Component already inactive.
BEGIN module Encapsulation of multiple components.
END module error Reported only via slog2info, not slmctl.

SLM configuration file

SLM uses an XML configuration file to determine the appropriate order for starting processes. The configuration file lists all the programs for SLM to manage, any dependencies between the programs, the commands for launching the processes, and other properties.

Configuration file structure

The root XML element of the configuration file is system. All element names start with SLM:, so the root element (and the outline of the file) looks like this:

<SLM:system>
    -- component and module descriptions --
</SLM:system>

Components

A process managed by SLM is represented by a component. You must provide a component name (usually based on the process name) to use within the configuration file when specifying interprocess dependencies or membership in a module.

All component elements are children of the root element and contain other elements that describe the properties of individual components. The component element uses the following syntax:
<SLM:component name="qconn">
    -- component properties --
</SLM:component>

The following table describes the component elements:

Tag Attribute Value(s) Description
ability   ability1[,ability2, ... abilityn]

List of the process's procnto abilities. This tag is equivalent to the –A option of the on command, and the syntax of the ability specification is the same.

Using many ability specifications to launch processes is generally a bad idea; using types to configure abilities is simpler and safer.

args   command_args The list of command-line arguments to provide the binary executable.
cd   dir_name

The directory to switch into when launching the process; this directory becomes the process's working directory ($CWD).

command launch bg Controls process creation.
  nohup Controls how signals are handled (no hangup).
  pathname

The full path of the binary executable (e.g., /usr/bin/qconn).

When calling posix_spawn(), pass the full pathname in argv[0] instead of truncating the value to a filename. This information is required by some utilities, such as sshd.

  builtin
The name of a built-in SLM command. Options are:
  • mkdir—creates one or more directories. List the directories to create in the args element.
  • no_op—does nothing, but allows waiting for a filepath. This mechanism can be used to detect whether a process started outside of SLM is ready.
  • pathmgr_symlink—creates one or more fast kernel symlinks. List the symlinks to create in the args element.
  session

In order to start a process as a session leader, the launch attribute of the <SLM:command> element must include the value session. The <SLM:command> element must also have a <SLM:tty> child element. Its value specifies where to redirect the stdin, stdout, and stderr of the process. See the examples for more information on how to use SLM to start a shell.

debug   command_args

An alternative list of command-line arguments to provide the binary executable when SLM is run in debug mode. This list might contain options (such as -v to increase verbosity).

depend state [ session | stateless ]

A component may need other services to be active before the component can run. Any prerequisites must be expressed as dependencies.

There are two forms of dependency: session (stateful) and stateless. With session dependency (the default), a client/server relationship is assumed; the server stores state information on all its clients. In this model, if the server must be stopped or restarted, then all its clients must be stopped.

With stateless dependency, the server doesn't maintain any client information, so it's not necessary to restart clients if the server is restarted.

  component_name Name of the prerequisite component. A component can have zero, one, or many dependencies.
Note: You must define a separate tag for each dependency.

SLM won't start a component until all the prerequisites are running.

envvar clear [ none | login | all ]
Specifies changes to environment variables. By default, the variables are inherited from the SLM server. The clear attribute specifies which current environment variables to clear or preserve:
  • none—All current environment variables are preserved
  • login—Only the initial login environment variables are preserved
  • all—All current environment variables are cleared
  environment_variables A list of environment variables to either merge with or override the current environment variables. Use the format VAR=value to specify each variable.
partition content partition_name Specifies the adaptive scheduler partition to put the process in. For detailed information, see the Adaptive Partitioning User's Guide.
priority   priority_algorithm

An alphanumeric value that indicates the priority level and scheduling policy to assign the process (e.g., 10r).

  • fSCHED_FIFO (FIFO scheduling)
  • rSCHED_RR (Round-robin scheduling)
  • oSCHED_OTHER (other scheduling)

For descriptions of the scheduling policies, see Scheduling polices in the Programmer's Guide.

repair   [ default | none | stop | replace ]
Specifies the action to take if the component terminates abnormally:
  • default—tells SLM to perform the action specified by the -r command-line option
  • none—SLM takes no recovery action
  • stop—SLM stops any other components that depend on the component that failed
  • replace—SLM restarts the failed component
runmask content component_runmask

A value that is interpreted as a bitmask, which specifies on which processors a process can run. It is a 32-bit integer and can be specified using any format that strtol() recognizes.

For example, the decimal value 5 corresponds to the bitmask 00000101, which allows the thread to run on CPUs 0 and 2.

Only specify the runmask once.

A valid runmask is always inherited by children.

For more information about runmasks, see the Multicore Processing chapter of the System Architecture guide, and the Multicore Processing chapter of the QNX Neutrino Programmer's Guide.

stderr iomode [ w[+] | a[+] ] The access mode: overwrite (w), read and overwrite (w+), append (a), or read and append (a+).
  filename Name of the file for redirecting standard error (stderr).
stdin iomode [ r[+] ] The access mode: read only (r) or read and write (r+).
  filename Name of the file for redirecting standard input (stdin).
stdout iomode [ w | a ] The access mode: overwrite (w) or append (a).
  filename Name of the file for redirecting standard output (stdout).
stop stop [ none | signal ]

The signal setting (the default) causes SLM to send a signal to the underlying process. The none setting disables the signaling; in this case, SLM takes no action to stop a process.

child [ self | before | after ]

For any process launched by SLM, its child processes are out of SLM's direct control. You can specify the shutdown of these child processes as relative to when the SLM-controlled parent process is terminated. The settings are: self (the default), before, and after.

timeout timeout_time The maximum amount of time to try to stop a process nicely, in milliseconds. If the process can't be stopped nicely, SIGKILL is sent to it. For no timeout, specify 0 (the default).
  data
Contains the signal number to send the process to stop it. By default, SIGTERM is sent, but you can change this to any signal. If repeated failed attempts to stop the process fail, SIGKILL is sent.
Note: This tag value isn't needed when the stop attribute is set to none.
tty   filename Name of the file to which stderr, stdin, and stdout are redirected to when a process is opened as the session leader.
type   typename Name of the security type to launch the component as. The name is a label that reflects the security policy being enforced. Generally, you should pick a name based on what you're trying to launch. For information about security policies, see the Security Policy and Mandatory Access Control chapter in the Security Developer's Guide.
user   uid:gid The user ID and group ID to assign to the underlying process. The two strings are separated by a colon (e.g., jgarvey:techies).
waitfor wait [ none | delay | pathname | exits | blocks ]
Once a component has been launched, SLM can wait for that component to set itself up before starting any dependent components. Values:
  • none (the default)—Causes SLM to start other components immediately.
  • delay—SLM pauses for the specified number of milliseconds.
  • pathname—SLM probes for the appearance of the specified pathname.
  • exits—SLM waits for the process to exit with the specified exit code. If the exit code is different from the expected one, SLM restarts the process.
  • blocks—SLM waits for a specified thread in the process to reach the RECV-blocked state.
polltime poll_time:timeout_time

Use with wait="pathname" or wait="exits" to specify a polling interval and total wait time (both in milliseconds) that override the global values.

For example, polltime="100:20000" results in polling every 100 milliseconds and timing out after 20 seconds.

  data
Contains data for the specified wait condition:
  • none — No data required.
  • delay — A time in milliseconds (e.g., 5000 for a 5-second delay).
  • pathname — A path.
  • exits — The expected exit code (default is 0).
  • blocks — A thread ID.
Note:

Only the command element is mandatory—all components must have a path to the binary. The remaining elements are optional.

Modules

You can group components into modules. The processes within a module could make up a subsystem or could be used to establish a set of system states, such as a base level of operation and various higher levels. Modules must be named so they can be internally referenced. Each module must be described in an element, as follows:

<SLM:module name="device_monitors">
    -- module description --
</SLM:module>

To list the components within a module, use member. There are no attributes for member elements; the element values refer to member components by the internal names defined in their respective component elements. Modules cannot contain depend elements.

Note:

You can include multiple components in a module by using one member element with wildcards in the component names. For example, you can write:

<SLM:member>devb-*</SLM:member>

Components and modules may be specified in any order in the XML configuration file, but SLM raises an error if any circular dependencies are found.

Reusing SLM modules and components

You can define modules and components for reuse in one or more SLM files. This can be useful for breaking up your SLM modules and components to reuse in different SLM configuration files.

In the SLM configuration file where you are reusing modules and components from other SLM files, you need to define the filenames of where these reusable sections reside. The syntax to do so is as follows:

<!DOCTYPE SLM_system [
    <!ENTITY inclusion_name SYSTEM 'filename'>
]>
where inclusion_name is a name that you use in your SLM configuration file to identify the reusable entities and filename is a separate file on your system where your reusable SLM modules and components are defined.

At the point in your SLM configuration file where you want to include the reusable entities, include them by specifying the following:

&inclusion_name;
    

For example, in your system you have a file called my_reusable_modules.xml where you have defined the SLM modules and components that can be included in different SLM configuration files. Then, in one of your SLM configuration files, you can define an entity named reuseModules and later include it:

<!DOCTYPE SLM_system [
    <!ENTITY reuseModules SYSTEM 'my_reusable_modules.xml'>
]>
...
<SLM:system>
    ...
    <!-- Include the contents of what's specified in 'my_reusable_modules.xml'
            by specifying the entity 'reuseModules' -->
    &reuseModules;
    ...
</SLM:system>

Sample configuration files

Suppose you want to automate the setup of your system's IP connectivity. This would require running io-pkt, which creates an IP socket for network traffic, and then running ifconfig to bind an IP address to the socket. You can create a module to include two components that correspond to the two services, and then describe the dependency of ifconfig on io-pkt in the component entries. The XML file would then look like this:

<SLM:system>
    <SLM:component name="io-pkt">
        <SLM:command>/sbin/io-pkt-v6-hc</SLM:command>
        <SLM:args>-ptcpip stacksize=8192</SLM:args>
        <SLM:waitfor wait="pathname">/dev/socket</SLM:waitfor>
    </SLM:component>
    <SLM:component name="ifconfig">
        <SLM:depend>io-pkt</SLM:depend>
        <SLM:command>/sbin/ifconfig</SLM:command>
        <SLM:args>en0 192.168.1.5 up</SLM:args>
        <SLM:waitfor wait="exits"></SLM:waitfor>
    </SLM:component>
    <SLM:module name="net-setup">
        <SLM:member>io-pkt</SLM:member>
        <SLM:member>ifconfig</SLM:member>
    </SLM:module>
</SLM:system>

The following example shows how to use SLM to start a shell:

<SLM:component name="console"> 
    <SLM:command launch="session">/bin/ksh</SLM:command> 
    <SLM:args>-l</SLM:args> 
    <SLM:tty>/dev/ser1</SLM:tty> 
    ... 
</SLM:component> 

The following example shows how sshd could be started by SLM (so that sshd could be monitored):

<SLM:component name="sshd">
    <SLM:command launch="pathname">/system/xbin/sshd</SLM:command> 
    <SLM:args>-D</SLM:args>
    ... 
</SLM:component> 
    

Normal vs. abnormal termination

SLM considers a process to have terminated normally in the following situations only:
  • SLM terminates a component's process because:
    • a stop action was created by executing slmctl stop component.
    • a dependency required SLM to stop the component's process.
  • The component is configured with a waitfor=exits and the component's process exits with the expected exit code.
All other process terminations are considered abnormal and cause SLM to restart the component process. If a process has died too frequently in a certain time period, SLM stops trying to restart the process even though the termination is abnormal.