slm
System launch and monitor: launch complex applications consisting of many processes that must be started in a specific order
Syntax:
slm [-avV] [-b seconds] [-D debug_mode] [-n subsystem_path]
[-p priority] [-P search_path] [-r recovery_mode]
[-R frequency/sec|min|hour][-s comp_name] [-t polling_interval]
[-T total_wait][-x comp_name] config_file
Runs on:
QNX OS
Options:
- -a
- Adopt running processes. Use this option to integrate SLM with an existing system
where some processes may already be running. If you place component entries
for all relevant system processes in the configuration file, SLM will adopt
these processes at startup as if it had launched them itself (and can thus
control the processes via the command interface or restart them
automatically if they terminate abnormally; see
Normal vs. abnormal termination
).The adoption mechanism works only if the component in the configuration file that corresponds to the running process uses exactly the same arguments (specified using the
args
element) as the running process and the arguments are in the same order. Otherwise, SLM does not recognize that the component corresponds to the running process. - -b seconds
- Specify a back-off period for all components. When a component terminates
abnormally, SLM waits this number of seconds before it attempts to restart
it. If the restart is not successful, it adds this number of seconds to the
wait time for each subsequent attempt. For example, if you start SLM with
-b 2, SLM waits two seconds before it attempts to restart the
process for the first time, waits four seconds before the second attempt,
waits six seconds before the third attempt, and so on. After SLM
successfully starts a component, it resets the back-off period to its
original value.
You can set a different back-off period for an individual component by specifying a
repair
element with abackoff=seconds
attribute for the component in the SLM configuration file.If not specified, the back-off period is 0 seconds.
- -D debug_mode
- Specify when to use the
<SLM:debug>
argument list (instead of the normal<SLM:args>
list). One of:cmd
(default),startup
, oralways
. Withcmd
, the debug list is used only when the module is started using slmctl with -d. Withstartup
, all components launched at startup (see the -s option) initially use the debug list, but then honor the -d option of subsequent restarts. Withalways
, the debug list is always used. - -n subsystem_path
- Set the access point (default is /dev/slm) for client applications to write control and query commands. For more information on the control and query commands, see slmctl.
- -p priority
- Set the priority of the SLM server thread (default is 30).
- -P search_path
- Set the search path for executables (default is
$PATH
). When launching a process, SLM looks in the search path to find the executable if the correspondingcommand
element doesn't contain a full path. - -r recovery_mode
- Set the recovery mode for components monitored by SLM. One of:
none
,stop
, orrestart
(the default). The action specified with the -r option is performed when a component terminates abnormally if that component doesn't override this setting in itsrepair
element. - -R frequency
-
Set how frequently SLM attempts to recover a component that has terminated abnormally. The frequency argument specifies the maximum number of recovery attempts as an integer and one of the following suffixes, separated by a forward slash:
sec
(seconds),min
(minutes), orhour
. For example,1/min
. Default is2/min
(2 times per minute). - -s comp_name
- Name a component or module to launch when slm starts. For
convenience, you can use the built-in pseudo-modules
all
andnone
(default isall
). - -t polling_interval
- Set the polling interval in milliseconds for the
wait
property. Default is100
. - -T total_wait
- Set the total wait time in milliseconds. Default is
50000
(50 seconds). - -v
- Specifies output verbosity (messages are written to slog2info). The -v option is cumulative; each additional v adds a level of verbosity, up to 7 levels. Default level is warning messages.
- -V
- Log output messages to the console. The -V option is cumulative; each additional V adds a level of verbosity. Default level is error messages.
- -x comp_name
- Name a component or module to terminate when slm terminates. For
convenience, you can use the built-in pseudo-modules
all
andnone
(default isall
).
Description:
The System Launch and Monitor (SLM) service automates the management of complex, multi-process applications that must be started in a specific order.
A configuration file controls SLM's behavior. It specifies the processes to run, their properties, and any interprocess dependencies. You can include other XML files using XML external entities, if needed.
SLM uses the information in the configuration file to internally construct a directed acyclic graph (DAG). SLM uses the DAG to determine the order in which it starts the processes.
Similarly, when a process fails, SLM determines any dependent processes to terminate and restart, when SLM starts the process again.
SLM can specify that a process is critical to the system functionality, meaning if the
process dies, the system crashes (for more information, see the description of the
launch attribute that is available for the configuration file
command
element).
When you start SLM, you must make sure that slogger2 is running and specify a configuration file, but all the other parameters are optional.
Client applications can control SLM using the slmctl utility or by directly writing commands to the /dev/slm interface. For more information including the control and query commands, see slmctl.
SLM configuration file
SLM uses an XML configuration file to determine the appropriate order for starting processes. The configuration file lists all the programs for SLM to manage, any dependencies between the programs, the commands for launching the processes, and other properties.
Configuration file structure
The root XML element of the configuration file is system
. All element names
start with SLM:
, so the root element (and the outline of the file)
looks like this:
<SLM:system>
-- component and module descriptions --
</SLM:system>
Components
A process managed by SLM is represented by a component. You must provide a component name (usually based on the process name) to use within the configuration file when specifying interprocess dependencies or membership in a module.
All component
elements are children of the root element and contain other
elements that describe the properties of individual components. The
component
element uses the following syntax:
<SLM:component name="component_name">
<SLM:ability> ability </SLM:ability>
<SLM:args> args </SLM:args>
<SLM:cd> directory </SLM:cd>
<SLM:command [launch=" launch_option[,launch_option]... "]>
executable_path</SLM:command>
<SLM:debug> command_args </SLM:debug>
<SLM:depend [state="session|stateless"]>
component_name </SLM:depend>
<SLM:envvar [clear="none|login|all"]>
environment_variables </SLM:envvar>
<SLM:groups> gid_1[,gid_2]... </SLM:groups>
<SLM:priority> priority_algorithm </SLM:priority>
<SLM:repair [backoff=seconds]> default|none|stop|restart </SLM:repair>
<SLM:rlimit> resource:soft_limit:hard_limit[,resource:soft_limit:hard_limit,...] </SLM:rlimit>
<SLM:runmask> component_runmask </SLM:runmask>
<SLM:stderr [iomode="w[+]|a[+]"]> filename </SLM:stderr>
<SLM:stdin [iomode="r[+]"]> filename </SLM:stdin>
<SLM:stdout [iomode="w|a"]> filename </SLM:stdout>
<SLM:stop
[stop="none|signal"] [child="false|true"] [timeout="timeout_time"]>
data </SLM:stop>
<SLM:tty> filename </SLM:tty>
<SLM:type> type_name </SLM:type>
<SLM:user> uid|:gid|uid:gid </SLM:user>
<SLM:waitfor [wait="none|delay|pathname|exits|blocks"]
[polltime="poll_time:timeout_time"]> data </SLM:waitfor>
</SLM:component>
Only the command
element is mandatory—all components must have a path
to the binary. The remaining elements are optional.
<SLM:ability>
<SLM:ability> ability </SLM:ability>
- ability
-
A procnto ability to give the process. This element is equivalent to the –A option of the on command, and the syntax of the ability specification is the same. Specify an
ability
element for each required ability.Using many ability specifications to launch processes is generally a bad idea; using types to configure abilities is simpler and safer.
<SLM:args>
<SLM:args> args </SLM:args>
- args
- The list of command-line arguments to launch the process. Use spaces to separate individual
arguments. If an argument includes embedded spaces, enclose it in single or
double quotes.
If you are specifying arguments for a
built-in
command, you separate arguments with spaces and sets of arguments with a semi-colon (;). For more information, see the<SLM:command>
description.If the component corresponds to a running process that SLM will adopt as if SLM had launched it (specified by the -a option) make sure that the arguments are identical to the ones the running process uses.
<SLM:cd>
<SLM:cd> directory </SLM:cd>
- directory
- The directory to switch into when launching the process; this directory becomes the process's working directory.
<SLM:command>
<SLM:command [launch=" launch_option[ launch_option]... "]>
executable_path</SLM:command>
launch=" launch_option[ launch_option]..."
- Specify one or more of the following options, separated by spaces:
builtin
— The name of a built-in SLM command. The following options are valid:chmod
path mode — Call chmod() and pass it the specified path and mode values.no_op
— Does nothing. Can be used to wait for a filepath or detect whether a process started outside of SLM is ready.link
existing new — Call link() and pass the specified arguments to it.mkdir
path mode — Call mkdir() and pass it the specified path and mode values.pathmgr_symlink
path symlink — Call pathmgr_symlink() and pass the specified arguments to it.pathmgr_unlink
path — Call pathmgr_unlink() and pass the specified arguments to it.remove
filename — Call remove() and pass the specified argument to it.symlink
path symlink — Call symlink() and pass the specified arguments to it.system
command — Call system() and pass the specified argument to it.unlink
path — Call unlink() and pass the specified argument to it.
<SLM:args>
element. Separate arguments with spaces and sets of arguments with a semi-colon (;). For example, the following entry creates two arguments to use withpathmgr_symlink
:
The sample configuration files provided below include a component that uses the<SLM:args>path1 symlink1; path2 symlink2</SLM:args>
builtin
option.critical
— Start a process with the POSIX_SPAWN_CRITICAL flag. This flag indicates that if this process dies, the system crashes.pathname
— When calling posix_spawn(), pass the full pathname inargv[0]
instead of truncating the value to a filename. This information is required by some utilities, such assshd
.session
— To start a process as a session leader, the launch attribute of the<SLM:command>
element must include the valuesession
and the<SLM:component>
element must have a<SLM:tty>
child element. The<SLM:tty>
element value specifies where to redirect the stdin, stdout, and stderr of the process. See the examples for more information on how to use SLM to start a shell.
- executable_path
- Specifies the path (absolute or relative) of the binary or script to execute.
<SLM:debug>
<SLM:debug> command_args </SLM:debug>
- command_args
-
An alternative list of command-line arguments to use when SLM launches
the process in debug mode. For example, including
-vvvvvv
in the list starts the associated process with increased verbosity when it's run in debug mode.
<SLM:depend>
<SLM:depend [state="session|stateless"]> component_name </SLM:depend>
state="session|stateless"
-
session
— if a component must be stopped or restarted, SLM first stops the component that depends on it (specified by component_name). Typically, you specifysession
when there is a client-server relationship between the components and the server maintains client information.stateless
— if a component must be stopped or restarted, the component that depends on it is unaffected. For example, if there is a client-server relationship between the components but the server doesn't maintain any client information, it's not necessary to restart clients if the server is restarted.
- component_name
- The name of the prerequisite component. A component can have zero or more dependencies.
You must define a separate element for each dependency.
SLM won't start a component until all the prerequisites are running and any waitfors are complete.
<SLM:envvar>
<SLM:envvar [clear="none|login|all"]> environment_variables </SLM:envvar>
clear="none|login|all"
- Specifies changes to environment variables. By default, the variables are inherited from SLM. The
clear
attribute specifies which current environment variables to clear or preserve:none
—Preserve all current environment variables (default).login
—Clear all environment variables except for any that theenvvar
element specifies and BAUD, DISPLAY, HZ, PHOTON, SYSNAME, TERM, TZ, HOME, LOGNAME, PATH, SHELL, TERM, and USERNAME.all
—Clear all current environment variables.
- environment_variables
- A list of environment variables to either merge with or override the current environment
variables. Use the format
VAR=
value to specify each variable.
<SLM:groups>
<SLM:groups> gid_1[,gid_2]... </SLM:groups>
- gid_1[,gid_2]...
- A list of group IDs that specifies the group access list for the component's process.
<SLM:priority>
<SLM:priority> priority_algorithm </SLM:priority>
- priority_algorithm
- An alphanumeric value that indicates the priority level and scheduling policy to assign the
process (e.g.,
10r
).f
—SCHED_FIFO (FIFO scheduling)r
—SCHED_RR (Round-robin scheduling)o
—SCHED_OTHER (other scheduling)
Scheduling polices
in the Programmer's Guide.
<SLM:repair>
<SLM:repair [backoff=seconds]> default|none|stop|restart </SLM:repair>
- Specifies the action to take if the component terminates abnormally:
backoff=seconds
—SLM waits the specified number of seconds and then attempts to restart the failed component. If the restart is not successful, it adds this number of seconds to the wait time for each subsequent attempt. For an example, see the -b command-line option description.default
—SLM performs the action specified by the -r command-line option.none
—SLM takes no recovery action.stop
—SLM stops any other components that depend on the component that failed.̵̵restart
—SLM restarts the failed component.
<SLM:rlimit>
<SLM:rlimit> resource:soft_limit:hard_limit[,resource:soft_limit:hard_limit,...] </SLM:rlimit>
- resource
- A system resource to limit the consumption of by the component's process.
For information on possible resources and the actions taken when limits are exceeded, see
prlimit()
in the C Library Reference. - soft_limit
- The soft limit for the resource.
- hard_limit
- The hard limit for the resource.
<SLM:runmask>
<SLM:runmask> component_runmask </SLM:runmask>
- component_runmask
- A value that is interpreted as a bitmask, which specifies on which processors a process can run.
It is a 32-bit integer and can be specified using any format that
strtol() recognizes.
For example, the decimal value 5 corresponds to the bitmask 00000101, which allows the thread to run on CPUs 0 and 2.
Only specify the runmask once.
A valid runmask is always inherited by children.
For more information about runmasks, see the Multicore Processing chapter of the QNX OS Programmer's Guide.
<SLM:stderr>
<SLM:stderr [iomode="w[+]|a[+]"]> stderr_filename </SLM:stderr>
iomode="w[+]|a[+]"
- The access mode: overwrite (
w
), read and overwrite (w+
; default), append (a
), or read and append (a+
). - stderr_filename
- Name of the file to which the standard error stream (
stderr
) is redirected.
<SLM:stdin>
<SLM:stdin [iomode="r[+]"]> stdin_filename </SLM:stdin>
iomode="r[+]"
- The access mode: read only (
r
) or read and write (r+
). - stdin_filename
- Name of the file to which standard input (
stdin
) is redirected.
<SLM:stdout>
<SLM:stdout [iomode="w|a"]> stdout_filename </SLM:stdout>
iomode="w|a"
- The access mode: overwrite (
w
) or append (a
). - stdout_filename
- Name of the file to which standard output (
stdout
) is redirected.
<SLM:stop>
<SLM:stop
[stop="none|signal"] [child="false|true"] [timeout="timeout_time"]>
stop_data </SLM:stop>
stop="none|signal"
- The
signal
setting (the default) causes SLM to send a signal number or name to the underlying process. Thenone
setting disables the signaling; in this case, SLM takes no action to stop a process. child="false|true"
- When set to
false
(the default), no child of the process is terminated. When set totrue
, SLM uses application groups to reliably terminate all children of the process before it terminates the process itself. timeout="timeout_time"
- The maximum amount of time to try to stop a process nicely, in milliseconds. If the process can't be stopped nicely, SIGKILL is sent to it. For no timeout, specify 0 (the default).
- stop_data
-
Contains the signal number or name to send the process to stop it. By default,
SIGTERM
is sent, but you can change this to any signal. Because a signal name does not need to begin with "SIG", all of the following example values are valid:15
TERM
SIGTERM
<SLM:tty>
<SLM:tty> tty_filename </SLM:tty>
- tty_filename
- Name of the file to which
stderr
,stdin
, andstdout
are redirected to when a process is opened as the session leader.
<SLM:type>
<SLM:type> type_name </SLM:type>
- type_name
- Name of the security type to launch the component as. The name is a label
that reflects the security policy being enforced. Generally, you should pick
a name based on what you're trying to launch. For information about security
policies, see the
Security Policies
chapter in the Security Developer's Guide.
<SLM:user>
<SLM:user> uid|:gid|uid:gid </SLM:user>
uid|:gid|uid:gid
- Assigns a user ID (UID), group ID (GID), or both to the underlying process. The values can be names to look up in /etc/passwd and /etc/group.
<SLM:waitfor>
<SLM:waitfor [wait="none|delay|pathname|exits|blocks"]
[ polltime=poll_time:timeout_time]> waitfor_data </SLM:waitfor>
wait="none|delay|pathname|exits|blocks"
- Once a component has been launched, SLM can wait for that component to set itself up
before starting any dependent components. Values:
none
(the default)—Causes SLM to start other dependent components immediately.delay
—SLM pauses for the specified number of milliseconds before it starts the dependent components.pathname
—SLM probes for the appearance of the specified pathname.exits
—SLM waits for the process to exit with the specified exit code. If the exit code is different from the expected one, SLM restarts the process.blocks
—SLM waits for a specified thread in the process to reach theRECV-blocked
state.
polltime=poll_time:timeout_time
-
Use with
wait="pathname"
orwait="exits"
to specify a polling interval and total wait time (both in milliseconds) that override the global values.For example,
polltime="100:20000"
results in polling every 100 milliseconds and timing out after 20 seconds. - waitfor_data
- Contains data for the specified
wait
condition:none
— No data required.delay
— A time in milliseconds (e.g., 5000 for a 5-second delay).pathname
— A path.exits
— The expected exit code (default is 0).blocks
— A thread ID.
Modules
You can group components into modules. The processes within a module could make up a subsystem or could be used to establish a set of system states, such as a base level of operation and various higher levels. Modules must be named so they can be internally referenced. Each module must be described in an element, as follows:
<SLM:module name="device_monitors">
-- module description --
</SLM:module>
To list the components within a module, use the member
element. There are no
attributes for member
elements; the element values refer to member
components by the internal names defined in their respective
component
elements. Modules cannot contain
depend
elements.
You can include multiple components in a module by using one member
element
with wildcards in the component names. For example, you can write:
<SLM:member>devb-*</SLM:member>
Components and modules may be specified in any order in the XML configuration file, but SLM raises an error if any circular dependencies are found.
Reusing SLM modules and components
You can define modules and components for reuse in one or more SLM files. This can be useful for breaking up your SLM modules and components to reuse in different SLM configuration files.
In the SLM configuration file where you are reusing modules and components from other SLM files, you need to define the filenames of where these reusable sections reside. The syntax to do so is as follows:
<!DOCTYPE SLM_system [
<!ENTITY inclusion_name SYSTEM 'filename'>
]>
where inclusion_name is a name that you use in your SLM configuration
file to identify the reusable entities and filename is a separate
file on your system where your reusable SLM modules and components are defined. At the point in your SLM configuration file where you want to include the reusable entities, include them by specifying the following:
&inclusion_name;
For example, in your system you have a file called
my_reusable_modules.xml where you have defined the SLM
modules and components that can be included in different SLM configuration files.
Then, in one of your SLM configuration files, you can define an entity named
reuseModules
and later include it:
<!DOCTYPE SLM_system [
<!ENTITY reuseModules SYSTEM 'my_reusable_modules.xml'>
]>
...
<SLM:system>
...
<!-- Include the contents of what's specified in 'my_reusable_modules.xml'
by specifying the entity 'reuseModules' -->
&reuseModules;
...
</SLM:system>
Sample configuration files
Suppose you want to automate the setup of your system's IP connectivity. This would require running io-sock, which creates an IP socket for network traffic, running if_up to wait for an interface to be ready for configuration, and then running ifconfig to bind an IP address to the socket. You can create a module that includes three components that correspond to io-sock and the two utilities. You can then describe the dependency of if_up on io-sock, and ifconfig on if_up, in the component entries. The XML file would then look like this:
<SLM:system>
<SLM:component name="io-sock">
<SLM:command>/sbin/io-sock</SLM:command>
<SLM:args>-m phy -m pci -m em</SLM:args>
<SLM:waitfor wait="pathname">/dev/socket</SLM:waitfor>
</SLM:component>
<SLM:component name="if_up">
<SLM:depend>io-sock</SLM:depend>
<SLM:command>/sbin/if_up</SLM:command>
<SLM:args>-p em0</SLM:args>
<SLM:waitfor wait="exits"></SLM:waitfor>
</SLM:component>
<SLM:component name="ifconfig">
<SLM:depend>if_up</SLM:depend>
<SLM:command>/sbin/ifconfig</SLM:command>
<SLM:args>em0 192.168.1.5 up</SLM:args>
<SLM:waitfor wait="exits"></SLM:waitfor>
</SLM:component>
<SLM:module name="net-setup">
<SLM:member>io-sock</SLM:member>
<SLM:member>if_up</SLM:member>
<SLM:member>ifconfig</SLM:member>
</SLM:module>
</SLM:system>
The following example shows how to use SLM to start a shell:
<SLM:component name="console">
<SLM:command launch="session">/bin/ksh</SLM:command>
<SLM:args>-l</SLM:args>
<SLM:tty>/dev/ser1</SLM:tty>
...
</SLM:component>
The following example shows how sshd
could be started by SLM (so that
sshd
could be monitored):
<SLM:component name="sshd">
<SLM:command launch="pathname">/system/xbin/sshd</SLM:command>
<SLM:args>-D</SLM:args>
...
</SLM:component>
The following example shows how to use the builtin
option to call a built-in
SLM function (system()) and pass arguments to it using the
args
element:
<SLM:component name="root-fs">
<SLM:depend>mount-fs</SLM:depend>
<SLM:command launch="builtin">system</SLM:command>
<SLM:args>setconf _CS_HOSTNAME __HOSTNAME__</SLM:args>
</SLM:component>
Normal vs. abnormal termination
- SLM terminates a component's process because:
- a stop action was created by executing
slmctl stop component
. - a dependency required SLM to stop the component's process.
- a stop action was created by executing
- The component is configured with a
waitfor=exits
and the component's process exits with the expected exit code.