Starting a process with the exec() and spawn() calls

Let's look at some of the other process-creation functions.

The next process-creation functions we should look at are the exec() and spawn() families. Before we go into the details, let's see what the differences are between these two groups of functions.

The exec() family transforms the current process into another one. What I mean by that is that when a process issues an exec() function call, that process ceases to run the current program and begins to run another program. The process ID doesn't change—that process changed into another program. What happened to all the threads in the process? We'll come back to that when we look at fork().

The spawn() family, on the other hand, doesn't do that. Calling a member of the spawn() family creates another process (with a new process ID) that corresponds to the program specified in the function's arguments.

Let's look at the different variants of the spawn() and exec() functions. In the table that follows, you'll see which ones are POSIX and which aren't. Of course, for maximum portability, you'll want to use only the POSIX functions. (The spawn() and spawnp() functions were in a POSIX draft, but never made it into the standard. The POSIX versions are posix_spawn() and posix_spawnp().)

Spawn	POSIX?	Exec	POSIX?
spawn()	No
spawnl()	No	execl()	Yes
spawnle()	No	execle()	Yes
spawnlp()	No	execlp()	Yes
spawnlpe()	No	execlpe()	No
spawnp()	No
spawnv()	No	execv()	Yes
spawnve()	No	execve()	Yes
spawnvp()	No	execvp()	Yes
spawnvpe()	No	execvpe()	No

While these variants might appear to be overwhelming, there is a pattern to their suffixes:

A suffix of:	Means:
`l` (lowercase "L")	The argument list is specified via a list of parameters given in the call itself, terminated by a NULL argument.
`e`	An environment is specified.
`p`	The PATH environment variable is used in case the full pathname to the program isn't specified.
`v`	The argument list is specified via a pointer to an argument vector.

The argument list is a list of command-line arguments passed to the program.

Also, note that in the C library, spawnlp(), spawnvp(), and spawnlpe() all call spawnvpe(), which in turn calls spawnp(). The functions spawnle(), spawnv(), and spawnl() all eventually call spawnve(), which then calls spawn(). Finally, spawnp() calls spawn(). So, the root of all spawning functionality is the spawn() call.

Figure 1. How the spawn*() functions are related.

Let's now take a look at the various spawn() and exec() variants in detail so that you can get a feel for the various suffixes used. Then, we'll see the spawn() call itself.

"l" suffix

For example, if I want to invoke the ls command with the arguments -t, -r, and -l (meaning "sort the output by time, in reverse order, and show me the long version of the output"), I could specify it as either:

/* To run ls and keep going: */
spawnl (P_WAIT, "/bin/ls", "/bin/ls", "-t", "-r", "-l", NULL);

/* To transform into ls: */
execl ("/bin/ls", "/bin/ls", "-t", "-r", "-l", NULL);

or, using the v suffix variant:

char *argv [] =
{
    "/bin/ls",
    "-t",
    "-r",
    "-l",
    NULL
};

/* To run ls and keep going: */
spawnv (P_WAIT, "/bin/ls", argv);

/* To transform into ls: */
execv ("/bin/ls", argv);

Why the choice? It's provided as a convenience. You may have a parser already built into your program, and it would be convenient to pass around arrays of strings. In that case, I'd recommend using the "v" suffix variants. Or, you may be coding up a call to a program where you know what the parameters are. In that case, why bother setting up an array of strings when you know exactly what the arguments are? Just pass them to the "l" suffix variant.

Note that we passed the actual pathname of the program (/bin/ls) and the name of the program again as the first argument. We passed the name twice to support programs that behave differently based on how they're invoked.

For example, the GNU compression and decompression utilities (gzip and gunzip) are actually links to the same executable. When the executable starts, it looks at argv[0] (passed to main()) and decides whether it should compress or decompress.

"e" suffix

The "e" suffix versions pass an environment to the program. An environment is just that—a kind of "context" for the program to operate in. For example, you may have a spelling checker that has a dictionary of words. Instead of specifying the dictionary's location every time on the command line, you could provide it in the environment:

$ export DICTIONARY=/home/rk/.dict

$ spellcheck document.1

The export command tells the shell to create a new environment variable (in this case, DICTIONARY), and assign it a value (/home/rk/.dict).

If you ever wanted to use a different dictionary, you'd have to alter the environment before running the program. This is easy from the shell:

$ export DICTIONARY=/home/rk/.altdict

$ spellcheck document.1

But how can you do this from your own programs? To use the "e" versions of spawn() and exec(), you specify an array of strings representing the environment:

char *env [] =
{
    "DICTIONARY=/home/rk/.altdict",
    NULL
};

// To start the spell-checker:
spawnle (P_WAIT, "/usr/bin/spellcheck", "/usr/bin/spellcheck",
         "document.1", NULL, env);

// To transform into the spell-checker:
execle ("/usr/bin/spellcheck", "/usr/bin/spellcheck",
        "document.1", NULL, env);

"p" suffix

The "p" suffix versions will search the directories in your PATH environment variable to find the executable. You've probably noticed that all the examples have a hard-coded location for the executable: /bin/ls and /usr/bin/spellcheck. What about other executables? Unless you want to first find out the exact path for that particular program, it would be best to have the user tell your program all the places to search for executables. The standard PATH environment variable does just that. Here's the one from a minimal system:

PATH=/proc/boot:/bin

This tells the shell that when I type a command, it should first look in the directory /proc/boot, and if it can't find the command there, it should look in the binaries directory /bin part. PATH is a colon-separated list of places to look for commands. You can add as many elements to the PATH as you want, but keep in mind that all pathname components will be searched (in order) for the executable.

If you don't know the path to the executable, then you can use the "p" variants. For example:

// Using an explicit path:
execl ("/bin/ls", "/bin/ls", "-l", "-t", "-r", NULL);

// Search your PATH for the executable:
execlp ("ls", "ls", "-l", "-t", "-r", NULL);

If execl() can't find ls in /bin, it returns an error. The execlp() function will search all the directories specified in the PATH for ls, and will return an error only if it can't find ls in any of those directories. This is also great for multiplatform support—your program doesn't have to be coded to know about the different CPU names, it just finds the executable.

What if you do something like this?

execlp ("/bin/ls", "ls", "-l", "-t", "-r", NULL);

Does it search the environment? No. You told execlp() to use an explicit pathname, which overrides the normal PATH searching rule. If it doesn't find ls in /bin that's it, no other attempts are made (this is identical to the way execl() works in this case).

Is it dangerous to mix an explicit path with a plain command name (e.g., the path argument /bin/ls, and the command name argument ls, instead of /bin/ls)? This is usually pretty safe, because:

a large number of programs ignore argv[0] anyway
those that do care usually call basename(), which strips off the directory portion of argv[0] and returns just the name.

The only compelling reason for specifying the full pathname for the first argument is that the program can display diagnostics including this first argument, which can instantly tell you where the program was invoked from. This may be important when the program can be found in multiple locations along the PATH.

The spawn() functions all have an extra parameter; in all the above examples, I've always specified P_WAIT. There are four flags you can pass to spawn() to change its behavior:

P_WAIT: The calling process (your program) is blocked until the newly created program has run to completion and exited.
P_NOWAIT: The calling program doesn't block while the newly created program runs. This allows you to start a program in the background, and continue running while the other program does its thing.
P_NOWAITO: Identical to P_NOWAIT, except that the SPAWN_NOZOMBIE flag is set, meaning that you don't have to worry about doing a waitpid() to clear the process's exit code.
P_OVERLAY: This flag turns the spawn() call into the corresponding exec() call! Your program transforms into the specified program, with no change in process ID.
It's generally clearer to use the exec() call if that's what you meant—it saves the maintainer of the software from having to look up P_OVERLAY in the C Library Reference!

Plain spawn()

As we mentioned above, all spawn() functions eventually call the plain spawn() function. Here's the prototype for the spawn() function:

#include <spawn.h>

pid_t
spawn (const char *path,
       int fd_count,
       const int fd_map [],
       const struct inheritance *inherit,
       char * const argv [],
       char * const envp []);

We can immediately dispense with the path, argv, and envp parameters—we've already seen those above as representing the location of the executable (the path member), the argument vector (argv), and the environment (envp).

The fd_count and fd_map parameters go together. If you specify zero for fd_count, then fd_map is ignored, and it means that all file descriptors (except those modified by fcntl()'s FD_CLOEXEC flag) will be inherited in the newly created process. If the fd_count is nonzero, then it indicates the number of file descriptors contained in fd_map; only the specified ones will be inherited.

The inherit parameter is a pointer to a structure that contains a set of flags, signal masks, and so on. For more details, you should consult the QNX Neutrino C Library Reference.