Appendix: Examples

In this appendix...

Simple restart

The most basic form of recovery is the simple death-restart mechanism. Since the QNX Neutrino realtime operating system provides virtually all non-kernel functionality via user-installable programs, and since it offers complete memory protection, not only for user applications, but also for OS components (device drivers, filesystems, etc.), a resource manager or other server program can be easily decoupled from the OS.

This decoupling lets you safely stop, start, and upgrade resource managers or other key programs dynamically, without compromising the availability of the rest of the system.

Consider the following code, where we restart the inetd daemon:

/* addinet.c */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/netmgr.h>
#include <fcntl.h>
#include <ha/ham.h>

int main(int argc, char *argv[])
{
      int status;
      char *inetdpath;
    ham_entity_t *ehdl;
    ham_condition_t *chdl;
    ham_action_t *ahdl;
    int inetdpid;
    if (argc > 1)
        inetdpath = strdup(argv[1]);    
    else 
        inetdpath = strdup("/usr/sbin/inetd -D");
    if (argc > 2)
        inetdpid = atoi(argv[2]);
    else
        inetdpid = -1;
    ham_connect(0);
    ehdl = ham_attach("inetd", ND_LOCAL_NODE, inetdpid, inetdpath, 0);
    if (ehdl != NULL)
    {
      chdl = ham_condition(ehdl,CONDDEATH, "death", HREARMAFTERRESTART);
    if (chdl != NULL) {
        ahdl = ham_action_restart(chdl, "restart", inetdpath, 
                              HREARMAFTERRESTART);
          if (ahdl == NULL)    
              printf("add action failed\n");
        }
        else
            printf("add condition failed\n");
    }
    else
        printf("add entity failed\n");
    ham_disconnect(0);
    exit(0);
}

The above example attaches the inetd process to a HAM, and then establishes a condition death and an action restart under it.


Note: If inetd isn't a self-attached entity, you need to specify the -D option to it, to force inetd to daemonize by calling procmgr_daemon() instead of by calling daemon(). The HAM can see death messages only from self-attached entities, processes that terminate abnormally, and tasks that are running in session 1, and the call to daemon() doesn't put the caller into that session.

If inetd is a self-attached entity, you don't need to specify the -D option because the HAM automatically switches to monitoring the new process that daemon() creates.


When inetd terminates, the HAM will automatically restart it by running the program specified by inetdpath. If inetd were already running on the system, we can pass the pid of the existing inetd into inetdpid and it will be attached to directly. Otherwise, the HAM will start and begin to monitor inetd.

You could use the same code to monitor, say, slogger (by specifying /usr/sbin/slogger), mqueue (by specifying /sbin/mqueue), etc. Just remember to specify the full path of the executable with all its required command-line parameters.

Compound restart

Recovery often involves more than restarting a single component. The death of one component might actually require restarting and resetting many other components. We might also have to do some initial cleanup before the dead component is restarted.

A HAM lets you specify a list of actions that will be performed when a given condition is triggered. For example, suppose the entity being monitored is fs-nfs2, and there's a set of directories that have been mounted and are currently in use. If fs-nfs2 were to die, the simple restart of that component won't remount the directories and make them available again! We'd have to restart fs-nfs2, and then follow that up with the explicit mounting of the appropriate directories.

Similarly, if io-pkt* were to die, it would take down the network drivers and TCP/IP stack (npm-tcpip.so) with it. So restarting io-pkt* involves also reinitializing the network driver. Also, any other components that use the network connection will also need to be reset (like inetd) so that they can reestablish their connections again.

Consider the following example of performing a compound restart mechanism.

/* addnfs.c */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/netmgr.h>
#include <fcntl.h>
#include <ha/ham.h>

int main(int argc, char *argv[])
{
      int status;
    ham_entity_t *ehdl;
    ham_condition_t *chdl;
    ham_action_t *ahdl;
    char *fsnfspath;
    int fsnfs2pid;
    if (argc > 1) 
        fsnfspath = strdup(argv[1]);
    else
        fsnfspath = strdup("/usr/sbin/fs-nfs2");
    if (argc > 2) 
        fsnfs2pid = atoi(argv[2]);
    else
        fsnfs2pid = -1;
    ham_connect(0);
    ehdl = ham_attach("Fs-nfs2", ND_LOCAL_NODE, fsnfs2pid, fsnfspath, 0);
    if (ehdl != NULL)
    {
      chdl = ham_condition(ehdl,CONDDEATH, "Death", HREARMAFTERRESTART);
    if (chdl != NULL) {
        ahdl = ham_action_restart(chdl, "Restart", fsnfspath, 
                              HREARMAFTERRESTART);
          if (ahdl == NULL)    
              printf("add action failed\n");
            /* else {
          ahdl = ham_action_waitfor(chdl, "Delay1", NULL, 2000, HREARMAFTERRESTART);
            if (ahdl == NULL)    
                printf("add action failed\n");
          ahdl = ham_action_execute(chdl, "MountPPCBE", 
                  "/bin/mount -t nfs 10.12.1.115:/ppcbe /ppcbe",
                   HREARMAFTERRESTART|((fsnfs2pid == -1) ? HACTIONDONOW:0));
            if (ahdl == NULL)    
                printf("add action failed\n");
          ahdl = ham_action_waitfor(chdl, "Delay2", NULL, 2000, HREARMAFTERRESTART);
            if (ahdl == NULL)    
                printf("add action failed\n");
          ahdl = ham_action_execute(chdl, "MountWeb", 
          "/bin/mount -t nfs 10.12.1.115:/web /web",
           HREARMAFTERRESTART|((fsnfs2pid == -1) ? HACTIONDONOW:0));
            if (ahdl == NULL)    
                printf("add action failed\n");
            } */
        }
        else
            printf("add condition failed\n");
    }
    else
        printf("add entity failed\n");
    ham_disconnect(0);
    exit(0);
}

This example attaches fs-nfs2 as an entity, and then attaches a series of execute and waitfor actions to the condition death. When fs-nfs2 dies, HAM will restart it and also remount the remote directories that need to be remounted in sequence. Note that you can specify delays as actions and also wait for specific names to appear in the namespace.

Death/condition notification

Fault notification is a crucial part of the availability of a system. Apart from performing recovery per se, we also need to keep track of failures in order to be able to analyze the system at a later point.

For fault notification, you can use standard notification mechanisms such as pulses or signals. Clients specify what pulse/signal with specific values they want for each notification, and a HAM delivers the notifications at the appropriate times.

/* regevent.c */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/neutrino.h>
#include <sys/iomsg.h>
#include <sys/netmgr.h>
#include <signal.h>
#include <ha/ham.h>

#define PCODEINETDDEATH      _PULSE_CODE_MINAVAIL+1
#define PCODEINETDDETACH     _PULSE_CODE_MINAVAIL+2
#define PCODENFSDELAYED      _PULSE_CODE_MINAVAIL+3
#define PCODEINETDRESTART1   _PULSE_CODE_MINAVAIL+4
#define PCODEINETDRESTART2   _PULSE_CODE_MINAVAIL+5

#define MYSIG SIGRTMIN+1

int fsnfs_value;

/* Signal handler to handle the death notify of fs-nfs2 */
void MySigHandler(int signo, siginfo_t *info, void *extra)
{
  printf("Received signal %d, with code = %d, value %d\n",
        signo, info->si_code, info->si_value.sival_int);
  if (info->si_value.sival_int == fsnfs_value)
    printf("FS-nfs2 died, this is the notify signal\n");
  return;
}

int main(int argc, char *argv[])
{
  int chid, coid, rcvid;
  struct _pulse pulse;
  pid_t pid;
  int status;
  int value;
  ham_entity_t *ehdl;
  ham_condition_t *chdl;
  ham_action_t *ahdl;
  struct sigaction sa;
  int scode;
  int svalue;

  /* we need a channel to receive the pulse notification on */
  chid = ChannelCreate( 0 ); 

  /* and we need a connection to that channel for the pulse to be
     delivered on */
  coid = ConnectAttach( 0, 0, chid, _NTO_SIDE_CHANNEL, 0 );

  /* fill in the event structure for a pulse */
  pid = getpid();
  value = 13;
  ham_connect(0);
  /* Assumes there is already an entity by the name "inetd" */
  chdl = ham_condition_handle(ND_LOCAL_NODE, "inetd","death",0);
  ahdl = ham_action_notify_pulse(chdl, "notifypulsedeath",ND_LOCAL_NODE, pid, chid, 
                              PCODEINETDDEATH, value, HREARMAFTERRESTART);
  ham_action_handle_free(ahdl);
  ham_condition_handle_free(chdl);
  ehdl = ham_entity_handle(ND_LOCAL_NODE, "inetd", 0);
  chdl = ham_condition(ehdl, CONDDETACH, "detach", HREARMAFTERRESTART);
  ahdl = ham_action_notify_pulse(chdl, "notifypulsedetach",ND_LOCAL_NODE, pid, chid, 
                              PCODEINETDDETACH, value, HREARMAFTERRESTART);
  ham_action_handle_free(ahdl);
  ham_condition_handle_free(chdl);
  ham_entity_handle_free(ehdl);
  fsnfs_value = 18; /* value we expect when fs-nfs dies */
  scode = 0;
  svalue = fsnfs_value; 
  sa.sa_sigaction = MySigHandler;
  sigemptyset(&sa.sa_mask);
  sa.sa_flags = SA_SIGINFO;
  sigaction(MYSIG, &sa, NULL);
  /*
   Assumes there is an entity by the name "Fs-nfs2".
   We use "Fs-nfs2" to symbolically represent the entity
   fs-nfs2. Any name can be used to represent the
   entity, but it's best to use a readable and meaningful name.
  */
  ehdl = ham_entity_handle(ND_LOCAL_NODE, "Fs-nfs2", 0);

  /* 
   Add a new condition, which will be an "independent" condition
   this means that notifications/actions inside this condition
   are not affected by "waitfor" delays in other action
   sequence threads
  */
  chdl = ham_condition(ehdl,CONDDEATH, "DeathSep",
                    HCONDINDEPENDENT|HREARMAFTERRESTART);
  ahdl = ham_action_notify_signal(chdl, "notifysignaldeath",ND_LOCAL_NODE, pid, MYSIG, 
                    scode, svalue, HREARMAFTERRESTART);
  ham_action_handle_free(ahdl);
  ham_condition_handle_free(chdl);
  ham_entity_handle_free(ehdl);
  chdl = ham_condition_handle(ND_LOCAL_NODE, "Fs-nfs2","Death",0);
  /*
   this actions is added to a condition that does not
   have a hcondnowait. Since we are unaware what the condition
   already contains, we might end up getting a delayed notification
   since the action sequence might have "arbitrary" delays, and
   "waits" in it.
  */
  ahdl = ham_action_notify_pulse(chdl, "delayednfsdeathpulse", ND_LOCAL_NODE, 
             pid, chid, PCODENFSDELAYED, value, HREARMAFTERRESTART);
  ham_action_handle_free(ahdl);
  ham_condition_handle_free(chdl);
  ehdl = ham_entity_handle(ND_LOCAL_NODE, "inetd", 0);
  chdl = ham_condition(ehdl, CONDRESTART, "restart", 
                             HREARMAFTERRESTART|HCONDINDEPENDENT);
  ahdl = ham_action_notify_pulse(chdl, "notifyrestart_imm", ND_LOCAL_NODE, 
                    pid, chid, PCODEINETDRESTART1, value, HREARMAFTERRESTART);
  ham_action_handle_free(ahdl);
  ahdl = ham_action_waitfor(chdl, "delay",NULL,6532, HREARMAFTERRESTART); 
  ham_action_handle_free(ahdl);
  ahdl = ham_action_notify_pulse(chdl, "notifyrestart_delayed", ND_LOCAL_NODE, 
                    pid, chid, PCODEINETDRESTART2, value, HREARMAFTERRESTART);
  ham_action_handle_free(ahdl);
  ham_condition_handle_free(chdl);
  ham_entity_handle_free(ehdl);
  while (1) {
    rcvid = MsgReceivePulse( chid, &pulse, sizeof( pulse ), NULL );
    if (rcvid < 0) {
      if (errno != EINTR) {
        exit(-1);
      }
    }
    else {
            switch (pulse.code) {
                case PCODEINETDDEATH:
                      printf("Inetd Death Pulse\n");
                    break;
                case PCODENFSDELAYED:
                      printf("Fs-nfs2 died: this is the possibly delayed pulse\n");
                      break;
                case PCODEINETDDETACH:
                      printf("Inetd detached, so quitting\n");
                     goto the_end;
                case PCODEINETDRESTART1:
                      printf("Inetd Restart Pulse: Immediate\n");
                    break;
                case PCODEINETDRESTART2:
                      printf("Inetd Restart Pulse: Delayed\n");
                    break;
              }
    }
  }
  /*
   At this point we are no longer waiting for the
   information about inetd, since we know that it
   has exited.
   We will still continue to obtain information about the
   death of fs-nfs2, since we did not remove those actions
   if we exit now, the next time those actions are executed
   they will fail (notifications fail if the receiver does
   exist anymore), and they will automatically get removed and
   cleaned up.
  */
the_end:
  ham_disconnect(0);
  exit(0);
}

In the above example a client registers for various different types of notifications relating to significant events concerning inetd and fs-nfs2. Notifications can be sent immediately or after a certain delay.

The notifications can also be received for each condition independently — for the entity's death (CONDDEATH), restart (CONDRESTART), and detaching (CONDDETACH).

The CONDRESTART is asserted by a HAM when an entity is successfully restarted.

Heartbeating clients (liveness detection)

Sometimes components become unavailable not because of the occurrence of a specific “bad” event, but because the components become unresponsive by getting stuck somewhere to the extent that the service they provide becomes effectively unavailable.

One example of this is when a process or a collection of processes/threads enters a state of deadlock or starvation, where none or only some of the involved processes can make any useful progress. Such situations are often difficult to pinpoint since they occur quite randomly.

You can have your clients assert “liveness” properties by actively sending heartbeats to a HAM. When a process deadlocks (or starves) and makes no progress, it will no longer heartbeat, and the HAM will automatically detect this condition and take corrective action.

The corrective action can range from simply terminating the offending application to restarting it and also delivering notifications about its state to other components that depend on the safe and correct functioning of this component. If necessary, a HAM can restart those other components as well.

We can demonstrate this condition by showing a simple process that has two threads that use mutual-exclusion locks incorrectly (by a design flaw), which causes them on occasion to enter a state of deadlock — each of the threads holds a resource that the other wants.

Essentially, each thread runs through a segment of code that involves the use of two mutexes.

Thread 1                             Thread 2

...                                  ...
while true                           while true
  do                                 do
    obtain lock a                      obtain lock b
    (compute section1)                   (compute section1)
    obtain lock b                        obtain lock a
        (compute section2)                 (compute section2)
    release lock b                       release lock a
    release lock a                     release lock b
done                                 done
...                                  ...

The code segments for each thread are shown below. The only difference between the two is the order in which the locks are obtained. The two threads deadlock upon execution, quite randomly; the exact moment of deadlock is related to the lengths of the “compute sections” of the two threads.

/* mutexdeadlock.c */

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <signal.h>
#include <pthread.h>
#include <process.h>
#include <sys/neutrino.h>
#include <sys/procfs.h>
#include <sys/procmgr.h>
#include <ha/ham.h>

pthread_mutex_t mutex_a = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutex_b = PTHREAD_MUTEX_INITIALIZER;

FILE *logfile;
pthread_t  threadID;
int doheartbeat=0;

#define COMPUTE_DELAY 100

void *func1(void *arg)
{
    int id;
    /* obtain the two locks in the order 
       a -> b 
       perform some computation and then 
       release the locks ...
       do this continuously 
    */

    id = pthread_self();
    while (1) {
        delay(85); /* delay to let the other one go */
        if (doheartbeat)
            ham_heartbeat();
        pthread_mutex_lock(&mutex_a);
        fprintf(logfile, "Thread 1: Obtained lock a\n");
        fprintf(logfile, "Thread 1: Waiting for lock b\n");
        pthread_mutex_lock(&mutex_b);
        fprintf(logfile, "Thread 1: Obtained lock b\n");
        fprintf(logfile, "Thread 1: Performing computation\n");
        delay(rand()%COMPUTE_DELAY+5); /* delay for computation */
        fprintf(logfile, "Thread 1: Unlocking lock b\n");
        pthread_mutex_unlock(&mutex_b);
        fprintf(logfile, "Thread 1: Unlocking lock a\n");
        pthread_mutex_unlock(&mutex_a);
    }
    return(NULL);
}

void *func2(void *arg)
{
    
    int id;
    /* obtain the two locks in the order 
       b -> a 
       perform some computation and then 
       release the locks ...
       do this continuously 
    */

    id = pthread_self();
    while (1) {
        delay(25);
        if (doheartbeat)
            ham_heartbeat();
        pthread_mutex_lock(&mutex_b);
        fprintf(logfile, "\tThread 2: Obtained lock b\n");
        fprintf(logfile, "\tThread 2: Waiting for lock a\n");
        pthread_mutex_lock(&mutex_a);
        fprintf(logfile, "\tThread 2: Obtained lock a\n");
        fprintf(logfile, "\tThread 2: Performing computation\n");
        delay(rand()%COMPUTE_DELAY+5); /* delay for computation */
        fprintf(logfile, "\tThread 2: Unlocking lock a\n");
        pthread_mutex_unlock(&mutex_a);
        fprintf(logfile, "\tThread 2: Unlocking lock b\n");
        pthread_mutex_unlock(&mutex_b);
    }
    return(NULL);
}

int main(int argc, char *argv[])
{
    pthread_attr_t attrib;
    struct sched_param param;
    ham_entity_t *ehdl;
    ham_condition_t *chdl;
    ham_action_t *ahdl;
    int i=0;
    char c;
    
    logfile = stderr;
    while ((c = getopt(argc, argv, "f:l" )) != -1 ) {
        switch(c) {
            case 'f': /* log file */
                logfile = fopen(optarg, "w");
                break;
            case 'l': /* do liveness heartbeating */
                if (access("/proc/ham",F_OK) == 0)
                    doheartbeat=1;
                break;
         }
    }

    setbuf(logfile, NULL);
    srand(time(NULL));
    fprintf(logfile, "Creating separate competing compute thread\n");    

    pthread_attr_init (&attrib);
    pthread_attr_setinheritsched (&attrib, PTHREAD_EXPLICIT_SCHED);
    pthread_attr_setschedpolicy (&attrib, SCHED_RR);
    param.sched_priority = getprio (0);
    pthread_attr_setschedparam (&attrib, &param);

    if (doheartbeat) {
        /* attach to ham */
        ehdl = ham_attach_self("mutex-deadlock",1000000000UL,5 ,5, 0);
        chdl = ham_condition(ehdl, CONDHBEATMISSEDHIGH, "heartbeat-missed-high", 0);
        ahdl = ham_action_execute(chdl, "terminate", 
                               "/proc/boot/mutex-deadlock-heartbeat.sh", 0);
    }
    /* create competitor thread */
    pthread_create (&threadID, &attrib, func1, NULL);
    pthread_detach(threadID);
    
    func2(NULL);

    exit(0);
}

Upon execution, what we see is:

  1. Starting two-threaded process.

    The threads will execute as described earlier, but will eventually deadlock. We'll wait for a reasonable amount of time (a few seconds) until they do end in deadlock. The threads write a simple execution log in /dev/shmem/mutex-deadlock.log.

  2. Waiting for them to deadlock.

    Here's the current state of the threads in process 73746:

         pid tid name               prio STATE       Blocked
       73746   1 oot/mutex-deadlock  10r MUTEX       73746-02 #-21474
       73746   2 oot/mutex-deadlock  10r MUTEX       73746-01 #-21474

    And here's the tail from the threads' log file:

    Thread 2: Obtained lock b
    Thread 2: Waiting for lock a
    Thread 2: Obtained lock a
    Thread 2: Performing computation
    Thread 2: Unlocking lock a
    Thread 2: Unlocking lock b
    Thread 2: Obtained lock b
    Thread 2: Waiting for lock a
    Thread 1: Obtained lock a
    Thread 1: Waiting for lock b
  3. Extracting core current process information:
    /tmp/mutex-deadlock.core:
     processor=PPC num_cpus=2
      cpu 1 cpu=602370 name=604e speed=299
      flags=0xc0000001 FPU MMU EAR
      cpu 2 cpu=602370 name=604e speed=299
       flags=0xc0000001 FPU MMU EAR
     cyc/sec=16666666 tod_adj=999522656000000000 nsec=5190771360840 inc=999960
     boot=999522656 epoch=1970 intr=-2147483648
     rate=600000024 scale=-16 load=16666
       MACHINE="mtx604-smp" HOSTNAME="localhost"
     hwflags=0x000004
     pretend_cpu=0 init_msr=36866
     pid=73746 parent=49169 child=0 pgrp=73746 sid=1
     flags=0x000300 umask=0 base_addr=0x48040000 init_stack=0x4803fa20
     ruid=0 euid=0 suid=0  rgid=0 egid=0 sgid=0
     ign=0000000006801000 queue=ff00000000000000 pending=0000000000000000
     fds=4 threads=2 timers=0 chans=1
     thread 1 REQUESTED
      ip=0xfe32f838 sp=0x4803f920 stkbase=0x47fbf000 stksize=528384
      state=MUTEX flags=0 last_cpu=1 timeout=00000000
      pri=10 realpri=10 policy=RR
     thread 2
      ip=0xfe32f838 sp=0x47fbef80 stkbase=0x47f9e000 stksize=135168
      state=MUTEX flags=4020000 last_cpu=2 timeout=00000000
      pri=10 realpri=10 policy=RR

The processes are deadlocked, with each process holding one lock and waiting for the other.

The process is made to heartbeat

Now consider the case where the client can be made to heartbeat so that a HAM will automatically detect when it's unresponsive and will terminate it.

Thread 1                             Thread 2

...                                  ...
while true                           while true
do                                   do
  obtain lock a                        obtain lock b
    (compute section1)                   (compute section1)
    obtain lock b                        obtain lock a
      send heartbeat                       send heartbeat
      (compute section2)                   (compute section2)
    release lock b                       release lock a
  release lock a                       release lock b
done                                 done
...                                  ...

Here the process is expected to send heartbeats to a HAM. By placing the heartbeat call within the inside loop, the deadlock condition is trapped. The HAM notices that the heartbeats have stopped and can then perform recovery.

Let's look at what happens now:

  1. Starting two-threaded process.

    The threads will execute as described earlier, but will eventually deadlock. We'll wait for a reasonable amount of time (a few seconds) until they do end in deadlock. The threads write a simple execution log in /dev/shmem/mutex-deadlock-heartbeat.log. The HAM detects that the threads have stopped heartbeating and terminates the process, after saving its state for postmortem analysis.

  2. Waiting for them to deadlock.

    Here's the current state of the threads in process 462866 and the state of mutex-deadlock when it missed heartbeats:

         pid tid name               prio STATE       Blocked
      462866   1 oot/mutex-deadlock  10r MUTEX       462866-03 #-2147
      462866   2 oot/mutex-deadlock  63r RECEIVE     1
      462866   3 oot/mutex-deadlock  10r MUTEX       462866-01 #-2147
    
    
        Entity state from HAM
    
    Path            : mutex-deadlock
    Entity Pid      : 462866
    Num conditions  : 1
    Condition type  : ATTACHEDSELF
    Stats:
    HeartBeat Period: 1000000000
    HB Low Mark     : 5
    HB High Mark    : 5
    Last Heartbeat  : 2001/09/03 14:40:41:406575120
    HeartBeat State : MISSEDHIGH
    Created         : 2001/09/03 14:40:40:391615720
    Num Restarts    : 0

    And here's the tail from the threads' log file:

    Thread 2: Obtained lock b
    Thread 2: Waiting for lock a
    Thread 2: Obtained lock a
    Thread 2: Performing computation
    Thread 2: Unlocking lock a
    Thread 2: Unlocking lock b
    Thread 2: Obtained lock b
    Thread 2: Waiting for lock a
    Thread 1: Obtained lock a
    Thread 1: Waiting for lock b
  3. Extracting core current process information:
    /tmp/mutex-deadlock.core:
     processor=PPC num_cpus=2
      cpu 1 cpu=602370 name=604e speed=299
       flags=0xc0000001 FPU MMU EAR
      cpu 2 cpu=602370 name=604e speed=299
       flags=0xc0000001 FPU MMU EAR
     cyc/sec=16666666 tod_adj=999522656000000000 nsec=5390696363520 inc=999960
     boot=999522656 epoch=1970 intr=-2147483648
     rate=600000024 scale=-16 load=16666
       MACHINE="mtx604-smp" HOSTNAME="localhost"
     hwflags=0x000004  
     pretend_cpu=0 init_msr=36866 
     pid=462866 parent=434193 child=0 pgrp=462866 sid=1
     flags=0x000300 umask=0 base_addr=0x48040000 init_stack=0x4803f9f0
     ruid=0 euid=0 suid=0  rgid=0 egid=0 sgid=0
     ign=0000000006801000 queue=ff00000000000000 pending=0000000000000000
     fds=5 threads=3 timers=1 chans=4
     thread 1 REQUESTED
      ip=0xfe32f838 sp=0x4803f8f0 stkbase=0x47fbf000 stksize=528384
      state=MUTEX flags=0 last_cpu=2 timeout=00000000
      pri=10 realpri=10 policy=RR
     thread 2
      ip=0xfe32f1a8 sp=0x47fbef50 stkbase=0x47f9e000 stksize=135168
      state=RECEIVE flags=4000000 last_cpu=2 timeout=00000000
      pri=63 realpri=63 policy=RR
      blocked_chid=1
     thread 3
      ip=0xfe32f838 sp=0x47f9df80 stkbase=0x47f7d000 stksize=135168
      state=MUTEX flags=4020000 last_cpu=1 timeout=00000000
      pri=10 realpri=10 policy=RR

Process starvation

We can demonstrate this condition by showing a simple process containing two threads that use mutual exclusion locks to manage a critical section. Thread 1 runs at a high priority, while Thread 2 runs at a lower priority. Essentially, each thread runs through a segment of code that looks like this:

Thread1                              Thread 2

...                                  ...
(Run at high priority)               (Run at low priority)
while true                           while true
do                                   do
    obtain lock a                        obtain lock a
        (compute section1)                   (compute section1)
    release lock a                       release lock a
done                                 done
...                                  ...

The code segments for each thread is shown below; the only difference being the priorities of the two threads. Upon execution, Thread 2 eventually starves.

/* mutexstarvation.c */

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <signal.h>
#include <pthread.h>
#include <process.h>
#include <sys/neutrino.h>
#include <sys/procfs.h>
#include <sys/procmgr.h>
#include <ha/ham.h>

pthread_mutex_t mutex_a = PTHREAD_MUTEX_INITIALIZER;

FILE *logfile;
int doheartbeat=0;

#define COMPUTE_DELAY 900

void *func1(void *arg)
{
    int id;

    id = pthread_self();
    while (1) {
        pthread_mutex_lock(&mutex_a);
        fprintf(logfile, "Thread 1: Locking lock a\n");
        delay(rand()%COMPUTE_DELAY+50); /* delay for computation */
        fprintf(logfile, "Thread 1: Unlocking lock a\n");
        pthread_mutex_unlock(&mutex_a);
    }
    return(NULL);
}

void *func2(void *arg)
{
    
    int id;

    id = pthread_self();
    while (1) {
        pthread_mutex_lock(&mutex_a);
        fprintf(logfile, "\tThread 2: Locking lock a\n");
        if (doheartbeat)
            ham_heartbeat();
        delay(rand()%COMPUTE_DELAY+50); /* delay for computation */
        fprintf(logfile, "\tThread 2: Unlocking lock a\n");
        pthread_mutex_unlock(&mutex_a);
    }
    return(NULL);
}

int main(int argc, char *argv[])
{
    pthread_attr_t attrib;
    struct sched_param param;
    ham_entity_t *ehdl;
    ham_condition_t *chdl;
    ham_action_t *ahdl;
    int i=0;
    char c;
    pthread_attr_t attrib2;
    struct sched_param param2;
    pthread_t  threadID;
    pthread_t  threadID2;
    
    logfile = stderr;
    while ((c = getopt(argc, argv, "f:l" )) != -1 ) {
        switch(c) {
            case 'f': /* log file */
                logfile = fopen(optarg, "w");
                break;
            case 'l': /* do liveness heartbeating */
                if (access("/proc/ham",F_OK) == 0)
                    doheartbeat=1;
                break;
         }
    }

    setbuf(logfile, NULL);
    srand(time(NULL));
    fprintf(logfile, "Creating separate competing compute thread\n");    

    if (doheartbeat) {
        /* attach to ham */
        ehdl = ham_attach_self("mutex-starvation",1000000000UL, 5, 5, 0);
        chdl = ham_condition(ehdl, CONDHBEATMISSEDHIGH, "heartbeat-missed-high", 0);
        ahdl = ham_action_execute(chdl, "terminate", 
                               "/proc/boot/mutex-starvation-heartbeat.sh", 0);
    }
    /* create competitor thread */
    pthread_attr_init (&attrib2);
    pthread_attr_setinheritsched (&attrib2, PTHREAD_EXPLICIT_SCHED);
    pthread_attr_setschedpolicy (&attrib2, SCHED_RR);
    param2.sched_priority = sched_get_priority_min(SCHED_RR);
    pthread_attr_setschedparam (&attrib2, &param2);

    pthread_create (&threadID2, &attrib2, func2, NULL);
    
    delay(3000); /* let the other thread go on for a while... */

    pthread_attr_init (&attrib);
    pthread_attr_setinheritsched (&attrib, PTHREAD_EXPLICIT_SCHED);
    pthread_attr_setschedpolicy (&attrib, SCHED_RR);
    param.sched_priority = sched_get_priority_max(SCHED_RR);
    pthread_attr_setschedparam (&attrib, &param);

    pthread_create (&threadID, &attrib, func1, NULL);
    
    pthread_join(threadID, NULL);
    pthread_join(threadID2, NULL);
    exit(0);
}

Upon execution, here's what we see:

  1. Starting two-threaded process.

    The threads will execute as described earlier, but eventually Thread 2 will starve. We'll wait for a reasonable amount of time (some seconds) until Thread 2 ends up starving. The threads write a simple execution log in /dev/shmem/mutex-starvation.log.

  2. Waiting for them to run for a while.

    Here's the current state of the threads in process 622610:

        
         pid tid name               prio STATE       Blocked
      622610   1 t/mutex-starvation  10r JOIN        3
      622610   2 t/mutex-starvation   1r MUTEX       622610-03 #-2147
      622610   3 t/mutex-starvation  63r NANOSLEEP

    And here's the tail from the threads' log file:

    Thread 1: Unlocking lock a
    Thread 1: Locking lock a
    Thread 1: Unlocking lock a
    Thread 1: Locking lock a
    Thread 1: Unlocking lock a
    Thread 1: Locking lock a
    Thread 1: Unlocking lock a
    Thread 1: Locking lock a
    Thread 1: Unlocking lock a
    Thread 1: Locking lock a
  3. Extracting core current process information:
    /tmp/mutex-starvation.core:
     processor=PPC num_cpus=2
      cpu 1 cpu=602370 name=604e speed=299
       flags=0xc0000001 FPU MMU EAR
      cpu 2 cpu=602370 name=604e speed=299
       flags=0xc0000001 FPU MMU EAR
     cyc/sec=16666666 tod_adj=999522656000000000 nsec=5561011550640 inc=999960   
     boot=999522656 epoch=1970 intr=-2147483648
     rate=600000024 scale=-16 load=16666
       MACHINE="mtx604-smp" HOSTNAME="localhost"
     hwflags=0x000004  
     pretend_cpu=0 init_msr=36866
     pid=622610 parent=598033 child=0 pgrp=622610 sid=1
     flags=0x000300 umask=0 base_addr=0x48040000 init_stack=0x4803fa10
     ruid=0 euid=0 suid=0  rgid=0 egid=0 sgid=0
     ign=0000000006801000 queue=ff00000000000000 pending=0000000000000000
     fds=4 threads=3 timers=0 chans=1
     thread 1 REQUESTED
      ip=0xfe32f8c8 sp=0x4803f8a0 stkbase=0x47fbf000 stksize=528384  
      state=JOIN flags=0 last_cpu=1 timeout=00000000
      pri=10 realpri=10 policy=RR
     thread 2
      ip=0xfe32f838 sp=0x47fbef80 stkbase=0x47f9e000 stksize=135168
      state=MUTEX flags=4000000 last_cpu=2 timeout=00000000
      pri=1 realpri=1 policy=RR
     thread 3
      ip=0xfe32f9a0 sp=0x47f9df20 stkbase=0x47f7d000 stksize=135168
      state=NANOSLEEP flags=4000000 last_cpu=2 timeout=0x1001000
      pri=63 realpri=63 policy=RR

Thread 2 is made to heartbeat

Now consider the case where Thread 2 is made to heartbeat. A HAM will automatically detect when the thread is unresponsive and can terminate it and/or perform recovery.

Thread 1                             Thread 2

...                                  ...
(Run at high priority)               (Run at low priority)
while true                           while true
do                                   do
    obtain lock a                        obtain lock a
                                             send heartbeat
        (compute section1)                   (compute section1)
    release lock a                       release lock a
done                                 done
...                                  ...

Here Thread 2 is expected to send heartbeats to a HAM. By placing the heartbeat call within the inside loop, the HAM detects when Thread 2 begins to starve.

The threads will execute as described earlier, but eventually Thread 2 will starve. We'll wait for a reasonable amount of time (some seconds) until it does. The threads write a simple execution log in /dev/shmem/mutex-starvation-heartbeat.log. The HAM detects that the thread has stopped heartbeating and terminates the process, after saving its state for postmortem analysis.

Let's look at what happens:

  1. Waiting for some time.

    Here's the current state of the threads in process 753682 and the state of mutex-starvation when it missed heartbeats:

         pid tid name               prio STATE       Blocked
      753682   1 t/mutex-starvation  10r JOIN        4
      753682   2 t/mutex-starvation  63r RECEIVE     1
      753682   3 t/mutex-starvation   1r MUTEX       753682-04 #-2147
      753682   4 t/mutex-starvation  63r NANOSLEEP
    
    
        Entity state from HAM
    
    Path            : mutex-starvation
    Entity Pid      : 753682
    Num conditions  : 1
    Condition type  : ATTACHEDSELF
    Stats:
    HeartBeat Period: 1000000000
    HB Low Mark     : 5
    HB High Mark    : 5
    Last Heartbeat  : 2001/09/03 14:44:37:796119160
    HeartBeat State : MISSEDHIGH
    Created         : 2001/09/03 14:44:34:780239800
    Num Restarts    : 0

    And here's the tail from the threads' log file:

    Thread 1: Unlocking lock a
    Thread 1: Locking lock a
    Thread 1: Unlocking lock a
    Thread 1: Locking lock a
    Thread 1: Unlocking lock a
    Thread 1: Locking lock a
    Thread 1: Unlocking lock a
    Thread 1: Locking lock a
    Thread 1: Unlocking lock a
    Thread 1: Locking lock a
  2. Extracting core current process information:
    /tmp/mutex-starvation.core:
     processor=PPC num_cpus=2
      cpu 1 cpu=602370 name=604e speed=299
       flags=0xc0000001 FPU MMU EAR
      cpu 2 cpu=602370 name=604e speed=299
       flags=0xc0000001 FPU MMU EAR
     cyc/sec=16666666 tod_adj=999522656000000000 nsec=5627098907040 inc=999960
     boot=999522656 epoch=1970 intr=-2147483648
     rate=600000024 scale=-16 load=16666
       MACHINE="mtx604-smp" HOSTNAME="localhost"
     hwflags=0x000004  
     pretend_cpu=0 init_msr=36866 
    pid=753682 parent=729105 child=0 pgrp=753682 sid=1
     flags=0x000300 umask=0 base_addr=0x48040000 init_stack=0x4803f9f0
     ruid=0 euid=0 suid=0  rgid=0 egid=0 sgid=0
     ign=0000000006801000 queue=ff00000000000000 pending=0000000000000000
     fds=5 threads=4 timers=1 chans=4
     thread 1 REQUESTED
      ip=0xfe32f8c8 sp=0x4803f880 stkbase=0x47fbf000 stksize=528384
      state=JOIN flags=0 last_cpu=2 timeout=00000000
      pri=10 realpri=10 policy=RR
     thread 2
      ip=0xfe32f1a8 sp=0x47fbef50 stkbase=0x47f9e000 stksize=135168
      state=RECEIVE flags=4000000 last_cpu=2 timeout=00000000
      pri=63 realpri=63 policy=RR
      blocked_chid=1
     thread 3
      ip=0xfe32f838 sp=0x47f9df80 stkbase=0x47f7d000 stksize=135168
      state=MUTEX flags=4000000 last_cpu=2 timeout=00000000
      pri=1 realpri=1 policy=RR
     thread 4
      ip=0xfe32f9a0 sp=0x47f7cf20 stkbase=0x47f5c000 stksize=135168
      state=NANOSLEEP flags=4000000 last_cpu=1 timeout=0x1001000
      pri=63 realpri=63 policy=RR