Recovery example

Updated: May 06, 2022

The following example demonstrates a simple recovery scenario, where a client opens a file across a network filesystem.

If the NFS server were to die, the HA Manager would restart it and remount the filesystem. Normally, any clients that previously had files open across the old connection would now have a stale connection handle. But if the client uses the ha_attach functions, it can recover from the lost connection.

The ha_attach functions allow the client to provide a custom recovery function that's automatically invoked by the cover-function library. This recovery function could simply reopen the connection (thereby getting a connection to the new server), or it could perform a more complex recovery (e.g., adjusting the file position offsets and reconstructing its state with respect to the connection). This mechanism thus lets you develop arbitrarily complex recovery scenarios, while the cover-function library takes care of the details (detecting a failure, invoking recovery functions, and retransmitting state information).

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <ha/cover.h>

#define TESTFILE "/net/machine99/home/test/testfile"

typedef struct handle {
  int nr;
  int curr_offset;
} Handle ;

int recover_conn(int oldfd, void *hdl)
{
  int newfd;
  Handle *thdl;
  thdl = (Handle *)hdl;
  newfd = ha_reopen(oldfd, TESTFILE, O_RDONLY);
  if (newfd >= 0) {
    // adjust file offset to previously known point
    lseek(newfd, thdl->curr_offset, SEEK_SET); 
    // increment our count of successful recoveries
    (thdl->nr)++;
  }
  return(newfd);
}

int main(int argc, char *argv[])
{
  int status;
  int fd;
  int fd2;
  Handle hdl;
  char buf[80];

  hdl.nr = 0;
  hdl.curr_offset = 0;
  // open a connection
  // recovery will be using "recovery_conn", and "hdl" will
  // be passed to it as a parameter
  fd = ha_open(TESTFILE, O_RDONLY, recover_conn, (void *)&hdl, 0);
  if (fd < 0) {
    printf("could not open file\n");
    exit(-1);
  }
  status = read(fd,buf,15);
  if (status < 0) {
    printf("error: %s\n",strerror(errno));
    exit(-1);
  }
  else {
    hdl.curr_offset += status;
  }
  fd2 = ha_dup(fd);
  // fs-nfs3 fails, and is restarted, the network mounts
  // are reinstated at this point.   
  // Our previous "fd" to the file is stale
  sleep(18);
  // reading from dup-ped fd
  // will fail, and will recover via recover_conn
  status = read(fd,buf,15);
  if (status < 0) {
    printf("error: %s\n",strerror(errno));
    exit(-1);
  }
  else {
    hdl.curr_offset += status;
  }
  printf("total recoveries, %d\n",hdl.nr);
  ha_close(fd);
  ha_close(fd2);
  exit(0);
}

Since the cover-function library takes over the lowest MsgSend*() calls, most standard library functions (read(), write(), printf(), scanf(), etc.) are also automatically HA-aware. The library also provides an ha-dup() function, which is semantically equivalent to the standard dup() function in the context of HA-aware connections. You can replace recovery functions during the lifetime of a connection, which greatly simplifies the task of developing highly customized recovery mechanisms.