MsgSend*() functions

Updated: April 19, 2023

Normally, the MsgSend*() functions return EBADF or ESRCH when a connection is stale or closed on the server end (e.g., because the server dies). In many cases, the servers themselves return (e.g., they're restarted) and begin to offer the services properly almost immediately (in an HA scenario). Rather than merely terminate the message transmission with an error, in some cases it might be possible to perform recovery and continue with the message transmission.

The HA library functions that “cover” all the MsgSend*() varieties are designed to do exactly this. When a specific invocation of one of the MsgSend*() functions fails, a client-provided recovery function is called. This recovery function can attempt to reestablish the connection and return control to the HA library's MsgSend*() function. As long as the connection ID returned by the recovery function is the same as the old connection ID (which in many cases is easy to ensure via close/open/dup2() sequences), then the MsgSend*() functions can now attempt to retransmit the data.

If at any point the errors returned by MsgSend*() are anything other than EBADF or ESRCH, these errors are propagated back to the client. Note also that if the connection ID isn't an HA-aware connection ID, or if the client hasn't provided a recovery function or that function can't re-obtain the same connection ID, then the error is allowed to propagate back to the client to handle in whatever way it likes.

Clients can change their recovery functions. And since clients can also pass around “recovery/connection” information (which in turn is passed by the HA library to the recovery function), clients can construct complex recovery mechanisms that can be modified dynamically.

The client-side recovery library lets clients reconstruct the state required to continue the message transmission after reconnecting to either the same server or to a different server. The client is responsible for determining what constitutes the state that must be reconstructed and for performing this appropriately while the recovery function is called.