Using Qnet for Transparent Distributed Processing

This chapter includes:

What is Qnet?
When should you use Qnet?
Conventions for naming nodes
Software components for Qnet networking
Starting Qnet
Checking out the neighborhood
Troubleshooting

What is Qnet?

A Neutrino native network is a group of interconnected computers running only Neutrino. In this network, a program can transparently access any resource—whether it's a file, a device, or a process—on any other node (computer) in your local subnetwork. You can even run programs on other nodes.

The Qnet protocol provides transparent networking across a Neutrino network; Qnet implements a local area network that's optimized to provide a fast, seamless interface between Neutrino computers, whatever the type of hardware.

For QNX 4, the protocol used for native networking is called FLEET; it isn't compatible with Neutrino's Qnet.

In essence, the Qnet protocol extends interprocess communication (IPC) transparently over a network of microkernels — taking advantage of Neutrino's message-passing paradigm to implement native networking.

When you run Qnet, entries for all the nodes in your local subnetwork that are running Qnet appear in the /net namespace. (Under QNX 4, you use a double slash followed by a node number to refer to another node.)

For more details, see the Native Networking (Qnet) chapter of the System Architecture guide. For information about programming with Qnet, see the Transparent Distributed Networking via Qnet chapter of the Programmer's Guide.

When should you use Qnet?

When should you use Qnet, and when TCP/IP or some other protocol? It all depends on what machines you need to connect.

Qnet is intended for a network of trusted machines that are all running Neutrino and that all use the same endian-ness. It lets these machines share all their resources with little overhead. Using Qnet, you can use the Neutrino utilities (cp, mv, and so on) to manipulate files anywhere on the Qnet network as if they were on your machine.

Because it's meant for a group of trusted machines (such as you'd find in an embedded system), Qnet doesn't do any authentication of requests. Files are protected by the normal permissions that apply to users and groups (see “File ownership and permissions” in Working with Files), although you can use Qnet's maproot and mapany options to control — in a limited way — what others users can do on your machine. Qnet isn't connectionless like NFS; network errors are reported back to the client process.

TCP/IP is intended for more loosely connected machines that can run different operating systems. TCP/IP does authentication to control access to a machine; it's useful for connecting machines that you don't necessarily trust. It's used as the base for specialized protocols such as FTP and Telnet, and can provide high throughput for data streaming. For more information, see the TCP/IP Networking chapter in this guide.

NFS was designed for filesystem operations between all hosts, all endians, and is widely supported. It's a connectionless protocol; the server can shut down and be restarted, and the client resumes automatically. It also uses authentication and controls directory access. For more information, see “NFS filesystem” in Working with Filesystems.

Conventions for naming nodes

In order to resolve node names, the Qnet protocol follows certain conventions:

node name

A character string that identifies the node you're talking to. This name must be unique in the domain and can't contain slashes or periods.

The default node name is the value of the _CS_HOSTNAME configuration string. If your hostname is localhost (the default when you first boot), Qnet uses a hostname based on your NIC hardware's MAC address, so that nodes can still communicate.

node domain

A character string that lsm-qnet.so adds to the end of the node name. Together, the node name and node domain must form a string that's unique for all nodes that are talking to each other. The default is the value of the _CS_DOMAIN configuration string.

fully qualified node name (FQNN)

The string formed by concatenating the node name and node domain. For example, if the node name is karl and the node domain name is qnx.com, the resulting FQNN is karl.qnx.com.

network directory

A directory in the pathname space implemented by lsm-qnet.so. Each network directory — there can be more than one on a node — has an associated node domain. The default is /net, as used in the examples in this chapter.

The entries in /net for nodes in the same domain as your machine don't include the domain name. For example, if your machine is in the qnx.com domain, the entry for karl is /net/karl; if you're in a different domain, the entry is /net/karl.qnx.com.

name resolution

The process by which lsm-qnet.so converts an FQNN to a list of destination addresses that the transport layer knows how to get to.

name resolver

A piece of code that implements one method of converting an FQNN to a list of destination addresses. Each network directory has a list of name resolvers that are applied in turn to attempt to resolve the FQNN. The default is the Node Discovery Protocol (NDP).

Software components for Qnet networking

You need the following software entities (along with the hardware) for Qnet networking:

Qnet framework

Components of Qnet.

io-pkt*: Manager to provide support for dynamically loaded networking modules.
devn-*, devnp-*: Managers that form an interface with the hardware.
lsm-qnet.so: Native network manager to implement Qnet protocols.

Starting Qnet

You can start Qnet by:

creating a useqnet file, then rebooting
or:
explicitly starting the network manager, protocols, and drivers

as described below.

If you run Qnet, anyone else on your network who's running Qnet can examine your files and processes, if the permissions on them allow it. For more information, see:

“File ownership and permissions” in the Working with Files chapter in this guide
“Qnet” in the Securing Your System chapter in this guide
“Autodiscovery vs static” in the Transparent Distributed Processing Using Qnet chapter of the Neutrino Programmer's Guide

Creating `useqnet`

To start Qnet automatically when you boot your system, log in as root and create an empty useqnet file, like this:

touch /etc/system/config/useqnet

If this file exists, your /etc/system/sysinit script starts Qnet when you boot your machine. If you need to specify any options to Qnet, edit sysinit and change these lines:

# Enable Qnet if the user has enabled it.
if test -r /etc/system/config/useqnet -a -d /dev/io-net; then
        mount -Tio-pkt lsm-qnet.so
fi

For example, if the hardware is unreliable, you might want to enable Cyclic Redundancy Checking on the packets. Change the above lines to:

# Enable Qnet if the user has enabled it.
if test -r /etc/system/config/useqnet -a -d /dev/io-net; then
        mount -Tio-pkt -o do_crc=1 lsm-qnet.so
fi

For more information about what happens when you boot your system, see Controlling How Neutrino Starts.

Starting the network manager, protocols, and drivers

The io-pkt* manager is a process that assumes the central role to load a number of shared objects. It provides the framework for the entire protocol stack and lets data pass between modules. In the case of native networking, the shared objects are lsm-qnet.so and networking drivers, such as devn-ppc800-ads.so. The shared objects are arranged in a hierarchy, with the end user on the top, and hardware on the bottom.

The device enumerator starts io-pkt* automatically when you boot and loads the appropriate drivers for the detected devices. For information about customizing how the enumerator starts io-pkt*, see “Device enumeration” in the Controlling How Neutrino Starts chapter in this guide.
It's possible to run more than one instance of io-pkt, but doing so requires a special setup. If you want to start io-pkt* “by hand,” you should slay the running io-pkt* first.
You can have at most one instance of Qnet running on a node, even if you're running more than one instance of io-pkt.

You can start the io-pkt* from the command line, telling it which drivers and protocols to load:

$ io-pkt-v4 -del900  -p qnet  &

This causes io-pkt-v4 to load the devn-el900.so Ethernet driver and the Qnet protocol stack.

Or, you can use the mount and umount commands to start and stop modules dynamically, like this:

$ io-pkt-v6-hc &
$ mount -Tio-pkt devn-el900.so
$ mount -Tio-pkt lsm-qnet.so

To unload the driver, type:

umount /dev/io-net/en0

You can't unmount a protocol stack such as TCP/IP or Qnet.

Checking out the neighborhood

Once you've started Qnet, the /net directory includes (after a short while — see below) an entry for all other nodes on your local subnetwork that are running Qnet. You can access files and processes on other machines as if they were on your own computer (at least as far as the permissions allow).

For example, to display the contents of a file on another machine, you can use less, specifying the path through /net:

less /net/alonzo/etc/TIMEZONE

To get system information about all of the remote nodes that are listed in /net, use pidin with the net argument:

$ pidin net

You can use pidin with the -n option to get information about the processes on another machine:

pidin -n alonzo | less

You can even run a process on another machine, using your console for input and output, by using the -f option to the on command:

on -f alonzo date

Populating `/net`

When a node boots and starts Qnet along with a network driver, if that node is quiet (i.e. there are no applications on it that try to communicate with other nodes via Qnet), the /net directory is slowly populated by the rest of the Qnet nodes, which occasionally broadcast their node information.

The default time interval for this is 30 seconds, and is controlled by the auto_add=X command-line option to lsm-qnet.so. So, 30 seconds after booting, /net is probably as full as it's going to get.

You don't have to wait 30 seconds to talk to a remote node; immediately after Qnet and the network driver initialize, an application on your node may attempt to communicate with a remote node via Qnet.

When there's an entry in the /net directory, all it means is that Qnet now has a mapping from an ASCII text node name to an Ethernet MAC address. It speeds up the node resolution process ever so slightly, and is convenient for people to see what other nodes might be on the network.

Entries in /net aren't deleted until someone tries to use them, and they're found to be invalid.

For example, someone might have booted a node an hour ago, run it for a minute, then shut it down. It will still have an entry in the /net directories of the other Qnet nodes, if they never talk to it. If they did talk to it, and establish session connections, everything will eventually be torn down as the session connections time out.

To flush out invalid entries from /net, type:

ls -l /net &

To completely clean out /net, type:

rmdir /net/*

Troubleshooting

All the software components for the Qnet network should work in unison with the hardware to build a native network. If your Qnet network isn't working, you can use various Qnet utilities to fetch diagnostic information to troubleshoot your hardware as well as the network. Some of the typical questions are:

Is Qnet running?
Are io-pkt* and the drivers running?
Is the network card functional?
How do I get diagnostic information?
Is the hostname unique?
Are the nodes in the same domain?

Is Qnet running?

Qnet creates the /net directory. Use the following command to make sure that it exists:

$ ls /net

If you don't see any directory, Qnet isn't running. Ideally, the directory should include at least an entry with the name of your machine (i.e. the output of the hostname command). If you're using the Ethernet binding, all other reachable machines are also displayed. For example:

joseph/ eileen/

Are `io-pkt*` and the drivers running?

As mentioned before, io-pkt* is the framework used to connect drivers and protocols. In order to troubleshoot this, use the following pidin command:

$ pidin -P io-pkt-v4-hc mem

Look for the Qnet shared object in the output:

     pid tid name               prio STATE            code  data        stack
  118802   1 sbin/io-pkt-v4-hc   21o SIGWAITINFO      876K  672K  4096(516K)*
  118802   2 sbin/io-pkt-v4-hc   21o RECEIVE          876K  672K  8192(132K) 
  118802   3 sbin/io-pkt-v4-hc   21r RECEIVE          876K  672K  4096(132K) 
  118802   4 sbin/io-pkt-v4-hc   21o RECEIVE          876K  672K  4096(132K) 
  118802   5 sbin/io-pkt-v4-hc   20o RECEIVE          876K  672K  4096(132K) 
  118802   6 sbin/io-pkt-v4-hc   10o RECEIVE          876K  672K  4096(132K) 
            libc.so.2          @b0300000             436K   12K
            devnp-shim.so      @b8200000              28K  4096
            devn-pcnet.so      @b8208000              40K  4096
            lsm-qnet.so        @b8213000             168K   36K

If the output includes an lsm-qnet.so shared object, Qnet is running.

Is the network card functional?

To determine whether or not the network card is functional, i.e. transmitting and receiving packets, use the nicinfo command. If you're logged in as root, your PATH includes the directory that contains the nicinfo executable; if you're logged in as another user, you have to specify the full path:

$ /usr/sbin/nicinfo

Now figure out the diagnostic information from the following output:

en0: 
  AMD PCNET-32 Ethernet Controller

  Physical Node ID ........................... 000C29 DD3528
  Current Physical Node ID ................... 000C29 DD3528
  Current Operation Rate ..................... 10.00 Mb/s
  Active Interface Type ...................... UTP
  Maximum Transmittable data Unit ............ 1514
  Maximum Receivable data Unit ............... 1514
  Hardware Interrupt ......................... 0x9
  I/O Aperture ............................... 0x1080 - 0x10ff
  Memory Aperture ............................ 0x0
  Promiscuous Mode ........................... Off
  Multicast Support .......................... Enabled

  Packets Transmitted OK ..................... 588
  Bytes Transmitted OK ....................... 103721
  Memory Allocation Failures on Transmit ..... 0

  Packets Received OK ........................ 11639
  Bytes Received OK .......................... 934712
  Memory Allocation Failures on Receive ...... 0

  Single Collisions on Transmit .............. 0
  Deferred Transmits ......................... 0
  Late Collision on Transmit errors .......... 0
  Transmits aborted (excessive collisions) ... 0
  Transmit Underruns ......................... 0
  No Carrier on Transmit ..................... 0
  Receive Alignment errors ................... 0
  Received packets with CRC errors ........... 0
  Packets Dropped on receive ................. 0

You should take special note of the Packets Transmitted OK and Packets Received OK counters. If they're zero, the driver might not be working, or the network might not be connected. Verify that the driver has correctly auto-detected the Current Operation Rate.

How do I get diagnostic information?

You can find diagnostic information in /proc/qnetstats. If this file doesn't exist, Qnet isn't running.

The qnetstats file contains a lot of diagnostic information that's meaningful to a Qnet developer, but not to you. However, you can use grep to extract certain fields:

# cat /proc/qnetstats | grep "compiled"
**** Qnet compiled on Jun  3 2008 at 14:08:23 running on EAdd3528

or:

# cat /proc/qnetstats | grep -e "ok" -e "bad"
  txd ok       930
  txd bad      0
  rxd ok       2027
  rxd bad dr   0
  rxd bad L4   0

If you need help getting Qnet running, our Technical Support department might ask you for this information.

Is the hostname unique?

Use the hostname command to see the hostname. This hostname must be unique for Qnet to work.

Are the nodes in the same domain?

If the nodes aren't in the same domain, you have to specify the domain. For example:

ls /net/kenneth.qnx.com