Using Qnet for Transparent Distributed Processing
|This version of this document is no longer maintained. For the latest documentation, see http://www.qnx.com/developers/docs.|
This chapter includes:
- What is Qnet?
- When should you use Qnet?
- Conventions for naming nodes
- Software components for Qnet networking
- Starting Qnet
- Checking out the neighborhood
A Neutrino native network is a group of interconnected workstations running only Neutrino. In this network, a program can transparently access any resource -- whether it's a file, a device, or a process -- on any other node (a computer or a workstation) in your local subnetwork. You can even run programs on other nodes.
The Qnet protocol provides transparent networking across a Neutrino network; Qnet implements a local area network that's optimized to provide a fast, seamless interface between Neutrino workstations, whatever the type of hardware.
|For QNX 4, the protocol used for native networking is called FLEET; it isn't compatible with Neutrino's Qnet.|
In essence, the Qnet protocol extends interprocess communication (IPC) transparently over a network of microkernels -- taking advantage of Neutrino's message-passing paradigm to implement native networking.
When you run Qnet, entries for all the nodes in your local subnetwork that are running Qnet appear in the /net namespace. (Under QNX 4, you use a double slash followed by a node number to refer to another node.)
For more details, see the Native Networking (Qnet) chapter of the System Architecture guide. For information about programming with Qnet, see the Transparent Distributed Networking via Qnet chapter of the Programmer's Guide.
When should you use Qnet, and when TCP/IP or some other protocol? It all depends on what machines you need to connect.
Qnet is intended for a network of trusted machines that are all running Neutrino and that all use the same endianness. It lets these machines share all their resources with little overhead. Using Qnet, you can use the Neutrino utilities (cp, mv, and so on) to manipulate files anywhere on the Qnet network as if they were on your machine.
Because it's meant for a group of trusted machines (such as you'd find in an embedded system), Qnet doesn't do any authentication of requests. Files are protected by the normal permissions that apply to users and groups (see "File ownership and permissions" in Working with Files), although you can use Qnet's maproot and mapany options to control -- in a limited way -- what others users can do on your machine. Qnet isn't connectionless like NFS; network errors are reported back to the client process.
TCP/IP is intended for more loosely connected machines that can run different operating systems. TCP/IP does authentication to control access to a machine; it's useful for connecting machines that you don't necessarily trust. It's used as the base for specialized protocols such as FTP and Telnet, and can provide high throughput for data streaming. For more information, see the TCP/IP Networking chapter in this guide.
NFS was designed for filesystem operations between all hosts, all endians, and is widely supported. It's a connectionless protocol; the server can shut down and be restarted, and the client resumes automatically. It also uses authentication and controls directory access. For more information, see "NFS filesystem" in Working with Filesystems.
In order to resolve node names, the Qnet protocol follows certain conventions:
- node name
- A character string that identifies the node you're
talking to. This name must be unique in the domain and can't contain
slashes or periods.
The default node name is the value of the _CS_HOSTNAME configuration string. If your hostname is localhost (the default when you first boot), Qnet uses a hostname based on your NIC hardware's MAC address, so that nodes can still communicate.
- node domain
- A character string that npm-qnet.so adds to the end of the node name. Together, the node name and node domain must form a string that's unique for all nodes that are talking to each other. The default is the value of the _CS_DOMAIN configuration string.
- fully qualified node name (FQNN)
- The string formed by concatenating the node name and node domain. For example, if the node name is karl and the node domain name is qnx.com, the resulting FQNN is karl.qnx.com.
- network directory
- A directory in the pathname space implemented by
npm-qnet.so. Each network directory -- there can be
more than one on a node -- has an associated node domain.
The default is /net, as used in the examples
in this chapter.
The entries in /net for nodes in the same domain as your machine don't include the domain name. For example, if your machine is in the qnx.com domain, the entry for karl is /net/karl; if you're in a different domain, the entry is /net/karl.qnx.com.
- name resolution
- The process by which npm-qnet.so converts an FQNN to a list of destination addresses that the transport layer knows how to get to.
- name resolver
- A piece of code that implements one method of converting an FQNN to a list of destination addresses. Each network directory has a list of name resolvers that are applied in turn to attempt to resolve the FQNN. The default is the Node Discovery Protocol (NDP).
You need the following software entities (along with the hardware) for Qnet networking:
- Manager to provide support for dynamically loaded networking modules.
- Network drivers (devn-*)
- Managers that form an interface with the hardware.
- Native network manager to implement Qnet protocols.
Neutrino currently includes these versions:
- npm-qnet-compat.so -- the original stack.
- npm-qnet-l4_lite.so -- the new, lightweight version, which provides faster speed and enhanced reliability. This version of the Qnet stack isn't compatible with the earlier version with regard to packet and protocol format.
By default, npm-qnet.so is a symbolic link to the latest version of the Qnet protocol stack. To determine which version you're using, type:
ls -l /lib/dll/npm-qnet.so
If any conflict arises, see "Troubleshooting," later in this chapter.
|If you need to customize Qnet, ask your sales representative about the Transparent Distributed Processing Software Kit (TDP SK).|
You can start Qnet by:
- creating a useqnet file, then rebooting
- explicitly starting the network manager, protocols, and drivers
as described below.
|If you run Qnet, anyone else on your network who's running Qnet can examine your files and processes, if the permissions on them allow it. For more information, see:|
To start Qnet automatically when you boot your system, log in as root and create an empty useqnet file, like this:
If this file exists, your /etc/system/sysinit script starts Qnet when you boot your machine. If you need to specify any options to Qnet, edit sysinit and change these lines:
# Enable qnet if user has enabled it. if test -r /etc/system/config/useqnet -a -d /dev/io-net; then mount -Tio-net npm-qnet.so fi
For example, if the hardware is unreliable, you might want to enable Cyclic Redundancy Checking on the packets. Change the above lines to:
# Enable qnet if user has enabled it. if test -r /etc/system/config/useqnet -a -d /dev/io-net; then mount -Tio-net -o do_crc=1 npm-qnet.so fi
For more information about what happens when you boot your system, see Controlling How Neutrino Starts.
The io-net manager is a process that assumes the central role to load a number of shared objects. It provides the framework for the entire protocol stack and lets data pass between modules. In the case of native networking, the shared objects are npm-qnet.so and networking drivers, such as devn-ppc800-ads.so. The shared objects are arranged in a hierarchy, with the end user on the top, and hardware on the bottom.
|The device enumerator starts io-net automatically when you boot
and loads the appropriate drivers for the detected devices.
For information about customizing how the enumerator starts io-net,
in the Controlling How Neutrino Starts chapter in this guide.
It's possible to run more than one instance of io-net, but doing so requires a special setup. If you want to start io-net "by hand," you should slay the running io-net first.
You can start the io-net from the command line, telling it which drivers and protocols to load:
$ io-net -del900 -p npm-qnet &
This causes io-net to load the devn-el900.so Ethernet driver and the Qnet protocol stack.
$ io-net & $ mount -Tio-net devn-el900.so $ mount -Tio-net npm-qnet.so
To unload the driver, type:
|You can't unmount a protocol stack such as TCP/IP or Qnet.|
Once you've started Qnet, the /net directory includes an entry for all other nodes on your local subnetwork that are running Qnet. You can access files and processes on other machines as if they were on your own computer (at least as far as the permissions allow).
For example, to display the contents of a file on another machine, you can use less, specifying the path through /net:
To get system information about all of the remote nodes that are listed in /net, use pidin with the net argument:
$ pidin net
You can use pidin with the -n option to get information about the processes on another machine:
pidin -n alonzo | less
You can even run a process on another machine, using your console for input and output, by using the -f option to the on command:
on -f alonzo date
All the software components for the Qnet network should work in unison with the hardware to build a native network. If your Qnet network isn't working, you can use various Qnet utilities to fetch diagnostic information to troubleshoot your hardware as well as the network. Some of the typical questions are:
- Is Qnet running?
- Are io-net and the drivers running?
- Is the Qnet protocol stack or Ethernet driver installed?
- Is the network card functional?
- How do I get diagnostic information?
- Is the Qnet version correct?
- Is the hostname unique?
- Are the nodes in the same domain?
Qnet creates the /net directory. Use the following command to make sure that it exists:
$ ls /net
If you don't see any directory, Qnet isn't running. Ideally, the directory should include at least an entry with the name of your machine (i.e. the output of the hostname command). If you're using the Ethernet binding, all other reachable machines are also displayed. For example:
As mentioned before, io-net is the framework used to connect drivers and protocols. In order to troubleshoot this, use the following pidin command:
$ pidin -P io-net mem
Look for the Qnet shared object in the output:
pid tid name prio STATE code data stack 86034 1 sbin/io-net 10o SIGWAITINFO 56K 684K 8192(516K)* 86034 2 sbin/io-net 10o RECEIVE 56K 684K 4096(68K) 86034 3 sbin/io-net 10o RECEIVE 56K 684K 4096(68K) 86034 4 sbin/io-net 10o RECEIVE 56K 684K 4096(68K) 86034 5 sbin/io-net 20o RECEIVE 56K 684K 4096(132K) 86034 6 sbin/io-net 10o RECEIVE 56K 684K 4096(68K) 86034 7 sbin/io-net 21r RECEIVE 56K 684K 4096(132K) 86034 8 sbin/io-net 10r RECEIVE 56K 684K 4096(132K) 86034 9 sbin/io-net 10o RECEIVE 56K 684K 4096(132K) 86034 10 sbin/io-net 10o RECEIVE 56K 684K 4096(132K) ldqnx.so.2 @b0300000 312K 16K npm-tcpip.so @b8200000 592K 144K devn-el900.so @b82b8000 56K 4096 devn-epic.so @b82c7000 44K 4096 npm-qnet-l4_lite.so @b82d3000 132K 16K
If the output includes an npm-qnet shared object, Qnet is running.
In order to ascertain the above, use the following command:
$ ls /dev/io-net
Ideally, you should see the following output:
en0 ip0 ip_en ipv6_en qnet_en
The en0 entry represents the first (and only) Ethernet driver, and qnet_en represents the Qnet protocol stack.
To determine whether or not the network card is functional, i.e. transmitting and receiving packets, use the nicinfo command. If you're logged in as root, your PATH includes the directory that contains the nicinfo executable; if you're logged in as another user, you have to specify the full path:
Now figure out the diagnostic information from the following output:
3COM (90xC) 10BASE-T/100BASE-TX Ethernet Controller Physical Node ID ................. 000103 E8433F Current Physical Node ID ......... 000103 E8433F Media Rate ....................... 10.00 Mb/s half-duplex UTP MTU .............................. 1514 Lan .............................. 0 I/O Port Range ................... 0xA800 -> 0xA87F Hardware Interrupt ............... 0x7 Promiscuous ...................... Disabled Multicast ........................ Enabled Total Packets Txd OK ............. 1283237 Total Packets Txd Bad ............ 9 Total Packets Rxd OK ............. 7923747 Total Rx Errors .................. 0 Total Bytes Txd .................. 82284687 Total Bytes Rxd .................. 1612645356 Tx Collision Errors .............. 34380 Tx Collisions Errors (aborted) ... 0 Carrier Sense Lost on Tx ......... 0 FIFO Underruns During Tx ......... 0 Tx deferred ...................... 83301 Out of Window Collisions ......... 0 FIFO Overruns During Rx .......... 0 Alignment errors ................. 0 CRC errors ....................... 0
You should take special note of the Total Packets Txd OK and Total Packets Rxd OK counters. If they're zero, the driver might not be working, or the network might not be connected. Verify that the Media Rate has been auto-detected correctly by the driver.
You can find diagnostic information in /proc/qnetstats. If this file doesn't exist, Qnet isn't running.
The qnetstats file contains a lot of diagnostic information that's meaningful to a Qnet developer, but not to you. However, you can use grep to extract certain fields:
$ cat /proc/qnetstats | grep "compiled" **** Qnet compiled on Jun 25 2003 at 17:14:27 running on qnet02
$ cat /proc/qnetstats | grep -e "ok" -e "bad" txd ok 19415966 txd bad 31 rxd ok 10254788 rxd bad dr 0 rxd bad L4 0
If you need help getting Qnet running, our Technical Support department might ask you for this information.
Since Neutrino includes two versions of Qnet stacks that are incompatible in regard to packet format, a conflict could arise, and native networking might not work. If this happens, make sure that npm-qnet.so is a symbolic link to the same version of the Qnet protocol stack on both machines. For more information, see "Software components for Qnet networking," earlier in this chapter.
You can also use the ping command:
to verify if all other things (such as network cards, TCP protocol) are working. If ping works, it's likely that the only problem lies with the versions of Qnet.
Use the hostname command to see the hostname. This hostname must be unique for Qnet to work.
If the nodes aren't in the same domain, you have to specify the domain. For example: