Improving the network throughput

QNX SDP8.0Performance Tuning User GuideUser

Increasing the iflib Rx budget

Many of the PCI network drivers use the iflib framework. This framework has a number of sysctl and tunable parameters that you can set, including rx_budget. The rx_budget tunable determines the maximum number of packets that the Rx task queue thread can process in a batch. Under heavy load, increasing this value can improve the Rx performance. For example:

# sysctl -w dev.ix.0.iflib.rx_budget=100

A value of 0 means the default of 16 is used.

Considerations

Increasing the number of packets processed in a batch increases the latency on other packet flows in the network stack.

Changing the thread priorities

Generally speaking, there are three threads involved in the transmission and reception of a packet:

  • A resource manager thread talking to the client application
  • A task queue thread performing most of the work
  • An interrupt service thread (IST)
The relative priorities of these threads can influence performance because they may briefly contend for the same resources as packets that are passed among them. The resource manager thread inherits the priority from the client on the message pass for the read(), write(), or similar calls. The priority of the task queue thread and the IST are set by tunables and default to 21. For example:
# sysctl -d qnx.taskq_prio
qnx.taskq_prio: taskq thread priority
# sysctl -d qnx.ist_prio
qnx.ist_prio: interrupt thread priority

Considerations

Changing the relative priorities may increase Rx performance at the expense of Tx performance and vice versa. Take care when you set priorities to make sure you do not starve other functionality of CPU time.

Changing the network dispatch service (netisr) policies

The netisr code is invoked as part of packet processing. It has three different dispatch policies (descriptions taken from the code):

NETISR_DISPATCH_DEFERRED
All work is deferred for a netisr, regardless of context (may be overriden by protocols).
NETISR_DISPATCH_HYBRID
If the executing context allows direct dispatch, and we're running on the CPU the work would be performed on, then direct dispatch it if it wouldn't violate ordering constraints on the workstream.
NETISR_DISPATCH_DIRECT
If the executing context allows direct dispatch, always direct dispatch. (The default.)

Also from the code: Notice that changing the global policy could lead to short periods of misordered processing, but this is considered acceptable as compared to the complexity of enforcing ordering during policy changes. Protocols can override the global policy (when they're not doing that, they select NETISR_DISPATCH_DEFAULT).

The NETISR_DISPATCH_DIRECT policy guarantees that all packet ordering constraints are met, but means that the netisr code is invoked directly by the task queue thread.

When the policy is NETISR_DISPATCH_DEFERRED, the netisr occurs in its own thread. This adjustment divides the work among the task queue and netisr threads, which potentially increases the throughput.

You set the netisr (kernel network dispatch service) dispatch policy via a sysctl variable. For more information, go to the sysctl entry in the Utilities Reference.

Considerations

The NETISR_DISPATCH_DEFERRED policy may violate ordering constraints.

Additional work needs to be performed when passing the work between the threads in a deferred dispatch scenario. If the task queue isn't maximized, deferring may result in worse throughput or increased CPU usage.

Increasing the UDS stream buffer sizes

Unix Domain Sockets (UDS) have sysctl parameters that define their socket buffer sizes. For SOCK_STREAM types, the transmit and receive buffers default to 8 kB. For example:

# sysctl net.local.stream
net.local.stream.sendspace: 8192
net.local.stream.recvspace: 8192

If large amounts of data are to be transferred over a stream UDS using reads and writes of more than 8 kB, increasing the sysctl parameter values can reduce the number of message passes and improve throughput. The following example increases them to 64 kB:

# sysctl -w net.local.stream.sendspace=65536
net.local.stream.sendspace: 8192 -> 65536
# sysctl -w net.local.stream.recvspace=65536
net.local.stream.recvspace: 8192 -> 65536

Considerations

Increasing the socket buffer sizes increases the memory consumed by each UDS. In a system with a large number of UDS, this additional memory usage can be significant.

Disabling the ICMP redirects for improved routing performance

By default, io-sock sends ICMP redirects when it acts as a router. The redirect functionality in io-sock requires the packets to go via a slower, full-functionality path in the code. If sending redirects when routing is not required, you should disable them, which lets the packets be forwarded via a fast-forwarding path. For example:

# sysctl net.inet.ip.redirect=0
net.inet.ip.redirect: 1 -> 0
# sysctl net.inet6.ip6.redirect=0
net.inet6.ip6.redirect: 1 -> 0

You can use protocol statistics to find out the number of packets forwarded and fast forwarded and the number of redirects sent. For example:

# netstat -s -p ip
ip:
...
0 packets forwarded (0 packets fast forwarded)
...
0 redirects sent

Considerations

ICMP redirects are required by RFC 792 but are rarely part of a modern network with a single router on a network segment.

Enabling the TCP ACK threshold settings

LRO (large receive offload)

The io-sock networking manager supports the large receive offload feature (LRO), which reassembles incoming network packets into larger buffers for Rx processing. LRO also has the effect of significantly reducing the number of ACK responses. Because io-sock is processing fewer ingress (batched data) packets, it generates fewer ACK responses.

ACK threshold sysctl variables

If your hardware or driver do not support LRO, you can use sysctl variables to reduce the number of ACK responses. (If LRO is supported, these variables have no effect because the ACK rate is already throttled.)

qnx.net.inet.tcp.tcp_ack_thres
Normally, a TCP receiver sends an acknowledgment packet (ACK) after two Rx packets: the first one sets a delayed ACK timer, and the second one sends an ACK (if the previous ACK is delayed). The qnx.net.inet.tcp.tcp_ack_thres variable delays the second ACK until the outstanding data exceeds the specified size, expressed as a multiple of the maximum segment size (MSS). For an example, if set to 20, the second ACK is not sent until the size of the outstanding data is more than 20 times the MSS. However, if the threshold exceeds the space remaining in the Rx socket buffer, the receiver resumes sending ACKs and qnx.net.inet.tcp.tcp_ack_thres is ignored. The delayed ACK is still sent if timeout occurs before the set threshold.
qnx.net.inet.tcp.tcp_win_update_thres
Adjusts when a window update packet is sent according to the rate at which the receiving application is reading data. If the read buffer is emptied by the specified multiple of the maximum segment size (MSS) and is within a certain range of the socket buffer size, a window update packet is sent. The value could be equal to or smaller than the value set by qnx.net.inet.tcp.tcp_ack_thres.
Because io-sock enables auto-scaling buffers by default, in most cases, the socket buffer size quickly scales beyond this window update threshold. It is more likely to affect ACK rate if smaller, fixed-socket buffer sizes are used.
qnx.net.inet.tcp.tcp_ack_on_push
Enables an event-based ACK. It is designed for use with sockets that are sending less data, and thus fewer ACKs. When the sender sends a TCP data packet with the PUSH flag set (to report that it has no further data to send in its sending socket buffer), the receiver sends the ACK instead of relying on the delayed ACK.

These sysctl variables allow you to set ACK thresholds to reduce the amount of ACKs in best-case conditions (no packet loss). Because the delayed ACK is still armed, that ACK is sent at timeout. In addition, because algorithms such as FAST retransmit are still engaged and missed segments still generate ACK responses, the threshold is only in effect under ideal conditions (without detected packet loss).

Page updated: