io-blk.so

Block I/O support

Syntax:

driver [blk option[,option…]] [fstype [options]]

Runs on:

QNX Neutrino

Options:

The driver is one of the devb-* drivers, such as devb-eide, and option is one of the options described below.

The optional fstype argument is one of the filesystem drivers (fs-*); you can follow it with options specific to the filesystem.

Suffixes for size, memory, and time arguments

You can use suffixes on the arguments to some options to specify the units. These suffixes aren't case-sensitive.

For size arguments, the suffixes are:

For memory arguments, the suffixes also include:

For time arguments, the suffixes are:

blk options

You can specify the following options only in the blk section:

alloc=mode
Set the cache/memory allocation policy to one of the following:
  • cache — allocate all of the buffer cache (the cache=size) at startup, but allocate all other caches (e.g., names) on demand and let them grow to their specified limit, and then start removing the Least Recently Used (LRU) items.
  • demand — allocate the buffer cache on demand; it will grow from 0 to the size specified by the cache option as you access the disk.
  • upfront — pregrow all caches to their full size. This option can be useful in RAM-tuning a system, to see how much memory the filesystem will eventually consume (things such as the name and vnode caches tend to grow over time).

The default is cache.

auto=amount
Set the amount of automounting to be performed; amount is one of:
  • none — only raw block devices appear.
  • partition — enumerate any partition tables.

The default is partition.

automount=[+]dev[@ptype]:mountpoint[:fstype[:options]]
Create a mountpoint for dev at mountpoint. If you don't specify a full path for the device, io-blk.so uses the value of its devdir option as a prefix. For example, if devdir is /dev (the default), an option of automount=hd0t177:/disk mounts /dev/hd0t177 at /disk.

The optional ptype specifies the partition type for a partitioned medium (e.g, hard disk, SD card), and must be one of dos, ext2, mac, nt, or qnx6.

The optional fstype specifies the filesystem type, and may be followed by filesystem options; if you don't specify fstype, the library tries to determine the filesystem automatically. The choices of filesystem and the associated shared objects are:

cd
fs-udf.so
dos
fs-dos.so
ext2
fs-ext2.so
mac
fs-mac.so
nt
fs-nt.so
qnx6
fs-qnx6.so (Power-Safe filesystem)
udf
fs-udf.so

To mount multiple filesystems on a (removable) device, specify that the device is shared with a + prefix. For example:

automount=+umass0:/dos/a:dos,automount=+umass0:/fd:qnx6

For a list of common partition types, see the Filesystems chapter of the System Architecture guide.

automount=@filename
The automounts are as specified in the given file. The file is a list of mounts (using the same syntax as above), separated by newline characters or commas.
Note: You can't locate the filename file in the filesystem to be automounted: it has to be available in an existing filesystem such as the image filesystem. Optionally, you could locate it in any devb filesystem that's already running.
cache=total[:hash]
Specify the total size of the disk buffer cache. The buffer cache is used as intermediate storage for all disk I/O, as well as providing LRU caching for dirty delayed-write blocks and recently-read blocks.

By default, 2 MB plus 2% of system RAM is assigned, with a maximum of 512 MB. If you specify an explicit size, bounds of 512 KB and 3 GB are applied.

Normally this cache is allocated at startup; if you specify alloc=demand, or the initial allocation failed, then the cache is dynamically grown as required up to the specified size.

The optional hash argument specifies the size of the buffer cache hash list. You can specify any of the suffixes described above. The default is 25% of the number of entries in the cache.

Note:
  • The default cache size is excessive for devb-ram. You'll probably want to reduce it to the minimum:
    devb-ram blk cache=512k &
    
  • By default in QNX Neutrino 6.5 and later, io-blk.so allocates the filesystem buffer cache (blk cache=) on affected ARM platforms from a global memory region (SHMCTL_ANON | SHMCTL_GLOBAL) to avoid the per-process 32 MB limitation. To override this and make the allocation from the normal devb-* process heap, specify blk memory=sysram.
contig=yes|no
(QNX Neutrino 7.0 or later) Allocate contiguous physical memory for the block cache (the majority of io-blk.so's memory). This speeds up I/O by eliminating virtual-to-physical translation later. The default is yes.
delwri=delay1[:delay2[:postpone]]
Specify the delay time for write-behinds to the media. A dirty disk block may remain in the cache without being physically written to the disk, to improve performance. The default is up to 3 seconds (delay1) for fixed media, and 1 second (delay2) for removable media. For more information, see Controlling writing operations,” below.

The postpone argument specifies the number of seconds to keep a dirty disk block in memory if it's being continuously modified, before physically writing it to the disk. This applies only to fixed media; for removable media, writes aren't delayed beyond the delay2 period. By default, postpone is the same as delay1.

devdir=path
The directory in which io-blk presents the physical devices as block-special files. The default is /dev.
devno=type
Controls how major device numbers are requested; type is one of:
  • name — use the name of the device (e.g., hd, cd).
  • class — use the CAM class of the device (e.g., direct, readonly).
  • common — use a single class for all block devices.

The default is name.

directio=no|rd|wr|rw
(QNX Neutrino 7.0 or later) Specify the rules for performing the DCMD_FSYS_DIRECT_IO devctl() command, disallowing it completely, or allowing it for read-only, write-only, or read-write operations. The default is rw.
enumpart=order
Set the order for enumerating disk partitions; one of the following:
  • forward — enumerate slots 1 through 4, followed by any extended partitions.
  • reverse — enumerate slots 4 through 1, followed by any extended partitions.
  • windows — for MBR-partitioned disks, enumerate the active partition, followed by any extended partitions, and then non-booting primary partitions.

The default is forward.

exclusive
Require/obtain exclusive access of the mount device. This means that when a filesystem is mounted on a partition, nothing else is allowed to open that raw partition until the filesystem is unmounted.
fdinfo=mode
Specify how to store open file names for the iofdinfo() query. The options for mode are:
  • ncache — try to reconstruct the file name from the contents of the directory name cache. Don't rely on this option to supply the names of all open files (a file's name is supplied only if all components of its pathname are in the name cache).
  • always — store the name used in each open() call to ensure that this name is always available.
  • never — never supply the name of an open file.

The default is always.

fixed-priority=priority
(QNX Neutrino 7.0 or later) Force all io-blk.so's threads that handle I/O to run at the given priority. The default is for them to inherit priorities.
fse-device=name
(QNX Neutrino 6.6 or later) Set the filesystem event manager name. The default is defined by FSE_DEFAULT_MANAGER_NAME in <sys/fs_events.h> and is currently /dev/fsevents.
fse-period=msecs
(QNX Neutrino 6.6 or later) Set the the maximum period of time to delay before sending events to the event manager. The default is defined by FSE_DEFAULT_PERIOD_MS in <sys/fs_events.h> and is currently 250 ms.
fse-size=size
(QNX Neutrino 6.6 or later) Set the size of the event buffer, in kilobytes. The default is defined by FSE_DEFAULT_BUFFER_SIZE in <sys/fs_events.h> and is currently 50 KB.
map=size[:hash]
Set the number of entries in a cache used to map translations from logical blocks to physical ones. If this option isn't specified, the size is based in the value of the vnode option.

The hash argument specifies the size of the associated hash list; the default is 1/6 of the number of entries in the map.

maxcio=write_num:read_num
(QNX Neutrino 6.6 or later) Set the maximum allowed I/O between vnode locks. You can use this option to break up I/O requests to improve the latency of transactions during long I/O operations. The default is 2m:2m.
If you don't want to break up I/O requests, set write_num and/or read_num to 0 (zero), as appropriate.
maxio=num
Limit the size of I/O requests that io-blk.so composes and hands over to the cam-disk.so library. This option is expressed in terms of disk sectors; the default is 128 disk sectors, or 64 KB. The minimum setting is 32 disk sectors (16 KB). The maximum setting depends on the CPU:
  • For 32-bit CPUs, the maximum setting is 2040 sectors.
  • For 64-bit CPUs, it's 1352 sectors.

You should specify a power of two for num (e.g., 256, 512, or 1024 sectors, for 128 KB, 256 KB, or 512 KB respectively).

Note: This option simply caps the size of I/O requests that io-blk.so composes; it doesn't actually force io-blk.so to build large I/O requests.
memory=type1[:type2[:type3[:type4]]]
Specify the typed memory pool or pools to use. For example, memory=sysram&below4G:sysram says to try sysram with the below4G modifier, and if no such region exists, then try plain sysram. (The same option works on systems with more or less than 4 GB of RAM.)
Note:
  • It's up to the startup to set up typed memory. Use pidin syspage=asinfo to see the list.
  • Generally you don't need to specify the memory option, in which case io-blk.so uses the normal mmap() pool; but on a system with more than 4 GB of RAM, it's mandatory.
  • You might have to quote this option, in order to prevent the shell from interpreting special characters such as an ampersand (&).

For more information about typed memory, see Typed memory in the “Interprocess Communication (IPC)” chapter of the System Architecture guide.

mfu=segmentation
Specify the MFU:MRU segmentation (typically as a percentage, but it can be a size). You can specify any of the suffixes described above. The default is a 50:50 split.

The first time a sector is accessed, it goes into the MRU (Most Recently Used) region; if it's accessed again, it goes into the MFU (Most Frequently Used). The oldest cache blocks are removed from either the MRU or MFU region, so as to preserve this ratio.

naming=scheme
Set the device/partition naming scheme. The default is 0#0. For more information, see Naming schemes,” below.
ncache=size[:hash]
Specify a name cache of size entries. Using more name cache entries speeds up path/file lookups at the expense of memory. Setting the size to 0 disables name caching. If this option isn't specified, the size is determined from the vnode option.

The hash argument specifies the size of the associated hash list; the default is 1/6 of the size of the number of entries in the name cache.

pregrow-fill=zero|none
(QNX Neutrino 6.6 or later) Specify whether or not the DCMD_FSYS_PREGROW_FILE devctl() command (see the Devctl and Ioctl Commands reference) zeroes the content when growing files. Zeroing the content is the default but increases the time for pregrowing files; not zeroing is very fast but insecure, as it allows access to the old content of the disk blocks.
priority=prio
Set the priority of periodic filesystem callouts. The default is 21.
ra=min[:max]
Set the minimum and maximum sizes of the read-ahead buffers. You can specify any of the suffixes described above. The default minimum is the system page size; the default maximum is 64 times the system page size.
ramdisk=size[:sector[:paddr]]
Create an internal ramdisk device (/dev/ramX) of the specified size, with the specified sector size. The size and sector variables can use the suffixes described above. The sector size must be a power of 2 in the range from 512 through 4096 bytes; the default is 512 bytes.

If you specify the physical address (paddr), the contiguous physical memory starting at that location is mapped in and is unaltered for use. The physical address should be aligned to the system page size. You can use this argument to “reload” a RAM filesystem.

Note: If you don't specify paddr, the initial contents of this memory device are unspecified, so you must format it before using it as a filesystem (for example, with mkqnx6fs for a Power-Safe filesystem).
rapolicy=traditional|aggressive
Choose the read-ahead policy. Read ahead means the speculative prefetching of the contents of files when the disk cache has good reasons to believe that the user will request these contents soon. Fetching the file contents in advance allows the subsequent read() to find the data already resident in the disk cache, so this system call won't need to generate an I/O request to bring this data in from disk and then wait for the disk I/O to complete.
  • traditional — this policy aims to strike a balance between the expected benefits of read-ahead and its potential costs (reading blocks that the client won't request later on). It's a good choice for memory-constrained systems, or systems where files are accessed in a predominantly random manner.
  • aggressive — this policy assumes that a file will be accessed in large chunks in a sequential manner, and that read() calls will come in a rapid sequence. It's therefore beneficial to load as many of the file's subsequent blocks into page cache as possible in anticipation that the client will request them very soon. This is a speculative bet; if the client doesn't behave as expected, there's a higher volume of disk I/O, and space is wasted in the page cache. This policy is a good choice for multimedia systems with a sizable disk cache.

The default is traditional.

rmvpoll=period
The polling period for removable media. The default is 0 seconds.
rmvto=delay
Specify a removable media timeout (default: 2 seconds). After the specified period of inactivity, a disk access prompts validation of the media with the driver; if the driver reports that the media has been changed, all data blocks and cached information for that device are discarded and relearned.

This option can take a value of none, which disables removable media relearning. This isn't very useful for real removable devices (e.g., CDs), but if your device is on-board SD, or USB that isn't removable but the driver is advertising it as such, you can disable the verification overheads.

spacenotifydelay=msec
Specify the delay in milliseconds before sending a free-space notification. When the amount of free space on a drive changes, a filesystem event (fse) notification is sent from io-blk.so. In case the free space is fluctuating rapidly, this option makes io-blk.so delay before delivering the event. The value must be in the range from 100 ms through 10000 ms. The default is 1000 ms.
stack=size
(QNX Neutrino 7.0 or later) Set the stack size for all threads. The size must be in the range from 20 KB through 100 KB; the default is 20k.
thread=max[:low[:high]]
Set the thread pool parameters (maximum, low water, and high water). The default is 12:2:5.
verbose[=level]
Be verbose. The output is sent to the system logger, slogger2.

The optional level argument is a series of alphabetic characters that indicates the categories of event to log:

  • b — Bad blocks.
  • c — Configuration.
  • d — Direct I/O.
  • e — Log errors that occur while handling an application request (i.e., errors during read(), write(), or devctl()).
  • f — fsys module (fs-*).
  • h — Logs any "housekeeping" tasks, including background TRIM operations.
  • i — Input.
  • o — Output.
  • r — Removable.
  • t — Trace events.
  • v — Virtual filesystem (VFS).
  • x — eXtra verbose logging. All errors, including benign ones (i.e., ENOENT on an open() call), are logged.

An option of blk verbose means all the categories, blk verbose=io means input plus output, blk verbose=!r means everything except removable, and so on. The default is none.

vnode=size[:max]
Specify the number of vnode entries (filesystem-independent inodes). The default is 1024 entries. Up to size vnodes may be active. Vnodes remain in this cache when the corresponding file is closed, making subsequent opens faster.

The max argument allows a momentarily large number of files to be open at the same time; the cache tries to stay at size entries but grows if needed up to max entries before giving an error of ENFILE. The default value of max is 3 times size.

Filesystem options

You can apply the following options globally (in the blk section) or to a specific filesystem:

after
Mount the filesystem so that it's resolved after any other filesystems mounted at the same pathname (in other words, it's placed behind any existing mount). When you access a file, the system looks on this filesystem last, and only if the file wasn't found on any other filesystems.
before
Mount the filesystem so that it's resolved before any other filesystems mounted at the same pathname (in other words, it's placed in front of any existing mount). When you access a file, the system looks on this filesystem first.
commit=level
Set the committing level of the filesystem, which controls how dirty system/user blocks are written to disk. The level is one of none, low, medium (the default), and high. If it's none, all writes are time-delayed (as specified by the delwri option); at high, all writes are performed synchronously. For more information, see Controlling writing operations,” below.
error=action
Set the action to perform when a fs-* filesystem module detects an internal error. The action is one of:
  • ebadfsys — simply return EBADFSYS to the client.
  • mountro — return EBADFSYS to the client and remount the affected filesystem as read-only.

The default is ebadfsys.

marking=mode
Set the filesystem-dirty marking behavior. The mode must be none or mount (the default). If marking is on, the filesystem is marked as being dirty when it's mounted, and it's marked as being clean when it's unmounted. The method of marking depends on the filesystem.
mntgid=numeric_gid
(QNX Neutrino 6.6 or later) Specify the group ID to use for the mountpoint.
mntperms=octal_permissions
(QNX Neutrino 6.6 or later) Specify the permissions to use for the mountpoint. For more information about octal permission values, see the entry for chmod.

If you specify this as a global blk option, it causes the device special files (/dev/hd0, /dev/hd0tXXX, etc.) to have the specified file access permissions. It also causes the filesystem mountpoint to have the same permissions. If you use it in a specific filesystem section, then only mounts of that filesystem type are affected.

mntuid=numeric_uid
(QNX Neutrino 6.6 or later) Specify the user ID to use for the mountpoint.
Alternatively, you can specify the mnt* options via the -o option to mount. If you do, the options given to the specific mount override those provided on the driver command line, whether they were global or filesystem type-specific. For example:
# devb-eide cam blk cache=10m,mntperms=0741,mntuid=32,mntgid=709
# ls -l /dev/hd0*
total 13784609
brwxr----x 1 32 709 5, 0 Jan 18 02:44 hd0
brwxr----x 1 32 709 1, 9 Jan 18 02:44 hd0t177

# mount -t qnx6 -omntperms=0555,mntuid=600,mntgid=1034 /dev/hd0t177 /fs/qnx6
# ls -ld /fs/qnx6
dr-xr-xr-x 6 600 1034 8192 Jan 18 02:41 /fs/qnx6
In this case, the permissions mask of 0555, user ID of 600, and group ID of 1034 are applied to the /fs/qnx6 mountpoint only. The permissions mask of 0741, user ID of 32, and group ID of 709 for other mountpoints created by devb-eide remain those given on the driver command line.

Because the raw block and partition devices don't have a type name for command-line use, and because they get automatically mounted (with the default auto=partition), the only way to set the mnt* options on these devices is to specify them globally.

[no]atime
Update/don't update the file's directory entry if the only change is the access time. The noatime option isn't strict POSIX 1003.2 behavior, but it's faster.
[no]creat
Allow/don't allow files to be created on this filesystem.
[no]exec
Allow/don't allow file execution from this filesystem.
[no]lock
Lock/don't lock removable media. If locked, the medium is treated as fixed.
[no]rmv
Don't/do allow invalid mounts on removable media (re-insert).
[no]suid
Ignore/don't ignore the set-user ID bit on files in this filesystem.
ro
Mount all drives/filesystems as read-only.
rw
Mount all drives/filesystems as read-write (if the physical media permit). This is the default.

For more information about the before and after options, see Pathname Management in the “Process Manager” chapter of the System Architecture guide.

Description:

The io-blk.so library provides block I/O support, as used by the devb-* drivers, and loads filesystem drivers (fs-*) as necessary.

The default values of the map and ncache options are based on the value of the vnode option. This arrangement lets you configure a system by specifying the cache size and the number of files, and letting the library set the other options.

Controlling writing operations

There are various types of writing operations:

Synchronous (SYNC)
Start immediately and wait for completion.
Asynchronous (ASYNC)
Start immediately but don't wait for completion.
Delayed (DELWRI)
Don't start until after a timeout period and then perform as asynchronous. The blk delwri= option controls the timeout for the delayed format; if you set this option to 0, a delayed writing operation is the same as asynchronous.
As required
Write only if you have to.

The types of data include:

User
What you read() and write().
Metadata
Things associated with stat(), such as times and IDs.
Filesystem
Things such as bitmaps, extents, etc.

If a file has no links, the “as required” form of write operation is used, never going to disk unless the buffer or cache is needed (since the file has no links, the data isn't expected to be accessible after a power failure). If you open a file with O_SYNC, the synchronous format is always used.

Otherwise, the blk commit level controls the type of write to use for each level of data:

commit= Filesystem data Metadata User data
none DELWRI DELWRI DELWRI
low ASYNC DELWRI DELWRI
medium SYNC DELWRI DELWRI
high SYNC SYNC SYNC
CAUTION:
If you specify commit=none, you lose all write ordering (both for single multiblock updates and multiple-user operations). Hence, your chances of a useful recovery following a power failure are poor. We recommend that you use this option only if you have a uninterruptible power supply (UPS).

Calling close() might force a metadata update, but does nothing to the user data. Calling fsync() always forces out any delayed-write blocks for the file, and so is useful only when commit isn't high.

Naming schemes

You can use the naming=scheme option to specify how to name devices and Master Boot Record (MBR) and Globally Unique Identifier (GUID) partitions. The scheme must be a string of at least two characters; the default is 0#0.

CAUTION:
Change to a different naming scheme at your own risk:
  • Some system components could have hard-coded assumptions about disk names.
  • Don't use a different scheme unless you're in control of the entire system.
  • If you use a different scheme, you'll need some external knowledge about what filesystem to mount on a partition, because you won't have the tXXX naming hint.

The first two characters of the naming=scheme option govern the naming of the raw disk and MBR partitions, as well as the maximum number of MBR partitions that you can have:

Character 1 Character 2 Maximum partitions Naming scheme
A digit # 100 The raw devices are numbered from the given digit, and MBR partitions are named from the device with a t followed by the OS type of the partition (see Partitions in the “Filesystems” chapter of the System Architecture guide). For example, a Power-Safe partition could be named hd0t177.

For duplicate partitions, a period (.) and sequence number are appended (e.g., hd0t12, hd0t12.1, and hd0t12.2 for logical/extended DOS partitions). This is the QNX Neutrino naming scheme.

A digit A lowercase letter 26 The raw devices are numbered from the given digit, and MBR partitions are named starting with the given letter (e.g., /dev/hd0, /dev/hd0a, /dev/hd0b, /dev/hd0c, and so on). The name doesn't indicate the OS type of the partitions, just the order in which they were found.
A lowercase letter A digit 100 The raw devices are named starting with the given letter. Primary MBR partitions are named 1, 2, 3, and 4; if you don't have four of them, the unused numbers are skipped. Any extended partitions are numbered without gaps from 5 (e.g., /dev/hda, /dev/hda1, /dev/hda2, /dev/hda5, and so on).

The name doesn't indicate the OS type of the partition, just its location. This is the Linux naming scheme.

The limits on the number of MBR partitions include primary and logical partitions. If you try to use a disk with more than the maximum number of partitions, only those partitions below the limit are enumerated. All others are ignored, and io-blk.so displays an error message.

The third character of the naming scheme governs the names of GUID Partition Table (GPT) partitions. The io-blk.so library uses the following elements of the GPT when it creates /dev entries for GPT partitions:

Slot
The slot number in the GPT where this partition is defined. Slot numbers start at zero.
Partition type GUID
A 16-byte GUID that identifies the type of the partition. For example, Power-Safe partitions have a partition type GUID of cef5a9ad-73bc-4601-89f3-cdeeeee321a1.
Unique partition GUID
A 16-byte GUID that's unique to the partition. This is expected to be unique for every partition on every disk anywhere in the world (but it isn't enforced in any way).
User-defined partition name
A descriptive string. Partition names are stored in a GPT in UTF-16, and io-blk.so converts them to UTF-8. Names are limited to 36 UTF-16 characters. It's valid (and common) for GPT partitions not to have a name.

By default, io-blk.so creates /dev entries for GPT partitions in the form /dev/hd0.str1.identifier, where identifier is the number of the slot in the GPT where this partition is defined. The str1 is the partition type GUID; io-blk.so recognizes a few partition type GUIDs and replaces them with short, easily readable aliases:

Partition type GUID Alias Description
ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 ms Microsoft Basic Data Partition (BDP), for any kind of FAT, exFAT, or NTFS filesystem
48465300-0000-11aa-aa11-00306543ecac hfs Apple hierarchical filesystem
cef5a9ad-73bc-4601-89f3-cdeeeee321a1 qnx6 Power-Safe filesystem

There's no limit on the number of GPT partitions that you can have.

Here are some examples of names of GPT partitions on disk /dev/hd1:

Partition name Description
/dev/hd1.ms.0 Microsoft filesystem defined in slot 0
/dev/hd1.hfs.1 Apple HFS+ filesystem defined in slot 1
/dev/hd1.qnx6.5 Power-Safe filesystem defined in slot 5
/dev/hd1.qnx6.9 Power-Safe filesystem defined in slot 9
/dev/hd1.8da63339-0007-60c0-c436-083ac8230908.12 This partition is defined in slot 12 and doesn't have an alias, so the full partition type GUID is used.
/dev/hd1.8da63339-0007-60c0-c436-083ac8230908.13 Another partition with the same partition type GUID defined in slot 13

The third character of the naming=scheme option governs the identifier part of the names for GPT partitions as follows:

Character 3 identifier
A digit The sum of the given digit and the number of the slot in the GPT that defines the partition
@ The user-defined partition name. If a partition doesn't have a user-defined name, its slot number in the GPT is used.
Any other character The partition unique GUID

If you specify blk naming=0##, then the entries for the GPT partitions might look like this:

If you specify blk naming=0#@, then the entries for the GPT partitions might look like this:

Note the trailing 13 in the last entry. This GPT partition doesn't have a user-defined name, so the number of the slot in the GPT where this partition is defined was used instead.

Files:

charset.so
This library includes the character sets for Japanese, traditional Chinese, and standard Chinese (code pages 932, 936, and 950); io-blk.so loads this library if it needs to mount a FAT volume, CD, or DVD that's formatted in one of these locales.