Block I/O support
Syntax:
driver [blk option[,option…]] [fstype [options]]
Options:
The driver is one of the devb-* drivers,
such as
devb-eide,
and option is one of the options described below.
The optional fstype argument is one of the filesystem drivers
(fs-*);
you can follow it with options specific to the filesystem.
Suffixes for size, memory, and time arguments
You can use suffixes on the arguments to some options to specify the units.
These suffixes aren't case-sensitive.
For size arguments, the suffixes are:
- b — bytes
- k — kilobytes
- m — megabytes
- % — percent of the total amount of cache, etc.,
depending on the option
For memory arguments, the suffixes also include:
For time arguments, the suffixes are:
- h — hours
- m — minutes
- s — seconds
blk options
You can specify the following options only in the blk section:
- alloc=mode
- Set the cache/memory allocation policy to one of the following:
- cache — allocate all of the buffer
cache (the cache=size) at startup, but allocate all
other caches (e.g. names) on demand and let them grow to their specified
limit, and then start removing the Least Recently Used (LRU) items.
- demand — allocate the buffer cache the same way,
on-demand (it will grow from 0 to the size specified by the
cache option as you access the disk).
- upfront — pregrow all caches to their full size.
This option can be useful in RAM-tuning a system, to see how much
memory the filesystem will eventually consume (things such as the name
and vnode caches tend to grow over time).
The default is cache.
- auto=amount
- Set the amount of automounting to be performed; amount is one
of:
- none — only raw block devices appear.
- partition — enumerate any partition tables.
The default is partition.
- automount=dev[:mountpoint[:fstype[:options]]] or
automount=@filename
- Create a mountpoint for dev at mountpoint.
If you don't specify a full path for the device, io-blk.so
uses the value of its devdir option as a prefix.
For example, if devdir is /dev (the
default), an option of automount=hd0t77:/disk mounts
/dev/hd0t77 at /disk.
The optional fstype specifies the filesystem type, after
which you can set options.
The choices of filesystem and the associated shared objects are:
- cd
- fs-cd.so
- dos
- fs-dos.so
- ext2
- fs-ext2.so
- mac
- fs-mac.so
- nt
- fs-nt.so
- qnx4
- fs-qnx4.so
- qnx6
- fs-qnx6.so
(Power-Safe filesystem)
- udf
- fs-udf.so
If you don't specify the type of filesystem, the library tries to determine it
automatically.
If the @filename version of this option
is used, the automounts are as specified in the given file.
The file is a list of mounts (using the same syntax as above),
separated by newline characters or commas.
Note:
You can't locate the filename file in the filesystem to be automounted:
it has to be available in an existing filesystem such as the image filesystem.
Optionally, you could locate it in any devb filesystem that's
already running.
To mount multiple filesystems on a (removable) device, specify that
the device is shared with a + prefix. For example,
automount=+fd0:/dos/a:dos,automount=+fd0:/fd:qnx4
For a list of common partition types, see the
Filesystems chapter
of the System Architecture guide.
- cache=total[:hash]
- Specify the total size of the disk buffer cache.
The buffer cache is
used as intermediate storage for all disk I/O, as well as providing
LRU caching for dirty delayed-write blocks and recently-read blocks.
By default, 15% of system RAM is assigned, with a minimum of 512 KB and
a maximum of 512 MB.
If you specify an explicit size, bounds of 512 KB and 3 GB are applied.
Normally this cache is allocated at startup;
if you specify alloc=demand, or the initial allocation failed,
then the cache is dynamically grown as required up to the specified size.
The optional hash argument specifies the size of the buffer
cache hash list.
You can specify any of the
suffixes
described above.
The default is 25% of the number of entries in the cache.
-
delwri=delay1[:delay2[:postpone]]
- Specify the delay time for write-behinds to the media.
A dirty disk block may remain in the cache without being physically
written to the disk, to improve performance.
The default is up to 3 seconds (delay1) for fixed media,
and 1 second (delay2) for removable media.
For more information, see
"Controlling writing operations,"
below.
The postpone argument specifies the number of seconds for
which to keep a dirty disk block in memory if it's
being continuously modified, before physically writing it to the disk.
This applies only to fixed media; for removable media, writes aren't
delayed beyond the delay2 period.
By default, postpone is the same as delay1.
- devdir=path
- The directory in which io-blk presents the physical devices
as block-special files. The default is /dev.
- devno=type
- Controls how major device numbers are requested; type
is one of:
- name — use the name of the device (e.g. hd,
cd).
- class — use the CAM class of the device
(e.g. direct, readonly).
- common — use a single class for all block devices.
The default is name.
- enumpart=order
- Set the order for enumerating disk partitions; one of the following:
- forward — enumerate slots 1 through 4, followed by
any extended partitions.
- reverse — enumerate slots 4 through 1, followed by
any extended partitions.
- windows — enumerate the active partition, followed
by any extended partitions, and then non-booting primary partitions.
For more information about this order, see
http://support.microsoft.com/kb/q51978/.
The default is forward.
- exclusive
- Require/obtain exclusive access of the mount device.
This means that when a filesystem is mounted on a partition, nothing
else is allowed to open that raw partition until the filesystem is unmounted.
- fdinfo=mode
- Specify the storing of open file names for the iofdinfo()
query. The options for mode are:
- ncache — try to reconstruct the file name from
the contents of the
directory name cache. Don't rely on this option to supply the names of all
open files (a file's name is supplied only if all components of its pathname
are in the name cache).
- always — store the name used in each
open() call to ensure that this name is always available.
- never — never supply the name of an open file.
The default is always.
- map=size[:hash]
- Set the number of entries in a cache used to map translations from
logical blocks to physical ones.
If this option isn't specified, the size is
based in the value of the vnode option.
The hash argument specifies the size of the associated hash
list; the default is 1/6 of the number of entries in the map.
- memory=type1[:type2[:type3[:type4]]]
- Specify the typed memory pool or pools to use.
For example, memory=sysram&below4G:sysram
says to try sysram with the below4G modifier,
and if no such region exists, then try plain sysram.
(The same option works on systems with more or less than 4 GB of RAM.)
Note:
- It's up to the startup to set up typed memory.
Use pidin syspage=asinfo to see the list.
- Generally you don't need to specify the memory option, in
which case io-blk.so uses the normal mmap() pool;
but on a system with more than 4 GB of RAM, it's mandatory.
- You might have to quote this option, in order to prevent the shell from
interpreting special characters such as an ampersand (&).
For more information about typed memory, see
"Typed memory"
in the Interprocess Communication (IPC) chapter of the
System Architecture guide.
- mfu=segmentation
- Specify the MFU:MRU segmentation (typically as a percentage, but it
can be a size).
You can specify any of the
suffixes
described above.
The default is a 50:50 split.
The first time a sector is accessed, it goes into the MRU
(Most Recently Used) region;
if it's accessed again, it goes into the MFU (Most Frequently Used).
The oldest cache blocks are removed from either the MRU or MFU region, so
as to preserve this ratio.
- naming=scheme
- Set the device/partition naming scheme. The default is
0#.
For more information, see
"Naming schemes,"
below.
- ncache=size[:hash]
- Specify a name cache of size entries.
Using more name cache entries speeds
up path/file lookups at the expense of memory. Setting the
size to 0 disables name caching.
If this option isn't specified, the size is
determined from the vnode option.
The hash argument specifies the size of the associated hash
list; the default is 1/6 of the size of the number of entries in the
name cache.
- priority=prio
- Set the priority of periodic filesystem callouts.
The default is 21.
- ra=min[:max]
- Set the minimum and maximum sizes of the read-ahead buffers.
You can specify any of the
suffixes
described above.
The default minimum is the system page size; the default maximum is
64 times the system page size.
- ramdisk=size[:sector]
- Create an internal ramdisk device (/dev/ramX)
of the specified size, with the specified sector size.
The size and sector variables can use the
suffixes described above.
The sector size must be power of 2 in the range from 512 through 4096 bytes;
the default is 512 bytes.
Note:
The initial contents of this memory device are unspecified, so you must
format it before using it as a filesystem (for example, with
dinit
for a QNX 4 filesystem, or
mkqnx6fs
for a Power-Safe filesystem).
- rmvpoll=period
- The polling period for removable media.
The default is 0 seconds.
- rmvto=delay
- Specify a removable media timeout (default: 2 seconds).
After the specified period of inactivity,
a disk access prompts validation of the media with the driver;
if the driver reports that the media has been changed, all data blocks and
cached information for that device are discarded and relearned.
This option can take a value of none, which disables removable
media relearning.
This isn't very useful for real removable devices (e.g. CDs), but if
your device is on-board SD, or USB that isn't
removable but the driver is advertising it as such, you can disable
the verification overheads.
- thread=max[:low[:high]]
- Set the thread pool parameters (maximum, low water, and high water).
The default is 12:2:5.
- verbose[=level]
- Be verbose.
The output is sent to the system logger,
slogger.
The optional level argument is a series of alphabetic
characters that indicates the categories of event to log:
- b — bad blocks
- c — configuration
- d — direct I/O
- f — fsys module (fs-*)
- i — input
- o — output
- r — removable
- v — virtual filesystem (VFS)
An option of blk verbose means all (bcdfiorv),
blk verbose=io
means input plus output, blk verbose=!r means everything except
removable, and so on.
The default is none.
- vnode=size[:max]
- Specify the number of vnode entries (filesystem-independent inodes)
The default is 1024 entries. Up to
size vnodes may be active. Vnodes remain in this cache
when the corresponding file is closed, making subsequent opens
faster.
The max argument allows a momentary large number files
to be open at the same time;
the cache tries to stay at size entries, but grows if
needed up to max entries before giving an error of
ENFILE.
The default value of max is 3 times size.
Filesystem options
You can apply the following options globally (in the blk section)
or to a specific filesystem (for example, in the qnx4 section
for a QNX 4 filesystem):
- after
- Mount the filesystem so that it's resolved after any other filesystems
mounted at the same pathname (in other words, it's placed behind any
existing mount).
When you access a file, the system looks on this filesystem last, and
only if the file wasn't found on any other filesystems.
- before
- Mount the filesystem so that it's resolved before any other filesystems
mounted at the same pathname (in other words, it's placed in front of
any existing mount).
When you access a file, the system looks on this filesystem first.
-
commit=level
- Set the committing level of the filesystem, which controls how
dirty system/user blocks are written to disk. The level is
one of none, low, medium (the default),
and high. If it's none, all writes are time-delayed
(as specified by the delwri option); at high, all
writes are performed synchronously.
For more information, see
"Controlling writing operations,"
below.
- error=action
- Set the action to perform when a fs-* filesystem module
detects an internal error. The action is one of:
- ebadfsys — simply return EBADFSYS to
the client.
- mountro — return EBADFSYS to the client
and remount the affected filesystem as read-only.
The default is ebadfsys.
- marking=mode
- Set the filesystem-dirty marking behavior.
The mode must be none or mount (the
default).
If marking is on, the filesystem is marked as being dirty when it's
mounted, and it's marked as being clean when it's unmounted.
The method of marking depends on the filesystem.
- [no]atime
- Update/don't update the file's directory entry if the only change
is the access time. The noatime option isn't strict
POSIX 1003.2 behavior, but it's faster.
- [no]creat
- Allow/don't allow files to be created on this filesystem.
- [no]exec
- Allow/don't allow file execution from this filesystem.
- [no]lock
- Lock/don't lock removable media. If locked, the medium is treated as
fixed.
- [no]rmv
- Don't/do allow invalid mounts on removable media (re-insert).
- [no]suid
- Ignore/don't ignore the set-user ID bit on files in this filesystem.
- ro
- Mount all drives/filesystems as read-only.
- rw
- Mount all drives/filesystems as read-write (if the physical media permit).
This is the default.
For more information about the before and after
options, see
"Ordering mountpoints"
in the Process Manager chapter of the System Architecture guide.
Note:
If you specify a filesystem option (e.g.
noatime) on a block
filesystem, and then you remount the filesystem (
mount -u),
the flag is ignored.
The absence of the flag is interpreted as your asking for access time
updates to be turned on.
There's no way for the code in
io-blk to determine if you
wanted to use the default, and therefore didn't specify anything,
or really did want access time updates to be turned on, and therefore
didn't specify anything.
Similarly, if you mounted the filesystem as read-only and then remount
it, the filesystem returns to its default setting.
To maintain the settings, specify the options again using the -o
option for the mount command.
For example:
mount -u -o noatime ...
Description:
The io-blk.so library provides block I/O support, as used by the
devb-*,
drivers, and loads filesystem drivers
(fs-*)
as necessary.
The default values of the map and ncache options are
based on the value of the vnode option.
This arrangement lets you configure a system by specifying the cache size
and the number of files, and letting the library set the other options.
Controlling writing operations
There are various types of writing operations:
- Synchronous (SYNC)
- Start immediately and wait for completion.
- Asynchronous (ASYNC)
- Start immediately but don't wait for completion.
- Delayed (DELWRI)
- Don't start until after a timeout period and then perform as asynchronous.
The blk delwri=
option controls the timeout for the delayed format; if you set this
option to 0, a delayed writing operation is the same as asynchronous.
- As required
- Write only if you have to.
The types of data include:
- User
- What you
read()
and
write().
- Metadata
- Things associated with
stat(),
such as times and IDs.
- Filesystem
- Things such as bitmaps, extents, etc.
If a file has no links, the "as required" form of write operation
is used, never going to disk unless the buffer or cache is needed (since the
file has no links, the data isn't expected to be accessible after a
power failure).
If you open a file with O_SYNC, the synchronous format is
always used.
Otherwise, the blk
commit
level controls the type of write to use
for each level of data:
commit=
|
Filesystem data
|
Metadata
|
User data |
none
|
DELWRI
|
DELWRI
|
DELWRI |
low
|
ASYNC
|
DELWRI
|
DELWRI |
medium
|
SYNC
|
DELWRI
|
DELWRI |
high
|
SYNC
|
SYNC
|
SYNC |
CAUTION:
If you specify
commit=none, you lose
all write ordering (both for single multiblock updates and multiple-user
operations).
Hence, your chances of a useful recovery following a power failure are poor.
We recommend that you use this option only if you have a uninterruptible
power supply (UPS)
, or if you don't mind using
dinit
on your filesystem as a recovery tool.
Calling
close()
might force a metadata update, but does nothing to the user data.
Calling
fsync()
always forces out any delayed-write blocks for the file,
and so is useful only when commit isn't high.
Naming schemes
You can use the naming=scheme option to specify
the naming scheme to use for devices and partitions.
The format of scheme is as follows:
- 0# (where 0 is any digit and sets the first/base number)
- The raw devices are named 0, 1, and so on,
and partitions are named from the
device with a t followed by the OS type of the partition (see
"Partitions"
in the Filesystems chapter of the System Architecture guide).
For example, a QNX partition could be named hd0t77.
For duplicate partitions, a period (.) and sequence number
are appended (e.g. hd0t12, hd0t12.1, and
hd0t12.2 for logical/extended DOS partitions).
This is the QNX Neutrino naming scheme.
- 0a (actually any digit and any letter; these set the
first/base name)
- The raw devices are named 0,1,..., and partitions are named a,
b, and so on (e.g. /dev/hd0,
/dev/hd0a, /dev/hd0b,
/dev/hd0c, and so on).
The name doesn't indicate the OS type of the partitions, just the order
in which they were found.
- a1 (actually any letter and any digit; these set the
first/base name)
- The raw devices are named a, b, and so on.
Primary partitions are named 1, 2, 3, and 4;
if you don't have four of them, the unused numbers are skipped.
Any extended partitions are numbered without gaps from 5 (e.g.
/dev/hda, /dev/hda1, /dev/hda2,
/dev/hda5, and so on).
The name doesn't indicate the OS type of the partition, just its
location.
This is the Linux naming scheme.
The default naming scheme is 0#.
CAUTION:
Change to a different naming scheme at your own risk:
- Some system components could have hard-coded assumptions about disk names.
- Don't use a different scheme unless you're in control of the entire system.
(For example, don't change it in a desktop installation, where
diskboot
scans for well-known hd0t77-style names).
- If you use a different scheme, you'll need
some external knowledge about what filesystem to mount on a partition,
because you won't have the tXXX naming hint.