QNX Community Resources
This chapter covers the following topics:
- Making a recovery floppy
- Overview of QNX disk structure
- File maintenance utilities
- Disk recovery procedures
- What to do if your system will no longer boot
- Recovering lost files and directories
The QNX filesystem achieves high throughput without sacrificing reliability. Although the filesystem is designed to be as robust as possible, there will always be situations in the real world where disk corruption will occur. Hardware will fail eventually, power will be interrupted, and so on.
The QNX filesystem has been designed to tolerate such catastrophes. It is based on the principal that the integrity of the filesystem as a whole should be consistent at all times. While most data is held in the buffer cache and written after only a short delay, critical filesystem data is written immediately. Updates to directories, inodes, extent blocks, and the bitmap are forced to disk to ensure that the filesystem structure on disk is never corrupt (i.e. the data on disk should never be internally inconsistent).
If a crash occurs, you can use the following file maintenance and recovery utilities:
These utilities will let you determine whether any damage was done to files that were open for writing at the time of the crash. These same utilities can also fix such damage, and in many cases will completely restore the filesystem.
Sometimes the damage may be more severe. For example, it's possible that a hard disk will develop a bad block in the middle of a file, or worse, in the middle of a directory or some other critical block.
Again, the utilities we've provided can help you determine the extent of such damage. You can often rebuild the filesystem in such a way as to avoid the damaged areas. In this case, some data will be lost, but with some effort, a large portion of the affected data may be recovered.
You should always have a recovery floppy on hand if, for whatever reason, a machine won't boot from hard disk.
|This procedure applies only to QNX systems that were shipped on diskette. If your QNX system came on CD-ROM, refer to the technote in /etc/readme/technotes/qnx_install, which documents a script for creating a boot floppy.|
Before you begin, make sure that you're logged in as root and that Fsys.floppy is running.
Now follow these steps:
- Insert a QNX boot disk in your floppy drive.
- Copy the image to a temporary file on your hard disk:
dd if=/dev/fd0 of=/tmp/floppy_image
- Insert a blank floppy in the drive. Format the floppy:
fdformat -k0 -z2 /dev/fd0
- Copy the image (from your temp file) to the floppy:
dd if=/tmp/floppy_image of=/dev/fd0
- Run dcheck to check the new floppy:
If this fails, retry steps 3 and 4 (fdformat and dd); if it fails twice, try a new floppy.
- Mount the floppy drive filesystem:
mount /dev/fd0 /fd
- To make some room on the floppy, remove the following:
- disktrap utility (14K):
- All the disk drivers from /fd/bin that aren't needed for this machine.
- disktrap utility (14K):
- Now copy these useful utilities to /fd/bin:
cp /bin/sin /fd/bin/sin cp /bin/zap /fd/bin/zap cp /bin/rm /fd/bin/rm cp /bin/ls /fd/bin/ls cp /bin/spatch /fd/bin/spatch cp /bin/chkfsys /fd/bin/chkfsys cp /usr/bin/elvis /fd/bin/elvis
- Create a /etc directory:
- Copy a termcap file:
cp /etc/termcap /fd/etc/termcap
You should also edit the termcap file and remove the entries you won't need. The only entry you'll need is the one for QNX.
- Now we need to create two links:
cd /fd/bin ln -s elvis vi ln -s fcat melt
- Finally, you'll need to modify the system initialization
file (/fd/etc/config/sysinit) so that it now
contains these lines:
Dev -n 10 & Dev.con -n 4 -O 256 & reopen /dev/con1 export PATH=/ram:.:/bin:/usr/bin export HOME=/ dinit /dev/ram mount /dev/ram /ram prefix -A /pipe=/ram prefix -A /tmp=/ram fcat /util.tar.F | pax -vr cp /bin/esh /ram/sh melt -z </etc/logo.F rtc hw echo Welcome to QNX 4.25 ontty /dev/con1 /bin/sh ontty /dev/con2 /bin/sh
That's it! Keep your recovery floppy in a safe place. If and when you ever need to use it, simply insert the floppy in a dead machine and power on -- the machine will boot QNX from the floppy.
In this section, we describe how the QNX filesystem stores data on a disk. Reading this section should help you recognize and possibly correct filesystem damage if you ever have to rebuild a filesystem.
If you have a C development package, the header file <sys/fsys.h> contains the definitions for all terms used in this section.
For an overall description of the QNX filesystem, see the Filesystem Manager chapter in System Architecture.
A QNX filesystem may be an entire disk (in the case of floppies) or it may be one of many partitions on a hard disk. Within a disk partition, a QNX filesystem contains the following components:
The following blocks are always found, in this order, on a QNX disk partition:
- root directory
- other data
The loader block is the first block of a QNX partition. It contains the bootstrap loader that loads the QNX OS into memory.
The root block is the second block of a QNX partition. It contains the directory entry for the root (/), the inode entries for the inode file, and a label field.
Several consecutive blocks follow the root block. The bitmap blocks form the bitmap for the QNX partition. One bit exists for each block on the partition, thus one bitmap block will be used for every 4096 disk blocks (corresponding to 2M of disk space).
If the value of a bit is zero, its corresponding block is unused. Unused bits at the end of the last bitmap block (for which there are no corresponding disk blocks) are turned on.
Bit assignments start with the least-significant bit of byte 0 of the first bitmap block -- which corresponds to QNX block #1.
The root directory follows the bitmap blocks. The root directory is a "normal" directory (see the "Directories" section). It is initially created by the dinit utility with enough room for 32 directory entries (4 blocks).
As the following illustration shows, the root directory (/) contains directory entries for several special files that always exist in a QNX filesystem. The dinit utility creates these files when the filesystem is first initialized.
|/.||A link to the / directory|
|/..||Also a link to the / directory|
|/.bitmap||Represents a read-only file consisting of the bitmap blocks.|
|/.inodes||A normal file of at least one block on a floppy/RAM disk and 16 blocks on other disks, /.inodes is a collection of inode entries. The first entry is reserved and used as a signature/info area. The first bytes of the .inode file are "IamTHE.inodeFILE".|
|/.boot||Represents an OS image file that will be loaded into memory during the standard boot process. This file will be of zero length if no boot file exists.|
|/.altboot||Represents an OS image file that will be loaded into memory during the alternate boot process. This file will be of zero length if no alternate boot file exists.|
A directory is simply a file that has special meaning to the filesystem. A directory file contains a collection of directory entries as shown in the following illustration:
The type of directory entry is determined by the bits in the d_status field, as follows:
|Bit 3 (_FILE_LINK)||Bit 0 (_FILE_USED)||Comment:|
|0||0||unused directory entry|
|0||1||normal, used directory entry|
|1||0||link to an entry in /.inodes (which should be used)|
The first directory entry is always for the file "." and includes a directory signature ("I[heart-symbol]QNX"). The hexadecimal equivalent of the [heart-symbol] character is 0x03. This entry refers to the directory itself by pointing to the entry within the parent directory that describes this directory.
The second entry is always for the ".." file. This entry refers to the parent directory by pointing to the first block of the parent directory.
Every directory entry either defines a file or points to an entry within the /.inodes file. Inode entries are used when the filename exceeds 16 characters or when two or more names are linked to a single file.
The first extent (if any) of a file is described in the directory/inode entry. Additional file extents require a linked list of extent blocks whose header is also in the directory/inode entry. Each extent block in the chain points to between 1 and 60 extents.
Files with names greater than 16 characters and links to other files are implemented with a special form of directory entry. These entries are identified with the _FILE_LINK bit (0x08) of the d_status field being set.
For these files, a portion of the directory entry is moved into the /.inodes file.
Extent blocks are used for any file that has more than a single extent. The directory entry di_xblk points to one of these extent blocks, which in turn defines where the second and subsequent extents are to be found.
An extent block is exactly one 512-byte disk block with the following form:
Each extent block contains:
- forward/backward pointers
- a count of extents
- a count of all the blocks in all the extents defined by this extent block
- pointers and block counts for each extent
- a signature ("IamXblk")
The first extent block also contains a redundant pointer to the first file extent (also described within the directory/inode entry). This lets you recover all data in the file by locating this block alone.
Files or file extents are groupings of blocks described by directory/inode entries; they have no structure imposed on them by the QNX filesystem.
Most files in QNX have the following overall structure:
The fdisk utility creates and maintains the partition block on a hard disk. This block is compatible with other operating systems and may be maintained by other OS versions of fdisk (although ours has the advantage of recognizing QNX-specific information). If the partition loader is missing or damaged, fdisk can create it.
|We recommend you keep a hard copy of the partition table information for every disk in your network.|
The dinit utility creates (but Fsys maintains) the following:
- boot block
- root block
- bitmap blocks
- root directory
- /.inodes file
The chkfsys utility is your principal filesystem maintenance tool. This utility:
- checks the directory structure of an entire disk partition, reports any inconsistencies, and fixes them, if possible
- verifies overall disk block allocation
- writes a new /.bitmap, upon your approval
The chkfsys utility assumes that the root block is valid. If the root block isn't valid, chkfsys will complain and give up -- you'll need to try restoring the root block with the dinit utility.
The dcheck utility verifies that a disk has been correctly formatted by attempting to read every block on the drive. When the -m option is specified, dcheck removes any bad blocks from the disk allocation bitmap (/.bitmap).
If the file /.bad_blks is found, dcheck will update the bitmap and recreate the /.bad_blks file. You can run dcheck a few times to increase your chances of bad blocks being recognized and added to the /.bad_blks file.
The zap utility lets root remove files or directories from the filesystem without returning the used blocks to the free list. You might do this for several reasons, including the following:
- the directory entry is damaged
- two files occupy the same space on the disk (an error)
If you zapped a file in error, it's sometimes possible to recover the zapped file using the zap utility with the -u option immediately after the deletion. You can recover a zapped file using zap under these conditions:
- the directory entry for that (now deleted) file must not be reused
- the disk blocks previously used by the file must not be reassigned to another file
The spatch utility lets you browse the raw disk and patch minor problems. You can sometimes cure transient disk problems by reading and writing the failing block with spatch.
The chkfsys utility is your principal tool for checking and restoring a potentially damaged filesystem. It can identify and correct a host of minor problems as well as verify the integrity of the entire disk system as a whole.
Normally, chkfsys requires that the filesystem be idle and that no files be currently open on that device. You'll have to shut down any processes that have opened files or that may need to open files while chkfsys is running.
To run chkfsys on a mount point, you'd simply type:
The utility scans the entire disk partition from the root down, building an internal copy of the bitmap and verifying the consistency of all files and directories it finds in the process.
When it has finished processing all files, chkfsys compares the internal bitmap to the bitmap on the disk. If they match, chkfsys is finished. If any discrepancies are found, chkfsys will -- upon your approval -- rewrite the bitmap with data consistent with the files it was able to find and verify.
In addition to verifying block allocation (bitmap), chkfsys attempts to fix any problems it finds during the scan. For example, chkfsys can:
- "unbusy" files that were written during a crash
- fix the file size in a directory entry to match the real data
It's a good idea to run chkfsys as part of your regularly scheduled maintenance procedures -- this lets you verify that the data on your disk is intact. For example, you might consider running chkfsys on your network servers every time they boot. An automated check on the filesystem at boot time guarantees that chkfsys will attempt to fix any problems it finds during the scan. To automate this process, add chkfsys to the server's sysinit.node file.
It's especially important to run chkfsys after a system crash, power outage, or unexpected system reboot so that you can identify whether any files have been damaged. The chkfsys utility checks the "clean" flag on the disk to determine whether the system was in a consistent state at the time.
The clean flag is stored on disk and is maintained by the system. The flag is turned off whenever a file is opened for update and is turned on after all open files have been closed and the associated data has been flushed from cache to disk. When the clean flag is set, chkfsys assumes that the filesystem is intact. If chkfsys finds the clean flag off, it tries to fix the problem.
The chkfsys utility supports a -u option, which overrides a set clean flag and tells chkfsys to run unconditionally. You might want to override the clean flag when:
- dcheck discovers bad blocks
- files have been deleted or zapped intentionally
- you want to force a general sanity check
The chkfsys utility normally requires exclusive use of the filesystem to provide a comprehensive verification of the disk.
|There is some risk to running chkfsys on a live system -- both chkfsys and the filesystem are reading and possibly writing the same blocks on the disk. Also, the filesystem has internal cached data about files and directories that can't be updated when chkfsys makes a change. But static changes, in place, on files or directories that Fsys doesn't currently have opened will probably not cause problems.|
If you're running an application that can't afford downtime or you couldn't run chkfsys because files were open for updating, try to run chkfsys with the -f option:
chkfsys -f /dev/hd0t77
This invokes a special read-only mode of chkfsys. It will give you a feeling for the overall sanity of your filesystem.
Hard disks occasionally develop bad blocks as they age. In some cases, you might be able to recover most or even all the data in a file containing a bad block.
Some bad blocks are the result of power failures or of weak media on the hard disk. In these cases, sometimes simply reading then rewriting a block will "restore" the block for a short period of time. This may allow you to copy the entire file somewhere else before the block goes bad again. This procedure certainly can't hurt, and is often worth a try.
To examine the blocks within a file, you use the spatch utility. When you get to a bad block, spatch should report an error, but it may have actually read a portion of "good" bytes from that block. Writing that same block back will often succeed.
At the same time, spatch will rewrite a correct CRC (Cyclic Redundancy Check) that will make the block good again (but with possibly incorrect data).
You can then copy the entire file somewhere else, and then zap the previously damaged file. To complete the procedure, you mark the marginal block as bad (by adding it to the /.bad_blks file), then run chkfsys to recover the remaining good blocks.
If this procedure fails, you can use the spatch utility to copy as much of the file as possible to another file, and then zap the bad file and run chkfsys.
If a previously working QNX system suddenly stops working and will no longer boot, then one of the following may have occurred:
- the hardware has failed or the data on the hard disk has been damaged
- someone has either changed/overwritten the boot file or changed the system initialization file -- these are the two most common scenarios
The following steps can help you identify the problem. Where possible, corrective actions are suggested.
Step 1 -- Try booting from floppy or across the network
If you have a network to boot over, try booting your machine over the network. Once the machine is booted, you'll need to log in as root and then start up a local filesystem:
If you don't have a network, you'll need to boot from your recovery floppy (described earlier in this section) or from the QNX boot floppy that was used to install your system onto the hard disk. The filesystem will already be running in this case, and you'll be logged in as root.
Step 2 -- Start the hard disk driver
You now have to start the appropriate hard disk driver. For example, to start a driver for an Adaptec series 4 SCSI adapter, you would type:
If you're using another type of driver, enter its name instead.
This should create a block special file called /dev/hd0 that represents the entire hard disk.
Step 3 -- Run fdisk
Running the fdisk utility will immediately give you useful information about the state of your hard disk.
The fdisk utility might report one of several types of problems:
|Error reading block 1||Either the disk controller or the hard disk itself has failed.||If the disk is good, replacing the controller card might let you continue using the disk. Otherwise, you'll have to replace the hard drive, reinstall QNX, and restore your files from backup.|
|Wrong disk parameters||Your hardware has probably "lost" its information about this hard drive -- likely because the battery for the CMOS memory is running low.||Rerunning the hardware setup procedure (or the programmable option select procedure on a PS/2) will normally clear this up. Of course, replacing the battery will make this a more permanent fix.|
|Bad partition information||If the disk size is reported correctly by fdisk, but the partition information is wrong, then the data in block 1 of the physical disk has somehow been damaged.||Use fdisk to recreate the correct partition information. It's a good idea to write down or print out a hard copy of the correct partition information in case you ever have to do this step.|
Step 4 -- Mount the partition and the filesystem
At this point, you have verified that the hardware is working (at least for block 1) and that a valid partition is defined for QNX. You now need to create a block special file for the QNX partition itself and to mount the block special file as a QNX filesystem:
mount -p /dev/hd0 /dev/hd0t77 /hd
This should create a volume called /dev/hd0t77. Depending on the state of the QNX partition, the mount may or may not fail. If the partition information is correct, there shouldn't be any problem. Since the root (/) already exists (on a floppy or on a remote disk on the network), we've mounted the local hard disk partition as a filesystem with the name /hd.
Your goal now would be to run the chkfsys utility on the disk to examine -- and possibly fix -- the filesystem.
|If you booted from floppy and you don't suspect there's
any damage to the filesystem on your hard disk (e.g. the
system was unable to boot because of a simple error
introduced in the boot file or system initialization file),
you can change the root prefix to your hard disk partition
at this point with the following command, which will resume
normal operation of the system:
/hd/bin/prefix -R /=/hd/
If you run this command, you can skip the rest of this section.
If the mount fails, the first portion of the QNX partition is probably damaged (since Fsys will refuse to mount what it considers to be a corrupted filesystem).
In this case, you can use the dinit utility to overlay enough good information onto the disk to satisfy Fsys:
dinit -hr /dev/hd0t77
The -r option tells dinit to rewrite:
- the root block
- the bitmap (with all blocks allocated)
- the constant portions of the root directory
You should now be able to reissue the mount command and once again try to create a mount point for a QNX filesystem called /hd.
After doing this, you'll need to rebuild the bitmap with chkfsys, even on a good partition.
Step 5 -- Run chkfsys
At least a portion of your QNX filesystem should now be accessible. You can use chkfsys to examine the filesystem and recover as much data as possible.
If the hard disk is mounted as /hd (e.g. the machine boots from floppy), enter:
If the hard disk is mounted as / (e.g. a network boot), enter:
In either case, you should make note of any problems reported and allow chkfsys to fix as much as it can. What you do next depends on the result of running chkfsys.
If, for any reason, your disk is completely unrecoverable, read the next section, "Recovering lost files and directories." In some cases, you may need to reinstall QNX from floppy and restore your disk from your backup files.
If significant portions of the filesystem are irreparably damaged, or important files are lost, then restoring from backup might be your best alternative.
If your filesystem is intact, yet the machine still refuses to boot from hard disk, then either of the following is probably damaged:
- the partition loader program in physical block 1
- the QNX loader in the first block of the QNX partition
To rewrite a partition loader, use fdisk:
fdisk /dev/hd0 loader
To rewrite the QNX loader, use dinit:
dinit -b /dev/hd0t77
You should now be able to boot your system.
You may sometimes find that files or directories have been completely lost due to disk corruption. If after running chkfsys you know that certain key files or directories were not recovered, then you might be able to use spatch to recover some or all of this data.
Before attempting this, you should first familiarize yourself with the details of a QNX filesystem (see "Overview of QNX disk structure" in this chapter). You should also study the documentation for the spatch utility in the Utilities Reference.