Backing Up and Recovering Data
This chapter includes:
- Backup strategies
- Archiving your data
- Storage choices
- Remote backups
- QNX 4 disk structure
- File-maintenance utilities
- Recovering disks and files
- What to do if your system will no longer boot
No matter how reliable your hardware and electrical supply are, or how sure you are that you'll never accidentally erase all your work, it's just common sense to keep backups of your files. Backup strategies differ in ease of use, speed, robustness, and cost.
Although we'll discuss different types of archives below, here's a quick summary of the file extensions associated with the different utilities:
|.tar||pax or tar|
|.cpio||pax or cpio|
|.gz||gzip or gunzip|
|.tar.gz or .tgz||tar -z|
|.z or .F||melt|
No matter how robust a filesystem is designed to be, there will always be situations in the real world where disk corruption will occur. Hardware will fail eventually, power will be interrupted, and so on.
The QNX 4 filesystem has been designed to tolerate such catastrophes. It is based on the principal that the integrity of the filesystem as a whole should be consistent at all times. While most data is held in the buffer cache and written after only a short delay, critical filesystem data is written immediately. Updates to directories, inodes, extent blocks, and the bitmap are forced to disk to ensure that the filesystem structure on disk is never corrupt (i.e. the data on disk should never be internally inconsistent).
|The Power-Safe filesystem is designed so that it should never be corrupted; you'll always have a complete version of its data. For more information, see "Power-Safe filesystem" in the Filesystems chapter of the System Architecture guide. It's still a good idea to back up your data, but the part of this chapter on recovering data applies only to QNX 4 filesystems.|
If a crash occurs, you can such utilities as fdisk, dinit, chkfsys, and spatch to detect and repair any damage that happened to files that were open for writing at the time of the crash. In many cases, you can completely restore the filesystem.
Sometimes the damage may be more severe. For example, it's possible that a hard disk will develop a bad block in the middle of a file, or worse, in the middle of a directory or some other critical block.
Again, the utilities we've provided can help you determine the extent of such damage. You can often rebuild the filesystem in such a way as to avoid the damaged areas. In this case, some data will be lost, but with some effort, you can recover a large portion of the affected data.
Your backup strategy will consist of making one or more backups on a periodic or triggered basis. For each backup you incorporate in your strategy, you have to choose:
- the storage media and location of the backup data
- how to archive, and optionally, compress your data
- the contents, and frequency or trigger condition of the backup
- automated versus manual backup
- local versus remote control of the backup
Often, a comprehensive backup strategy incorporates some backups on the local side (i.e. controlled and stored on the same machine that the data is located on), and others that copy data to a remote machine. For example, you might automatically back up a developer's data to a second hard drive partition on a daily basis and have a central server automatically back up the developer's data to a central location on a weekly basis.
Early in the process of determining your backup strategy, you're likely to choose the location of your data backups and the media to store the backups on, because these choices are the primary factors that affect the hardware and media costs associated with the system. To make the best choice, first take a close look at what you need to back up, and how often you need to do it. This information determines the storage capacity, transfer bandwidth, and the degree to which multiple users can share the resource.
Your choices of backup media vary, depending on whether you create backup copies of your data on a local machine or on a remote machine by transferring the data via a network:
- Local backups offer the advantage of speed and potentially greater control by the end user, but are limited to backup technologies and media types that Neutrino supports directly.
- Remote backups often allow use of company-wide backup facilities and open up additional storage options, but are limited by the need to transfer data across a network and by the fact that the facilities are often shared, restricting your access for storing or retrieving your backups.
Here's a summary of some of the backup media you might consider, and their availability for local or remote backups:
|USB mass-storage device||Yes||Yes|
When backing up your data, you need to decide whether to back up each file and directory separately, or in an archive with a collection of other files. You also need to decide whether or not to compress your data to reduce the storage requirements for your backups.
The time lost to compression and decompression may be offset to a degree by the reduced time it takes to write or read the compressed data to media or to transfer it through a network. To reduce the expense of compression, you may choose to compress the backup copies of your data as a background task after the data has been copied -- possibly days or weeks after -- to reduce the storage requirements of older backups while keeping newer backups as accessible as possible.
You should back up often enough so that you can recover data that's still current or can be made current with minimal work. In a software development group, this may range from a day to a week. Each day of out-of-date backup will generally cost you a day of redevelopment. If you're saving financial or point-of-sale data, then daily or even twice-daily backups are common. It's a good idea to maintain off-site storage.
You can store backups of each of your files separately, or you can store them in an archive with other files that you're backing up. Files stored in an archive can be more readily identified as belonging to a certain time or machine (by naming the archive), more easily transferred in bulk to other systems (transfer of a single archive file), and can sometimes be more readily compressed than individual files can.
You have several archive formats to choose from under Neutrino, including pax, and tar. Neutrino also supports cpio (*.cpio), but we recommend it only when the archive needs to be readable by other systems that use cpio archives.
cp -t my_file backup_directory
echo my_file | pax -rw backup_directory
To back up an entire directory, type:
cp -Rt my_directory backup_directory
find my_directory -print | pax -rw backup_directory
To back up only certain files matching some criteria, use the find utility or other means of identifying the files to be backed up, and pipe the output to pax -rw, like this:
find my_directory -name '*.[ch]' | pax -rw backup_directory
To combine individual files into a single archive, use tar or pax. These utilities take all the files that you give them and place them into one big contiguous file. You can use the same utilities to extract discrete files from the archives.
|The filesystem can't support archives -- or any other files -- that are larger than 2 GB.|
When you use pax as an archiver (pax -w mode), it writes tar-format archives. Your choice of which to use is based on the command-line syntax that works better for you, not the format of the archives, because the formats are identical. The pax utility was created as part of the POSIX standard to provide a consistent mechanism for archive exchange (pax stands for Portable Archive eXchange), thus avoiding conflict between variants of the tar utility that behave differently.
You can create archives of:
- Single files (although there isn't much point in doing so
with tar and pax).
pax -wf my_archive.tar code.c
This command takes code.c and creates an archive (sometimes referred to as a "tarball") called my_archive.tar. The -wf options tell pax to write a file.
- Multiple files -- to archive more than one file, pass more
files on the end of the command line.
pax -wf my_archive.tar code.c header.h readme.txt
Pax archives them all together resulting in the archive, my_archive.tar.
- Directories -- just specify a directory name on the command line:
pax -wf my_archive.tar workspace
This command archives all the contents of workspace into my_archive.tar.
- Partitions -- specify the directory name of the partition:
pax -wf my_archive.tar /fs/hd0-t79
This command archives all the contents of the t79 partition into one very large archive, my_archive.tar.
You can keep the archive on your local system, but we recommend that you keep a copy of it on a remote system; if the local system gets physically damaged, or the hard disk is corrupted, you'll lose a local archive.
To extract from the archive, you can use pax with the -r option:
pax -rf my_archive.tar
or tar with the -x (extract), -v (verbose), and -f (filename) options:
tar -xvf my_archive.tar
|To view the contents of the archive without extracting them, use tar with the -t option instead of -x.|
An archive can be quite large -- especially if you archive the entire partition. To conserve space, you can compress archives, although it takes some time to compress on storage and decompress on retrieval.
Neutrino includes the following compressors and decompressors:
The best choice is usually gzip, because it's supported on many operating systems, while freeze is used mainly for compatibility with QNX 4 systems. There are also many third-party compressors.
|The gzip utility is licensed under the Gnu Public License (GPL), which is a consideration if you're going to distribute gzip to others as part of the backup solution you're developing.|
For example, to compress my_archive.tar to create a new file called my_archive.tar.gz, type:
This file is much smaller than the original one, which makes it easier to store. Some of the utilities -- including gzip -- have options that let you control the amount of compression. Generally, the better the compression, the longer it takes to do.
|The default extension is .tar.gz, but you'll see others, such as .tgz. You can use the -S option to gzip to specify the suffix.|
To decompress the archive, use the compressor's corresponding utility. In the case of a .gz or .tgz file, use gunzip:
These commands decompress the file, resulting in my_archive.tar. You can also use tar with the -z option to extract from the archive without decompressing it first:
tar -xzf my_archive.tgz
You can back up to a CD by using a CD burner on the Neutrino system or by creating an ISO image and copying it to a system with a CD burner that can burn ISO images.
You can use cdrecord to burn CDs on a Neutrino system. To get this software, go to the Third-party software section of the Download area on our website, http://www.qnx.com/.
In either case, you have to create an ISO image of the data that you want to burn to a CD. You can do this with mkisofs, a utility that's included with cdrecord.
Before you can create an ISO image, you need to arrange the files into the directory structure that you want to have on the CD. Then use mkisofs, like this:
mkisofs -l -f -r -joliet -quiet -V"My Label" -o my_iso_image.iso
This command creates an ISO image named my_iso_image.iso with the label, My Label, using the Joliet file format, allowing full 31-character filenames (-l), following all symbolic links when generating the filesystem (-f), and generating SUSP and RR records using the Rock Ridge protocol (-r).
Once you've created the ISO image, you can send the image to a system that can burn an ISO image or you can burn it using cdrecord:
cdrecord -v speed=2 dev=/dev/cd0 my_iso_image.iso
This command burns a CD at dual speed (2), using the CD burner called cd0, from the ISO image called my_iso_image.iso. For more information, see the documentation for cdrecord.
|For a list of supported CD drives, see the README file that comes with the cdrecord source code.|
You can also make the CD bootable, using cdrecord and its associated utilities, as follows:
- Create a bootable floppy that calls the needed scripts and includes the needed binaries in the image.
- Make an image of the floppy, using the
dd if=/dev/fd0 of=/floppy.img
- Create a directory with all the needed binaries, in the layout that you
want in your CD-ROM ISO image.
mkdir iso_image cp -Rc /bin iso_image/bin cp -Rc /etc iso_image/etc ....
- Make sure that the isocatalog is in /usr/share/cdburning on the system.
- Create the ISO image using mkisofs, making sure to
specify the catalog with the -c option.
mkisofs -l -f -r -joliet -quiet -V"My Label" -b floppy.img \ -c /usr/share/cdburning/isocatalog -o my_iso_image.iso
- Burn the ISO image to a CD.
Other forms of removable media are also useful for backing up data. Neutrino supports LS-120, magnetic optical (MO drives), internal ZIP drives, and USB mass-storage devices. Each has its own benefits and weaknesses; it's up to you to determine which form of media is best for backing up your data. For instructions on how to install this hardware, see the Connecting Hardware chapter in this guide.
|The instructions here are for copying from one hard disk to another of identical properties (size, make model). To make a copy of a drive that differs in size and make, contact technical support for the QNX_Drive_Copy utility.|
You can make identical images of hard drives under Neutrino, using simple utilities. This is called making a raw copy of the drive.
If you have an identical hard drive (manufacturer, size, model number), you can simply attach the drive to the system. Make sure you know which position the drive is set up as (e.g. EIDE Primary Slave).
Once you've attached the drive, boot the Neutrino system. The system should automatically detect the hard drive and create an entry in the /dev directory for it. The new entry should appear as /dev/hd1 if there are only two drives in the system. If there are more than two, then the drive could be hd1, hd2, and so on. In this case, use the fdisk to identify which drive is which. The new drive shouldn't have any partitions set up on it and should be blank.
|Be absolutely positive about the drives before continuing, because if you don't identify the drives correctly, you could copy the contents of the blank hard drive onto your original drive, and you'll lose all your data. There's no way to recover from this.|
Once you've identified the drives, type:
cp -V /dev/hd0 /dev/hd1
where hd0 is the original hard disk, and hd1 is the new drive that you're copying to.
This command copies everything from the first drive, including partition tables, boot loaders, and so on, onto the second drive. To test that the copy was successful, remove the original drive and put the backup drive in its place, then boot the system from the backup drive. The system should boot into Neutrino and look the same as your original drive. Keep the backup in a safe location.
Some Neutrino users have used ghost images for backups, but we don't recommend them. Partition information might not be restored properly, causing filesystems to not boot correctly. If you run fdisk again on the drive, the drive reports incorrect information, and fdisk writes incorrect data to the drive.
Remote backups are generally a much safer solution than storing a backup on a local system, because a remote server is generally more reliable -- as the saying goes, don't put all your eggs in one basket.
Depending on your situation, it might make sense to buy a good system with lots of server-grade hardware, and then buy regular systems to develop on. Make regular backups of your server.
Neutrino ships with a copy of the CVS (Concurrent Versions System) client utility. In order to use CVS, you need to have a CVS server (preferably one that your company administers). CVS lets you manage your source archives safely and remotely. For more details, see the Using CVS chapter in this guide.
Storing a second backup on a remote system is often a simple yet effective way to prevent the loss of data. For example, if you have a basic archive of your code in a separate directory on your local system, and then the hard disk breaks down for some unforeseen reason, you've lost your local backup as well. Placing a copy on a remote filesystem effectively lowers the chance of losing data -- we highly recommend it.
|If you place a file on a non-Neutrino filesystem, you might lose the file's permissions. Files under Neutrino (like other UNIX systems) have special file permissions (see Working with Files) that are lost if you store individual files on a Windows-based filesystem. If you create an archive (see "Archiving your data," above), the permissions are preserved.|
There are other remote version systems (similar to CVS) that are available to Neutrino via third-party solutions. Many of them are free; search the Internet for the tools that are right for your company and project.
If you ever have a problem with a QNX 4 filesystem, you'll need to understand how it stores data on a disk. This knowledge will help you recognize and possibly correct damage if you ever have to rebuild a filesystem. The <sys/fs_qnx4.h> header file contains the definitions for the structures that this section describes.
For an overall description of the QNX 4 filesystem, see the Working with Filesystems chapter.
A QNX 4 filesystem may be an entire disk (in the case of floppies) or it may be one of many partitions on a hard disk. Within a disk partition, a QNX 4 filesystem contains the following components:
- loader block
- root block
- bitmap blocks
- root directory
- other directories, files, free blocks, etc.
These structures are created when you initialize the filesystem with the dinit utility.
The first physical block of a disk partition is the loader block. It contains the bootstrap code that the BIOS loads and then executes to load an OS from the partition. If a disk hasn't been partitioned (e.g. it's a floppy), this block is the first physical block on the disk.
The root block is the second block of a QNX 4 partition. It's structured as a standard directory and contains a label field and the inode information for these special files:
- the root directory of the filesystem (usually /)
The files /.boot and /.altboot contain images of the operating system that can be loaded by the QNX bootstrap loader.
Normally, the QNX loader loads the OS image stored in the /.boot file. But if the /.altboot file isn't empty, you can load the image stored in it. For more information, see the Controlling How Neutrino Starts chapter.
Several consecutive blocks follow the root block. The bitmap blocks form the bitmap for the QNX 4 partition. One bit exists for each block on the partition; thus one bitmap block is used for every 4096 disk blocks (corresponding to 2M of disk space).
If the value of a bit is zero, the corresponding block is unused. Unused bits at the end of the last bitmap block (for which there are no corresponding disk blocks) are turned on.
Bit assignments start with the least-significant bit of byte 0 of the first bitmap block -- which corresponds to QNX 4 block #1.
The root directory follows the bitmap blocks. The root directory is a "normal" directory (see the "Directories" section), with two exceptions:
- Both "dot" (.) and "dot dot" (..) are links to the same inode information, namely the root directory inode in the root block.
- The root directory always has entries for the /.bitmap, /.inodes, /.boot, and /.altboot files. These entries are provided so programs that report information on filesystem usage see the entries as normal files.
The dinit utility creates this directory with initially enough room for 32 directory entries (4 blocks).
The root directory (/) contains directory entries for several special files that always exist in a QNX 4 filesystem. The dinit utility creates these files when the filesystem is first initialized.
|/.||A link to the / directory|
|/..||Also a link to the / directory|
|/.bitmap||Represents a read-only file that contains a map of all the blocks on the disk, indicating which blocks are used.|
|/.inodes||A normal file of at least one block on a floppy/RAM disk and 16 blocks on other disks, /.inodes is a collection of inode entries. The first entry is reserved and used as a signature/info area. The first bytes of the .inode file are set to IamTHE.inodeFILE.|
|/.longfilenames||An optional file that stores information about files whose names are longer than 48 characters; see "QNX 4 filesystem" in Working with Filesystems.|
|/.boot||Represents an OS image file that will be loaded into memory during the standard boot process. This file will be of zero length if no boot file exists.|
|/.altboot||Represents an OS image file that will be loaded into memory during the alternate boot process. This file will be of zero length if no alternate boot file exists.|
A directory is simply a file that has special meaning to the filesystem; the file contains a collection of directory entries.
The bits in the i_status field indicate the type of the directory entry:
|0||0||Unused directory entry|
|0||1||Normal, used directory entry|
|1||0||Link to an entry in /.inodes (which should be used)|
The first directory entry is always for the . ("dot") link and includes a directory signature ("I[heart-symbol]QNX"). The hexadecimal equivalent of the [heart-symbol] character is 0x03. This entry refers to the directory itself by pointing to the entry within the parent directory that describes this directory.
The second entry is always for the .. ("dot dot") link. This entry refers to the parent directory by pointing to the first block of the parent directory.
Every directory entry either defines a file or points to an entry within the /.inodes file. Inode entries are used when the filename exceeds 16 characters or when two or more names are linked to a single file. If you've enabled support for long filenames, the root directory of the filesystem also includes the .longfilenames file, which stores information about files whose names are longer than 48 characters.
The first extent (if any) of a file is described in the directory/inode entry. Additional file extents require a linked list of extent blocks whose header is also in the directory/inode entry. Each extent block can hold location information for up to 60 extents.
Files with names greater than 16 characters, and files that are links to other files, are implemented with a special form of directory entry. These entries have the QNX4FS_FILE_LINK bit (0x08) set in the i_status field.
For these files, a portion of the directory entry is moved into the /.inodes file.
If the filename is longer than 48 characters:
- the l_fname field in the directory entry holds a 48-character truncated version of the name
- the l_lfn_block field points to an entry in .longfilenames
Extent blocks are used for any file that has more than a single extent. The i_xblk field in the directory entry points to one of these extent blocks, which in turn defines where the second and subsequent extents are to be found.
An extent block is exactly one 512-byte disk block with the following form:
Each extent block contains:
- forward/backward pointers
- a count of extents
- a count of all the blocks in all the extents defined by this extent block
- pointers and block counts for each extent
- a signature (IamXblk)
The first extent block also contains a redundant pointer to the first file extent (also described within the directory/inode entry). This lets you recover all data in the file by locating this block alone.
Files or file extents are groupings of blocks described by directory/inode entries; they have no structure imposed on them by the QNX 4 filesystem.
Most files in Neutrino have the following overall structure:
If a crash occurs, you can use the following file-maintenance and recovery utilities:
This section gives a brief description of these utilities; for more information, see the Utilities Reference.
The fdisk utility creates and maintains the partition block on a hard disk. This block is compatible with other operating systems and may be maintained by other OS versions of fdisk (although ours has the advantage of recognizing QNX-specific information). If the partition loader is missing or damaged, fdisk can create it.
|We recommend that you keep a hard copy of the partition table information for every disk in your network.|
The dinit utility creates (but the QNX 4 filesystem maintains) the following:
- loader block
- root block
- bitmap blocks
- root directory
- /.inodes file
- /.longfilenames file
If something destroys the first few blocks of your filesystem, you can try to recover them by using the -r option to dinit and then running chkfsys. For more information, see dinit in the Utilities Reference.
The chkfsys utility is your principal filesystem-maintenance tool.
|The chkfsys utility will claim that a Power-Safe filesystem is corrupt; use chkqnx6fs on this type of filesystem.|
The chkfsys utility:
- checks the directory structure of an entire disk partition, reports any inconsistencies, and fixes them, if possible
- verifies overall disk block allocation
- writes a new /.bitmap, upon your approval
The chkfsys utility assumes that the root block is valid. If the root block isn't valid, chkfsys complains and gives up -- you'll need to try restoring the root block with the dinit utility.
The dcheck utility checks for bad blocks on a disk by attempting to read every block on the drive. When you specify the -m option, dcheck removes any bad blocks from the disk allocation bitmap (/.bitmap).
If it finds the file /.bad_blks, dcheck updates the bitmap and recreates the /.bad_blks file. You can run dcheck a few times to increase your chances of recognizing bad blocks and adding them to the /.bad_blks file.
The zap utility lets root remove files or directories from the filesystem without returning the used blocks to the free list. You might do this, for example, if the directory entry is damaged, or if two files occupy the same space on the disk (an error).
If you zapped a file in error, it's sometimes possible to recover the zapped file using the zap utility with the -u option immediately after the deletion. You can recover a zapped file using zap under these conditions:
- the directory entry for that (now deleted) file must not be reused
- the disk blocks previously used by the file must not be reassigned to another file
You may sometimes find that files or directories have been completely lost due to disk corruption. If after running chkfsys, you know that certain key files or directories weren't recovered, then you might be able to use spatch to recover some or all of this data.
The spatch utility lets you browse the raw disk and patch minor problems. You can sometimes cure transient disk problems by reading and writing the failing block with spatch.
|Before using spatch, make sure you understand the details of a QNX 4 filesystem; see "QNX 4 disk structure" earlier in this chapter.|
The chkfsys utility is your principal tool for checking and restoring a potentially damaged filesystem. It can identify and correct a host of minor problems as well as verify the integrity of the disk system as a whole.
Normally, chkfsys requires that the filesystem be idle and that no files be currently open on that device. You'll have to shut down any processes that have opened files or that may need to open files while chkfsys is running.
To run chkfsys on a mountpoint, type:
The utility scans the entire disk partition from the root down, building an internal copy of the bitmap and verifying the consistency of all files and directories it finds in the process.
When it has finished processing all files, chkfsys compares the internal bitmap to the bitmap on the disk. If they match, chkfsys is finished. If any discrepancies are found, chkfsys will -- upon your approval -- rewrite the bitmap with data consistent with the files it was able to find and verify.
In addition to verifying block allocation (bitmap), chkfsys attempts to fix any problems it finds during the scan. For example, chkfsys can:
- "unbusy" files that were being written when a crash occurred
- fix the file size in a directory entry to match the real data
It's a good idea to run chkfsys as part of your regularly scheduled maintenance procedures -- this lets you verify that the data on your disk is intact. For example, you might consider running chkfsys on your network servers every time they boot. An automated check on the filesystem at boot time guarantees that chkfsys will attempt to fix any problems it finds during the scan. To automate this process, add chkfsys to the server's rc.local file (see Controlling How Neutrino Starts).
It's especially important to run chkfsys after a system crash, power outage, or unexpected system reboot so that you can identify whether any files have been damaged. The chkfsys utility checks the "clean" flag on the disk to determine whether the system was in a consistent state at the time.
The clean flag is stored on disk and is maintained by the system. The flag is turned off when the filesystem is mounted and is turned on when the filesystem is unmounted. When the clean flag is set, chkfsys assumes that the filesystem is intact. If chkfsys finds the clean flag off, it tries to fix the problem.
The chkfsys utility supports a -u option, which overrides a set clean flag and tells chkfsys to run unconditionally. You might want to override the clean flag when:
- dcheck discovers bad blocks
- you've intentionally deleted or zapped some files
- you want to force a general sanity check
The chkfsys utility normally requires exclusive use of the filesystem to provide a comprehensive verification of the disk.
|There is some risk in running chkfsys on a live system
-- both chkfsys and the filesystem are reading and
possibly writing the same blocks on the disk.
If you do this, and chkfsys writes something, it sends a message to the filesystem to invalidate itself, and that makes the filesystem remount itself and go back to the disk to reread all data. This marks any open files as stale; you'll get an error of EIO whenever you read or write, unless you close and reopen the files. This can affect things such as your system log file.
Static changes, in place, on files or directories that the filesystem doesn't currently have opened will probably not cause problems.
If you're running an application that can't afford downtime or you couldn't run chkfsys because files were open for updating, try to run chkfsys with the -f option:
chkfsys -f /dev/hd0t79
This invokes a special read-only mode of chkfsys that can give you an idea of the overall sanity of your filesystem.
Hard disks occasionally develop bad blocks as they age. In some cases, you might be able to recover most or even all the data in a file containing a bad block.
Some bad blocks are the result of power failures or of weak media on the hard disk. In these cases, sometimes simply reading then rewriting a block will "restore" the block for a short period of time. This may allow you to copy the entire file somewhere else before the block goes bad again. This procedure certainly can't hurt, and is often worth a try.
To examine the blocks within a file, use the spatch utility. When you get to a bad block, spatch should report an error, but it may have actually read a portion of "good" bytes from that block. Writing that same block back will often succeed.
At the same time, spatch will rewrite a correct CRC (Cyclic Redundancy Check) that will make the block good again (but with possibly incorrect data).
You can then copy the entire file somewhere else, and then zap the previously damaged file. To complete the procedure, you mark the marginal block as bad (by adding it to the /.bad_blks file), then run chkfsys to recover the remaining good blocks.
If this procedure fails, you can use the spatch utility to copy as much of the file as possible to another file, and then zap the bad file and run chkfsys.
If a previously working Neutrino system suddenly stops working and will no longer boot, then one of the following may have occurred:
- the hardware has failed or the data on the hard disk has been damaged
- someone has either changed/overwritten the boot file or changed the system initialization file -- these are the two most common scenarios
The following steps can help you identify the problem. Where possible, corrective actions are suggested.
- Try booting from CD or across the network.
- If you have a network to boot over, try booting your machine over the network. Once the machine is booted, you'll need to log in as root.
- If you don't have a network, boot from your installation CD. The filesystem will already be running in this case, and you'll be logged in as root.
- Start the hard disk driver.
For example, to start a driver for an Adaptec series 4 SCSI adapter, type:
devb-aha4 options &
If you're using another type of driver, enter its name instead. For example:
devb-eide options qnx4 options &
This should create a block special file called /dev/hd0 that represents the entire hard disk.
- Run fdisk.
Running the fdisk utility will immediately give you useful information about the state of your hard disk.
The fdisk utility might report one of several types of problems:
Problem: Probable cause: Remedy: Error reading block 1 Either the disk controller or the hard disk itself has failed. If the disk is good, replacing the controller card might let you continue using the disk. Otherwise, you'll have to replace the hard drive, reinstall Neutrino, and restore your files from backup. Wrong disk parameters Your hardware has probably "lost" its information about this hard drive -- likely because the battery for the CMOS memory is running low. Rerunning the hardware setup procedure (or the programmable option select procedure on a PS/2) will normally clear this up. Of course, replacing the battery will make this a more permanent fix. Bad partition information If the disk size is reported correctly by fdisk, but the partition information is wrong, then the data in block 1 of the physical disk has somehow been damaged. Use fdisk to recreate the correct partition information. It's a good idea to write down or print out a hard copy of the correct partition information in case you ever have to do this step.
- Mount the partition and the filesystem.
At this point, you have verified that the hardware is working (at least for block 1) and that a valid partition is defined for Neutrino. You now need to create a block special file for the QNX 4 partition itself and to mount the block special file as a QNX 4 filesystem:
mount -e /dev/hd0 mount /dev/hd0t79 /hd
This should create a volume called /dev/hd0t79. Depending on the state of the QNX 4 partition, the mount may or may not fail. If the partition information is correct, there shouldn't be any problem. Since the root (/) already exists (on a CD or on a remote disk on the network), we've mounted the local hard disk partition as a filesystem with the name /hd.
Your goal now would be to run the chkfsys utility on the disk to examine -- and possibly fix -- the filesystem.
|If you booted from CD and you don't suspect there's
any damage to the filesystem on your hard disk (e.g. the
system was unable to boot because of a simple error
introduced in the boot file or system initialization file),
you can see up a symbolic link to your hard disk partition in the
process manager's in-memory prefix tree:
ln -sP /hd /
If you run this command, you can skip the rest of this section.
If the mount fails, the first portion of the QNX 4 partition is probably damaged (since the driver will refuse to mount what it considers to be a corrupted filesystem).
In this case, you can use the dinit utility to overlay enough good information onto the disk to satisfy the driver:
dinit -hr /dev/hd0t79
The -r option tells dinit to rewrite:
- the root block
- the bitmap (with all blocks allocated)
- the constant portions of the root directory
You should now be able to reissue the mount command and once again try to create a mountpoint for a QNX 4 filesystem called /hd.
After doing this, you'll need to rebuild the bitmap with chkfsys, even on a good partition.
At least a portion of your QNX 4 filesystem should now be accessible. You can use chkfsys to examine the filesystem and recover as much data as possible.
If the hard disk is mounted as /hd (e.g. the machine boots from CD), enter:
If the hard disk is mounted as / (e.g. a network boot), enter:
In either case:
- If possible, you should run chkfsys from somewhere other than the filesystem that you're trying to recover.
- Make note of any problems reported and allow chkfsys to fix as much as it can.
What you do next depends on the result of running chkfsys.
If, for any reason, your disk is completely unrecoverable, you might be able to use spatch (see above) to patch your files and directories. In some cases, you may need to reinstall Neutrino and restore your disk from your backup files.
If significant portions of the filesystem are irreparably damaged, or important files are lost, then restoring from backup might be your best alternative.
If your filesystem is intact, yet the machine still refuses to boot from hard disk, then either of the following is probably damaged:
- the partition loader program in physical block 1
- the Neutrino loader in the first block of the QNX 4 partition
To rewrite a partition loader, use fdisk:
fdisk /dev/hd0 loader
To rewrite the QNX loader, use dinit:
dinit -b /dev/hd0t79
You should now be able to boot your system.