User Tools

Site Tools


Rebuilding Failed XRAIDs

If, upon start up (or at any other time, although this is highly unlikely) you experience a disk failure in a set of VLBI disks, fear not! The disk sets are built using RAID5, which is a striping protocol wherein a 7-disk set in which any one disk fails can still have all its data retrieved. This leads to a loss of storage space, but given these disks are shipped around Australia using road freight, we think it's a loss well worth taking!!

Hopefully you won't need to deal with a fully un-initialised disk set. If you do, see Creating XRAIDs.

The rebuilding process involves only the XRAID firmware, and not the I/O on the fibrechannel, so a disk failure does not stop you using the disk set immediately, but you should begin the rebuild as soon as you detect a failure.

  • First note, most disk failures are NOT due to broken/damaged disks, but to poor seating in the chassis, or dirt on the contacts.
  • Power down the XRAID
  • Remove faulty disk. Inspect all contact pins on the disk and in the chassis, clean as necessary. Reinsert carefully.

NOTE, simply removing a disk, or having a red light, is enough to destroy the disk's membership in a RAID, and so regardless of whether cleaning and reinserting the disk is successful (ie you can select the disk in the XRAID admin tools), you'll still need to rebuild the array. THE RED LIGHT WILL NOT GO OUT but the alarm does turn off.

  • If the disk really does appear to have failed, you may need to substitute a new disk. However, there are no spare 750GB disks, and Curtin does not have spare disks of any other size either. Contact ATNF if you need a disk, but there may not be any available.
  • In the Disk Utilities menu in the XRAID admin tools, you need to select the faulty disk and Make Available.
  • If the XRAID recognises an available disk with a degraded array, the rebuilding process will commence automatically. When rebuilding a disk set, reslicing, repartitioning and reformatting are not necessary due to the clever RAID5 protocol!
  • Rebuilding a 750GB set requires approximately 24 hours (compared to a from-scratch build time of over 48 hours). During this time you can access your data as you would usually. There is some loss of speed, though.

It is only if two or more disks fail on the one start-up that you will lose all your data. Insert disks carefully!!!

Back to XRAID menu

correlator/rebuild.txt · Last modified: 2008/11/07 12:35 by chotan