User Tools

Site Tools


correlator:loaddisks

This is an old revision of the document!


How to load XRAID disk sets

  • Ensure no one has opened files on the disks: lsof in the directories where the data is located.
  • You may also want to see whether anyone is using the machine. Use the “w” command.
  • If necessary, disable NFS on the computer to which the destination xraid is attached. I don't know how to do this safely on cuppa02 because it removes /nfs/apps and /home as well
    • We don't typically use NFS to export disk data, but we can by simply editing /etc/exports.
  • Stop the disks and prepare to power down the xraid:
    • This should be done by unmounting the file systems and removing the scsi device entries. Each of the cuppa xraid nodes has a script designed to do this (each script is specific to the machine because the device IDs can be different). The script must be run from a root shell because it requires access to the /proc file system.
> sudo tcsh
# /root/unload_cuppa01.csh (replace 01 with the number of the node you are using)
# exit
  • If any of the xraid file systems cannot be unmounted, the script exits and you will have to figure out what is hanging on to the file system before proceeding.
  • If (for any reason) you need to manually unmount one of the xraid file systems without removing the associated SCSI device, use the following:
sudo umount /exports/xraid/*
  • To make sure that the devices have been properly removed, try:
 sudo fdisk -l

You should only see the internal disks.

  • Go to the cluster room. Power down the xraid (this could be done in software from the comfort of your office through “XRAID admin tools”, which isn't yet installed).
  • Remove disks and replace in appropriate cases in order!
  • Insert new disks, #1 on the left, #7 on the right. Check contacts on disks (and in XRAID if possible) before inserting into XRAID chassis. Disks slide in and then require a final push, you should hear them lock (thump!) into place, at which stage with the handle depressed they will be flush with the chassis.
  • Power on XRAID. This should be done in the cluster room so you can watch for alarms/red lights. The power button is on the BACK of the chassis.
  • If all disks come up green, you can go back to your office! Otherwise, open XRAID admin tools and identify the source of the problem. If a disk has failed, it might be worth trying to power-cycle and re-insert. Sometimes the disk itself is fine, but not properly seated in the chassis. If a spare disk is required, you should be able to insert it in place of the failed one (even while the chassis is running) and the array will automatically rebuild using the new disk. If using an OLD disk or trying to re-insert an improperly seated disk, you may have to explicitly make it available to the array, because it will already contain RAID information. See Rebuilding XRAIDs.
  • Once the XRAID is running, you can reload the SCSI devices and mount their file systems. There should be no reason to reboot the host node.
> sudo tcsh
# /root/load_cuppa01.csh (replace 01 with the number of the node you are using)
# exit
  • The script will print out a list of each file system that was successfully mounted (and an error message for everything that did not mount). Hopefully the list of mounted devices will match your expectations. If not, take a good look at the error messages.
  • If you need to mount any of the file systems manually, trust the system! Mount devices using
    sudo mount /exports/xraid/?_?

    where ?_? is l_1, l_2, l_3, r_1, r_2, r_3 as required (left and right referring to disk banks in a chassis). We believe that this will mount the first left device in l_1 and so forth. However you should check this by ensuring that the data you expect to be on the device is in fact there before you try doing anything with it! ABSOLUTELY NO WARRANTY and all that. If you're concerned, then mount them manually in the cluster room, mounting /dev/sd?1 to the exports directory you want it to be, and making sure the left/right set of lights come on when you run the command. I believe devices always come up in the same order within a disk set, but disk sets are not always recognised in the same order (left-right, right-left, alternating) in a chassis.

  • Sometimes mounting doesn't work, no matter how long you wait or how many times you try, it complains that there is no valid file system on the partition. Until such time as we figure out a better solution, it may be necessary to restart the machine if it refuses to come good. Hopefully this will no longer be a problem now that we have scripts to properly reload the SCSI devices.

NFS mounting

  • It is now possible to mount the Xraids over nfs as any user, with e.g. mount /nfs/xraid01/l_1. To make life easier, there are shell scripts to do it in /home/corr/LBA/scripts/ mountxraids.sh and unmountxraids.sh
  • On cuppa01, type cssh cuppa to get a multi-window command prompt, so the scripts can be run simultaneously on all cuppas.
  • To change disks, after unmounting from NFS, it seems to be necessary to restart the nfs server on the local host node:
    sudo /etc/init.d/nfs-kernel-server restart

    in order to

    umount /exports/xraid01/l_1

    etc. without getting “device busy” errors. Be a bit careful doing this… try to check that no one is running stuff first. If other people are logged on may get stale file handle problems.


Back to XRAID menu

correlator/loaddisks.1226547570.txt.gz · Last modified: 2008/11/13 14:39 by chotan