User Tools

Site Tools


correlator:loaddisks

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
correlator:loaddisks [2008/11/13 15:54]
hbignall
correlator:loaddisks [2014/03/26 19:31] (current)
hbignall
Line 1: Line 1:
 ====== How to load XRAID disk sets ====== ====== How to load XRAID disk sets ======
  
-  * Ensure no one has opened ​files on the disks: ''​lsof''​ in the directories where the data is located.+  ​* **Note:** as we seldom (never?) use NFS mounting now, it is usually OK to simply unmount the Xraids, power off, change disks, power on and mount the new disks. Otherwise, the following instructions may be useful. 
 +  ​* Ensure no one has left open files on the disks: ​Run ''​lsof''​ in the directories where the data is located.
   * You may also want to see whether anyone is using the machine. Use the "​w"​ command.   * You may also want to see whether anyone is using the machine. Use the "​w"​ command.
-  * If necessary, disable ​NFS on the computer to which the destination xraid is attached. ​//I don't know how to do this safely ​on cuppa02 ​because ​it removes ​/nfs/apps and /home as well//+  * Disable ​NFS on the computer to which the destination xraid is attached ​(may not be necessary, but always check)The safest way to do this, especially ​on cuppa02 ​(as it also exports /home and /nfs/apps) is to edit the nfs exports table ''/​etc/​exports'' ​and comment out the xraid entries (by adding a '#'​ at the start of the line), then reload the nfs server (hopefully this will not cause any interruption to its operation) 
 +<​code>​ 
 +>​sudo ​/etc/init.d/​nfs-kernel-server reload 
 +</code>
     * We don't typically use NFS to export disk data, but we can by simply editing /​etc/​exports.     * We don't typically use NFS to export disk data, but we can by simply editing /​etc/​exports.
-  ​Stop the disks and prepare ​to power down the xraid: +    ​If one of your partitions IS NFS mounted at the time, this may just produce errors. In this case you may have to manually figure out which node(s) ​the partition is mounted onI don't know if there'​s an easy way to do this. You'll need to visit each node, check if the partition is mounted (eg ''​ls /​nfs/​xraid0?/?​_?''​), and if so, unmount ​it (''​sudo umount ​/nfs/​xraid0?/?​_?''​). 
-    * This should be done by unmounting ​the file systems and removing the scsi device entriesEach of the cuppa xraid nodes has a script designed ​to do this (each script is specific ​to the machine because the device IDs can be different). The script must be run from a root shell because ​it requires access to the /proc file system.+  * Stop the disks and unmount the file systems. 
 <​code>​ <​code>​
-> sudo tcsh +> sudo umount ​/exports/xraid0X/*
-/root/unload_cuppa01.csh (replace 01 with the number of the node you are using) +
-# exit+
 </​code>​ </​code>​
-  ​* If any of the xraid file systems cannot be unmounted, ​the script exits and you will have to figure out what is hanging on to the file system ​before proceeding. + 
-  * If (for any reason) you need to manually unmount one of the xraid file systems without removing the associated SCSI device, use the following:​ +  ​* If any of the xraid file systems cannot be unmounted, you will have to figure out which process ​is hanging on before proceeding. 
-<​code>​sudo umount /​exports/​xraid/​*</​code>​ +  * Power down the xraid (this could be done in software from the comfort of your office through "​XRAID ​Admin Tools", which aren't yet installed). 
-  * To make sure that the devices have been properly removed, try: +  * Remove disks and replace in appropriate cases in the correct ​order!
-<​code>​ sudo fdisk -l</​code>​ +
-You should only see the internal disks. +
-  * Go to the cluster room. Power down the xraid (this could be done in software from the comfort of your office through "​XRAID ​admin tools", which isn't yet installed). +
-  * Remove disks and replace in appropriate cases in order!+
   * Insert new disks, #1 on the left, #7 on the right. Check contacts on disks (and in XRAID if possible) before inserting into XRAID chassis. Disks slide in and then require a final push, you should hear them lock (thump!) into place, at which stage with the handle depressed they will be flush with the chassis.   * Insert new disks, #1 on the left, #7 on the right. Check contacts on disks (and in XRAID if possible) before inserting into XRAID chassis. Disks slide in and then require a final push, you should hear them lock (thump!) into place, at which stage with the handle depressed they will be flush with the chassis.
-  * Power on XRAID. This should be done in the cluster room so you can watch for alarms/red lights. The power button is on the BACK of the chassis. +  * Power on the XRAID. This should be done in the cluster room so you can watch for alarms/red lights. The power button is on the BACK of the chassis. Hold it in for a couple of seconds
-  * If all disks come up green, you can go back to your office! Otherwise, open XRAID admin tools and identify the source of the problem. If a disk has failed, it might be worth trying to power-cycle and re-insert. Sometimes the disk itself is fine, but not properly seated in the chassis. If a spare disk is required, you should be able to insert it in place of the failed one (even while the chassis is running) and the array will automatically rebuild using the new disk. If using an OLD disk or trying to re-insert an improperly seated disk, you may have to explicitly make it available to the array, because it will already contain RAID information. See [[rebuild|Rebuilding XRAIDs]].+  * If all disks come up green, you can go back to your office! Otherwise, open XRAID Admin Tools and identify the source of the problem. If a disk has failed, it might be worth trying to power-cycle and re-insert. Sometimes the disk itself is fine, but not properly seated in the chassis. If a spare disk is required, you should be able to insert it in place of the failed one (even while the chassis is running) and the array will automatically rebuild using the new disk. If using an OLD disk or trying to re-insert an improperly seated disk, you may have to explicitly make it available to the array, because it will already contain RAID information. See [[rebuild|Rebuilding XRAIDs]].
   * Once the XRAID is running, you can reload the SCSI devices and mount their file systems. //There should be no reason to reboot the host node//.   * Once the XRAID is running, you can reload the SCSI devices and mount their file systems. //There should be no reason to reboot the host node//.
 +
 <​code>​ <​code>​
-> sudo tcsh +> sudo /nfs/apps/​vlbi/​refresh_xraid
-/root/load_cuppa01.csh (replace 01 with the number of the node you are using) +
-# exit+
 </​code>​ </​code>​
-  * The script will print out a list of each file system that was successfully mounted (and an error message for everything that did not mount). Hopefully the list of mounted devices will match your expectations. If not, take a good look at the error messages. 
-  * If you need to mount any of the file systems manually, trust the system! Mount devices using <​code>​sudo mount /​exports/​xraid/?​_?</​code>​ where ?_? is l_1, l_2, l_3, r_1, r_2, r_3 as required (left and right referring to disk banks in a chassis). We //believe// that this will mount the first left device in **l_1** and so forth. However you should check this by ensuring that the data you expect to be on the device is in fact there before you try doing anything with it! ABSOLUTELY NO WARRANTY and all that. If you're concerned, then mount them manually in the cluster room, mounting ''/​dev/​sd?​1''​ to the exports directory you want it to be, and making sure the left/right set of lights come on when you run the command. I believe devices always come up in the same order within a disk set, but disk sets are not always recognised in the same order (left-right,​ right-left, alternating) in a chassis. 
-  * Sometimes mounting doesn'​t work, no matter how long you wait or how many times you try, it complains that there is no valid file system on the partition. Until such time as we figure out a better solution, it may be necessary to **restart the machine** if it refuses to come good. Hopefully this will no longer be a problem now that we have scripts to properly reload the SCSI devices. 
  
-====== NFS mounting ======+  * The program will print out a list of detected APPLE devices and any error messages if the refresh was not successful. 
 +  * Create the mount points for the xraid (if the SCSI device has changed we'll need new mount points accordingly)
  
-**Note** This is not recommended now. The correlator will start datastream processes on the correct local host node provided this is specified in the corresponding line of the machines file (this was apparently not always the case in the past).+<​code>​ 
 +> sudo /​nfs/​apps/​vlbi/​udevrules.pl 
 +</​code>​
  
-  * It is now possible to mount the Xraids over nfs as any user, with e.g. ''​mount /nfs/xraid01/l_1''​. To make life easierthere are shell scripts ​to do it in /home/corr/LBA/scriptsmountxraids.sh and unmountxraids.sh +  * Mount the new file systems using  
-  On cuppa01type ''​cssh cuppa''​ to get a multi-window command prompt, so the scripts can be run simultaneously ​on all cuppas+ 
-  * To change disks, after unmounting from NFS, it seems to be necessary to restart ​the nfs server on the local host node: <​code>​sudo /​etc/​init.d/​nfs-kernel-server ​restart</​code> ​in order to <​code>​umount ​/exports/xraid01/l_1</​code> ​etc. without getting "​device busy" errors. Be a bit careful doing this... try to check that no one is running stuff first. If other people are logged on may get stale file handle problems.+<​code>​ 
 +> sudo mount /exports/xraid0X/?_? 
 +</​code>​  
 + 
 +where ?_? is l_1, l_2, l_3, r_1, r_2, r_3 as required (left and right referring ​to disk banks in a chassis). 
 +    * We //believe// that this will mount the first left device in **l_1** and so forthHowever you should check this by ensuring that the data you expect to be on the device is in fact there before you try doing anything with it! ABSOLUTELY NO WARRANTY ​and all that.  
 +    If you are concernedtry mounting ​''​/dev/sd?1''​ to the exports directory you want it to be, and make sure the left/right set of lights come on when you run the command. I believe devices always come up in the same order within a disk set, but disk sets are not always recognised in the same order (left-right,​ right-left, alternating) in a chassis
 +  * Now restart ​NFS if desiredby editing ''/​etc/​exports''​ and removing ​the ''#''​ character from the start of each xraid line, then run  
 +<code
 +> sudo /​etc/​init.d/​nfs-kernel-server ​reload 
 +</​code>​ 
 +and on each node where you need to use these data, mount the device over NFS:  
 +<​code>​ 
 +> mount /nfs/xraid0?/?_? 
 +</​code>​ 
 +(Note that sudo access ​is not required for this).
  
 +**Tip:** parallel-ssh is useful for NFS mounting on multiple nodes
 \\ \\
  
 [[xraid|Back to XRAID menu]] [[xraid|Back to XRAID menu]]
  
correlator/loaddisks.1226552098.txt.gz · Last modified: 2008/11/13 15:54 by hbignall