These instructions are for data transfer from Parkes (pam-store-ext, pkvsi1-ext, pkvsi2-ext, pkstore-ext), ATCA (cavsi1-ext, cavsi2-ext), Mopra (mpvsi1-ext, mpvsi2-ext).
Requirements:
ssh authentication key setup:
In order to use globus GridFTP to push data to lba-1 (or vlbi-data or Pawsey) via your own user account, it helps to have ssh keys set up.
On each recorder host that you need to transfer data from (iff each has its own independent home directory), use ssh-keygen
to generate your own ssh keys with a passphrase that you can remember. ssh-keygen
will prompt for a filename to save to; use a new unique file in the ~/.ssh area, e.g. “myname_rsa”. Append the public key to your authorized_keys on the destination machine with e.g. ssh-copy-id -i ~/.ssh/myname_rsa.pub user@lba-1.it.csiro.au
.
After this you should be able to ssh from the recorder host to lba-1 without needing to enter your password.
To use your keys on the recorder host type
ssh-agent bash ssh-add ~/.ssh/myname_rsa
ssh-add
will ask for your passphrase. After this you should be able to log into lba-1 without being prompted for your passphrase.
Doing this within screen will save you having to repeat this every time you log in (man screen
for info on screen) and will permit long running transfers.
Note that the command gcopy
described below is a bash alias.
To save typing your password every time you log into a recorder host from venice, you may also want to append your own ssh public key from your ATNF user account to the vlbi account's ssh authorized_keys on the recorder hosts.
After each run (including mini-runs):
How to transfer data:
screen
- see above.gcopy
on most VLBI machines). You can also use parallelisation, e.g. -p 2
, or udt -udt
(not both!) to speed up transfers. The environment variable $GLOBUS_LOCATION should be set. (Add to .mybashrc and source this from .bashrc, if not already there.) /scratch/pawsey0169/transfers/{year}/{experiment}/{antenna}
/datasets/data_cass/data/{year}/{experiment}/{antenna}/
/data/lba/{year}/{experiment}/{antenna}
{experiment}/{antenna}/{DASn}
directories.cd ~/transfers ln -s /path/to/data ln -s /path/to/other/data gcopy file://$PWD/ sshftp://user@remote//remotedir/
where gcopy
is a bash function: globus-url-copy -cd -r -sync -sync-level 1 -restart -vb -c $@
{antenna}/{experiment}
to the correct {experiment}/{antenna}
directories. Note the following is specific to Pawsey, but could be also made available on lba-1: To help with this rearrange, Chris has a script ~cphillips/bin/movedir.sh
which works with alias 'movedir' (to be run from within the experiment directory where the data are currently located). To use it, you can add alias movedir='. ~cphillips/bin/run_movedir.sh
' to your own ~/.profile
or similar, and add Chris' ~/bin directory to your path (add the line export PATH=/home/cphillips/bin:$PATH
to your .bashrc).~cphillips/bin/filesum.sh
at Pawsey) on both the copy and the original data at the observatories to compare the two sets of data (check that total data volume, #files, smallest and largest files and first and last are the same). Once this comparison is made, use alias “movedir” on the original data (at the observatories; the alias is already set in the vlbi account) to move it to a directory called “checked”. Medusa records data to a FlexBuff recorder (pkstore).
ssh vlbi@pkstore.atnf.csiro.au
(via venice if necessary, address may also be pkstore-ext depending where you connect from).
Data are stored on a VBS (virtual) file system which you need to access with the vbs command set and/or jive5ab:
jive5ab
vbs
To get a file listing:
vbs_ls -lrth vt031a*
To transfer, make sure jive5ab is running at both source and destination, then use m5copy:
m5copy --resume -p 40040 vbs://localhost/vt031a* file://lba-1.it.csiro.au:40100/datasets/data_cass/data/2021/vt031a/md/
There is also an alias jcopy
on pkstore
which sets –resume
and runs under a re-try loop in case the connection falls over. This is very convenient for large unattended transfers.
Used mostly for the OCTAD backend.
ssh oper@kasi-mark6.atnf.csiro.au
data are typically found under /mnt/mark6g*
and appear as a more-or-less normal file system as mk6fuse is used (YMMV).
Use du –apparent-size
to check file sizes.
Use jive5ab to transfer data similar to procedure for Medusa (pkstore) above, either accessing the files over the fuse mount, or directly from the mark6.
jcopy
is an alias for restart “/home/oper/cormac/bin/m5copy -t 40 -m 1500 –resume $args” 5
jcopy -udt -p 40050 file://localhost:2621${file} file://150.229.194.15:40100/datasets/data_cass/data/2022/bp258a/mp/if2/
or
jcopy -udt -p 40050 mk6://localhost:2621/13/${file} file://150.229.194.15:40100/datasets/data_cass/data/2022/bp258b/mp/if1/
the /13
in the mk6 url indicates data are on diskpacks 1 and 3 in this case.
m5copy on the route to lba-1 is quite flaky with frequent timeouts. The jcopy alias for m5copy therefore uses a long timeout (-t 40) and it may be helpful to transfer one file at a time using a restart loop for failed transfers:
for file in $(cat bp258b.filelist) ; do (jcopy -udt -p 40050 mk6://localhost:2621/12/${file} file://150.229.194.15:40100/datasets/data_cass/data/2022/bp258b/mp/if1/) ; done
The below assumes firewall settings allow traffic on the relevant IP addresses/ports. Can test with traceroute (uses UDP by default), e.g. traceroute -p 40070 {IP address}
(-T
option for TCP)
Use jive5ab. This should always be running already on the JIVE end. Coordinate with Martin/Bob/e-Bob. If not running already on lba-1 or vlbi-data (check with lsof -i
), start in screen with jive5ab
. Then run m5copy with something like:
m5copy -udt -p 40001 file://$PWD/*.m5b vbs://{IP address of flexbuff at JIVE}:40000/
See m5copy -h
for more options/usage. For file to file transfers, use something like:
m5copy -udt -p 40001 --resume file:///OSM/PER/CASS_LBA/data/2018/{path/to/data}/* file://{remote IP}/path/to/data_directory/
There is an alias jcopy
on both lba-1 and vlbi-data that runs m5copy with sensible parameters and under a 'retry' loop that is convenient for large unattended transfers.
NB wildcards are supported for SRC file URIs only if m5copy is running on the local machine (remote wildcards are not supported).
Use either jive5ab (see below) or tsunami. (jive5ab needs to be running at both ends.) Coordinate with Gabriele Bruni re. Radioastron data transfers to Bonn.
To run the tsunami client, log in to the remote machine with ssh evlbi@{IP address of machine to send data to}
(NB requires that ssh keys are set up).
Then, e.g. if 350Mbps has been reserved for the transfer:
(dir
lists the available files, just to make sure the server is working)
> cd /data{directory where data are to go} > tsunami set port 40070 set udpport 40070 rate 350m connect cass-01-per.it.csiro.au > dir > get *
We normally use the following settings in /etc/sysctl.conf to get better performance in both TCP and UDT transfers.
# increase TCP max buffer size net.core.rmem_max = 201326592 net.core.wmem_max = 33554432 # increase Linux autotuning TCP buffer limits # min, default, and max number of bytes to use net.ipv4.tcp_rmem = 4096 87380 33554432 net.ipv4.tcp_wmem = 4096 87380 33554432 net.ipv4.udp_rmem_min = 262144 net.ipv4.udp_wmem_min = 262144 net.ipv4.udp_mem = "536870912 805306368 1073741824" #Marjolein recommendation for Monsterbuff @16Gbps net.core.netdev_max_backlog = 8192
To check current settings:
sysctl -a
To set parameters on the fly:
sysctl -w variable=value
To load a new sysctl.conf after editing:
sysctl -p /etc/sysctl.conf
All data recorded by ATNF telescopes for RadioAstron experiments needs to go to , usually in the directory
/OSM/PER/CASS_LBA/data/{year}/Radioastron/{experiment}
. On pawsey, the script for moving RadioAstron data to the correct subdirectory structure can be accessed using alias movera='. ~cphillips/bin/run_movera.sh
' - this could be copied to cass-01-per.
The data are transferred to ASC for correlation, and need to be converted to mark5b then copied using the tsunami transfer protocol. ASC requests a copy of all Radioastron data, however some experiments are correlated at JIVE or Bonn and can also be transferred there directly (see below).
You will need to use the following programs/scripts. Make an alias or add e.g. /home/phi196/bin
to your path.
<code>
/home/phi196/bin/lba2mk5b
/home/phi196/bin/tsunamid
/home/big040/bin/domd5.pl
</code>
The following assumes these are in your path.
To convert LBADR data to mark5b change directory to where the data are and run lba2mk5b, e.g.
<code>
> cd transfers/2016/radioastron/rk12ol/ATCA
> lba2mk5b *.lba
</code>
If you wish to keep all the mark5b data in a subdiretory add the option -o mark5b
. You must create the directory by hand first.
ACS insist on md5sums calculated on all files. domd5.pl will do this for you. It will do multiple directories recursively, only runs md5sum on files with .m5b extension and checks if the md5 sum has already been calculated for a particular file. This means you can redun domd5.pl if more data is added or the script is interuppted. Note however if a file is updated, you will have to edit the dir.md5 file by hand to remove the file entry (or remove the file to redo the whole directory).
To md5sum a number of directories at the same time run:
<code>
> domd5.pl rg14e rgs14d rgs14e
</code>
If you do not pass any directories on the command line it does not do anything.
Once the mark5b files and created and md5sum run, you need transfer the data using Tsunami. Multple directories can be transferred at the same time. Run something like
<code>
> tsunamid –port=40070 rg14e/*/*.m* rg12ol/*/*.m*
</code>
The exact command will depend on the directory structure used and number of experiments/stations transferred at once. Note the above should make the .m5b data files and .md5 md5sum files available.
FInally send an email to ACS staff (currently Dmitry) to let them know which experiments/stations you have made available and note the Tsunami port used (40070 above, you can choose anything between 40000 and 40100) and which machine (e.g. cass-01-per.it.csiro.au). Request Dmitry lets you know when transfer is completed. It can 24hrs or more for the transfer to start, so run tsunami from a VNC or screen session.