The correlator archive now lives on Pawsey's data store. (You'll need your own Pawsey account for access.)
If you log in to the data store via your web browser, under “Tools” there is a data management script available for command line use: pshell
.
archivec.py
is a wrapper script for pshell
that automatically tars up and transfers the data required for the archive:
archivec.py $CORR_DATA/<expname> /projects/VLBI/Archive/LBA/<exp_parent>
where <exp_parent> is the overall project code, not the <expname> for this session (e.g. the 'v534' in 'v534a'). You must first log in to pshell and set up a session delegate. E.g.
pshell login delegate 7
will prompt for your username and password then set up passwordless delegate access for the next 7 days.
N.B. it is advisable to run transfers in screen as the transfers can take a while.
To archive data by hand (deprecated):
pshell.py pawsey:offline>login Username: hbignall # insert your Pawsey user name here Password: pawsey:online>cd VLBI/Archive/LBA/{} # changes to this directory on the data store pawsey:online>lcd /path/to/local/data # changes the local working directory pawsey:online>put ./ # will transfer the whole {expname} directory to the data store pawsey:online>logout # when finished, to prevent someone else using your login pawsey:offline>exit
The FITS files (with an md5sum) need to be uploaded to ATOA. There is a script to generate the md5sum file and upload it and the FITS file to the appropriate place:
archatoa.py *FITS*
Be careful to upload only the production FITS files. From Pawsey, archatoa.py requires ssh tunneling to access the ATNF web site. You can set that up to happen automatically. Removing errant files from ATOA is problematic, so it is recommended to leave a cooling off period before running archatoa.
The pipeline outputs are distributed to PIs via the wiki. lba_feedback.py
will create the wiki page, and archpipe
will automatically send the archive plots and the wiki page to the wiki. The wiki pages are linked from the correlator records spreadsheet ( LBA or Auscope).
cd $PIPE/<expname>/out lba_feedback.py <expname>.wikilog > <expname.txt> archpipe <expname>
From Pawsey, archpipe requires ssh tunneling to access the ATNF web site. You can set that up to happen automatically.
Once archived, log in to My Data
at https://data.pawsey.org.au. Locate the FITS files (VLBI/archive/LBA/), and make them public. Email the link to the PI.
Once the correlated data are verified and released, the baseband data should be deleted. On Pawsey systems, don't use the normal 'rm' command as the overheads are likely to cause file system problems, as described here: https://support.pawsey.org.au/documentation/display/US/Tips+and+Best+Practices
There is a convenient alias (rml) defined in ~cormac/.bash_aliases. To use, e.g.:
. ~cormac/.bash_aliases cd ~/scratch/corr/baseband rml v454b-??
will efficiently delete the baseband data directories for v454b
If you are using espresso.py
for correlation and archivec.py
for archiving, the following is all done automatically.
Additional files for pulsar modes only:
Jobs may live in subdirectories, with identical filenames for files from different jobs within each subdirectory.
For accountability it's important to keep the directory structure.
Ideally we want to keep all relevant files for all production jobs.
NB: the following is mainly relevant for old versions of DiFX (pre DiFX-2): In some cases SWIN format output data may be impractically large for online storage and it may not be desirable to keep this intermediate-stage data. For example, output from DiFX 1.5 when a full band is correlated at high spectral resolution, but the user only wants the subset of the band containing the spectral lines at high resolution. In this case the output FITS data will generally be a manageable size, but the SWIN output data in the .difx directories will typically be at least several times larger (e.g. it covers 16 MHz, while the region of interest is only 2 MHz wide).
It may be useful to keep all jobs (clock search and test as well - e.g. to have some data at higher spectral resolution for checking). Usually clock search jobs are in “clocks” subdirectory, but it won't necessarily be obvious which jobs are test or final production. Test jobs could be manually moved to a subdirectory (e.g. “test”). Other (dated) subdirectories may exist for production jobs (especially where multiple runs were needed). Note: Running with espresso now creates a comment.txt file to contain a description of each job (operator prompted to edit file at completion of correlation). Espresso also allows test jobs to be specified on running (to be written to “test” subdirectory in output data area).
Not useful to keep: