User parameters - Archiving

The final stage of the pipeline involves preparing data for storage in CASDA, the CSIRO ASKAP Science Data Archive. This involves four steps:

  • Diagnostic plots are created. These are intended to be used for Quality Analysis & validation. Currently the script that does this is only a prototype, producing both greyscale plots of the continuum images, with weights contours and the component catalogue overlaid, and greyscale plots of the noise maps produced by Selavy. See examples on Validation and Diagnostics.

  • All images are converted to FITS format. FITS is the format required for storage in CASDA - the ASKAPsoft tasks are able to write directly to FITS (IMAGETYPE_CONT etc), but if CASA-format images are created, an additional step is required. At the same time, the FITS header is given the following keywords: PROJECT (the OPAL project code), SBID (the scheduling block ID), DATE-OBS (the date/time of the observation), DURATION (the length of the observation). Additionally, information on the version of the askapsoft, askappipeline and aces software is added to the HISTORY.

  • All two-dimensional images have two “thumbnail” images made. The thumbnail image is designed as a “quick-look” image that can be incorporated into the CASDA search results. Two sizes are made - a small one for easy display, and a larger one to provide a little more detail. The greyscale limits are chosen from a robust measurement of the overall image noise, and are -10 to +40 sigma by default. The format is currently PNG.

  • The Casda Upload Utility utility is run, copying the relevant data products to the staging directory, and producing an XML file detailing the contents of this directory. The files copied are all FITS files conforming to a particular pattern, any Selavy catalogues, measurement sets (just the averaged MSs, using the calibrated ones if they have been made), the thumbnail images, and the stats summary file detailing how the processing tasks went.

The following is a list of files included in the upload to CASDA:

  • Images: all FITS files that have been processed (ie. continuum images, spectral cubes if requested, continuum cubes if requested) that meet the naming system governed by IMAGE_LIST. Model images used for continuum subtraction will also be included.

  • Additional images produced by Selavy - the noise map, and the component map and its associated residual image.

  • Measurement sets: all continuum measurement sets are included as individual files, along with all spectral measurement sets if requested. These will be tarred for access through CASDA. The pipeline metadata directory (see How to run the ASKAP pipelines) will be copied into the measurement set, where it will appear as a directory called ASKAP_METADATA. Note that since this is a directory of text files, rather than a table, it will not be programmatically accessible in the same way as other MS tables. It is simply provided as a way of co-locating the metadata information with the rest of the MS, to ensure that a CASDA user can acquire information such as the beam footprint at the same time as the MS.

  • UV grids: if requested, the directories containing the gzipped UV grid images are included as tar files. These will be provided alongside the measurement sets through CASDA.

  • Catalogues: all Selavy catalogues created for the final mosaics, in VOTable/XML format, along with any catalogues used for the continuum subtraction (if the components method is used).

  • Evaluation files: several files are included for use with the validation:

    • An XML file listing the validation metrics from the continuum source-finding validation.

    • A tar file containing the validation directory

    • The stats files that summarise the resource usage of each job.

    • When DO_SOURCE_FINDING_BEAMWISE=true, an additional tar file is archived that contains the results of the beam-wise sourcefinding: VOTable/XML catalogues and, if made, tarred directories containing source spectra.

    • A tar file containing the processing directory structure, called calibration-metadata-processing-logs, with the following:

      • All calibration tables in a “CalibrationTables” directory

      • The diagnostics directory

      • Logs, slurm output and slurm file directories individually tarred.

      • The beam logs for the spectral cubes

      • The metadata directory

    • There is also a tar file of the diagnostics directory alone (to make it easier to access, instead of getting the larger calibration tarfile.

Each data product is assigned a project code (used to govern who is responsible for the quality validation in CASDA). This will ordinarily be the PROJECT_ID assigned to the scheduling block prior to observation, although the overall project code can be changed at pipeline-runtime. To support commensal observations, we also allow different project codes to be given for different types of data products:

  • CONTINUUM_PROJECT_ID - all continuum/MFS images, the catalogues that go with them, and any continuum validation products.

  • POLARISATION_PROJECT_ID - all continuum cubes, the polarisation catalogue and associated spectra (from RM synthesis), and polarisation validation products.

  • SPECTRAL_PROJECT_ID - all spectral-line cubes, the spectral-line catalogue (and any associated spectra/moment-maps), and the spectral validation products.

  • MS_CONT_PROJECT_ID - all averaged (continuum) measurement sets.

  • MS_SPEC_PROJECT_ID - all full-resolution (spectral) measurement sets.

If any of these are not given, they fall back to the PROJECT_ID, although the continuum/spectral MS IDs fall back to the continuum/spectral projects initially.

The specific products that will be archived can be selected through the use of the following parameters:

  • ARCHIVE_IMAGE_CONT - all continuum images and related catalogues

  • ARCHIVE_IMAGE_CONTCUBE - all continuum (coarse-resolution) cubes and polarisation catalogues

  • ARCHIVE_IMAGE_SPECTRAL - all spectral cubes, extracted spectra, and spectral catalogues

  • ARCHIVE_CONT_MS - all averaged (continuum) measurement sets

  • ARCHIVE_SPECTRAL_MS - all full-resolution (spectral) measurement sets

For each of these categories, one can select particular “image codes” to archive, using the ARCHIVE_IMAGECODE_CONT, ARCHIVE_IMAGECODE_CONTCUBE and ARCHIVE_IMAGECODE_SPECTRAL parameters. The possible image codes are:

  • restored - the restored image out of the Imager, before any continuum subtraction or common-PSF-convolution.

  • convrestored - the restored image after convolution to a common PSF across beams

  • altrestored - the restored image made with the “restore preconditioner (see User Parameters - Continuum imaging)

  • contsub - the restored image after image-based continuum-subtraction

  • image - the clean model image

  • residual - the clean residual image

These parameters default to a blank string, in which case they match the equivalent MOSAIC_IMAGECODE_ parameter (see User Parameters - Mosaicking)

There are a number of user parameters that govern aspects of these scripts, and they are detailed here.

Variable

Default

Parset equivalent

Description

DO_DIAGNOSTICS

true

none

Whether to run the diagnostic script upon completion of imaging and source-finding.

JOB_TIME_DIAGNOSTICS

JOB_TIME_DEFAULT (24:00:00)

none

Time request for the diagnostic script.

DO_CONVERT_TO_FITS

true

none

Whether to convert the CASA images to FITS format.

JOB_TIME_FITS_CONVERT

JOB_TIME_DEFAULT (24:00:00)

none

Time request for the FITS conversion.

DO_MAKE_THUMBNAILS

true

none

Whether to make the PNG thumbnail images of the 2D FITS images.

JOB_TIME_THUMBNAILS

JOB_TIME_DEFAULT (24:00:00)

none

Time request for the thumbnail creation.

DO_STAGE_FOR_CASDA

false

none

Whether to stage data for ingest into CASDA

JOB_TIME_CASDA_UPLOAD

JOB_TIME_DEFAULT (24:00:00)

none

Time request for the CASDA-upload task.

General

IMAGE_LIST

"image psf psf.image residual sensitivity"

none

The list of image prefixes that will be used for generating FITS files and determining the list of images to be uploaded to CASDA. In addition, the images image.XXX.restored and image.XXX.alt.restored (in the latter’s case, if present) will also be processed.

ARCHIVE_IMAGE_CONT

true

none

Whether to archive the continuum image products and catalogues.

ARCHIVE_IMAGE_CONTCUBE

true

none

Whether to archive the continuum cubes and polarisation catalogues.

ARCHIVE_IMAGE_SPECTRAL

true

none

Whether to archive the spectral image cubes and spectral catalogues.

ARCHIVE_CONT_MS

true

none

Whether to archive the continuum (averaged) measurement sets.

ARCHIVE_CONT_MS_FULLY_CALIBRATED

true

none

Whether the continuum MSs taht are being archived are those calibrated with the self-cal gains, or those prior to that step.

ARCHIVE_SPECTRAL_MS

false

none

Whether to archive the individual full-spectral-resolution measurement sets.

ARCHIVE_BEAM_IMAGES

false

none

Whether the individual beam images should be included in the archiving (true) or if only the mosaicked image should be uploaded.

ARCHIVE_SELFCAL_LOOP_MOSAICS

false

none

Whether to archive the mosaics of the intermediate self-calibration loop images (see User Parameters - Continuum imaging and User Parameters - Mosaicking).

ARCHIVE_FIELD_MOSAICS

false

none

Whether to archive the mosaics for each individual field, as well as for each tile and the final mosaicked image. See User Parameters - Mosaicking for a description.

ARCHIVE_IMAGECODE_CONT

“”

none

Which continuum image imagecodes should be archived. Default is to match MOSAIC_IMAGECODE_CONT (User Parameters - Mosaicking).

ARCHIVE_IMAGECODE_CONTCUBE

“”

none

Which continuum cube imagecodes should be archived Default is to match MOSAIC_IMAGECODE_CONTCUBE (User Parameters - Mosaicking).

ARCHIVE_IMAGECODE_SPECTRAL

“”

none

Which spectral cube imagecodes should be archived Default is to match MOSAIC_IMAGECODE_SPECTRAL (User Parameters - Mosaicking).

PROJECT_ID

“”

<key>.project (Casda Upload Utility)

The project ID that is written to the FITS header, and used by the casdaupload script to describe each data product. This is usually taken from the SB parset, but can be given in the configuration file. The configuration file value will over-ride that from the SB parset (unless it is a blank string “”). Note that this is new behaviour from version 1.0.9.

CONTINUUM_PROJECT_ID

“”

<key>.project (Casda Upload Utility)

The project ID to use for continuum data products (continuum images, catalogues, validation files). If not given, we fall back to the overall PROJECT_ID.

POLARISATION_PROJECT_ID

“”

<key>.project (Casda Upload Utility)

The project ID to use for polarisation data products (continuum cubes, polarisation catalogues, extracted spectra & validation files). If not given, we fall back to CONTINUUM_PROJECT_ID.

SPECTRAL_PROJECT_ID

“”

<key>.project (Casda Upload Utility)

The project ID to use for spectral-line data products (spectral cubes, catalogues, validation files). If not given, we fall back to the overall PROJECT_ID.

MS_CONT_PROJECT_ID

“”

<key>.project (Casda Upload Utility)

The project ID to use for continuum (averaged) measurement sets. If not given, we fall back to the CONTINUUM_PROJECT_ID.

MS_SPEC_PROJECT_ID

“”

<key>.project (Casda Upload Utility)

The project ID to use for spectral (full-resolution) measurement sets. If not given, we fall back to the SPECTRAL_PROJECT_ID.

Thumbnails

THUMBNAIL_SUFFIX

jpg

none

Suffix for thumbnail image files, which in turn determinings the format of these files.

THUMBNAIL_GREYSCALE_MIN

-10

none

Minimum greyscale level fro the thumbnail image colourmap. In units of the overall image rms noise.

THUMBNAIL_GREYSCALE_MAX

40

none

Maximum greyscale level fro the thumbnail image colourmap. In units of the overall image rms noise.

THUMBNAIL_SIZE_INCHES

"16,5"

none

The sizes (in inches) of the thumbnail images. The sizes correspond to the size names given below. Don’t change unless you know what you are doing.

THUMBNAIL_SIZE_TEXT

"large,small"

none

The labels that go with the thumbnail sizes. These are incorporated into the thumbnail name, so that image.fits gets a thumbnail image_large.png etc. Don’t change unless you know what you are doing.

CASDA upload

OBS_PROGRAM

“”

obsprogram (Casda Upload Utility)

The name of the observational program to be associated with this data set.

CASDA_UPLOAD_DIR

/askapbuffer/casda/prd

outputdir (Casda Upload Utility)

The output directory to put the staged data. It may be that some users will not have write access to this directory - in this case the data is written to a local directory and the user must then contact CASDA or Operations staff.

CASDA_USE_ABSOLUTE_PATHS

true

useAbsolutePaths (Casda Upload Utility)

If true, refer to filenames in the observation.xml file by their absolute paths. This will mean they remain where they are, and are not copied to the upload directory. The exceptions are the XML file itself, and the tarred-up MS files.

CASDA_CLOBBER_TARFILES

false

none

If true, the tar files created by the getArchivesList script will be created from scratch, even if they exist. If false, then no new file will be created if they exist already.

WRITE_CASDA_READY

false

writeREADYfile (Casda Upload Utility)

Whether to write the READY file in the staging directory, indicating that no further changes are to be made and the data is ready to go into CASDA. Setting this to true will also transition the scheduling block from PROCESSING to PENDINGARCHIVE.

TRANSITION_SB

false

none

If true, the scheduling block status is transitioned from PROCESSING to PENDINGARCHIVE once the casdaupload task is complete. This can only be done by the ‘askapops’ user.

POLLING_DELAY_SEC

1800

none

The time, in seconds, between slurm jobs that poll the CASDA upload directory for the DONE file, indicating ingestion into CASDA is complete.

MAX_POLL_WAIT_TIME

172800

none

The maximum time (in seconds) to poll for the DONE file, before timing out and raising an error. (Default is 2 days.)