User Parameters - Pipeline & job control¶
Here we detail the input parameters that cover the overall process control, and the switches to turn on & off different parts of the pipeline.
Values for parameters that act as flags (ie. those that accept true/false values) should be given in lower case only, to ensure comparisons work properly.
ASKAPsoft versions¶
The default behaviour of the slurm jobs is to use the askapsoft module
that is loaded in your ~/.bashrc file. However, it is possible to run
the pipeline using a different askapsoft module, by setting
ASKAPSOFT_VERSION
to the module version (0.19.2, for instance). If
the requested version is not available, the default version is used
instead.
This behaviour is also robust against there not being an askapsoft
module defined in the ~/.bashrc - in this case the default module is
used, unless ASKAPSOFT_VERSION
is given in the configuration
file.
The pipeline looks in a standard place for the modules, given by the
setup used at the Pawsey Centre. If the pipelines are being run on a
different system, the location of the module files can be given by
ASKAP_MODULE_DIR
(this is passed to the module use command).
Slurm control¶
These parameters affect how the slurm jobs are set up and where the
output data products go. To run the jobs, you need to set
SUBMIT_JOBS=true
. Each job has a time request associated with it -
see the Slurm time requests section below for details.
Variable |
Default |
Description |
---|---|---|
|
false |
The ultimate switch controlling whether things are run on the galaxy queue or not. If false, the slurm files etc will be created but nothing will run (useful for checking if things are to your liking). |
|
3.7.4 (galaxy), 3.7.3 (petrichor), or either 4.1.0-askap (setonix askaprt) or 4.1.0-mpi (setonix work partition) |
Specific version of the singularity module to use. Depends on cluster being
used, and, for setonix, the partition. This can be overridden by the |
|
|
Specific version of the askappy module to use. |
|
|
The version of the python module loaded at the start of slurm jobs. This is only necessary for use with legacy versions of askapsoft (version 1.0.14 or earlier). For later versions, the “askappy” module is used to provide python. When a legacy askapsoft is used, if this is given as blank or as a version that is not available, we fall back to the system default. |
|
1.13.3 |
The version of the numpy module loaded at the start of slurm jobs, and in the
execution of the pipeline scripts. If given as blank or as a version that is not
available, we fall back to the system default. This is only used with legacy
versions of askapsoft (1.0.14 or earlier), and only when
|
|
|
The version number of the askapsoft module to use for the processing. If not given, or if the requested version is not valid, the version defined in the ~/.bashrc file is used, or the default version should none be defined by the ~/.bashrc file. |
|
|
The version of the bptool module to use for the processing. Giving |
|
/group/askap/modulefiles (galaxy) or /software/projects/askaprt/modulefiles (setonix) |
The location for the modules loaded by the pipeline and its slurm jobs. Change this to reflect the setup on the system you are running this on, but it should not be changed if running at Pawsey. |
|
work |
Slurm partition (“queue”) used for the bulk of the processing jobs. Can be specified via the -q option for processASKAP.sh. |
|
askaprt |
The queue used to run tasks that need to access the /askapbuffer filesystem - operationally important jobs such as pipeline startup, accessing raw data, or setting up CASDA archiving. Leave as is unless you know better. |
|
|
This allows one to provide slurm with additional constraints. While not needed for galaxy, this can be of use in other clusters (particularly those that have a mix of technologies). |
|
|
This is the account that the jobs should be charged to. If left blank, then the user’s default account will be used. |
|
|
If there is a reservation you specify the name of it here. If you don’t have a reservation, leave this alone and it will be submitted as a regular job. |
|
24:00:00 |
The default time request for the slurm jobs. It is possible to specify a
different time for individual jobs - see the list below and on the individual
pages describing the jobs. If those parameters are not given, the time requested
is the value of |
|
. |
The sub-directory in which to put the images, tables, catalogues, MSs etc. The name should be relative to the directory in which the script was run, with the default being that directory. |
|
|
An email address to which you want slurm notifications sent (this will be passed
to the |
|
ALL |
The types of notifications that are sent (this is passed to the |
|
30 |
The length (in seconds) of a sleep call that is inserted prior to second and subsequent srun calls within the slurm jobs. This adds extra waiting time before the srun task, with the aim of reducing the likelihood of compute-node errors. |
Filesystem control¶
There are a couple of parameters that affect how files interact with the Lustre filesystem. We set the striping of the directory at the start to a value configurable by the user. This only affects the directories where MSs, images & tables go - parsets, logs, metadata and the rest are given a stripe count of 1.
There is also a parameter to control the I/O bucketsize of the measurement sets created by mssplit. This is particularly important in governing the I/O performance and the splitting run-time. The default, 1MB, matches the stripe size on /group, and has been found to work well.
A new parameter stman.files
has been introduced in mssplit that
makes use of the casacore storage manager functionality to control
the number/type of files written to a measurement set. This should
be useful in achieving better overall I/O performance. The pipeline
now allows users to specify this parameter using FILETYPE_MSSPLIT
.
There is also a parameter PURGE_FULL_MS
that allows the deletion
of the full-spectral-resolution measurement set once the averaging to
continuum channels has been done. The idea here is that such a dataset
is not needed for some types of processing (continuum & continuum
cube imaging in particular), and so rather than have a large MS left
lying around on the disk, we delete it. This parameter defaults to
true, but is turned off if any of the spectral-line processing tasks
are turned on (DO_COPY_SL
, DO_APPLY_CAL_SL
,
`DO_CONT_SUB_SL
or DO_SPECTRAL_IMAGING
). The deletion is done
in the averaging job, once the averaging has completed
successfully. If the averaging fails it is not removed.
Similarly, if DO_COPY_SL=true
(so that a channel range is copied
out of the spectral dataset), there is an option to remove the full MS
after the copying has completed successfully - set
PURGE_FULL_MS_AFTER_COPY=true
to have this.
Variable |
Default |
Description |
---|---|---|
|
4 |
The stripe count to assign to the data directories |
|
1048576 |
The bucketsize passed to mssplit (as “stman.bucketsize”) in units of bytes. |
|
combined |
Can be one of |
|
1 |
The number of channels in the measurement set tile for the science data, once the local version is created. |
|
1 |
The number of channels in the measurement set tile for the bandpass calibrator data, once the local version is created. |
|
true |
Whether to remove the interim science MSs created when splitting and merging is required. |
|
true |
Whether to remove the interim bandpass calibrator MSs created when splitting and merging is required. |
|
true |
Whether to remove the full-spectral-resolution measurement set once the averaging has been done. See notes above. |
|
false |
Whether to remove the full-spectral-resolution measurement set once the copying of a channel range of the spectral data has completed. |
Important locations |
||
|
/askapbuffer/processing/pipeline-errors |
Directory in which information about job failures is archived. |
|
/askapbuffer/payne/askapops/SST-templates |
Direcotry where the standard templates are found |
|
/askapbuffer/payne/askap-beams |
Standard location of primary beam (holography) images |
Control of Online Services¶
The pipeline makes use of two online databases: the scheduling block service, which provides information about individual scheduling blocks and their parsets; and the footprint service, which translates descriptive names of beam footprints into celestial positions.
These are hosted at the MRO, and it may be that the MRO is offline but
Pawsey is still available. If that is the case, use of these can be
turned off via the USE_CLI
parameter (CLI=”command line
interface”). If you have previously created the relevant metadata
files, the pipeline will be able to proceed as usual. If the footprint
information is not available, but you know what the footprint name
was, you can use the IS_BETA
option. See
User Parameters - Mosaicking for more information and related
parameters.
Variable |
Default |
Description |
---|---|---|
|
true |
A parameter that determines whether to use the command-line interfaces to the online services, specifically schedblock and footprint. |
|
false |
A special parameter that, if true, indicates the dataset was taken with BETA, and so needs to be treated differently (many of the online services will not work with BETA Scheduling Blocks, and the raw data is in a different place). |
Reprocessing without raw data present¶
It is possible that re-processing needs to be done (through re-running processASKAP.sh), but the raw data has been removed or is not available. Normally, various bits of metadata are obtained from the raw MSs, which are used to set up the processing. Some of this is determined through the mslist (Measurement summary and data inspection utility) tool, producing a metadata file that is later parsed. Also required is the total number of channels in the input data (summed over all MSs for a given beam).
If the raw data is not available, these metadata should be provided through the configuration file using the parameters listed below. For the pipeline run to work, the local versions of the MS data must exist - the splitting/merging will not be run again (since there is no raw data to obtain it from).
Variable |
Default |
Description |
---|---|---|
|
“” |
The total number of channels in the dataset. If a beam’s
data is spread over more than one MS, this is the sum of all
channels for that beam. If |
|
“” |
The total number of channels in the dataset. If a beam’s
data is spread over more than one MS, this is the sum of all
channels for that beam. If |
|
“” |
The file produced by mslist (typically from a previous pipeline run) that shows the MS metadata for a calibration observation. The full path to the file must be given. |
|
“” |
The file produced by mslist (typically from a previous pipeline run) that shows the MS metadata for a science observation. The full path to the file must be given. |
Calibrator switches¶
These parameters control the different types of processing done on the
calibrator observation. The three aspects are splitting by beam/scan,
flagging, and finding the bandpass. The DO_1934_CAL
acts as the
“master switch” for the calibrator processing.
Variable |
Default |
Description |
|
---|---|---|---|
|
true |
Whether to process the 1934-638 calibrator observations. If
set to |
|
|
true |
Whether to split a given beam/scan from the input 1934 MS. From rev10559 onwards, users can additionally split out bandpass msdata from a specified Time-Range (see below) |
|
|
true |
Whether to flag the splitted-out 1934 MS |
|
|
true |
Whether to fit for the bandpass using all 1934-638 MSs |
Science field switches¶
These parameter control the different types of processing done on the
science field, with DO_SCIENCE_FIELD
acting as a master switch for
the science field processing.
The pipeline now allows users to convolve continuum images and cubes to common resolution. The common resolution to convolve to is defined by the beam having the coarsest psf. For the coarse cubes, each frequency plane is convolved independently to a resolution common across the mosaic.
Variable |
Default |
Description |
---|---|---|
|
true |
Whether to process the science field observations. If set
to |
|
false |
Whether to use the rapid survey mode of the pipeline - suitable for continuum observations of many fields within a single scheduling block. See How to run the ASKAP pipelines for details. |
|
true |
Whether to split out the given beam from the science MS |
|
true |
Whether to flag the (splitted) science MS |
|
true |
Whether to apply the bandpass calibration to the science observation |
|
true |
Whether to average the science MS to continuum resolution |
|
true |
WHether to run the preimaging tasks (BP application, flagging, averaging) in a single slurm job. Likewise the pre-spectral-imaging jobs (application of gains/leakages, splitting of channels and/or averaging). |
|
true |
Whether to image the science MS |
|
true |
Whether to self-calibrate the science data when imaging |
|
|
Whether to do the continuum source-finding with Selavy. If
not given, the default value is that of |
|
true |
Whether to run the continuum validation script upon completion of the source-finding. |
|
false |
Whether to image the continuum cube(s), optionally in multiple polarisations. |
|
false |
Whether to image the continuum data in Stokes other than I |
|
false |
Whether to image the continuum data at a series of short intervals spanning the course of the observation. |
|
true |
Whether to apply the gains calibration determined from the continuum self-calibration to the averaged MS. |
|
false |
Whether to do the spectral-line processing. Acts as a master switch that can turn off the following parameters. |
|
false |
Whether to copy a channel range of the original full-spectral- resolution measurement set into a new MS. |
|
true |
Whether to apply the gains calibration determined from the continuum self-calibration to the full-spectral-resolution MS. |
|
true |
Whether to subtract a continuum model from the spectral-line dataset. |
|
true |
Whether to do the spectral-line imaging |
|
false |
Whether to image the spectral-line data with joint imaging. |
|
true |
Whether to do the image-based continuum subtraction. |
|
|
Whether to do the spectral-line source-finding with
Selavy. If not given the default value is that of
|
|
true |
Whether to mosaic the individual beam images, forming a
single, primary-beam-corrected image. Mosaics of each field
can be done via the |
|
true |
Whether to use the new imager (imager) for
all imaging. Its use for specific modes can be selected by
the parameters |
|
true |
Whether to convolve the single beam continuum images and
cubes to a common resolution before mosaicing. For the MFS
images, the final resolution is dictated by the beam having
the coarsest psf. For coarse cubes, each frequency channel
is convolved independently to a resolution defined by the
beam having the coarsest psf for that frequency channel.
If |
Post-processing switches¶
After the calibration, imaging and source-finding, there are several tasks that can be done to prepare the data for archiving in CASDA, and these tasks are controlled by the following parameters.
Variable |
Default |
Description |
---|---|---|
|
true |
Whether to run the diagnostic script upon completion of imaging and source-finding. (This is not the continuum validation, but rather other diganostic tasks). |
|
true |
Whether to produce a summary file showing the fraction of data flagged as a function of integration, baseline and channel for each of the Cal MSs. See Validation and Diagnostics for further description. |
|
true |
Whether to produce a summary file showing the fraction of data flagged as a function of integration, baseline and channel for each of the continuum-averaged MSs. See Validation and Diagnostics for further description. |
|
true |
Whether to produce a summary file showing the fraction of data flagged as a function of integration and baseline for each of the spectral-line MSs. See Validation and Diagnostics for further description. |
|
true |
Run specific science validation tasks, such as plotting the cube statistics. |
|
true |
Whether to convert remaining CASA images and image cubes to FITS format (some will have been converted by the source-finding tasks). |
|
true |
Whether to make the PNG thumbnail images that are used within CASDA to provide previews of the image data products. |
|
false |
Whether to tun the casda upload script to copy the data to the staging directory for ingest into the archive. |
|
false |
Whether to tun the casda upload script to copy the bandpass data to the staging directory for ingest into the archive. |
Slurm time and memory requests¶
Each slurm job has a time request associated with it. These default to
24 hours (24:00:00), given by the user parameter
JOB_TIME_DEFAULT
. You can use this parameter to set a different
default. Additionally, you can set a different time to the default for
individual jobs, by using the following set of parameters. Acceptable
time formats include (taken from the sbatch man page): “minutes”,
“minutes:seconds”, “hours:minutes:seconds”, “days-hours”,
“days-hours:minutes” and “days-hours:minutes:seconds”
Each slurm job also may have a memory request. This has not been
necessary on galaxy, since jobs are allocated an entire node, but
setonix allows multiple jobs per node, with the allocation size
controlled by both the requested number of tasks and the memory. There
is a JOB_MEMORY_
equivalent to each JOB_TIME_
parameter, as
well as JOB_PRIORITY_
and JOB_CPUFREQ_
versions.
For the memory, if a specific value is given, it can be a number
followed by a unit indicator (one of [K|M|G|T]
, where the default
is M
for megabytes). The priority can be one of normal
or
high
- by default the bandpass processing is high as are the final
jobs of the science processing (so that a nearly-complete pipeline run
will finish off in preference to starting something else). Note that a
value of zero (0) results in the entire compute node(s) being
allocated to the job.
Each job has a priority setting, which changes the --qos
setting
for the jobs. This is set to “normal” for most jobs, although some get
“high”: all bandpass processing jobs, and all jobs in the
“finalisation” phase (diagnostics, fits update, validation, casda
upload). See table below for specific default values.
Each job may also have its --cpu-freq
setting adjusted. This is
unset by default, but may be specified on a per-job basis, or globally
by setting JOB_CPUFREQ_DEFAULT
. Setting this to, say, medium
will reduce the CPU frequency, causing it to run slower but
potentially using less energy (particularly if the job is very
IO-bound).
The following table lists the variable suffixes, which can be used to
form the JOB_TIME_
, JOB_MEMORY_
, JOB_PRIORITY_
, or JOB_CPUFREQ_
variables
(e.g. there can be a JOB_TIME_SPLIT_1934
and a
JOB_MEMORY_SPLIT_1934
parameter). The default values for the
memory and priority are shown - the defaults for time fall back to
JOB_TIME_DEFAULT
, which takes the value of 24:00:00
.
Variable Suffix |
Description |
Default Memory |
Default Priority |
---|---|---|---|
|
Request for splitting the calibrator MS |
16G |
high |
|
Request for flagging the calibrator data |
8G |
high |
|
Request for plotting the bandpass & leakage solutions, as well as the flagging statistics for the bandpass data |
8G |
high |
|
Request for finding the flagging summary (bandpass data) |
8G |
high |
|
Request for finding the bandpass solution |
0 |
high |
|
Request for finding the bandpass leakage solutions |
0 |
high |
|
Request for applying the bandpass to the bandpass data |
16G |
high |
|
Request for averaging the channels of the bandpass data |
8G |
high |
|
Request for splitting the science MS |
16G |
normal |
|
Request for the pre-imaging job (encompassing bandpass application, flagging, averaging) |
0 |
normal |
|
Request for flagging the science data |
8G |
normal |
|
Request for subsequent flagging of bad (nearly-totally-flagged) channels |
8G |
normal |
|
Request for combining the flag statistics |
8G |
normal |
|
Request for plotting the selfcal gains following self-calibraition |
8G |
normal |
|
Request for combining plots made per beam |
8G |
normal |
|
Request for plotting the flagging statistics |
8G |
normal |
|
Request for making the flagging summary for the averaged data |
8G |
normal |
|
Request for making the flagging summary for the spectral data |
8G |
normal |
|
Request for applying the bandpass to the science data |
16G |
normal |
|
Request for applying the leakage solutions to the science data |
8G |
normal |
|
Request for averaging the channels of the science data |
8G |
normal |
|
Request for the concatenation of time window MSs (averaged data) |
8G |
normal |
|
Request for the concatenation of time window MSs (spectral data) |
8G |
normal |
|
Request for imaging the continuum (both types - with and without self-calibration) |
0 |
normal |
|
Request for the selfcal job when done as a separate job (when
|
0 |
normal |
|
Request for the application of calibration when done as a separate
job (when |
0 |
normal |
|
Request for imaging the continuum cubes |
0 |
normal |
|
Request for the continuum-subtraction from the continuum MS prior ot transient imaging |
4G |
normal |
|
Request for maging a dataset at short time-intervals |
8G |
normal |
|
Request for searching for variables & transients in the fast images |
16G |
normal |
|
Request for the pre-spectral-imaging job (encompassing spectral splitting, applying the gains & leakages, and continuum subtraction) |
16G |
normal |
|
Request for splitting out a subset of the spectral data |
16G |
normal |
|
Request for applying the gains calibration to the spectral data |
8G |
normal |
|
Request for subtracting the continuum from the spectral data |
16G |
normal |
|
Request for imaging the spectral-line data |
0 |
normal |
|
Request for the joint-imaging of spectral-line data |
0 |
normal |
|
Request for performing the image-based continuum subtraction |
0 |
normal |
|
Request for making a cut of the MS around a konwn source |
0 |
normal |
|
Request for convolving the per-beam images to common resolution |
0 |
normal |
|
Request for mosaicking |
0 |
normal |
|
Request for continuum source-finding jobs |
0 |
normal |
|
Request for spectral-line source-finding jobs |
0 |
normal |
|
Request for the diagnostics job |
0 |
high |
|
Request for converting to and/or fixing up the FITS files |
0 |
high |
|
Request for making the thumbnail images |
0 |
high |
|
Request for the various validation jobs |
0 |
high |
|
Request for the casdaupload job |
0 |
high |
|
Request for the final wrap-up job that gathers job statistics |
0 |
high |
|
Request for the relaunch job to restart the pipeline |
0 |
high |
Speed-up switches¶
The ASKAPsoft release 1.1.0 allows some of the processes in the imaging tasks run by the master to exploit thread-level parallelisation across spare CPUs using OpenMP. This can significantly speed up the preconditioning and the CLEANing stages.
Variable |
Default |
Description |
---|---|---|
|
16 |
This number is passed to the The default (16) is set for the |
|
16 |
This number is passed to the The default (16) is set for the |
|
/software/projects/askaprt/fftw-wisdom/baremetal |
The location of ‘wisdom’ files used by the FFTW library to help speed up the FFTs |