CASDA Users Guide
Version 1.3 December 2016
Welcome to the CSIRO ASKAP Science Data Archive (CASDA) Users Guide. This Guide is intended to help astronomers get started with finding and making use of data products from the Australian Square Kilometre Array Pathfinder (ASKAP).
The first ASKAP data products have been produced from science commissioning observations taken with the Boolardy Engineering Test Array (BETA). These show the potential and unique wide-field fast survey capabilities of ASKAP and provide some demonstration data sets to the astronomy community.
For information on the initial data release see the BETA Data Release Notes.
For a reference document that preceded the construction of CASDA see the pdf document:
CSIRO ASKAP Science Data Archive: Overview, Requirements and Use Cases (Chapman et al. 2014)
Contents
1 CASDA Overview
- 1.1 About CASDA
- 1.2 ASKAP data products
- 1.3 CASDA services on the CSIRO Data Access Portal
- 1.4 CASDA Virtual Observatory services
- 1.5 User Authentication and OPAL registration
- 1.6 Restricted CASDA tasks and project roles
- 1.7 Pawsey Supercomputing Centre user accounts
- 1.8 Getting help
2 Using CASDA with the CSIRO Data Access Portal
- 2.1 Login to the DAP using an OPAL or NEXUS account
- 2.2 Find information about ASKAP Data Collections
- 2.3 Find a persistent link and Digital Object Identifier for a data collection
- 2.4 Search for data products using the DAP search form
- 2.5 Carry out a cone search using the CASDA Observation Search form
- 2.6 Download a single file to an external locations
- 2.7 Download multiple files to an external locations
- 2.8 Generate and download a single cutout of an image or cube
3 Using CASDA with Virtual Observatory Services for Catalogues
- 3.1 Install TOPCAT
- 3.2 Find and download catalogues using the VO Table Access Protocol (TAP)
- 3.3 Run simple queries in Astronomical Data Query Language (ADQL)
- 3.4 Try out some TOPCAT Features
- 3.5 Create sky plots showing positions of radio detections
- 3.6 Create an x-y scatter plot
- 3.7 Create a histogram plot
- 3.8 Create a new column and add to existing table
- 3.9 Carry out cone searches on a catalogue using the VO Cone Search Protocol
- 3.10 Search for data products programatically using user scripts
- 3.11 Plot positions of radio detections on an optical image covering the sky region using TOPCAT with Aladin
- 3.12 Cross-match information from a CASDA catalogues with a catalogue obtained from VizieR and generate a merged catalogue
4 Using CASDA Virtual Observatory Services for Images and Image Cubes
- 4.1 Find images and image cubes and see the access services
- 4.2 Use the VO Simple Image Access Protocol for programmatic data access
- 4.3 Generate cut outs from images and image cubes using a script
5 Restricted CASDA Services
- 5.1 Download data to a location in the Pawsey Supercomputing Centre
- 5.2 Set data validation flags and information
- 5.3 See the list of individuals assigned to a Science Team in CASDA
- 5.4 Manage roles for team members
- 5.5 Add an individual to a Science Team
- 5.6 Publish a Science Team catalogue: General advice
- 5.7 Submit a VO catalogue for CASDA publication
- 5.8 Approve and release a 'level 7' Science Team catalogue
6 Publications and Acknowledgements
7 Links to external documentation
8 Document versions
1 CASDA Overview
1.1 About CASDA
The CSIRO ASKAP Science Data Archive (CASDA) provides the long term storage for ASKAP data products and the hardware and software facilities that enable astronomers to make use of these. Data products are stored at the Pawsey Supercomputing Centre in Perth, Western Australia.
ASKAP is a data driven facility where the data rates are extremely high. In full operations, the ASKAP data rates arriving at the Pawsey Centre will reach around 75 Petabytes (PB) per year. This is beyond the current ability to archive data and so raw data are not archived. Such high data rates require instead that ASKAP data processing is carried out using automated pipelines to produce 'data products' and associated metadata. These are stored and made available through the science archive. The archive can be thought of as the end stage of the full system.
ASKAP Early Science observations are due to begin later in 2016. So that users can gain experience and provide feedback to the CASDA team, CASDA now provides access to demonstration data products obtained during science commissioning and pilot observing programs that have been carried out with a small number of ASKAP antennas from the BETA array.
1.2 CASDA data products
CASDA provides three types of data products:
- Calibrated visibility files
Calibrated visibility files are stored for total intensity and polarisation continuum observations. During Early Science, calibrated visibility data files for spectral line observations will be archived on a best efforts basis. Visibilities are stored in CASA measurement set format.
- Images and image cubes
Radio continuum and spectral line images and image cubes are stored in FITS format.
- Catalogues
As part of the data processing pipelines, detection algorithms are used to search images and image cubes for source detections. Detection parameters such as positions and flux densities are captured in catalogues. The CASDA catalogue registry includes source detection catalogues with parameters determined from continuum, polarisation and spectral line images or image cubes.
1.3 CASDA services on the CSIRO Data Access Portal
CASDA provides data services in two ways: using search interfaces and tools that are accessed through the CSIRO Data Access Portal (DAP), and using Virtual Observatory (VO) Services.
The CSIRO DAP provides access to many archival data products managed by CSIRO across the organisation. These include a wide range of science areas in addition to astronomy. Many customised tools have been added to the DAP to support CASDA. See section 2 for instructions on how to use the CASDA DAP services.
1.4 CASDA Virtual Observatory services
The VO uses standard protocols to enable catalogue or image data obtained from one facility to be easily compared with similar data from other facilities. For example a user might wish to compare source detections from a radio survey with detections from an optical or infrared surveys, or he/she might wish to plot a set of radio source positions on top of an optical image.
For catalogues, we strongly recommend using TOPCAT (Tool for OPerations on Catalogues And Tables). TOPCAT is a freely available program that provides an interface to Virtual Observatory compatible data products. TOPCAT is like a 'browser' that provides access to VO supported data products together with many tools for working with these.
TOPCAT is a richly tooled program with many features that support working with catalogues. In this guide we give some start-up instructions for using CASDA with TOPCAT. For full tutorial and reference documentation please see the TOPCAT website.
As TOPCAT does not currently support images, CASDA provides access to images and image cubes through web-based interfaces. In addition, users can programatically access images using python or other scripts. Image cut-outs, where sub-sections of images or cubes are extracted, can be generated and downloaded.
1.5 User Authentication and OPAL registration
CASDA allows users to search for data and to see what is available in the archive, without any authentication. However, user authentication is required to download data files including calibrated visibilities, images and image cubes.
For science users, CASDA supports two types of user authentication as follows:
- OPAL For general use, we recommend using OPAL authentication. To register with OPAL, go to the OPAL Home Page and click on the link to 'Register'.
Enter your email address, name, affiliation and a password. The OPAL application will register you straight away.
OPAL user accounts are self-managed. Please keep your account details up to date. To change user-registration details, or to request a new OPAL password, use the link to 'Log in or reset password'.
- CSIRO Nexus authentication is available for individuals who have CSIRO NEXUS accounts.
1.6 Restricted CASDA tasks and project roles
In addition to general user authentication, a small number of CASDA tasks are restricted to archive administrators and/or members of the Survey Science teams. These include setting data validation flags, and/or accessing project data prior to public release. Tasks with restricted access are described in section 5.
CASDA recognises special roles for Administrators and Survey Science team members as follows:
- Individuals with project administration rights can add and remove individuals from the project team. They assign roles to team members, can enter data validation information and access unreleased data products for their projects. At present the project-administration tasks are restricted to CASDA staff administrators.
- 'Validators' enter data validation information and access unreleased data products for their projects.
- Other team members are given access to unreleased data products for their projects but do not have permission to enter data validation information.
1.7 Pawsey Supercomputing Centre user accounts
For users who have user accounts at the Pawsey Supercomputing Centre, CASDA provides fast downloads to the Pawsey Galaxy and/or Magnus supercomputers.
Pawsey accounts are restricted to individuals who are part of science teams that have been granted access to Pawsey facilities through competitive merit allocation processes. For further information please refer to the Pawsey website or contact the CASDA helpdesk.
1.8 Getting Help
CASDA provides user support and documentation in several ways:
- This User Guide is intended to provide an overview of the CASDA data services and to help new users get started. To get going we recommend trying some of the items described in sections 2 and 3.
- The CASDA application pages on the CSIRO Data Access Portal provide some 'tooltips'. These are shown as yellow question marks. Click on a question mark to read the tip.
- Online documentation is available with the DAP . Click on the 'help' link at the top of the page.
- For enquiries and staff support relating to all CSIRO radio astronomy data archives, including CASDA, please send
an email to the helpdesk: atnf-datasup@csiro.au. You will receive an automated email from our helpdesk to acknowledge that your
request has been logged. A CASS staff member will reply soon afterwards. We aim to send an initial reply to user queries within four business hours.
Suggestions for improvements to CASDA tools and to this Guide are always welcome. Please send all comments to the helpdesk.
2 Using CASDA with the CSIRO Data Access Portal
The information in this and the following section is presented as a set of 'how to instructions' using specific data sets as examples.
2.1 Login to the CSIRO Data Access Portal (DAP) using an OPAL or NEXUS account |
---|
|
Notes |
2.2 Find information about ASKAP Data Collections |
---|
|
Notes In general terms, a data collection is a group of similar data files. For CASDA, each ASKAP project has two collections. One collection holds the data catalogues and the other the data product files for images, image cubes and visibilities. The CSIRO DAP holds data for thousands of collections. You might like to explore the DAP using different keyword searches. Some example keywords are 'pulsars', 'mosquitos' and 'climate change' or try guessing. AS031 is the (only) project code used for ASKAP BETA science commissioning observations. On the DAP Home Page, lists of collections can be sorted using the sort options that are shown under the large blue search button. |
2.3 Find a persistent link and a Digital Object Identifier for a data collection |
---|
|
Notes For CASDA the DOIs will be updated at intervals of around six months from March 2016 onwards. DOIs are part of the collections information and can be included with journal publications. |
2.4 Search for data products using the DAP search form |
---|
|
Notes The search form will return information for all collections and data products that match the criteria. Only tabs that have associated data files are shown. Selecting a data tab and then clicking on the project number will open the collections information that is relevant to that tab. 'Unreleased' data are only available to team members. Data are released following a data validation process. |
2.5 Carry out a cone search using the CASDA Observation Search form |
---|
This example finds data products associated with a single cone search in the Tucana region.
CASDA can resolve some objects by name, based on Simbad and NED catalogues. Just type in the object name and click on the 'Resolve' button. Cone searches can also be run for multiple sky positions. Click on the 'multiple positions' radio button. An ascii (text) file can now be uploaded, with one position per line. The search returns the combined results from running multiple cone searches. Each row of the source files should contain: right ascension (J2000), declination (J2000), radius in arcmin (optional) with spaces between entries. Position separators can be either spaces or colons. Blank lines are permitted. If no radius is included on one or more lines, then a default radius must be entered as the search radius. Examples of accepted formats are:
|
Notes The online help tip (small question mark) also explains the format for source files. Source lists are currently limited to 50 rows for Internet Explorer browsers and 100 rows for other browsers. |
2.6 Download a single file to an external location |
---|
In this example an image covering 150 square degrees of the Tucana region, created by Ian Heywood from data combined over four sub-bands and three
epochs is downloaded to your local computer. This approach can be used to download any available data products including images, visibilities and catalogues.
|
Notes For this example, the image file size is 113 MB. This may take some time to download across the networks. For each data file, an additional file is provided with checksum information. This can be used to check the download has fully worked. The Schedule Block ID (1206) corresponds to the first of four schedule blocks used for this data product. For BETA observations, where more than one observation block is used to create a data product CASDA associates the data file with the first scheduling block (only). For ASKAP Early Science data products (not yet available), a search for a scheduling block will return all data products associated with the scheduling block. |
2.7 Download multiple files to an external location |
---|
If you wish to download a number of files, a better approach is to save the links as a text file to your local computer by clicking on the link to 'Save links as text file'.
This can then be used together with a script to access the files.
There are many options for downloading files from a list of URLs. Here we describe two methods. These use the unix commands 'xargs', 'wget' and 'parallel'.
|
Notes
If the Unix commands are not already available then you may need to ask a system adminstrator to install them for you. For downloading multiple files an alternative approach is to use a download manager. Many options are freely available on the internet. One example, Download Fast is an open source, free multiplatform download manager that supports multithreaded (i.e. parallel) downloading of large files. |
2.8 Generate and download a single cutout of an image or cube |
---|
In this example a cutout is generated from an image. To begin with we select the 150 square degree image of the Tucana region from Section 2.6.
|
Notes
The background image in the Aladin-lite GUI defaults to SUMSS, to show bright radio sources in the area of interest. You can click on the icon in the top left corner of the GUI to select different images (e.g. DSS or WISE). A checksum file is provided so that the user can double check that the cutout file download has worked. |
3 Using CASDA with Virtual Observatory Services for Catalogues
The notes given in this section provide an introduction to finding and working with CASDA VO tables, mostly together with TOPCAT. These are intended to be sufficient to get started but do not cover many of the available tools and features.
The notes given here correspond to TOPCAT version 4.3.
3.1 Install TOPCAT |
---|
|
Notes TOPCAT and Java are freely available. To see which version of TOPCAT you are using open TOPCAT then click on Help > About TOPCAT. |
3.2 Find and download CSIRO astronomy catalogues using the VO Table Access Protocol (TAP) |
---|
In this example, you will find and download a list of VO catalogues supported by CSIRO and then select a catalogue, casda.continuum_island. This contains parameters for radio astronomy 'islands' detected using source
find algorithms for an image of the Tucana region.
|
Notes
TOPCAT write options include a range of table formats including VO-table, html, FITS, ascii and csv. To write out a VO table use the format option = 'votable-tabledata'. To navigate to a selected location on your computer use the 'Filestore Browser'. To find all CSIRO TAP services enter 'CSIRO' as a keyword. |
3.3 Run simple queries in Astronomical Data Query Language (ADQL) |
---|
Here are three simple examples using ADQL to query a catalogue.
|
Notes The asterisk in examples 1 and 3 indicates that all columns are selected. Line returns are optional in ADQL queries. ADQL queries are generally straightforward to construct. Here is a link to the full ADQL documentation . The ADQL examples provided in TOPCAT provide an easy way of seeing some of the syntax. |
3.4 Try out some TOPCAT Table Icons |
---|
|
Notes Using the mouse to hover over an icon will bring up a description of it. Note that the icons are arranged in groups. This group of five icons provides diffferent displays of the table data and metadata. |
3.5 Create sky plots showing positions of radio detections |
---|
This example shows how to do a sky plot for a set of source positions.
To get started with plots, here are a few suggestions to try out: - To change the plot centre - use the mouse with a left-click.
|
Notes TOPCAT provides powerful tools for plotting table data and allows a considerable degree of customization. In this guide we provide a few examples. For full tutorial and reference documentation, see the TOPCAT user manual . |
3.6 Create an x-y scatter plot |
---|
This example generates an x-y plot that shows the major axis determined for radio continuum components, plotted
against the ratio of the integrated to peak flux densities. Point sizes are scaled using the ratio of the major to minor axes.
|
Notes
Data subsets can also be plotted and could be used, for example, to show data subsets using different colours. Note that selected plot options can be turned on and off using tick boxes located to the left. In effect, plots are constructed using a set of layers that can be included or excluded as needed. |
3.7 Create a histogram plot |
---|
This example shows how to create a plot with two overlaid histograms that show the peak and integrated flux densities for radio continuum components.
|
Notes |
3.8 Create a new column and add to existing table |
---|
In this example an extra column is added to a table, corresponding to the logarithm of the peak flux density.
|
Notes
If no value is given the column is added as the last column in the table. Note that column headers and table contents can be edited. Universal Content Descriptors (UCDs) are part of a formal, controlled vocabulary for astronomical data, provided by the IVOA. The use of UCDs facilitates sharing information. For further information, see UCD1+ controlled vocabulary. |
3.9 Carry out cone searches on a catalogue using the VO Cone Search Protocol |
---|
This example searches the CASDA table casda.continuum_island for sources within a given radius of a
position and writes the results into another VO table.
|
Notes CASDA currently restricts cone searches to a maximum radius of five degrees If this is a problem for you please let us know. |
3.10 Search source detection catalogues programmatically using user scripts |
---|
The CASDA TAP service can be used together with user scripts. This can be helpful, for example, to facilitate access to large numbers of files.
In this example we provide a sample python script, tapquery_vo.py that could be easily modified as needed for your specific purposes. This connects to the CSIRO TAP Service and finds a catalogue, casda.continuum_component. A simple ADQL query is used to search the catalogue for radio components with peak flux densities below 600 mJy. The output is written in xml/VO-format to a file data.xml in the local folder.
To run the script:
|
Notes
You may need to ask computing staff for assistance with installing python modules such as astropy. For users with Pawsey accounts; astropy is installed on the Galaxy supercomputer in the Pawsey Supercomputing Centre. For this script the astropy module is used to write the output to a VO-format table. Here is a similar script that can be run without astropy tapquery_csv.py. This version writes the output to a csv file. This can also be read by TOPCAT. The sample script can be modified to connect to different catalogues and to carry out different ADQL commands. For additional documentation see the internal comments in the script. |
3.11 Plot positions of radio detections on an optical image covering the sky region using TOPCAT with Aladin (version 9) |
---|
|
Notes Aladin is an interactive sky atlas from the Centre de Donneées Astronomique de Strasbourg (CDS). Note the other surveys provided by Aladin Desktop. The same approach can be used with any of these. |
3.12 Cross-match information from a CASDA catalogue with a catalogue obtained VizieR and generate a merged catalogue |
---|
|
Notes TOPCAT provides several options for joining tales and matching positions. For full details please see TOPCAT documentation. |
4 Using CASDA Virtual Observatory Services for Images and Image Cubes
For images and image cubes, the Virtual Observatory provides two protocols. The SIAP (Simple Image Access Protocol) is used to 'discover' relevant files, whilst the SODA (Server-side Operations for Data Access) protocol provides the data access.
These services are primarily intended for use with scripts written by science teams using python or other scripting languages. This approach allows a high degree of customisation together with the ability to handle many files at a time.
These protocols may be somewhat difficult to use whilst some knowledge of python is needed to develop scripts. We are happy to help users get started and would welcome any feedback to assist us improve the features described in this section. Please send any comments to ATNF Data Support.
4.1 Find images and image cubes and see access services. | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
This item describes how to navigate to a list of images and image cubes services. When one image or cube is selected, several data access services are shown. Sections 4.2 and 4.3 provide examples of accessing files and generating cut-outs using python scripts.
| ||||||||||||||||||
Notes The query string 'BAND = 0.25 0.30' refers to a wavelength range, with units given in metres. |
4.2 Use the VO Simple Image Access Protocol for programmatic data access |
---|
In this example we provide a sample python script, siap.py, that can be used to download images and image cubes overlapping a particular sky region.
Supported formats for right ascension are 22:58:04.88 22h58m4.88s and 344.52039 , and for declination -36:25:39.4 -36d25m39.4s and -36.42758
The script will query the CASDA SIAP service for images that overlap a 0.1 degree radius circle centred on the coordinates provided. All images and image cubes will then be downloaded in full. To run this script:
As an example, the following command will download HI images of the galaxy group IC1459. The output files are written into a folder 'output'. For a 0.1 radius, three images are produced. >> python siap.py OPAL_username 344.52039 -36.42758 output | Notes This script provides an example that can be customised for your own purposes. Extensive internal documentation is included. To provide more detail, the script:
To access files, the script uses OPAL authentication and this must be provided for all downloads. You will only be able to retrieve images which have been openly released or that you have access to as a member of a science team. Please note that the OPAL_Password should be surrounded by single quotes on a mac or linux command line if there are any non-alphanumeric characters in the password. |
4.3 Generate cut outs from images and image cubes using a script |
---|
In this example we provide a sample python script,
cutouts.py, that can be used to generate a set of cut outs obtained from the images and image cubes generated from observations associated with a scheduling block.
The script reads a VO-format catalogue, casda_continuum_component, that includes positions for radio continuum components. For each component with a peak flux density above 150 mJy it generates a small cut-out image from the associated image files where the detected position falls within the full-size image or cube. The cut out images have sizes of 0.2 x 0.2 degrees in right ascension and declination and contain the same number of planes as is the full sized image or cubes. Thus a cut-out from a full spectral line image cube may have up to 16,200 channels along the third axis. A cut-out from a single plane image will also be a single plane image. To run this script:
As an example, the following command will generate cut outs from a radio continuum image of the Tucana region. The output files are written into a folder 'output'. For a flux-cut-off of 150 mJy, 89 cut-out images are produced. >> python cutouts.py OPAL_username 609 output |
Notes This script provides an example that can be customised for your own purposes. Extensive internal documentation is included. Note that the script includes a switch to retrieve entire image cubes instead of cutouts. To provide more detail, the script:
To access files, the script uses OPAL authentication and this must be provided for all downloads. You will only be able to retrieve images which have been openly released or that you have access to as a member of a science team. Please note that the OPAL_Password should be surrounded by single quotes on a mac or linux command line if there are any non-alphanumeric characters in the password. |
5 Restricted CASDA Tools
5.1 Download data to a location in the Pawsey Supercomputing Centre (Pawsey user account required) |
---|
In this example CASDA is used to generate a set of data links that are then used to transfer the files to a location in the Pawsey Supercomputing Centre (usually on Galaxy or Magnus). The files can only be accessed from a Pawsey user account. The following steps are carried out from a Nexus or OPAL account:
|
Notes
See section 1.6 for information on Pawsey accounts If you are waiting some time for files to download, bookmark or otherwise save the URL for the webpage with links. As a note for Windows users: To cut and paste text between files in different locations or between Windows and Unix, i) highlight the text to copy, ii) cntrl-C to copy, iii) right-click to insert. |
5.2 Set data validation flags and information (Survey Science Team members with validator permissions ) |
---|
Data validation is carried out using the CSIRO DAP or Survey Science Projects by members of the Survey Science teams. This task can only be carried out by
individuals with validation-level access to the data products.
The validation process involves setting a flag in the data base and adding additional information. Validation is applied at the level of individual data products. So for a set of images created for a project - a validation flag is set for each image. Following validation data products are 'released'. The steps below describe the process:
|
Notes For help with the validation process please contact the helpdesk: atnf-datasup@csiro.au. |
5.3 See the list of individuals assigned to a science team (CASDA Administrator Task) |
---|
|
Notes In a later CASDA release, this task will later be made available to project-level adminstrators. |
5.4 Manage roles for team members (CASDA Administrator Task) |
---|
|
Notes In a later CASDA release, this task will be made available to project-level administrators. |
5.5 Add an individual to a science team (CASDA Administrator Task) |
---|
|
Notes In a later CASDA release, this task will later be made available to project-level administrators. As a starting point, CASDA generates a list of team members from the PI and co-authors on the associated OPAL project proposals. This list is then managed through CASDA. |
5.6 Publish a Science Team catalogue: General advice (for Survey Science Teams) |
---|
For some ASKAP surveys, science teams will generate catalogues with final science results. These are referred to here as 'Science Team catalogues'. As an example, a catalogue might contain a
set of polarisation properties for a list of detected source. Another catalogue could contain properties derived from HI spectra for nearby galaxies, together with optical identifications. Compiling such catalogues is
the responsibility of Science Teams.
CASDA provides a service so that Science Team catalogues can be published as part of the ASKAP science archive. If your team is interested in publishing level 7 catalogues please note the following advice.
|
Notes CASDA catalogues are published together with digital object identifiers (DOIs). These can be cited in journal publications. |
5.7 Submit a catalogue for CASDA publication (Nexus account required) |
---|
|
Notes You may need some assistance for catalogue deposits. Please see the help tips (yellow question marks) and/or contact ATNF data support. |
5.8 Approve a Science Team catalogue (Approvers only) |
---|
|
Notes |
6 Publications and Acknowledgements
See CASS publications and acknowledgments7 Links to external documentation
8 Document versions
Author | User Guide Version |
Date | Latest CASDA version (release date) |
TOPCAT Version |
Notes |
---|---|---|---|---|---|
M Huynh | 1.3 | 07 Dec 2016 | 1.4 (Oct 2016) | 4.3 | Small updates and new sub-section on DAP cutout service. |
J Chapman | 1.2 | 01 Jun 2016 | 1.2 (Apr 2016) | 4.3 | Many small updates and new sub-sections on generating plots. |
J Chapman | 1.1 | 24 Mar 2016 | 1.1 (Feb 2016) | 4.3 | Updates and new content in sections 1 to 5. |
J Chapman | 1.0 | 05 Nov 2015 | 1.0 (5 Nov 2015) | 4.2 | Initial release |