Report on data activities in
the
International
Astronomical
to the 23rd CODATA General Assembly
(
This report describes the current status of data handling in astronomy, including the growth and development of interoperable terabyte data archives, and describes the vigorous current activity in the international development of the Virtual Observatory.
The International Astronomical Union (IAU),
founded in 1919, promotes and safeguards the science of astronomy in all its
aspects through international co-operation. The IAU plays a key role in
promoting and co-ordinating world-wide co-operation in astronomy. Over 8,300
professional astronomers are members of the IAU, 66 countries adhere to the
Details about the
organisation of the IAU and current information on its activities may be found
at the
Within the
Astronomical Union, one commission (Commission 5) is primarily concerned with
the technical aspects of data handling. Working groups and task groups within
Commission 5 deal specifically with information handling, data centres and
networks, technical aspects of collection, archiving, storage and dissemination
of data, designations and classification of astronomical objects, library
services, editorial policies, computer communications, ad hoc methodologies, and with various standards, reference frames,
etc.
Astronomy has
always been data-rich and data-sophisticated, best exemplified by the data
centres and electronic publishing which arrived well in advance of similar
developments in most other disciplines. Astronomy is also in the forefront in
building links between distributed, heterogeneous systems, thanks to exchange
standards such as FITS for data and the Bibcode for the description of
bibliographic references, shared by the community. Partnership between data providers, data
centres and journal publishers contributed to the success. All major refereed astronomical
journals produce an electronic edition in addition to the paper version.
Abstracts of articles in all leading journals are available electronically on
ADS (http://adsabs.harvard.edu/abstract_service.html),
which also offers facilities for searches by various criteria. Preprints are
also routinely published on http://xxx.lanl.gov/archive/astro-ph
and other specialist web sites.
The international
and the major national and local data centres are the repositories for thousands
of catalogues of astronomical objects and for many data tables from journal
articles. The level of collaboration between the different data centres is such
that work is seldom duplicated. In order to serve different geographical
communities optimally, well maintained copies of frequently requested data
(like those of the Hubble Space Telescope) reside at more than one centre. The
three most widely-accessed data centres are the international data centres at:
Centre de Données
NASA/IPAC Extragalactic Database,
Until about 20 years ago astronomy was
split into a number of sub-disciplines corresponding to the different
wavelengths at which observations were made (optical astronomy, radio
astronomy, etc). Each of these tended to use its own data formats, as a result
of which it was awkward to combine data from different wavelengths. In 1981 the
FITS (Flexible Interchange Transport System) format was proposed (http://fits.gsfc.nasa.gov/documents.html),
and adopted enthusiastically by all sub-disciplines of astronomy. This
permitted easy interchange of data between the sub-disciplines, and was largely
responsible for breaking down cultural barriers between these sub-disciplines.
As a result, many astrophysicists today take their data at whichever wavelength
is needed to solve the astrophysical problem being addressed. This is probably
partly responsible for the current healthy state of astronomy and astrophysics
internationally, with new discoveries about the origin and evolution of the
Universe being made at a breathtaking rate.
While FITS has been enormously and demonstrably
successful, astronomical data are now seen by some as outgrowing the 1980s
technology on which FITS was based. The
An even more sophisticated approach has
been taken by organisations in US and Europe that are building the necessary
components and infrastructure for a Virtual Observatory (see below), resulting
in a prototype standard called VOTable, relying on cutting-edge web services
and directories.
The International Virtual Observatory will
link the archives of all the world's major observatories (each of which may
contain several terabytes of data) into one distributed database, with powerful
tools to optimise the extraction of science from the data. As a result, data
from all the world's major observatories will be available to all users, and to
the public. A user can simply request some data or an image of some particular
part of the sky, or perhaps the result of some operation on several data sets,
and the IVO will provide the result to the user. If the data do not yet exist,
the IVO may tell the user how to obtain it, or might in some cases direct a
robotic telescope to obtain the data.
The concept of a Virtual Observatory is
based on the fact that scientific discoveries are generated as much by use of
archive data as by use of "live" observations. For example, data from
the Hubble Space Telescope typically gets used four times: once by the original
investigator and three times more by other astronomers accessing the HST
archive. To extend this grand concept to all major observatories requires a
great deal of IT development, and there is a strong effort in the US (http://www.us-vo.org/) and Europe (http://www.eso.org/avo/) to develop this concept, funded at a
level of tens of millions of dollars. Smaller efforts have been mounted in
However, the concept of the Virtual
Observatory goes well beyond merely accessing data from an observatory or data
centre. It may also encompass theoretical or numerical modelling, or other
processes. The real power will come when a user can perform a query on
distributed databases, each of several terabytes, residing in different parts
of the world, with a query that itself may require significant computational
power (E.g. “give me the spectral distribution of all objects in this French
database which satisfy this complex criterion on these databases in the US and
Australia”). Such problems lead naturally to interesting Grid Computing
challenges which have excited significant interest and investment both from
commercial companies and from IT researchers.
All the Virtual Observatory groups have
recently combined into the International Virtual Observatory Alliance (http://ivoa.org/).
Ray Norris,