Report on data activities in the
to the 23rd CODATA General Assembly
This report describes the current status of data handling in astronomy, including the growth and development of interoperable terabyte data archives, and describes the vigorous current activity in the international development of the Virtual Observatory.
The International Astronomical Union (IAU),
founded in 1919, promotes and safeguards the science of astronomy in all its
aspects through international co-operation. The IAU plays a key role in
promoting and co-ordinating world-wide co-operation in astronomy. Over 8,300
professional astronomers are members of the IAU, 66 countries adhere to the
Details about the
organisation of the IAU and current information on its activities may be found
Within the Astronomical Union, one commission (Commission 5) is primarily concerned with the technical aspects of data handling. Working groups and task groups within Commission 5 deal specifically with information handling, data centres and networks, technical aspects of collection, archiving, storage and dissemination of data, designations and classification of astronomical objects, library services, editorial policies, computer communications, ad hoc methodologies, and with various standards, reference frames, etc.
Astronomy has always been data-rich and data-sophisticated, best exemplified by the data centres and electronic publishing which arrived well in advance of similar developments in most other disciplines. Astronomy is also in the forefront in building links between distributed, heterogeneous systems, thanks to exchange standards such as FITS for data and the Bibcode for the description of bibliographic references, shared by the community. Partnership between data providers, data centres and journal publishers contributed to the success. All major refereed astronomical journals produce an electronic edition in addition to the paper version. Abstracts of articles in all leading journals are available electronically on ADS (http://adsabs.harvard.edu/abstract_service.html), which also offers facilities for searches by various criteria. Preprints are also routinely published on http://xxx.lanl.gov/archive/astro-ph and other specialist web sites.
The international and the major national and local data centres are the repositories for thousands of catalogues of astronomical objects and for many data tables from journal articles. The level of collaboration between the different data centres is such that work is seldom duplicated. In order to serve different geographical communities optimally, well maintained copies of frequently requested data (like those of the Hubble Space Telescope) reside at more than one centre. The three most widely-accessed data centres are the international data centres at:
Centre de Données
NASA/IPAC Extragalactic Database,
Until about 20 years ago astronomy was split into a number of sub-disciplines corresponding to the different wavelengths at which observations were made (optical astronomy, radio astronomy, etc). Each of these tended to use its own data formats, as a result of which it was awkward to combine data from different wavelengths. In 1981 the FITS (Flexible Interchange Transport System) format was proposed (http://fits.gsfc.nasa.gov/documents.html), and adopted enthusiastically by all sub-disciplines of astronomy. This permitted easy interchange of data between the sub-disciplines, and was largely responsible for breaking down cultural barriers between these sub-disciplines. As a result, many astrophysicists today take their data at whichever wavelength is needed to solve the astrophysical problem being addressed. This is probably partly responsible for the current healthy state of astronomy and astrophysics internationally, with new discoveries about the origin and evolution of the Universe being made at a breathtaking rate.
While FITS has been enormously and demonstrably
successful, astronomical data are now seen by some as outgrowing the 1980s
technology on which FITS was based. The
An even more sophisticated approach has been taken by organisations in US and Europe that are building the necessary components and infrastructure for a Virtual Observatory (see below), resulting in a prototype standard called VOTable, relying on cutting-edge web services and directories.
The International Virtual Observatory will link the archives of all the world's major observatories (each of which may contain several terabytes of data) into one distributed database, with powerful tools to optimise the extraction of science from the data. As a result, data from all the world's major observatories will be available to all users, and to the public. A user can simply request some data or an image of some particular part of the sky, or perhaps the result of some operation on several data sets, and the IVO will provide the result to the user. If the data do not yet exist, the IVO may tell the user how to obtain it, or might in some cases direct a robotic telescope to obtain the data.
The concept of a Virtual Observatory is
based on the fact that scientific discoveries are generated as much by use of
archive data as by use of "live" observations. For example, data from
the Hubble Space Telescope typically gets used four times: once by the original
investigator and three times more by other astronomers accessing the HST
archive. To extend this grand concept to all major observatories requires a
great deal of IT development, and there is a strong effort in the US (http://www.us-vo.org/) and Europe (http://www.eso.org/avo/) to develop this concept, funded at a
level of tens of millions of dollars. Smaller efforts have been mounted in
However, the concept of the Virtual Observatory goes well beyond merely accessing data from an observatory or data centre. It may also encompass theoretical or numerical modelling, or other processes. The real power will come when a user can perform a query on distributed databases, each of several terabytes, residing in different parts of the world, with a query that itself may require significant computational power (E.g. “give me the spectral distribution of all objects in this French database which satisfy this complex criterion on these databases in the US and Australia”). Such problems lead naturally to interesting Grid Computing challenges which have excited significant interest and investment both from commercial companies and from IT researchers.
All the Virtual Observatory groups have recently combined into the International Virtual Observatory Alliance (http://ivoa.org/).