Report on data activities in the


International Astronomical Union


to the 23rd CODATA General Assembly


(Montreal, October 2002)


This report describes the current status of data handling in astronomy, including the growth and development of interoperable terabyte data archives, and describes the vigorous current activity in the international development of the Virtual Observatory.

Background: The International Astronomical Union

The International Astronomical Union (IAU), founded in 1919, promotes and safeguards the science of astronomy in all its aspects through international co-operation. The IAU plays a key role in promoting and co-ordinating world-wide co-operation in astronomy. Over 8,300 professional astronomers are members of the IAU, 66 countries adhere to the Union. Eleven Scientific Divisions and, through them about 40 specialised Commissions cover the full spectrum of astronomy. General Assemblies, held every three years, define the IAU’s long-term policy, which is implemented by the Executive Committee. The permanent IAU secretariat is located at the Institut d’Astrophysique in Paris. 

Details about the organisation of the IAU and current information on its activities may be found at the Union’s web-site ( The Information Bulletins, a complete membership directory, as well as details about the Union’s Divisions and Commissions are available at this address.

Within the Astronomical Union, one commission (Commission 5) is primarily concerned with the technical aspects of data handling. Working groups and task groups within Commission 5 deal specifically with information handling, data centres and networks, technical aspects of collection, archiving, storage and dissemination of data, designations and classification of astronomical objects, library services, editorial policies, computer com­munications, ad hoc methodologies, and with various standards, reference frames, etc.

Data Handling in Astronomy

Astronomy has always been data-rich and data-sophisticated, best exemplified by the data centres and electronic publishing which arrived well in advance of similar developments in most other disciplines. Astronomy is also in the forefront in building links between distributed, heterogeneous systems, thanks to exchange standards such as FITS for data and the Bibcode for the description of bibliographic references, shared by the community.  Partnership between data providers, data centres and journal publishers contributed to the success. All major refereed astronomical journals produce an electronic edition in addition to the paper version. Abstracts of articles in all leading journals are available electronically on ADS (, which also offers facilities for searches by various criteria. Preprints are also routinely published on and other specialist web sites.

Astronomical Data Centres

The international and the major national and local data centres are the repositories for thousands of catalogues of astronomical objects and for many data tables from journal articles. The level of collaboration between the different data centres is such that work is seldom duplicated. In order to serve different geo­graphical communities optimally, well maintained copies of frequently requested data (like those of the Hubble Space Telescope) reside at more than one centre. The three most widely-accessed data centres are the international data centres at:

Centre de Données Strasbourg, France                      ,

Astronomical Data Center, Maryland, USA               ,

NASA/IPAC Extragalactic Database, Caltech, USA 

Data Formats

Until about 20 years ago astronomy was split into a number of sub-disciplines corresponding to the different wavelengths at which observations were made (optical astronomy, radio astronomy, etc). Each of these tended to use its own data formats, as a result of which it was awkward to combine data from different wavelengths. In 1981 the FITS (Flexible Interchange Transport System) format was proposed (, and adopted enthusiastically by all sub-disciplines of astronomy. This permitted easy interchange of data between the sub-disciplines, and was largely responsible for breaking down cultural barriers between these sub-disciplines. As a result, many astrophysicists today take their data at whichever wavelength is needed to solve the astrophysical problem being addressed. This is probably partly responsible for the current healthy state of astronomy and astrophysics internationally, with new discoveries about the origin and evolution of the Universe being made at a breathtaking rate.

While FITS has been enormously and demonstrably successful, astronomical data are now seen by some as outgrowing the 1980s technology on which FITS was based. The Astronomical Data Center at NASA's Goddard Space Flight Center ( has applied newer technology such as the eXtensible Markup Language (XML) to solve astrophysics data interchange issues. These include automated data ingest, panchromatic data search, access to large databases, and development of new meta-data standards.

An even more sophisticated approach has been taken by organisations in US and Europe that are building the necessary components and infrastructure for a Virtual Observatory (see below), resulting in a prototype standard called VOTable, relying on cutting-edge web services and directories.

The Virtual Observatory

The International Virtual Observatory will link the archives of all the world's major observatories (each of which may contain several terabytes of data) into one distributed database, with powerful tools to optimise the extraction of science from the data. As a result, data from all the world's major observatories will be available to all users, and to the public. A user can simply request some data or an image of some particular part of the sky, or perhaps the result of some operation on several data sets, and the IVO will provide the result to the user. If the data do not yet exist, the IVO may tell the user how to obtain it, or might in some cases direct a robotic telescope to obtain the data.

The concept of a Virtual Observatory is based on the fact that scientific discoveries are generated as much by use of archive data as by use of "live" observations. For example, data from the Hubble Space Telescope typically gets used four times: once by the original investigator and three times more by other astronomers accessing the HST archive. To extend this grand concept to all major observatories requires a great deal of IT development, and there is a strong effort in the US ( and Europe ( to develop this concept, funded at a level of tens of millions of dollars. Smaller efforts have been mounted in Canada and Australia, and incipient projects are under development in several other countries. The Virtual Observatory is likely to become the primary means of accessing astronomical data, with gains in productivity and cost-effectiveness of the observatories that participate in it

However, the concept of the Virtual Observatory goes well beyond merely accessing data from an observatory or data centre. It may also encompass theoretical or numerical modelling, or other processes. The real power will come when a user can perform a query on distributed databases, each of several terabytes, residing in different parts of the world, with a query that itself may require significant computational power (E.g. “give me the spectral distribution of all objects in this French database which satisfy this complex criterion on these databases in the US and Australia”). Such problems lead naturally to interesting Grid Computing challenges which have excited significant interest and investment both from commercial companies and from IT researchers.

All the Virtual Observatory groups have recently combined into the International Virtual Observatory Alliance (




Ray Norris, 25 September 2002, with acknowledgements to Ernst Raimond

Staff space