Report from the Working Group on Astronomical Data

 

Ray Norris, 26 Sept 2002

Overview

The management of astronomical data has become an increasingly significant issue in recent years, because of large surveys now routinely producing Terabytes of data, and because of the growing importance of www-based data centres, remote observing, data archiving, etc. There have been two dominant issues since the last GA: the Virtual Observatory, and the protection of freedom of access to large databases. The Virtual Observatory work will be describe in detail by the VO working group, but some implications have been handled by the WGAD, which will be described here. A major issue for the IAU is the proposed database protection legislation which is opposed by CODATA on behalf of the ICSU.

CODATA

CODATA is the committee on data in science and technology of the ICSU (International Council for Science), and it serves the role of coordinating data-related activities between the various scientific unions such as IAU and URSI. Given the increasing significance of data in astronomy and the threats to our freedom to use it, IAU involvement in CODATA is very important.

 

Ray Norris represented the IAU at the October 2000 meeting in Italy, and will do so again at the next meeting in Montreal, Canada, in October 2002. CODATA offers significant benefits to the IAU, particularly in terms of (a) looking after our data interests, as in the data protection legislation fiasco, and (b) cross-fertilisation with other disciplines. Because astronomy is a data-intensive discipline, the IAU should continue to play an active role in CODATA, and should perhaps explore ways to obtain even more value from CODATA.

Database Protection Legislation

There is a worldwide shift towards increased protection for intellectual property. In particular, the World Intellectual Property Organisation (WIPO) has proposed legislation with the very reasonable aim of protecting commercial databases, but with unwanted side-effects which will cause problems for open access to scientific data (see http://www.codata.org/data_access/summary.html for a summary of issues). Various groups, particularly in the US, have been arguing that such legislation needs to include clauses such as "fair use" provisions similar to those included in copyright law. Otherwise, our current freedom of access to public-domain databases (like NED, ADS, Simbad) is threatened by this. For example, in Europe, all but a few countries have now enacted legislation. Most have included "fair use" clauses, which permit free use of data for science and education. But Italy and France have no fair use provision, which means that citing data (e.g. quoting a redshift) from a journal published in France or Italy is now technically illegal unless you first obtain permission from the author or publisher.  In practice, scientists are either unaware of this or are ignoring it. If a publisher chose to enforce this legislation, data centres could be severely affected by the need to maintain a paper trail authorizing use of each item of data, and the virtual observatory could be swamped in a mass of contracts.

Astronomical Data Archive Issues

The growth of databases from large surveys, and the advent of the virtual observatory, has raised a number of issues around astronomical data. One of these is the need to ensure future freedom of access to astronomical archive data from major observatories, and a resolution to this effect will be tabled at the next IAU GA. Another is the need for the IAU to build its current widely-dispersed set of resolutions, definitions of astronomical quantities, reference frames, etc into a well-defined consistent and easily available set of rules. The WGAD has just initiated a discussion on how this might be achieved.

 

Data Formats

Until about 20 years ago astronomy was split into a number of sub-disciplines corresponding to the different wavelengths at which observations were made (optical astronomy, radio astronomy, etc). Each of these tended to use its own data formats, as a result of which it was awkward to combine data from different wavelengths. In 1981 the FITS (Flexible Interchange Transport System) format was proposed (http://fits.gsfc.nasa.gov/documents.html), and adopted enthusiastically by all sub-disciplines of astronomy. This permitted easy interchange of data between the sub-disciplines, and was largely responsible for breaking down cultural barriers between these sub-disciplines. As a result, many astrophysicists today take their data at whichever wavelength is needed to solve the astrophysical problem being addressed. This is probably partly responsible for the current healthy state of astronomy and astrophysics internationally, with new discoveries about the origin and evolution of the Universe being made at a breathtaking rate.

While FITS has been enormously and demonstrably successful, astronomical data are now seen by some as outgrowing the 1980s technology on which FITS was based. The Astronomical Data Center at NASA's Goddard Space Flight Center (http://adc.gsfc.nasa.gov/) has applied newer technology such as the eXtensible Markup Language (XML) to solve astrophysics data interchange issues. These include automated data ingest, panchromatic data search, access to large databases, and development of new meta-data standards.

An even more sophisticated approach has been taken by organisations in US and Europe that are building the necessary components and infrastructure for the Virtual Observatory, resulting in a prototype standard called VOTable, relying on cutting-edge web services and directories.

 

Staff space
Public