Report to the 24th CODATA General Assembly
on data activities in the International Astronomical Union

Compiled and Edited by Ray Norris, September 2004.

Abstract

This report describes the current status of data handling in astronomy, including the growth and development of interoperable terabyte data archives, and summarises the vigorous current activity in the international development of the Virtual Observatory. It identifies the need for the IAU to develop a strategic framework for data management in astronomy, with recommendations to guide and assist individual observatories and organisations. It recommends that the IAU should be proactive in working with the ICSU and CODATA, both to participate in the ICSU framework, and to bring that experience to the development of an astronomical framework.

 

Sections 1-4 of this report give background and describe the activities since the last CODATA General Assembly. Section 5 discusses the issue of Open Access and the need for a strategic framework for data management.

1. Introduction and Background

1.1 The International Astronomical Union

The International Astronomical Union (IAU) promotes, safeguards, and co-ordinates world-wide astronomy. Its membership includes 9100 professional astronomers and 70 adherent nations. Its activities are divided amongst 12 Scientific Divisions and, through them, 37 specialised Commissions. General Assemblies, held every three years, review the status of worldwide astronomy, and define the IAU’s long-term policy, which is implemented by the Executive Committee. Further details may be found at the Union’s web-site (http://www.iau.org).

Within the Astronomical Union, one commission (Commission 5) is primarily concerned with the management of documentation and scientific data. Working groups and task groups within Commission 5 deal specifically with Astronomical Data, Designations, Nomenclature, Libraries, FITS (Flexible Image Transport System), Virtual Observatories, and the Preservation and Digitization of Photographic Plates.

1.2 Data Handling in Astronomy

Astronomy has always been data-rich and data-sophisticated, best exemplified by the data centres and electronic publishing which arrived well in advance of similar developments in most other disciplines. Astronomy is also at the forefront in building links between distributed, heterogeneous systems, thanks to exchange standards such as FITS for data, and the Bibcode for the description of bibliographic references.  Partnership between data providers, data centres and journal publishers have contributed to this success. All major refereed astronomical journals produce an electronic edition in addition to a paper version. Abstracts of articles in all leading journals are available electronically on ADS (http://adsabs.harvard.edu/abstract_service.html), which offers facilities for searches by various criteria. Preprints are also routinely published on http://xxx.arxiv.org/archive/astro-ph and other specialist web sites, and are also indexed by ADS.

1.3 Data Formats

Until about 20 years ago astronomers were hindered by the way the subject was divided into sub-disciplines corresponding to the different observational wavelengths (optical, radio, etc.).  Each sub-discipline was wedded to its own data formats, as a result of which it was awkward to combine data from different wavelengths. In 1981 the FITS (Flexible Image Transport System) format was proposed (http://fits.gsfc.nasa.gov/documents.html), and adopted enthusiastically throughout astronomy. This permitted easy interchange of data between the sub-disciplines, and was partly responsible for breaking down cultural barriers between them. As a result, many astrophysicists today take their data at whichever wavelength is needed to solve the astrophysical problem being addressed. This is probably partly responsible for the current healthy state of astronomy and astrophysics internationally, with new discoveries about the origin and evolution of the Universe being made at a breathtaking rate.

FITS has continued to evolve and develop, and is now one of the underpinning data formats for the Virtual Observatory.

2 The Virtual Observatory

2.1 Introduction

The vision of the Virtual Observatory (VO) is to make access to astronomical databases as seamless and transparent as browsing the World Wide Web. It will federate the data flows from current and future facilities and large scale surveys, and the computational resources and new tools necessary to exploit them. This requires both technological developments and an international commitment to standardisation. Increasingly, it will alter the way that astronomers do science, and the way that future facilities and projects plan for the management and scientific exploitation of their data.

There are currently fifteen VO projects worldwide, which co-ordinate their efforts through an International Virtual Observatory Alliance (IVOA: http://www.ivoa.net). This body evolves and agrees on technical standards as well as sharing best practice and software, wherever possible adhering to other international standards. It is widely regarded as a strikingly successful example of international co-operation, and was highlighted as such in the recent OECD report on future large scale facilities in astronomy. The various VO projects have already laid the foundations for the IVOA (international standards, fundamental infrastructure, early demonstrations, and the first published science papers using VO tools) and we expect in the next few years to see the VO become a working reality. The next stages will involve

(i)                  deployment of the new infrastructure  at data centres,

(ii)                science results using VO tools,

(iii)               making links to existing and  planned facilities,

(iv)              much more ambitious data mining analysis services, and

(v)                exploiting the emerging high-performance networking and computational services comprising the 'Grid'.

Formed in June 2002, the IVOA has already fostered the creation of a new international and widely accepted, astronomical data format, VOTable, which builds upon the industry-standard eXtensible Markup Language (XML). Following developments in the grid community (e.g. http://www.gridforum.org/), it has set up technical working groups devoted to defining standards for service registries, content description, data access, data models and query languages. These new standards and technologies are being used to build science prototypes, demonstrations, and applications, many of which have been shown in international meetings in the past two years.

2.2 Background

The cost of developing forefront astronomical research facilities often greatly exceeds the funding capacities of individual universities, research organizations, and nations (e.g., Atacama Large Millimetre Array [ALMA], The Square Kilometre Array [SKA], and Extremely Large (optical) Telescopes [ELTs]). Collaborative alliances of organisations and nations are therefore being formed to build new, facility-class astronomical observatories around the globe. This expansion and globalisation of the astronomical research effort raises a number of major issues that must be confronted and solved by astronomers, research funding bodies, and governments. These issues include:

-         How will global astronomical research projects meet the data volume and computational challenges associated with tackling forefront research problems?

-         How will globally distributed teams of researchers be able to share data, from across the electromagnetic spectrum, originating from multiple new facilities? How will they work collaboratively with common resources, and finally publish their discoveries and achievements in a manner that can be used by others?

-         How will new observatories with facility-class instruments provide the maximal scientific return on the investment of global public funds that created them?

Some of these issues are being addressed by other sciences, and some are unique to the research diversity inherent in exploring the Universe through multiple, complementary wavelength windows. In each case, the challenge of managing, maximally utilizing, and collaboratively sharing the huge volume of digital information focuses and guides the discussion of critical issues for success.

2.3 A Path Forward

In 2001-2, several independent groups of astronomers from around the world began to grapple with some of the difficult questions raised above. These groups have now coalesced, and what is emerging is a collective vision of the path forward. This vision centres on creating a new global astronomical research infrastructure called the Virtual Observatory.

The power of the World Wide Web is its transparency - it is as if all the documents in the world are inside your PC. The idea of the Virtual Observatory (VO) is to achieve the same transparency for astronomical data. In the VO, all the world's data are available from your desktop. All archives understand the same query language, can be accessed through a uniform interface, and diverse data can be analysed by the same tools. A central goal is democratisation: the power the scientist has at her fingertips should be independent of location, or whether she is located in a developing nation or a wealthy nation. Such an infrastructure will also enable “collaboratories”, which are informal distributed research teams sharing data, workflows, and analysis results in a transparent virtual storage system.

Transparency is also a goal of computational Grids, where a set of distributed computers functions like one supercomputer on your desktop. The VO concept can be seen as a domain-specific example of a data grid. However it goes one step further, as what is offered is not just access to the data, but also operations on the data and returned results that are essential for their full exploitation. Today such analysis is done by end-users after downloading data. In the future, the normal mode will be for such calculations (many of which are quite standard) to be data services offered by the expert data centres holding the data. These operations then also need to be standardised to be compatible across many archives. The result is a service grid.

The VO will not be a monolithic system, but, like the Web, will be a set of standards that make all the components of the system interoperable: data and metadata standards, agreed protocols and methods, and standardised mix-and-match software components. These standards and software modules constitute the VO Framework. To achieve the whole vision, however, data centres, software developers, and facility builders all need to accept the new framework and work within it. Five strands of work are needed:

-         Development of standards and protocols, and their international agreement.

-         Construction of "glue" software components: portal, registry, workflow, user authentication, virtual storage.

-         Adoption by data centres, who need to "publish" to the system, i.e., to write VO compliant data services connected to their holdings.

-         Construction of tools to do science with the data.

-         Establishment and maintenance of resource registries and user support systems.

The VO concept has a high priority in most national astronomy programmes, and large organizations such as ESA, NASA, ESO, and NSF have recognised the strategic importance of the VO. The community itself, as well as its political leaders, has made its interest clear over the last two years through a number of dedicated VO conferences and workshops and special sessions at large general meetings, such as the General Assembly of the International Astronomical Union. Some of this drive comes from the widespread interest in grid computing and e-science, but it mainly comes from awareness of the imminent data flood, and the constantly rising expectations of astronomers concerning the quality and power of web-based tools. The general feeling of most astronomers is that something like this simply has to happen: they are very keen that it happens in an organised and professional fashion.

The VO must naturally compete in a limited funding environment against initiatives to build new telescopes and instrumentation. Since VO infrastructure and compliance comes at very low cost compared to the physical infrastructure of telescopes and instrumentation, VO capabilities are increasingly being built into the very design of new astronomical facilities, where the differential cost is minimal and the benefits are immense.

2.4 Making It Happen

In June 2002, at an international VO meeting in Munich (co-sponsored by ESO, ESA, NASA, and NSF), the IVOA was formed. This alliance seeks to ensure that the essential VO infrastructural technologies and interoperability standards are developed to enable a VO capability on a global scale. IVOA is also participating in the Global Grid Forum (GGF), making the VO effort a research group of the GGF. This ensures that astronomical requirements are discussed at the GGF and that the VO provides feedback to the GGF community on prototype Grid middleware. A working group has also been created within IAU Commission 5 to examine and endorse proposed IVOA standards in much the same way FITS has been endorsed by the IAU as an international astronomical data format over the past 20 years.

The IVOA has created working groups and interest groups to pursue the discussion and definition of IVOA standards and mechanisms. To facilitate the global advancement of working group activities, the IVOA organizes Interoperability Workshops twice a year at which all working groups discuss progress, and work towards final definitions of standards. The process of defining standards within the working groups is described by a standards process (http://www.ivoa.net/Documents/REC/DocStandard) modelled on the W3C (World Wide Web Consortium) process.

At its January 2003 meeting the IVOA identified six major technical initiatives necessary to make progress toward the scientific goals of the IVO.

-         Registries. Registries function as the “yellow pages” of the Virtual Observatory, collecting metadata about data resources and information services into a queryable database. But, like the VO resources and services themselves, the registry is also distributed. Replicas will exist around the network, both for redundancy and for more specialized collections. The VO projects are investigating a variety of industry standards for implementation of registries, including the Open Archive Initiative (OAI) developed in the digital library community. Registry metadata are using the Dublin Core definitions, also developed for the library community, wherever possible.

-         Data Models. Although the international astronomy community has long agreed on a common format for data, the FITS standard, there are many variations in which metadata can be encoded in FITS files, and many options for storing associated data objects (a spectrum, its wavelength scale, and its variances, for example). FITS is a syntactic standard, not a semantic standard. The VO data models initiative aims to define the common elements of astronomical data structures and to provide a framework for describing their relationships. Data models will allow software to be designed to operate on many data storage variants without needing to modify the source data structures.

-         Uniform Content Descriptors (UCDs). The CDS (Centre de Données astronomiques de Strasbourg)  pioneered the development of UCDs in order to make semantic sense of its large collection of astronomical catalogues and tables. Among the tens of thousands of column names in its collection, they found that there were only about 1500 unique types of content. Astronomers are creative, having found some 250 labels for a Johnson V magnitude! UCDs will provide a lingua franca for metadata definitions throughout the VO. The next step (UCD1+) aims at building a more coherent and expandable set of content elements, with a reduced number of "atomic" UCDs

-         Data Access Layer (DAL). Building on the VO data models and UCDs, the data access layer provides standardized access mechanisms to distributed data objects. Three initial prototypes for the DAL have been developed so far: a ConeSearch protocol, a Simple Image Access Protocol, and a Simple Spectrum Access Protocol for spectra and time series. The ConeSearch returns catalogue entries for a specified location and search radius on the sky, and the Simple Image Access Protocol returns pointers to sky images given similar selection criteria. Work is underway to extend the DAL to other data types and to enable legacy software systems to incorporate DAL interfaces.

-         VO Query Language. The many and distributed databases of the VO need a standard query language. Although SQL (Structured Query Language) can be used to query most modern astronomical databases, it has limitations in areas fundamental to astronomical research, such as region specifications on the sky. A superset of SQL, called the Astronomical Data Query Language (ADQL), has been implemented, and supports access to tabular information and integrates the simple image and spectrum access protocols.

-         Grid & Web Services. The VO is a service grid which means there are nodes where sophisticated tools are located together with data collections. Those tools are wrapped into web services and therefore standard web service interfaces are in preparation. Further aspects are asynchronous messaging methods, single sign-on authentication and workflow management. In order to combine data excerpts from different services storage nodes need to be set up (VOSpace). Automatic processes as well as human users will allocate scratch space that is visible on the web for exchange of data with other processes or collaborators.

-         VOTable. The first international agreement reached by the VO projects was VOTable, an XML mark-up standard for astronomical tables. The heritage of VOTable comes from FITS, the CDS Astrores format, and the industry-standard eXtensible Mark-up Language. The data access layer ConeSearch and SIAP (Simple Image Access Protocol) services return results in VOTable. VOTable software libraries have been developed in Perl, Java, and C++, and VO India has developed a general purpose VOPlot program in Java for data display. VOTable has been in use for 2 years, and the IVOA is now looking into what enhancements or extensions might be necessary.

The detailed status of IVOA working group progress towards definition of standards in these areas can be found in the IVOA document repository: http://www.ivoa.net/Documents/latest/ and in the individual working group pages: http://www.ivoa.net/twiki/bin/view/IVOA/WebHome.

A number of the VO projects are using science prototypes, or demonstration projects, to help guide technical developments of IVOA standards and show the user community the benefits of the federated archives, catalogues, and computational services. These demonstrations and other technical achievements are outlined in the following summary of individual VO project highlights.

2.5 IVOA Roadmap

In 2002, the IVOA published a development roadmap (http://www.ivoa.net/pub/info) to highlight the essential steps necessary for the implementation of a VO with global reach. The key elements of the roadmap were coordinated demonstrations of new functionality following well-defined science cases and the roll-out of the necessary interoperability and technological standards. The roadmap is now being refined to take into account the standards developed and the experience gained through demonstrations and the working group activities. The aim is to enable demonstration in January 2005 that show full international VO functionality via interoperating registries. At that point, the international VO effort will be ready to transition from VO R&D to VO take-up by participating data and service providers. This will lay the groundwork for VO-enabled science on an international scale.

The work of the IVOA will continue actively until the essential elements of the new astronomical research infrastructure are agreed, developed, and tested. At that time, the community of data centres and data providers will have the enabling technology for them to become VO publishers of data and services. The VO will be this collection of collaborating, interconnected, uniformly accessible data and service providers. Research teams and individual astronomers will be able to design tools and run projects within the VO and to return VO-enabled scientific results and data products to the broader community. The IVOA’s long-term role will be to preserve and guide the development of VO standards, to act as a clearing-house for those standards, and to act as a forum to promote the empowerment of global astronomy through appropriate standards and technologies.

3. Other Developments in 2002-4

3.1 Astronomical Data Centres

The international and the major national and local data centres are the repositories for thousands of catalogues of astronomical objects and for many data tables from journal articles. In order to serve different geo­graphical communities optimally, well maintained copies of frequently requested data (such as those of the Hubble Space Telescope) reside at more than one centre. Several such Data Centres exist, and are located throughout Europe, the USA, Canada, Russia, and the Asia-Pacific region. Of the three most widely-used data centres, two (CDS and NED) continue to develop and flourish, but one (ADC) was shut down in 2002 as the result of NASA funding cuts.

3.1.1 Centre de Données astronomiques de Strasbourg, France (http://cdsweb.u-strasbg.fr/)

The Centre de Données astronomiques de Strasbourg (CDS), established in 1972, continues the development of major on-line reference services, which are widely used by the world-wide astronomy community:

-         SIMBAD, the reference database for the nomenclature and bibliography of astronomical objects

-         VizieR, which integrates catalogues, tables published in journals, and very large surveys

-         Aladin, which allows users to overlay images, catalogues, and data from SIMBAD and NED. The CDS is also very active in the definition and maintenance of standards and tools for the general astronomy community.

CDS has collaborated for a long time with other major information producers, data centres, journals, and the ADS, to develop a network of astronomy on-line resources. In recent years, it has been very active in the development of the IVOA, both as a partner of the European Astrophysical Virtual Observatory project, and in collaboration with other projects. CDS has developed techniques for accessing distributed, heterogeneous resources, using the emerging VO standards, and Aladin has been used as a prototype VO portal in several VO demonstrations. The first international Interoperability Working Group was established under CDS leadership in 2001 within the European OPTICON Network, and the first VO Interoperability meeting, which set the foundation of the first VO standard, VOTable, was organised by CDS in January 2002. Since then CDS has organized a further IVOA meeting in October 2003 and it continues to play a major role in the development of VO standards.

3.1.2 NASA/IPAC Extragalactic Database, Caltech, USA (http://nedwww.ipac.caltech.edu/)

The NASA/IPAC Extragalactic Database (NED) has been freely providing data on extragalactic sources for over 15 years. The database covers all frequencies from radio to gamma rays, and includes positions, redshifts, photometry and spectral energy distributions, sizes, and images. NED currently responds to over 2.5 million web hits every month from research astronomers and the general public around the world. The database now contains over 7 million objects known and recognized by over 11 million names. It provides access to 21 million photometric measurements, 2.4 million references to 57,000 published papers, 435,000 redshifts, 2 million images, 60,000 notes and 33,000 on-line abstracts. An extensive selection of object diameters will soon be added to the database as searchable attributes.

NED services have traditionally supplied data in HTML format for connections from Web browsers, and a custom ASCII data structure for connections by remote computer programs written in the C programming language.  By late 2004 NED will offer new services that provide responses from NED queries in XML documents compliant with the international virtual observatory VOTable protocol.  The XML/VOTable services support cone searches, all-sky searches based on object attributes (survey names, cross-IDs, redshifts, flux densities), and requests for detailed object data. Initial services have been inserted into the NVO registry, and others will follow soon. The first client application is a Style Sheet specification for rendering NED VOTable query results in Web browsers that support XML.  The new XML/VOTable output mode will also simplify the integration of data from NED into visualization and analysis packages, software agents, and other virtual observatory applications.

3.1.3 Astronomical Data Center (http://adc.gsfc.nasa.gov/)

On 1 October 2002, NASA terminated all support for the Astronomical Data Center (ADC) at the Goddard Space Flight Center and routed ADC users to other sites that perform similar services. For 25 years, the ADC was a key centre for published astronomy data, catalogues, and journal tables. The ADC made these data sets computer readable and freely accessible to the astronomical community. It was also at the forefront of developing new methods, tools, and techniques for their preparation and use.  An example was its leadership in leveraging XML-related technologies to improve access to astronomical data. The ADC represented the U.S. in exchanging data and collaborating on techniques and standards with data centres in other countries. 

Among the reasons given for the closure of ADC were:

(a)    budgetary constraints;

(b)   availability of data from similar international data services, and

(c)    availability of electronic tables from journal publishers. 

These are issues that any astronomical data centre and service must grapple with. Other underlying reasons may have been:

(d)   the transformation of its parent organisation (NASA) into a highly focused mission-directed entity, reducing the ADC’s priority in budgetary tradeoffs, and

(e)    a lack of understanding among decision makers of the difference between data that are simply in electronic form and those that are truly accessible. 

The legacy of ADC will live on in the Virtual Observatory. The ADC-developed XML data sets and applications continue to support the IVOA in its high-level queries and science demonstrations.

3.2    Cartographic Coordinates for comets and asteroids

A significant change was made by the IAU's Working Group on Cartographic Coordinates (which is also a Working Group of the International Association for Geodesy) that is relevant to several data systems. The working group has adopted a new standard for defining coordinates for small bodies of the solar system (asteroids and cometary nuclei) but NOT for the bodies for which there is a lengthy historical precedent, namely planets and satellites.  The new standard recognises that the axis of rotation can move significantly relative to surface features of the body, and so introduces a stable and consistent definition of the axis of such an object in terms of its moment of inertia. All cartographic  coordinates for small bodies then follow the right hand rule based  on this axis (planets and satellites do not consistently  follow this convention).  This will directly affect NASA’s Planetary Data System and ESA's Planetary Science Archive, who were widely consulted before the decision was made.  A report describing the new standard is available on http://astrogeology.usgs.gov/Projects/ISPRS/PREPRINTS/index_preprints.html.

3.3 The Preservation and Digitization of Photographic Plates                                    (PDPP) ( http://www.lizardhollow.net/PDPP.htm )

 Astronomy possesses a hugely valuable reserve of heritage data - about 3 million photographic observations - that have accumulated in plate archives since the late 19th century. However, only a few of those data are accessible in digital form, and most are therefore inaccessible to most potential users. Little attention has been paid to the salvage of historic material, and expertise and equipment have become lost. A rescue is now becoming urgent, as plate archives face loss and deterioration through ageing, disasters, and ignorant destruction.

 

The PDPP Task Force was created in 2000 to promote this rescue mission. It acts as an advisory body and watchdog, and issues an annual Newsletter with reports and comments from groups and individuals around the world who are involved in in-house projects to digitize or catalogue plates. It sponsored specialist meetings in 2003 and 2004, and regional workshops are planned for the near future.

 

In projects to digitize plates, high priority will be given to those having identifiable scientific benefits, whilst weighing plate quality (particularly grain size) against length of time base. For spectra, priority will be given to material showing greatest detail (i.e. Coudé plates), spectra of variable objects and spectra for studying the Earth's atmosphere.  For direct plates, emphasis is on orbital refinements (particularly for asteroid detections), and time-sensitive studies such as pre- and post-supernova events, astrometry and long-term photometry. Priority will also be accorded to individual requests for specific observations.

 

The fully reduced data will be made available worldwide via the Internet. Historic data complement, and add value to, data obtained with modern instruments, and will substantially increase the scope of the VO by adding a significant time dimension. They will be invaluable both for stand-alone research, and for analyses whose solutions hinge critically upon the inclusion of these unprecedented time-spans of 50+ years.

 

Because the PDPP Task Force is a community service, it is difficult to identify funding sources with sufficiently broad remit and long-term view to fund the necessary equipment on an international or even national basis. An astronomical photograph cannot be digitised to the required precision and accuracy with a desktop scanner, but requires specialised equipment. So far, individual groups at observatories in Italy, the Vatican, Belgium, and the USA have digitized, or are digitizing, selected in-house collections. The US National Science Foundation has recently awarded significant grants to three projects: 100+ years of solar sunspot images from Mount Wilson, the re-analysis of photographic solar polarimetry from a 50-year collection, and the construction of a high-speed scanner to digitize the entire plate archive at Harvard, the world's largest (650,000). 

 

Nevertheless, those are still individual initiatives and leave many other worthy collections untouched. The broader vision is to establish scanning laboratories for "direct plates" (sky images) in Brussels (at the Royal Observatory) and in the USA (at the Pisgah Astronomical Research Institute, PARI), and one for spectra in Canada (Dominion Astrophysical Observatory).

3.4    Access to data tables in published journals

Since most journals now publish in electronic form, it may be expected that all data-rich articles (containing data on more than about 50 objects) would automatically enter the CDS archive or NED. However, of the electronic tables of 1500 such articles collated from the literature by H. Andernach, less than 50% appear in the CDS archive. The problems of making accessible this collection, and eventually the data content of the entire astronomical literature, are:

  • lack of manpower for writing metadata,
  • non-standard data formats in the electronic publication,
  • inadequate nomenclature for astronomical objects used by authors,
  • the need to scan articles from older (non-electronic) journals.

To make the data in journals available to the data centres and the virtual observatory, standards for presentation of tables in journals need to be established and adhered to. While some journals are already making excellent progress in this direction, others are not. This task will be amongst those addressed by the data framework discussed below.

4 Open Access to Astronomical Data

4.1 Background

Because the advance of astronomy frequently depends on the comparison and merging of disparate data, it is important that astronomers have access to all available data on the objects or phenomena that they are studying. Astronomical data have therefore always enjoyed a tradition of open access, best exemplified by the data centres (Section 3.1 of this report) which provide access to data for all astronomers at no charge.

 

There exist a number of exceptions to this open access tradition, including (but not confined to):

  • Short-term protection of data by national facilities, to enable observers and instrument builders to have sole use of the data that they obtained for a limited time (typically 1-2 years),
  • Normal copyright laws for publications, which allow open access to data and information contained within them, and use of text or diagrams for research purposes,
  • Private observatories which do not have facilities or policies for disseminating their data except through normal publication channels,
  • Historical data recorded in non-electronic form (e.g. photographic plates) which have not been converted to electronic form and therefore cannot easily be disseminated,
  • Tables in journals that are not available in a standard format and so cannot easily be disseminated by the data centres (See section 3.4 of this report).

In addition, not all major facilities provide open access to their archive data. A first step at promoting open access to data was taken by adopting the resolution described below.

4.2 Resolution at 2003 IAU GA

The 2003 General Assembly of the IAU adopted the following resolution. It should be noted that it applies only to data taken by publicly-funded observatories, and only to archived data.

 

Resolution: Public Access to Astronomical Archives

 

The General Assembly of the International Astronomical Union

 

Recognising

 

1        That scientific advances rely on full and open access to data.

2        That it is in the interests of astronomy generally that archive data be made as widely accessible as possible, and that the technology exists via the world-wide web to do so cheaply and effectively

3        That the development of the Virtual Observatory will enable effective use to be made of such archives, thus increasing the effectiveness and scientific return of astronomical research

 

Considering

 

1        That access to observing time on major astronomical facilities is sometimes necessarily and legitimately restricted for funding or other reasons,

2        That after data have been obtained on such a facility, that access to such data is often necessarily and legitimately restricted for some period (the “proprietary period”, typically of one to two years), to the observer, students, instrument builder, or other defined groups, so that they may have a reasonable opportunity to publish their results, and thereby capitalise on their investment of time and resources put into the observations,

3        That in many cases, after this proprietary period the data are placed in a data archive where they are made more widely available

 

Recommends that

 

1        Data obtained on publicly-funded Major National or International Astronomical facilities should, after a reasonable proprietary period in which they are available only to observers or other designated users of the facility, be placed in an archive where they may be accessed via the internet by all research astronomers. As far as possible, the data should be accompanied by appropriate metadata and other information or tools to make them scientifically valuable.

2        Such data should not be subject to intellectual property rights. The form in which data are made available, and the subsequent processing of such data, may be appropriately protected by copyright laws, but the fair usage (including educational purposes) of the archive data themselves should not be subject to restrictions.

3        Funding agencies provide encouragement and support to enable data produced by astronomical research that they fund to be deposited, after some proprietary period as defined above, in recognized data archives, which provide unrestricted access to these data.

4.3 WIPO legislation

Our current freedom of access to public-domain astronomical databases was threatened in 2000-2 by poorly-conceived treaties and legislation proposed by WIPO (World Intellectual Property Organisation), the European Union, and other bodies. At present, the threat appears to have diminished, thanks to quick and effective action by members of CODATA and other bodies, but we need to remain vigilant that the threat does not re-appear. A lesson learnt from this is that the science community needs to ensure that its data needs are better articulated and understood, which in turn underscores the need for an effective data management framework.

4.4 Berlin declaration

In October 2003, a conference took place in Berlin on “Open Access to Knowledge in the Sciences and Humanities". This resulted in the “Berlin Declaration” (see

http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html), which was subsequently signed by many scientific institutions and organisations. The IAU is not currently a signatory to the Berlin Declaration, although the Declaration is well-aligned with the spirit of open access that prevails in astronomy. The implications of the Declaration for astronomy are currently being examined, and, if appropriate, a recommendation will be made to the IAU Executive by the end of 2004.

4.5  ICSU Review on Scientific Data and Information

The International Council of Science (ICSU) has set up a panel of independent experts to perform a Priority Area Assessment (PAA) on Scientific Data and Information. This panel was charged with assessing the strategic issues in this arena and reviewing ICSU’s current activities. Their report recommends that ICSU assume an international leadership role in identifying and addressing critical policy and management issues related to scientific data and information, and that it create a new global framework for data and information policy and management. ICSU played a seminal role in the 1980’s and 1990’s in establishing an interdisciplinary and internationally coordinated research program on global environmental change. The report argues that there is now a need for an equally strong ICSU role in establishing an international infrastructure and capacity for scientific data and information management and access.

4.6 World Summit on the Information Society

Recognising the growing impact of information and data on our society, the International Telecommunications Union (ITU) resolved in 1998 to hold a World Summit on the Information Society (WSIS) and place it on the agenda of the United Nations. This was endorsed by the UN General Assembly (Resolution 56/183). The Summit is taking place in two phases: 10-12 December 2003, in Geneva, and 16-18 November 2005 in Tunis.

The Geneva meeting resulted in a declaration (see http://www.itu.int/wsis/documents/) which includes the following principles:

  • The sharing and strengthening of global knowledge for development can be enhanced by removing barriers to equitable access to information for economic, social, political, health, cultural, educational, and scientific activities and by facilitating access to public domain information, including by universal design and the use of assistive technologies.
  • We strive to promote universal access with equal opportunities for all to scientific knowledge and the creation and dissemination of scientific and technical information, including open access initiatives for scientific publishing.

5. Towards a Strategic Framework for the Management of Astronomical Data

Several very vigorous and effective groups in astronomy (e.g. the VO and the data centres) are individually achieving ambitious goals in the area of data management and handling. However, between and outside these active groups are gaps in which data management is neglected or dealt with in an ad hoc way. Astronomy does not have any strategic data framework that links these activities together, provides policies or guidelines for astronomical data management, or is able to represent the interests of astronomical data management to external parties. As a result

  • We are vulnerable to external threats such as the WIPO (World Intellectual Property Organisation) legislation (Section 4.3),
  • We are unable to represent astronomical data requirements in a coordinated way to external groups, such as ICSU, funding agencies, or journal publishers,
  • There is no uniform approach across astronomy to preservation and dissemination of data,
  • While some groups in astronomy adopt a professional approach to data management, others treat it as an afterthought, or neglect it completely, so that astronomy as a whole loses value,
  • There is poor coordination between astronomy and other disciplines, and poor recognition in other disciplines of the data needs and strengths of astronomy

 

IAU Commission 5 therefore proposes to develop a strategic framework for data management in astronomy, with recommendations to guide and assist individual observatories and organisations, and encouraging principles of open access as far as possible. It will do so in close liaison with the IVOA, which can provide the tools and infrastructure for facilitating this process. Recognising that the ICSU is also engaging in a similar activity across all sciences (Section 4.5), the IAU should be proactive in working with the ICSU, both to participate in the ICSU framework, and to bring that experience to the development of an astronomical framework.

 

Elements of achieving this goal include:

  • Active participation by IAU in ICSU and CODATA discussions
  • Email/wiki discussions within IAU to reach broad agreement on the way forward
  • Close collaboration with the IVOA on developing requirements for implementing these strategies within the VO
  • An open meeting at the Prague IAU GA in 2006, at which a draft framework will be debated
  • A resolution to be proposed at the IAU GA in 2006 for the IAU to adopt and develop the data management framework.

 

As an outcome of these measures, we expect that the next few years will see a vigorous growth in the availability and inter-operability of astronomical data, resulting in even more cross-fertilisation and idea generation in astronomy.

Acknowledgements

I thank the following contributors to this report: M. A’Hearn, H. Andernach, B.Archinal, W. Brouw, C. Cheung, F. Genova, E. Griffin, R. Hanisch, A. Lawrence, B.Madore, F. Murtagh, G. Oertel, M.Schmitz, K. Seidelmann, and P. Uhlir. Much of the material describing the VO was taken from a paper by P. Quinn, D. Barnes, I. Csabai, C. Cui, F. Genova, R. Hanisch, A. Kembhavi, S.C. Kim, A. Lawrence, O. Malkov, M. Ohishi, F.Pasian, D. Schade, W. Voges.

Appendix: Astronomical Data-related Meetings in 2002-4

2002

2003

2004

Staff space
Public