AVO

Background

Plans

Discussions

Australian Virtual Observatory
Background information

Update: ESO AVO site
Update: DoE Science Grid Documents

Astronomy is not unique in being swamped with data. The particle physics community is building and starting to run projects that will generate petabytes of data. This is driving some of the most ambitious developments noted below. Earth observation science is now generating huge databases of highly detailed measurements. In genetics, the speed at which gene sequencing can be done now has generated a new computing discipline: bioinformatics.

Setting up a virtual observatory is a long-term process and needs as much thought about the future as about handling present data. In fact, having a VO necessarily changes the way in which observing is done. Here are a few points to consider.

  • Most proposals recognize that the data needs to be calibrated before being presented to the requester.

    This puts new responsibilites on those running the telescopes, to make sure adequate calibration data is taken and associated with the science target observations. This is already being tackled by queue-scheduled observatories like Gemini but existing observatories will need to find ways to change.

  • To minimise the costs of sending data across networks, each node supplying data needs to process it locally and transmit the results. Methods of coordinating computations across many widely-separated computers will be needed.

    The VOs must therefore have a protocol in place so they can trust each other to run processing requests and return the results. There is a lot of activity going on in this area, as part of the development broadly known as Grid Computing. Projects that have solved pieces of the problem are the long-running Condor project, the Parallel Virtual Machine package and seti@home. Good starting points to learn more about what's currently going on in this area are the Globus project and the Grid Forum.

  • There will be cases where the query requires data to be brought to a single machine for processing, or where one wants to visualise the results in some rich form, such as a high-resolution movie.

    These cases will require fast network connections to make the query possible, a next-generation internet. The U.S. government started a sizeable effort to improve their public networks and this is yeilding interesting results. A recent conference on gigabit networking (pdf, 1.5 Mbyte) included virtual observatories (ps, 0.3 Mbyte) as a technology demonstrator. See also this promotional piece about that meeting.

    Australia has long been strapped for this sort of network capacity but that is set to change with the advent of AARNet's research-class connection to the US & Hawaii. This link ties into the privately-funded Internet2 and the Asia-Pacific Advanced Network, both of which hook into the hub of the US high-speed networks, StarTap.

  • The capability to recommend a list of telescopes that could make the observation you want will require observatories to advertise the capabilities of their instrumentation, in a form that the request agent can understand and use to decide whether an instrument is suitable.

    Some discipline (or automation) will be required to keep this data up to date. This probably also impacts on the way that instrument software packages are designed. Some projects addressing this are the memes work at Lick Observatory and NASA's Instrument Remote Control project (which includes development of the Astronomical Instrument Markup Language, AIML).

Plans for VOs are afoot in Europe, the U.S. and possibly Japan. See:

In the future, virtual observatories will probably become semantic webs of knowledge, with the data and literature intimately tied together in a new form of digital library.

Data Grids

VOs are a subset of Data Grids. Plans for these are rapidly advancing, some of the concepts have already been demonstrated. The Lawrence Berkeley National Labs has been tackling some of the thorny implementation issues. Two sample documents from that site are here: a project proposal and a discussion of issues involved in scaling up grids to large volumes and high reliability. Other groups are working on Distributed Data Mangement issues.

Acknowledgements: