|
|
Australian Virtual Observatory
Background information
Update: ESO AVO site
Update: DoE Science Grid Documents
Astronomy is not unique in being swamped with data. The particle physics
community is building and starting to run projects that will generate
petabytes of data. This is driving some of the most ambitious
developments noted below. Earth observation science is now generating
huge databases of highly detailed measurements. In genetics, the speed
at which gene sequencing can be done now has generated a new computing
discipline: bioinformatics.
Setting up a virtual observatory is a long-term process and needs as
much thought about the future as about handling present data. In fact,
having a VO necessarily changes the way in which observing is done. Here
are a few points to consider.
- Most proposals recognize that the data needs to be calibrated before
being presented to the requester.
This puts new responsibilites on those running the telescopes, to
make sure adequate calibration data is taken and associated with the
science target observations. This is already being tackled by queue-scheduled
observatories like Gemini but
existing observatories will need to find ways to change.
- To minimise the costs of sending data across networks, each node supplying
data needs to process it locally and transmit the results. Methods of
coordinating computations across many widely-separated computers will
be needed.
The VOs must therefore have a protocol in place so they can trust
each other to run processing requests and return the results. There
is a lot of activity going on in this area, as part of the development
broadly known as Grid
Computing. Projects that have solved pieces of the problem are
the long-running Condor
project, the Parallel
Virtual Machine package and seti@home.
Good starting points to learn more about what's currently going on
in this area are the Globus project
and the Grid Forum.
- There will be cases where the query requires data to be brought to
a single machine for processing, or where one wants to visualise the
results in some rich form, such as a high-resolution movie.
These cases will require fast network connections to make the query
possible, a next-generation internet. The U.S. government started
a sizeable effort to improve their
public networks and this is yeilding interesting results. A recent
conference on gigabit networking (pdf,
1.5 Mbyte) included virtual observatories
(ps, 0.3 Mbyte) as a technology demonstrator.
See also this promotional
piece about that meeting.
Australia has long been strapped for this sort of network capacity
but that is set to change with the advent of AARNet's
research-class connection to the US & Hawaii. This link ties into
the privately-funded Internet2
and the Asia-Pacific Advanced
Network, both of which hook into the hub of the US high-speed
networks, StarTap.
- The capability to recommend a list of telescopes that could make the
observation you want will require observatories to advertise the capabilities
of their instrumentation, in a form that the request agent can understand
and use to decide whether an instrument is suitable.
Some discipline (or automation) will be required to keep this data
up to date. This probably also impacts on the way that instrument
software packages are designed. Some projects addressing this are
the memes
work at Lick Observatory and NASA's Instrument
Remote Control project (which includes development of the Astronomical
Instrument Markup Language, AIML).
Plans for VOs are afoot in Europe, the U.S. and possibly Japan. See:
In the future, virtual observatories will probably become semantic
webs of knowledge, with the data and literature intimately tied together
in a new form of digital library.
Data Grids
VOs are a subset of Data Grids. Plans for these are rapidly
advancing, some of the concepts have already been demonstrated. The Lawrence
Berkeley National Labs has been tackling some of the thorny implementation
issues. Two sample documents from that site are here: a project
proposal and a discussion of issues involved in scaling
up grids to large volumes and high reliability. Other groups are working
on Distributed Data
Mangement issues.
Acknowledgements:
|