Distributed Archives Interoperability
Cynthia Y. Cheung
NASA Goddard Space Flight Center
IAU 2000, Commission 5
Manchester, UK, August 12, 2000
This page contains a text-only overview of Cynthia's talk, and excludes graphics etc. Click
here to see the original slides, including graphics.
Current Status
Global Astrophysics Data Resources Loosely Connected By the Internet
Observational data archives or repositories
Derived data products (astronomical catalogs, browse images, video)
Data analysis packages
Visualization/presentation packages
Special services (bibliography, discipline-specific knowledge bases, directories)
Distributed Storage, Processing, and Management
Multispectral surveys (Data volume ~ terabytes)
Islands of Information?
Requires both Vertical and Horizontal Integration
Path to the Future
Current (connections via hyperlinks): one to one
Near Future (connections to multiple DBs all at once, via middleware): one to many
Long Term (multiple inter-connectivity, federated databases): many to many
Distributed Autonomous data centers
Intelligent Agents
User-defined Profiles and _Preferences
Access via Multiple Interfaces
Components of Interoperability
Integrated search and discovery
URL Registry (e.g., Yellow Pages, GLU, AstroBrowse)
Query processor (e.g., AMASE, ISAIA)
Browsing/visualization to support selection (ADC Data Viewer, AEQ)
Batch queries (Feed output stream of one data service to another)
Tools to support integration of results
Data and software exchange
FTP of data and software updates (pull)
Download of Browser Plug-in (pull)
Automated Updates (HST DB replication) (push)
Hybrid Techniques (with data cache or aircache) (push & pull)
Packaging of software with data (XDF)
Technical Issues and Challenges
Example: Positional correlation of objects in a region of the sky across multiple wavelengths (Radio, IR, Optical, UV, X-rays, Gamma Rays)
- Data volume and network bandwidth
- Cache of pre-computed results (e.g., astronomical catalogs)
- Data filtering at data site, ship results only
- Deployment of user code (platform independent S/W)
- Data visualization for exploration and selection
- Registration, Sensitivity, Positional Accuracy
- Coordinate transformation on a large scale
- Calibration and normalization
- Query Optimization across Multiple Sites
- Query execution plan for efficient cross-correlation
- Indexing for fast access
Semantic Interoperability
Content-based Searches
Science goal driven queries instead of SQL
Data Understanding (Domain Context)
Human Interface —> S/W Mapping —> Object-oriented Mapping
Data Annotation for Correct Interpretation
Measured parameters, units, quality, range of validity
Algorithm and calibration used, pedigree
Theoretical models applied
Data Organization
File directory structure
Database schema
Need Information in both Machine-understandable and Human-understandable form
Metadata Standards
Syntax
Directory Structure
Size, Format, Location, URL
Semantics
Usage Convention (e.g., FITS)
Extensible Standards to Encompass Different Disciplines (DTD, XML)
Astronomical Nomenclature and Designation
Conceptual Data Model
Metadata Language or Representation
FITS, ASCII, IEEE Binary
Astronomical XML
Aspects of Metadata Usage
[Ref: Bretherton & Singley 1994 Proc of 7th SSDBM, p. 166]
Search, browse, retrieval (Human)
Data extraction and interpretation
Navigate among services
Ingest, quality assurance, (re-)processing
Science product generation pipeline
Content analysis
Storage, archive (Data Management)
Information relevant for effective system design and operation
Application to application transfer (Machine)
Enable "context" interchange (distributed queries and transformations)
Need transfer language with mappings from conceptual level to different logical representation
Other Supporting Tools
Interface Standards for Software Tools
Tools for Schema Mapping
Document logical structure of database (key elements and relationship)
Mapping of local definitions into common terminology
Track changes and updates at other sites
Tools for Data Integration and Fusion
Dynamic Interface with user preferences
Intelligent Software Agents to mediate interaction
Goal:
Global query to many distributed autonomous evolving data resources