Distributed Archives Interoperability

Cynthia Y. Cheung

NASA Goddard Space Flight Center

IAU 2000, Commission 5

Manchester, UK, August 12, 2000


This page contains a text-only overview of Cynthia's talk, and excludes graphics etc. Click

here to see the original slides, including graphics.


 

Current Status

  • Global Astrophysics Data Resources Loosely Connected By the Internet
  • Observational data archives or repositories
  • Derived data products (astronomical catalogs, browse images, video)
  • Data analysis packages
  • Visualization/presentation packages
  • Special services (bibliography, discipline-specific knowledge bases, directories)
  • Distributed Storage, Processing, and Management
  • Multispectral surveys (Data volume ~ terabytes)
  • Islands of Information?
  • Requires both Vertical and Horizontal Integration

    Path to the Future

  • Current (connections via hyperlinks):
  • one to one
  • Near Future (connections to multiple DBs all at once, via middleware):
  • one to many
  • Long Term (multiple inter-connectivity, federated databases):
  • many to many
  • Distributed Autonomous data centers
  • Intelligent Agents
  • User-defined Profiles and _Preferences
  • Access via Multiple Interfaces
  • Components of Interoperability

  • Integrated search and discovery
  • URL Registry (e.g., Yellow Pages, GLU, AstroBrowse)
  • Query processor (e.g., AMASE, ISAIA)
  • Browsing/visualization to support selection (ADC Data Viewer, AEQ)
  • Batch queries (Feed output stream of one data service to another)
  • Tools to support integration of results
  • Data and software exchange
  • FTP of data and software updates
  • (pull)
  • Download of Browser Plug-in
  • (pull)
  • Automated Updates (HST DB replication)
  • (push)
  • Hybrid Techniques (with data cache or aircache)
  • (push & pull)
  • Packaging of software with data
  • (XDF)

    Technical Issues and Challenges

    Example: Positional correlation of objects in a region of the sky across multiple wavelengths (Radio, IR, Optical, UV, X-rays, Gamma Rays)

    • Data volume and network bandwidth
    • Cache of pre-computed results (e.g., astronomical catalogs)
    • Data filtering at data site, ship results only
    • Deployment of user code (platform independent S/W)
    • Data visualization for exploration and selection
    • Registration, Sensitivity, Positional Accuracy
    • Coordinate transformation on a large scale
    • Calibration and normalization
    • Query Optimization across Multiple Sites
    • Query execution plan for efficient cross-correlation
    • Indexing for fast access

    Semantic Interoperability

  • Content-based Searches
  • Science goal driven queries instead of SQL
  • Data Understanding (Domain Context)
  • Human Interface —> S/W Mapping —> Object-oriented Mapping
  • Data Annotation for Correct Interpretation
  • Measured parameters, units, quality, range of validity
  • Algorithm and calibration used, pedigree
  • Theoretical models applied
  • Data Organization
  • File directory structure
  • Database schema
  • Need Information in both Machine-understandable and Human-understandable form

    Metadata Standards

  • Syntax
  • Directory Structure
  • Size, Format, Location, URL
  • Semantics
  • Usage Convention (e.g., FITS)
  • Extensible
  • Standards to Encompass Different Disciplines (DTD, XML)
  • Astronomical Nomenclature and Designation
  • Conceptual Data Model
  • Metadata Language or Representation
  • FITS, ASCII, IEEE Binary
  • Astronomical XML

  • Aspects of Metadata Usage

    [Ref: Bretherton & Singley 1994 Proc of 7th SSDBM, p. 166]

  • Search, browse, retrieval (Human)
  • Data extraction and interpretation
  • Navigate among services
  • Ingest, quality assurance, (re-)processing
  • Science product generation pipeline
  • Content analysis
  • Storage, archive (Data Management)
  • Information relevant for effective system design and operation
  • Application to application transfer (Machine)
  • Enable
  • "context" interchange (distributed queries and transformations)

    Need transfer language with mappings from conceptual level to different logical representation

    Other Supporting Tools

  • Interface Standards for Software Tools
  • Tools for Schema Mapping
  • Document logical structure of database (key elements and relationship)
  • Mapping of local definitions into common terminology
  • Track changes and updates at other sites
  • Tools for Data Integration and Fusion
  • Dynamic Interface with user preferences
  • Intelligent Software Agents to mediate interaction
  • Goal:

    Global query to many distributed autonomous evolving data resources
    Staff space
    Public