< Previous | Contents | Next >

5. XML

5.1 A Brief Description of XML

XML stands for eXtensible Markup Language. It differs from HTML in that it is designed to describe rather the control the appearance of data. For instance, if you were publishing a paper to the web you might write the following HTML code:


<h1>The Paper Title</h1>

<p><i>Joe Bloggs</i></p>

<h2>Abstract</h2>

<p>This paper is a demonstration of HTML and XML?</p>

<h2>Introduction</h2>

<p>We begin this paper with this sentence?</p>


The above tags tell you nothing about the content of the document; only how each section should be formatted. There is nothing to distinguish the text of the abstract with the text of the introduction. Contrast this with an XML scheme:


<title>The Paper Title</title>

<author>Joe Bloggs</author>

<subtitle>Abstract</subtitle>

<abstract>This paper is a demonstration of HTML and XML?</abstract>

<subtitle>Introduction</subtitle>

<introduction>We begin this paper with this sentence?</introduction>


Here, each of the elements is clearly defined. Like fields in a database, it is now possible to selectively extract information from the document. To make a table of contents, it would suffice to list all contents between the <subtitle> tags. Likewise, the paper title, author and the abstract can be retrieved for cataloguing purposes.


The formatting of the XML output is performed using stylesheets, something that can also be done with HTML. Each tag can be given a format in the style sheet. For instance, it may be decided that the title element should be displayed with a sans-serif font of 36 points, while the author element is a serif font and in italics. Thus formatting is separated from the document and can be changed without affecting the information in the document.


The XML tags are also defined separately to the document in a Document Type Definition (DTD) file. Different types of documents can have different DTD's. The elements of a newsflash are obviously different from those in a paper. Both the DTD's and the stylesheets can be centrally managed, ensuring consistency for all authors across an organisation.

5.2 XML Implementation Issues

The preceding discussion illustrates some of the benefits of using XML for the publication of information. However, there are a number of issues that must be considered.


The use of XML involves a large amount of initial planning when compared with the existing loose HTML standards, as before any pages can be written in XML, the Document Type Definitions have to be created. That is, the tags that can be used in a document must be defined. Each type of document to be published using XML probably requires the definition of a unique DTD. Furthermore, each DTD must be carefully designed, as it may be difficult to change poorly defined documents at a later stage.


Under an XML scheme the pre-specification requirement means that authors lose the ability to publish "anything at anytime", thus reducing the flexibility of the website. This has the benefit of encouraging higher structural quality and consistency for the website, but may, in some circumstances, mean more work for the authors and reduce the quantity of information being placed online.


Another difficulty in using XML is the lack of software support for publishing documents to XML. Many documents are currently written in Microsoft Word, PowerPoint or LaTeX and then exported to HTML. Third party products may become available to resolve this problem and there are specialist tools for producing XML. One alternative is to permit some HTML documents alongside the XML files.


With the inability of familiar software to produce usable XML, document authors may be forced to learn to use new software or code the XML directly. The imposition of structure on documents will also mean a change of thinking for many authors. While the latter may be beneficial, such changes take time.


Many older browsers, Netscape 4.x included, are unable to directly handle XML documents. This necessitates the conversion of the XML into HTML, a process that is normally performed by the server. Additional complexity must be built into the web server to handle the XML processes, usually in the form of additional components.

5.3 Should the ATNF Use XML?

Despite the above caveats, the benefits of using XML are many. It ensures a focus on the information in documents rather than their appearance, which is a very important issue for a knowledge-based organisation such as the ATNF. XML appears to have gained wide acceptance as a standard and will be used increasingly both on the Internet and to exchange data between organisations and software.


The future use of XML for the ATNF website deserves strong consideration. It is already used on the CSIRO Corporate website. Like CSIRO Corporate, we will probably require a system that encompasses both XML and legacy HTML. XML is a complex topic and requires further research before implementation.

< Previous | Contents | Next >

Software
Public