XML, is it the next great Web technology?
The following discussion will give a general overview of this new language and its uses on the web.
XML is a new language that is just coming into use. The W3C XML Working Group has been putting together the
specifications
since the spring of 1997. XML, unlike HTML is not a fixed language. It allows authors to create their own tags as necessary for their document(s). The other great advantage with XML is that when using an XSL style sheet and an XML processor the XML document can be converted to .RTF, .HTML or .TXT formats. The web is simply the method used for distribution of the documents and not necessarily only for viewing.
| HTML | XML |
| <CENTER>Apollo<CENTER> | <GREEKGOD>Apollo</GREEKGOD> |
The above table shows the advantages of XML. With the HTML markup all that the reader knows is that the word 'Apollo' is to be centered in a page. It does not describe what the purpose of the word is or what it means. Whereas the XML markup describes the meaning of the word. This form of markup is advantageous for authors in specialized fields. It allows authors to markup their data according to the meaning. This allows them to better communicate the relevance of their content.
XML is a subset of SGML (Standard Generalized Markup Language). But unlike SGML is fairly simple to learn. It is not related at all to HTML other than that HTML is a single document type of SGML.
Unlike HTML, XML is not a language that tells a browser how to display a document. To display the page the author must first create a style sheet where each tag that is being used is defined for the purpose of layout. The style sheet language that will be used with XML is called XSL. The other requirement for XML is a DTD or Document Type Definition. The DTD describes the relationship between the tags and how they can be structured.
XSL allows authors to apply formatting to the XML elements. In other words the <GREEKGOD>Apollo</GREEKGOD> element can actually tell the browser to center the text, display it in red and use a specific font face and size. I will cover XSL in more detail later in this work.
The first step in creating a well formed XML document is making the document declaration. Without this a browser might not know how to display the page.
Below are XML DTD's that can be used -
|
<?XML VERSION="1.0" STANDALONE="YES"?>
|
This DTD is for well formed XML |
| <?XML VERSION="1.0" STANDALONE="NO"?> | This DTD is for use when writing valid XML. |
| <?XML VERSION="1.0" STANDALONE="NO" ENCODING="UTF-8"?> |
This DTD is the default. |
Here you create a Root Element that will contain your entire document. All document types must have this. The root element in HTML is the <HTML></HTML> element.
Decide on a topic for your document and choose a root element that will describe your document in the broadest general terms. For example, if you are putting a novel on the web a good root element would be <NOVEL> </NOVEL>
Now, below is what we have created so far -
|
<?XML VERSION="1.0" STANDALONE="YES"?>
<NOVEL>
<TITLE> </TITLE>
|
Now comes the fun part - writing the page. Write your document, creating your own custom tags as you go. Each time that your content changes significantly from previous content make up a new tag.
|
<?XML VERSION="1.0" STANDALONE="YES"?>
<NOVEL> <TITLE> </TITLE> <CONTENTS>Table of Contents</CONTENTS> <CHAPTER>One</CHAPTER> <PAGE>one</PAGE> <CHAPTER>Two</CHAPTER> </NOVEL> |
Parsing your document is simply checking for errors. The rules for XML are specific in one respect, all elements must have a closing tag. HTML allows tags that are empty but XML does not.
For example - the following is valid HTML
<P>
or
<IMG SRC="">
But it is not acceptable for XML. The following would be the requirements for XML
<P> </P>
<IMG SRC=""/>
In the empty tags, like the <IMG>, you can place a slash at the end of the tag and this will close it. All other tags must have a closing tag, such as the <P> and <BR> tags
The first step in writing valid XML is making your Document Type Declaration (or DTD). In the previous example of well formed XML we used the standalone declaration. For Valid XML you must use this version -
<?XML VERSION="1.0" STANDALONE="NO"?>
This informs the browser that it must retrieve a DTD before it can display your document.
The DTD is the foundation of valid XML that is borrowed from SGML. The DTD contains the information necessary for the document reader to know the structure of your page. In other words, it lets the browser know which tags are allowed, how the can be nested, if at all and whether multiple instances of a tag are allowed. It is the same as the HTML DTD, for example, the HTML DTD only allows one occurence of the <BODY> tag within a document. Valid HTML also must have a DTD as the first line in the file.
Below is a DTD for an HTML file -
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Final//EN">
<HTML>
|
Before you begin writing a DTD you must have a clear idea of what you are going to be writing about and the type of document you are going to create. Below is a sample XML DTD that I have borrowed -
<?XML VERSION="1.0" STANDALONE="NO"?>
<!DOCTYPE MEMO [
<!ELEMENT MEMO (TO,FROM,SUBJECT,BODY,SIGN)>
<!ELEMENT TO (#PCDATA)>
<!ELEMENT FROM (#PCDATA)>
<!ELEMENT SUBJECT (#PCDATA)>
<!ELEMENT BODY (P+)>
<!ELEMENT P (#PCDATA)>
<!ELEMENT SIGN (#PCDATA)>
]>
|
|
|
| Statement |
|
| <!DOCTYPE | The document type declaration states which document type a document complies to. |
| [ | This symbol marks the start of the document type definition. |
| <!ELEMENT | Elements state their relationships with other elements, so that document readers understand how documents, complying to the DTD, work. |
| +,*,?,| |
These symbols represent the allowed use of elements. For example, if "+" is associated with an element, then that element must be used at least once and can be used limitlessly. The other elements, other than "|", operate in similar ways, outlined below. The "|" symbol means OR, meaning that only one of a set of options can be used.
+: required and multiple *: optional and multiple ?: optional but singular |
| #PCDATA | This statement, to a document reader, means text. If a DTD designer wants text to be allowed in a document, "#PCDATA" will be used in the DTD to state that. |
| ]> | This symbol marks the end of the document type definition. |
When you've reached this point the actual writing of the XML is simple. The DTD is the hardest of all to write, but once you've got that the rest is easy. Just make sure that you follow the rules that you've set out in your DTD and everything should work out fine. If you find that the DTD is to strict, or conversely, not detailed enough just edit it to your needs.
Written by Harald Gill ©1998