Get ready for the next big thing: XML
Most Web developers know Extensible Markup Language looms large in their future, but few know whether it will be next month or next year.
www.iso.ch/cate/d16387.html
Flexible format
| An XML Glossary |
| Attribute: A property that can be assigned a value associated with an element. Hyperlinks and embedded images are attributes. CDF: Channel Definition Format, a push technology used in XML. DTD: Document Type Definition, a set of rules governing the tags in an XML document, set at the top of the document. DSSSL: Document Style Semantics and Specification Language, an SGML linking standard. Element: The key word that starts a declaration of element type. Entity: Phrase or character that represents text or data stored elsewhere. Parser: A program that checks an XML document to ensure it is valid. Stlye sheets: Can be associated with an XML document to control information display. Well-formed: An XML document whose open and close tags match and are nested correctly, and whose entities and attributes are properly declared. XLL: Extensible Linking Language, the linking standard for XML. XSL: Extensible Style Language, the style standard for XML. |
The main page for the W3C's XML efforts is at www.w3.org/XML/Activity. XML only recently became a W3C recommendation and is not yet an official standard.
Internet Explorer 5.0 so far is the only browser that understands XML elements, based on the draft specification. The parts Microsoft adopted for Explorer will likely be part of the official XML standard. Netscape Communications Corp. is taking a wait-and-see attitude and likely will not release an XML-ready Communicator 5 until the specification becomes official.
Here's how XML makes documents readable by users and by browsers and other software programs. Say you want to create a document about some machine parts stored in a warehouse. In the HTML world, you would start with a document that looked something like this:
Machine Parts
Left-handed widgets
Then you would add more lines of description to produce a basic Web page. If your colleagues later wanted to put the information into a report or add it to a database, they would gather the page, strip out the font and alignment tags, and then reformat the information.
Now here's how the same document might look in XML:
In XML, tags can be invented to describe data types. Anyone searching for an occurrence of left-handed widgets within a recognized tag called
Once tags are generally recognized, software can deal with them automatically. It's simple to tell a program to look at a specific directory and pull in the contents of the
If every bit of information is properly tagged, you can pull all of it into a database. At that point you no longer need to maintain the original document, just the database.
Get together
But you cannot keep adding new tag names, especially if you share data with other offices. How would they know what your tags meant? That's why groups have gotten together to develop standardized tag sets.
Given the appropriate tags, you can stack an enormous amount of data into an XML document. Anything becomes a data field just by tagging it, including the document itself. Take a look at this XML document:
It looks like HTML, but it has no presentation data. That comes from another source, such as a style sheet. It's like having your word processor import addresses or names via mail merge rather than typing and formatting them directly in the document. Many systems merge XML and style data back into a presentational language such as HTML for easier reading.
The downside of XML's flexibility is that it is less forgiving than HTML. Browsers ignore HTML commands they fail to understand. If items aren't properly nested, it's no big deal to the browsers.'But in XML, an improperly formatted file creates a fatal error. Applications will refuse to process the file.
That means a document must be what XML experts call well-formed to work right. It has to be ready for a computer program to read, and thus ready to be used in multiple ways for network delivery.
In a well-formed document:
''All begin tags and end tags match up.
''Empty tags use the special XML syntax
''All the attribute values are properly quoted, for example: .''All the entities, or reusable data chunks, are declared.Checking for code errors across thousands of documents is tough, so XML users turn to automated tools such as the Lark parser. An online demo of Lark appears at xml.com/xml/pub/tools/ruwf/check.html, which can check whether your document is well-formed.
XML designers recognized that document authors sometimes omit important information or include extraneous text. The document type definition, or DTD, makes sure that XML coding will do what was intended.
For the parts file above, a DTD might work like this:
text]. The text entered inside the brackets would represent the DTD for the document with a root element known as
- . The root element contains all other elements.
. This simply says to expect a standalone tag.
. This defines the
An XML document can have an internal or an external DTD. It must be external if the DTD applies to multiple XML files.
The elements can get more complex. For example, in the term , the #PCDATA term is parsed character data'nonbinary information such as an image or raw text. You could designate "author" as the author's name or a photo.
The DTD checks to confirm that items within the tags follow its rules. For details about how DTDs are constructed, visit www.w3.org/TR/REC-xml#dt-doctype. But an XML document need not have a DTD to function. If the document is well-formed, it requires no special rules to tell a browser or other device how to read it.
A validating parser knows whether a document is well-formed. To do a quick, simple validation, save a document with an .xml extension, then view it in Internet Explorer 5.0, which will show whether anything is incomplete.
The key to writing successful XML is to do a great deal of advance planning. Decide how documents will be stored and served, how databases will be accessed, what tag sets will be used, and how they will nest so that the resulting documents are not only well-formed but also make sense to readers.
Decide whether you will need a DTD. If so, should it be internal or external, and how should it be structured? Don't worry about style sheets until you have everything else in place.
Above all, learn what others in your agency are doing about tag set creation. Because the government shares so much information, it needs a governmentwide tag set.
Then set up some experiments with a few dozen documents. Check the resources in this article to get started.
You can read the full XML specification at www.w3.org/TR/REC-xml.
NEXT STORY: Can you separate convergence fact from fiction?




