What about XML schema

September 8, 2007

What is a DTD?

The purpose of a Document Type Definition (DTD) is to define the legal building blocks of any SGML-based (SGML = Standard Generalized Markup Language) document. It defines the document structure with a list of legal elements.

DTD’s have been used since the 1970’s

What is a schema?

Schemata (plural of schema) are a “A diagrammatic representation; an outline or model.”

Something that formally describes the abstract structure of a set of data can therefor be called schema.

An XML-schema is a document that describes the valid format of an XML data-set. This definition include what elements are (and are not) allowed at any point; what the attibutes for any element may be; the number of occurances of elements; etc..

Note: XML-Schema are not known for their brevity. An XML-Schema document for a reasonably-sized XML instance-document will be fairly large. Disk space is cheap and bandwidth is not a huge bottleneck, so there is no need to worry about it.
It does mean that you will to alot of typing though.

Why use a DTD or Schema?

The majority of XML documents are “well formed” rather than “valid”. The former means that there is exactly one root element, and every sub-element (and recursive sub-elements) have delimiting start- and end-tags, and that they are properly nested within each other. On the other hand, a valid document is “well-formed” and conforms to a specified set of production rules.

To validate an XML document, some form of validating rules need to be provided. This can be done by any Document Type Declaration.

Why schemas instead od DTDs?

An XML-Schema sounds very much like a DTD, however there is are some critical differences, the most notable being that XML-Schema can deal with name-spaces, and DTD’s can’t (see the sidebar at http://www-106.ibm.com/developerworks/xml/library/xml-schema/#sidebar1 for some of the limitations of a DTD)
namespaces

As the main reason for using a schema instead of a DTD is the ability to mix namespaces, it must be mentioned that XML-schema are very dependent on namespaces - so we need to go over them first.
Question: What is a namespace?
Answer: From the W3C web site
We envision applications of Extensible Markup Language (XML) where a single XML document may contain elements and attributes (here referred to as a “markup vocabulary”) that are defined for and used by multiple software modules. One motivation for this is modularity; if such a markup vocabulary exists which is well-understood and for which there is useful software available, it is better to re-use this markup rather than re-invent it.
Such documents, containing multiple markup vocabularies, pose problems of recognition and collision. Software modules need to be able to recognize the tags and attributes which they are designed to process, even in the face of “collisions” occurring when markup intended for some other software package uses the same element type or attribute name.
These considerations require that document constructs should have universal names, whose scope extends beyond their containing document. This specification describes a mechanism, XML namespaces, which accomplishes this.
[Definition:] An XML namespace is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute names. XML namespaces differ from the “namespaces” conventionally used in computing disciplines in that the XML version has internal structure and is not, mathematically speaking, a set. These issues are discussed in “A. The Internal Structure of XML Namespaces”.

What this means, basically, is that the validating rules for some elements are defined in one place, and some others in another.
For example, HTML (and xhtml) are defined in one single place [by the W3C people]. This can be defined with a DTD.

The RDF (Resource Description Framework), on the other hand, is specifically designed to be a framework for various parties to share data using a common set of XML elements. In the Bibliographic world, there is another framework (called the Dublin Core) which is often used in conjunction with RDF.. This is far more complex, with multiple markup vocabularies, so requires namespaces - which requires schema.

Read the full article

 

Post a comment

Name (required)

Mail (will not be published) (required)

Website

*
To prove you're a person (not a spam script), type the security text shown in the picture. Click here to regenerate some new text.
Click to hear an audio file of the anti-spam word