1 - Introduction
Published online by Cambridge University Press: 05 July 2014
Summary
Documents, schemas, and schema languages
The data format known as extensible mark-up language (XML) describes tree structures based on mark-up texts. The tree structures are formed by inserting, between text fragments, open and end tags that are balanced, like parentheses. A data set thus obtained is often called a document. On the surface, XML resembles hypertext mark-up language (HTML), the most popular display format for the Web. The essential difference, however, is that in XML the structure permitted to documents, including the set of tag names and their usage conventions, is not fixed a priori.
More precisely, XML allows users to define their own schemas; a schema determines the permitted structure of a document. In this sense, it is often said that a schema defines a “subset of XML” and thus XML is a “format for data formats.” With the support of schemas each individual application can define its own data format, while virtually all applications can share generic software tools for manipulating XML documents. This genericity is a prominent strength of XML in comparison with other existing formats. Indeed, XML has been adopted with unprecedented speed and range: an enormous number of XML schemas have been defined and used in practice. To raise a few examples, extensible HTML (XHTML) is the XML version of HTML, simple object access protocol (SOAP) is an XML message format for remote procedure calls, scalable vector graphics (SVG) is a vector graphics format in XML, and MathML is an XML format for mathematical formulas.
- Type
- Chapter
- Information
- Foundations of XML ProcessingThe Tree-Automata Approach, pp. 1 - 8Publisher: Cambridge University PressPrint publication year: 2010