XML - SourceForge

XML
eXtensible Markup Language
XML
• A method of defining a format for
exchanging documents and data.
– Allows one to define a dialect of XML
– A library of tags, with associated structure
<config>
<descriptor type="FILE" name="source">
<attribute name="media_type" type="svalue"/>
<attribute name="frame_rate" type="svalue"/>
</descriptor>
</config>
The Social Benefits
• Can specify an interchange format
concisely and accurately enough to set
up a validation service easily
• There is plenty of available software for
dealing with XML files and translating
from one format into another
Downsides
• Sometimes defining a representation can be a
pain
– Deciding what to leave as content and what to
move to attributes.
– XML Schemas are confusing, while DTDs do not
offer enough control
• Verbose
– ViPER increased about 2x uncompressed, 4/3x
gzip compressed
• Difficult to read
– Lots of </…> and end tags get in the way of the
data
The Real Benefits to The
Programmer
• XML Schema (or DTDs) allow you to validate
a document without having to examine it
• Xpath allows you to specify a node, or set of
nodes, in a document quickly and easily
• SAX makes it easy to write a quick parser
• DOM makes it so you don’t even have to do
that
• XSL:T allows you to transform from an XML
document into another document, possibly
not even standard XML
• Etc.
XML As A File Format
• Makes parsing simpler, but currently no
methods for making saving easier
• Saves you from dealing with things like
character encoding and date formatting
• No more difficult than making up your own
• An unfamiliar or forgotten file grants more
affordances than an XML or binary file
Defining A Dialect
• XML Schema – Structure and Data
– Define elements and attributes
– Associate them with data types
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://lamp.cfar.umd.edu/viper"
xmlns:viper="http://lamp.cfar.umd.edu/viper"
elementFormDefault="qualified">
<xsd:element name="viper"/>
<xsd:element name="config"/>
</xsd:schema>
Schema Datatypes
• Can create and assign datatypes to
attributes and elements. For example:
<xsd:element name="data" type="xsd:base64Binary"/>
<xsd:attribute name="span" type="viper:framespanType"/>
<xsd:simpleType name="framespanType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d+\:\d+" />
</xsd:restriction>
</xsd:simpleType>
Schema Structures
• Can specify order and contents of
elements
– Sequence, choice, mixed, etc. allow
specifying how and where elements appear
– Substitution groups allow one tag to take
the place of another
• Can group elements without placing the
into types
Extensiblity
• Inheritance
– Can extend complex elements by adding more
attributes and elements to the bottom
– Can restrict the data using the <restriction/>
elements
• The <any/> and <anyAttribute/> elements
– The ultimate in extensibility, allow any valid XML
in from a given namespace or range of namespaces
Parsing
• Using the DOM:
– The DOM provides a tree structure that
represents the document
– Memory heavy
• Using SAX:
– Event driven
– Lightweight
– Better for large documents
Xpath
• The common language for selecting individual
pieces of an XML document shared between
X-Link and XSL:T
– Also used for defining uniqueness constraints in
Schemas
– DOM Level 3 will support selecting by Xpath
• Looks sort of like a JavaScript DOM call:
– /viper/config/descriptor[@type=“FILE”]
• Selects all of the file descriptor nodes that are of type
“FILE”
Resources
• www.xml.com
– O'Reilly's XML resource
• www.w3.org
– The standards themselves, and lots of good
links to implementations.
• xml.apache.org
– DOM, SAX, and XSLT for C and Java