CBXML: Experience with Binary XML

 

Mike Conner

IBM Corporation

8/8/2003

Abstract

Some time ago we began an investigation into alternative encodings of XML data with the goals of preserving the advantages of XML’s standard character based encoding while reducing document size and improving document processing speed. This paper provides an overview of the encoding that emerged from this investigation which we have called CBXML (Compact Binary XML). The rationale for, and benefits of, CBXML are also given. In addition, some of the alternatives considered and why they were rejected are discussed.

Table of Contents

Overview.. 2

Basic Concepts of CBXML. 3

Compression and Performance. 4

CBXML Design Issues. 6

Character Encodings. 6

Coverage of XML Features. 7

Capture of Information not in XML Infoset 7

C Language Support 7

Bulk Decoding Support 7

Idiom CIDs. 8

Binary Values. 8

Inline Attachments. 8

Alternative Encodings. 8

Appendix A: Specification of the CBXML encoding. 8

Non-negative Integers. 9

Strings. 10

Items. 10

Item representations. 12

The Four Parts of an CBXML Document 12

Appendix B: Analysis of "personal.xml" Conversion to CBXML. 15

Compression Data. 15

Performance Results. 15

Encoding. 15

personal.xml 31

 

Overview

XML offers many benefits as a portable and widely adopted standard representation of data. It is being successfully employed in a wide range of contexts such as: publishing, program-to-program communication, user interface, and databases. However, two issues are consistently raised with its use.

1.      Its size. When compared to various binary representations of data, the XML representation can be quite a bit bigger— up to10 times bigger (or even more in extreme cases). This raises issues when XML is be transmitted across communication lines, as it results in considerable additional bandwidth requirements.

2.      Processing overhead. When compared to typical scenarios for converting binary data into a form that can be processed by applications, XML can be much more expensive—2 to 10 times more expensive. This stems from the need to parse XML data which involves a fair amount of complexity and a number of cycle-consuming operations such as scanning for delimiters, and processing alternative character representations such as entity expressions.

Compact, Binary XML (CBXML) purports to address these issues by offering an encoding of XML data that preserves almost all of the traditional values associated with XML while greatly reducing both document size and processing requirements.

The basic idea of CBXML is to provide an encoding of XML information that retains all the platform neutrality and self-descriptive benefits of the standard character-oriented encoding of XML while greatly reducing document size and allowing them to be processed at greater speed with simpler software. The only major aspect of XML that is compromised is the ability for humans to directly read or produce CBXML documents, or for documents to be directly processed by text-based tools.. However, a very simple program can convert a CBXML encoding of XML information into a standard XML encoding and from a standard XML encoding to a CBXML encoding.

Some of the design goals of CBXML are:

·        Retain all the platform neutrality, self-descriptive and loose coupling aspects of the standard XML representation.

·        Provide an encoding that can represent all of the information in a standard XML document and that could be converted back into an XML document equivalent to the original without requiring any additional document specific information (such as the document's Schema). (Note: this goal was not fully achieved, but full encoding of the information contained in the XML Infoset derived from a well formed XML document is achieved. It would be straight forward to capture more of the information in an XML document. For example, which type of quote is used on an attribute value could be captured by having two CIDs for attributes, one for single quote and one for double quote.)

·        Be very space efficient with respect to representing the non-content aspects of XML, thus reducing the overhead of XML.

·        Support very rapid parsing (decoding) with minimal complexity.

·        Support a sequential processing model where it is not necessary to parse (decode) the entire document in order to decode an initial fragment of the document or to have knowledge of the entire document before starting to produce a CBXML encoding of the document.

Basic Concepts of CBXML

The document is broken down into a sequence of tokens. Each token is either an integer id or length; a string; or a typed clause consisting of a known sequence of integer, string, or clause tokens.

Integer ids and lengths are encoded as a variable number of bytes (7 bits of value per byte). The id space is partitioned (e.g., element names are separated from attribute names) so that most ids are less than 127 and therefore fit in a single byte. Most lengths also fit in a single byte.

Strings are encoded as byte sequences preceded by a length count. The byte sequences may follow any of several character encoding standards such as UTF-8, or UTF-16. Using a length count to delimit Strings means that no characters are reserved as delimiters and therefore escapes (such as "<") are not needed. It also allows very rapid scanning of CBXML data streams because character sequences can be “jumped” over without having to scan for delimiters. This might prove to be very valuable for sparse processing of XML data.

Strings are not usually repeated. After a string's first occurrence it may be inserted into the document’s representation by reference to its first occurrence. This means that an element name that occurs 100 times in a text XML document will occur 1 time followed by 99 references. This also means that it only has to be converted from a byte sequence to a UNICODE character array 1 time.

Each type of XML information is represented by an explicit format so that it is not necessary to "parse" the CBXML document. For example, the start of an element section (something like '<element attrName="value">') has a unique id and then is followed by the element's name followed by a list of attributes. Lists are always preceded by a count which may be zero.

The main space savings in CBXML comes from not repeating strings and from representing lexical boundaries efficiently. The main processing performance improvements come from eliminating the need for complex lexical analysis and from object creation reduction as repeating objects are almost always references back to the original object.

Compression and Performance

Data Compression

Because of the design of CBXML, it does a good job of compressing XML messages. The amount of compression is dependent on the XML message, however, a 6X improvement is not unreasonable as shown in the following figures.

Figure 1: Compression results for short messages.

Figure 2: Compression results for longer messages.

 

The above figures also show the compression achievable by ZIP compression. One benefit of CBXML compression is that it is comes for free (actually it reduces processing costs). With ZIP, the XML message must be converted to ZIP format, and then decompressed at the other end.  CBXML can be rendered directly and consumed directly, resulting in less processing cost that using the standard XML encoding as discussed in the next section.

Performance

CBXML impacts only the initial steps of processing an XML message. To measure this impact we compared the time necessary for a Xerces parser to produce XNI events (similar to SAX events) to the time for a CBXML parser to do the same.  This shows the difference for this aspect of XML processing. However, this is only part of the processing that is typically applied to XML documents. As the amount of additional processing increases the relative impact of CBXML will decrease. Measurements of Web Services performance (not included in this paper) show a noticeable speed-up in the full round-trip processing of a Web Services request that includes no business logic.  However, for Web Services requests that involve very significant business logic overhead, the impact of CBXML on the round trip processing time becomes negligible unless bandwidth is constrained. When scenarios with constrained bandwidth are considered CBXML dramatically reduces round-trip processing time due to compression, but so does ZIP compression. The following figures show how CBXML impacts the initial steps of XML processing as discussed above.

 

Figure 3: Parsing Impact from CBXML relative to Xerces J  2.02.  Units indicate the multiplicative factor of improvement over Xerces.

CBXML Design Issues

This is a brief discussion of issues that have arisen in the design of the CBXML encoding or with the software support for processing CBXML.

Character Encodings

All character information in a CBXML document is encoded; however some encodings are more efficient in space and/or time than others for some character data. Therefore CBXML supports different encodings for character data. However only one encoding is used for each document, except that the header portion of documents uses 7-bit ASCII, US-ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set.

With respect to character encodings we are considering the following issues:

§         Should CBXML be extended to allow multiple character encodings in a single document, by adding "startEncoding' and "endEncoding" clauses?

§         Should this be in addition to the document wide specification in the header or should the header specification be removed? Note that this may have an impact on "Bulk decoding support" discussed below.

Coverage of XML Features


Many of the features of XML are designed to enable hand authored and maintained XML documents. CBXML is primarily focused on program generated and consumed XML such as is used in Web Services and database environments. Therefore, to reduce complexity and processing overhead, CBXML does not support the embedded DTD subset. However, DTD and Schema references are supported. We are considering whether we should add support for the internal DTD subset.

Capture of Information not in XML Infoset

Information such as the type of quotes (single or double) used for attribute values is lost in the current CBXML encoding. It would be simple to retain this type of information in the CBXML encoding by, for example, having two CIDs for attribute values, one for single quotes, and one for double quotes. Other information such as the interleaving of namespace declarations and attribute name-value pairs could also be captured. Doing so would likely increase the processing time for CBXML documents, but not their size.

C Language Support

The C programming language and its libraries work with null terminated strings. A pad character could be added after each new string definition in CBXML so that C-based processing could insert a null and not necessarily have to copy the string from its input buffer.

Bulk Decoding Support

Currently CBXML puts the length in bytes in front of any new string definition. This could be changed to the length in 16-bit characters (UNICODE). If this was combined with a change to the encoding of variable length integers to use base64 encoding (reducing the content of each byte to 5 bits plus a flag bit) then the entire CBXML message could be decoded from, say, UTF-8 in one operation. This would, of course, hamper any optimized processors that work on the CBXML file without converting some or all of the strings.

Idiom CIDs

There are a number of idioms that occur very frequently in XML, such as a simple leaf node such as <tag>value</tag>. These could be given special CIDs. This would both reduce message size by two bytes in the example given, but reduce processing from three clauses ("begin element", "characters", "end element") to one. However, this would come at the cost of increased complexity in generating CBXML in some environments.

Binary Values

It would be fairly straightforward to extend CBXML to allow non-character-based value types, e.g., integer or floating point. Some initial analysis indicates that this is not justified for integers. However, no analysis has been done for more complex binary types such as floating point, or date. Of course, any use of binary values would introduce portability or interoperability issues in that no binary value encoding is native to every platform. It is my guess that even for documents heavy in, say, floating point numbers then benefits would be a few percent.

Inline Attachments

There has been a lot of focus on the attachment model for SOAP lately. A lot of this has focused on including attachments in the body of a message in binary (not base64) form. CBXML uses a run-length encoding model for variable length values (which are currently restricted to being characters). This means that it would be simple and non-disruptive to inline attachments. Probably the best way to do this would be to include a new value CID, something like "mime value".

Alternative Encodings

There are other directions for binary XML being considered. On one extreme, only large values such as images could be represented in binary. This would seem to suffer all the negatives of CBXML, such as breaking human readability and being a new XML encoding, but would not yield as much value as full CBXML.

One could also go beyond the information currently encoded in CBXML and include structural navigation data such as offsets for various parts of the logical XML tree structure. This might accelerate such things as partial message processing and provide support for more efficient XPath evaluation. However, this would seem to require breaking the streamability requirement.

On the other extreme are encodings that are schema specific. In these encodings, the tags and other descriptive aspects of XML are removed. A document so encoded can only be understood if it is processed with exactly the same schema used to encode it. This introduces a fragility to XML messages similar to that of DCOM, CORBA, RMI, and other binary content models which has proven to be a evolution management nightmare for customers.

Appendix A: Specification of the CBXML encoding

The CBXML encoding is based on a straight forward serialization of the XML Information Set (see http://www.w3.org/TR/xml-infoset). Each information item defined in the XML Information Set contains two types of information.

Explicit - this is information that is not covered by other information items. For example, the document information item explicitly provides information like the XML version and the standalone indication.

Contained - this is information that consists of an ordered or unordered list of other information items.

The information items are divided into two categories, those that cannot have contained information, called simple items, and those that can have contained information called compound items. Simple information items are serialized by generating a single encoded item that includes all the information item's explicit information. Compound information items are serialized by generating a start and end encoded item. The start item will include any of the information item's explicit information and all of its contained information, if any, will be generated between its start and its end encoded item.

A CBXML document consists of four parts as follows and in the indicated order.

A specific integer value, 1764953, indicating that this is an CBXML encoded document,

a CBXML encoding version indication

an extensible header which does not carry any of the document's structure or content, but rather provides information about encoding options, and

a body that represents all of the document's structure and content information in the manner discussed above.

All three parts of a CBXML document are represented by a sequence of two basic types of tokens:

Non-negative integers

Strings

In the header and body these are arranged into a sequence of typed clauses each of which begins with an identifying integer value (referred to as a clause id, CID, which determines the type of the clause. Each clause then has a type specific sequence of arguments. Details for the two types of values and for each of the three parts follow.

Non-negative Integers

Integers are represented by a variable number of bytes with the first byte containing the highest order bits for the integer, the next byte containing the next highest order bits, and so on. Each byte contains 7 bits of the integer's value with the highest order bit of each byte designated as a flag bit. A byte's flag bit is off if and only if the byte is the last byte (lowest order byte) of a variable length byte sequence for a number. Only as many bytes as necessary to represent an integer are used. Thus integers between 0 and 127 are represented in one byte with the flag bit off; integers between 128 and 16,383 are represented in two bytes with the flag bit set in the first byte, and so on.  Integer values must be between 0 and 2,147,483,647 (231-1).

Strings

String representations consist of two parts, in sequence, as follows:

An non-negative integer giving the number of bytes in the string's value representation. zero is a valid value indicating a non-null string of zero length.

The bytes of the string's value representation in order. These bytes must be interpreted according to the document's character encoding type as defined in the document header.

The byte encodings allowed for strings are given in the table below. Any of these can be used to represent string values, however the choice must be consistent for all strings in the body of the CBXML document and is indicated by a clause in the CBXML document's header. UTF-8 is the default string value representation.

 

US-ASCII

Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set

ISO-8859-1  

ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1

UTF-8

Eight-bit UCS Transformation Format

UTF-16BE

Sixteen-bit UCS Transformation Format, big-endian byte order

UTF-16LE

Sixteen-bit UCS Transformation Format, little-endian byte order

UTF-16

Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

Items

The body part of the document which provides all of the document's content consists of a sequence of clauses as discussed above each of which consists of a CID followed by a CID-specific, ordered sequence of items or lists of items. Items have two basic representations, defining and reference.

A defining representation consists of integer value of 0 indicating a null value, or 1 or 2 followed by an ordered sequence of type specific basic types or items. A value of 1 indicates that the item is non-referenceable (only simple strings can be non-referenceable). A value of 2 indicates that the item is sharable. As no references to non-referenceable items can occur, they do not participate in the numbering of strings for future reference.

A reference representation consists of a single integer that is greater than 2 (to indicate that this is not the first part of a defining representation) that is the index, based on document order, of an occurrence of a referenceable defining representation for an item of the same type (see Item Types directly below) in the same document plus 2 and it indicates that the same item should used at the reference point. That is, a reference integer of 5 for an element name string would mean to use the 3rd element name string starting from the beginning of the CBXML document.

Item types: All items are divided in to types. This makes for smaller reference indexes as an index is only over items of one type. The items types are as follows:

Item Type

Defining Representation

Qualified element names

2 parts:

Namespace map reference,

a local element name part definition or reference

Local element name part

a string

Qualified attribute names

2 parts:

Namespace map reference,

a local attribute name part definition or reference

Local attribute names

a string

Attribute values

a string

Characters (i.e., character content)

a string

Processing instruction targets

a string

Processing instruction data

a string

Comments

a string

Entity names

a string

Namespace prefixes

a string

Namespace URIs

a string

Namespace maps

2 parts:

a namespace prefix reference or definition

a namespace URI reference or definition

Miscellaneous - all other items

a string

The type of a item is determined from its context in an CBXML document as discussed in the body description below.

Item representations

All items except for the following are represented by a single basic string value. The exceptions are:

Namespace maps are represented by a sequence of two items, a namespace prefix item, followed by a namespace URI item.

Qualified Element Names are encoded as two items: a namespace map reference (it can never be a defining instance) followed by a local element name.

Qualified attribute names are encoded as two items: a namespace map reference (it can never be a defining instance) followed by a local attribute name.

The Four Parts of an CBXML Document

CBXML documents consist of four parts in sequence as defined below.

Part 1: The CBXML indicator

This is the non-negative integer, 1764953

Part 2: Version indicator

The version indicator consists of two integers. The first integer indicates the major version of the CBXML encoding specification and the second integer indicates the minor version of a specification. A CBXML parser should be able to read and fully parse a CBXML document with the same major version for which it was designed and a minor version less than or equal to the minor version of the specification for which it was designed.

Part 3: Header

The CBXML header contains information about encoding options that were followed in the body part of the CBXML document. This information is given as a sequence of encoding clauses followed by the integer value of 0 which indicates the end of the header. Currently only one header clause is defined. It is:

String value encoding - CID = 1

This clause has a single string argument which indicates the encoding of string values in the body of the document. It must be one of the following:

 

US-ASCII

ISO-8859-1

UTF-8

UTF-16BE

UTF-16LE

UTF-16

Part 4: Body

The CBXML body contains all the information that was encoded from the original XML document. It consists of a series of clauses followed by a integer value of 0 to indicate the end of the body. The following clauses are defined.

CID

Name

Format of information content

1

CharacterData

data

Where:
data is a string with the character data

2

StartElement

ename

Where:
ename is the element's qualified name

3

StartElementAttributes

ename, numAtts, [anamei, avaluei]*

Where:
ename is the element's qualified name
numAtts is the number of attributes
anamei is the ith attribute's qualified name
avaluei is the ith attribute's string value

4

StartElementNamespaceDecls

numMaps, [prefixi, urii]*,ename, numAtts, [anamei, avaluei]*

Where:
numMaps is the number of namespace maps
prefixi is the ith map's prefix's name
avaluei is the ith map's URI string value
ename is the element's qualified name
numAtts is the number of attributes
anamei is the ith attribute's qualified name
avaluei is the ith attribute's string value

5

EndElement

This can only be used to end an element that did not introduce any namespace maps

5

EndElementMaps

numMaps, mapReferencei*

Where:
numMaps is the number of namespace maps introduced by the corresponding element start
mapReferencei is the ith map reference (it must be a reference as the map must already have been declared).

7

EmptyElement

ename

Where:
ename is the element's qualified name

8

EmptyElementAttributes

ename, numAtts, [anamei, avaluei]*

Where:
ename is the element's qualified name
numAtts is the number of attributes, n
anamei is the ith attribute's qualified name
avaluei is the ith attribute's string value

9

EmptyElementNamespaceDecls

numMaps, [prefixi, urii]*,ename, numAtts, [anamei, avaluei]*

Where:
numMaps is the number of namespace maps
prefixi is the ith map's prefix's name
avaluei is the ith map's URI string value
ename is the element's qualified name
numAtts is the number of attributes
anamei is the ith attribute's qualified name
avaluei is the ith attribute's string value

10

ProcessingInstruction

target, data

Where:
target is the qualified name of the procession instruction target
data is the data associated with the named target

11

XMLDeclaration

version, standalone

Where:
version is a string giving the documents version, may be null
standalone is a string that is either "yes", "no", or null

12

DocumentTypeDeclaration

rootElementName, systemId, publicId

13

EntityReference

name

Where:
name is a string giving the name of the entity

14

Comment

data

Where:
data is a string with the comment's text

Appendix B: Analysis of "personal.xml" Conversion to CBXML

Following is a detailed analysis of the conversion of a simple xml file, "personal.xml", to "personal.cbxml". This analysis includes compression and parsing speed data. It also includes detailed analysis of the cbxml encoding of "personal.xml". This shows the exact sequence of bytes produced and describes the meaning of each.

Compression Data

Convert: personal.xml to personal.cbxml
 
Size Information:
    Size of XML source                     2097 bytes
    Size of CBXML                           723 bytes
    Reducing the source by                   65 %, or  2.9 times better.
    Size of zipped xml                      593 bytes
    Size of zipped cbxml                    576 bytes
    

Performance Results

All timing runs are for 300 iterations after a warm up of 500 iterations.

    XML to CBXML via Xerces2 front-end:   1.770 milliseconds
    CBXML to CBXML Roundtrip:             0.633 milliseconds
    CBXML to CBXML via XMLDocHandler:     1.470 milliseconds
 
    CBXML to Null CBXML handler:          0.267 milliseconds
    XML to Null Xerces2 handler:          0.567 milliseconds
    CBXML parsing time reduction is          52 %, or  2.1 times better.

Encoding

This following is a description of the actual byte sequence of the cbxml file. The first section of each line is the byte offset being described. The next section is an indication of the interpretation of the value at this byte offset. The third section is the actual value in the cbxml byte stream. Values are either: integers, strings or references. For an integer the integer value is given. For a string it length is given followed by the string value. For a reference its reference id is given followed by an equal sign and the referenced value.

Header

[    0] Magic number----------------------- 1764953
[    3] Major version---------------------- 1
[    4] Minor version---------------------- 1
[    5] Character encoding option---------- 1
[    6] Character encoding----------------- 5,"UTF-8"
[   12] End of header---------------------- 0

Body

===========================================
[   13] XMLDecl---------------------------- 11
[   14] new: miscellaneous-XML Version----- 2
[   15] ----------------------------------- 3,"1.0"
[   19] null miscellaneous-standalone------ 0
===========================================
[   20] startElementNamespaceDecls--------- 4
[   21] number of NS Maps------------------ 1
[   22] new: namespace map----------------- 2
[   23] new: namespace prefix-------------- 2
[   24] ----------------------------------- 7,"empType"
[   32] new: namespace uri----------------- 2
[   33] ----------------------------------- 25,"http://foo.com/hr/empType"
[   59] new: qualified element name-------- 2
[   60] null namespace map----------------- 0
[   61] new: localpart of an element name-- 2
[   62] ----------------------------------- 9,"personnel"
[   72] number of attributes--------------- 0
===========================================
[   73] comment---------------------------- 14
[   74] new: comment----------------------- 2
[   75] ----------------------------------- 27," Email Contact information "
===========================================
[  103] characters------------------------- 1
[  104] new: content----------------------- 2
[  105] ----------------------------------- 7,"\n\n  \n  "
===========================================
[  113] startElementAttributes------------- 3
[  114] new: qualified element name-------- 2
[  115] null namespace map----------------- 0
[  116] new: localpart of an element name-- 2
[  117] ----------------------------------- 6,"person"
[  124] number of attributes--------------- 1
[  125] new: qualified attribute name------ 2
[  126] null namespace map----------------- 0
[  127] new: localpart of an attribute name 2
[  128] ----------------------------------- 2,"id"
[  131] new: attribute value--------------- 2
[  132] ----------------------------------- 8,"Big.Boss"
===========================================
[  141] characters------------------------- 1
[  142] new: content----------------------- 2
[  143] ----------------------------------- 5,"\n    "
===========================================
[  149] startElementNamespaceDecls--------- 4
[  150] number of NS Maps------------------ 1
[  151] new: namespace map----------------- 2
[  152] new: namespace prefix-------------- 2
[  153] ----------------------------------- 0,""
[  154] new: namespace uri----------------- 2
[  155] ----------------------------------- 17,"http://foo.com/hr"
[  173] new: qualified element name-------- 2
[  174] old: namespace map----------------- 4 = "xmlns=http://foo.com/hr"
[  175] new: localpart of an element name-- 2
[  176] ----------------------------------- 4,"name"
[  181] number of attributes--------------- 0
===========================================
[  182] characters------------------------- 1
[  183] new: content----------------------- 2
[  184] ----------------------------------- 7,"\n      "
===========================================
[  192] startElement----------------------- 2
[  193] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  194] characters------------------------- 1
[  195] new: content----------------------- 2
[  196] ----------------------------------- 4,"Boss"
===========================================
[  201] endElement------------------------- 5
===========================================
[  202] characters------------------------- 1
[  203] old: content----------------------- 5 = "\n      "
===========================================
[  204] startElement----------------------- 2
[  205] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  206] characters------------------------- 1
[  207] new: content----------------------- 2
[  208] ----------------------------------- 3,"Big"
===========================================
[  212] endElement------------------------- 5
===========================================
[  213] characters------------------------- 1
[  214] old: content----------------------- 4 = "\n    "
===========================================
[  215] endElement------------------------- 5
===========================================
[  216] comment---------------------------- 14
[  217] old: comment----------------------- 3 = " Email Contact information "
===========================================
[  218] characters------------------------- 1
[  219] new: content----------------------- 2
[  220] ----------------------------------- 10,"\n    \n    "
===========================================
[  231] startElement----------------------- 2
[  232] new: qualified element name-------- 2
[  233] null namespace map----------------- 0
[  234] new: localpart of an element name-- 2
[  235] ----------------------------------- 5,"email"
===========================================
[  241] characters------------------------- 1
[  242] new: content----------------------- 2
[  243] ----------------------------------- 13,"chief@foo.com"
===========================================
[  257] endElement------------------------- 5
===========================================
[  258] characters------------------------- 1
[  259] old: content----------------------- 4 = "\n    "
===========================================
[  260] startElement----------------------- 2
[  261] new: qualified element name-------- 2
[  262] old: namespace map----------------- 3 = "xmlns:empType=http://foo.com/hr/empType"
[  263] new: localpart of an element name-- 2
[  264] ----------------------------------- 12,"employeeType"
===========================================
[  277] characters------------------------- 1
[  278] new: content----------------------- 2
[  279] ----------------------------------- 19,"\n      manager\n    "
===========================================
[  299] endElement------------------------- 5
===========================================
[  300] characters------------------------- 1
[  301] old: content----------------------- 4 = "\n    "
===========================================
[  302] emptyElementAttributes------------- 8
[  303] new: qualified element name-------- 2
[  304] null namespace map----------------- 0
[  305] new: localpart of an element name-- 2
[  306] ----------------------------------- 4,"link"
[  311] number of attributes--------------- 1
[  312] new: qualified attribute name------ 2
[  313] null namespace map----------------- 0
[  314] new: localpart of an attribute name 2
[  315] ----------------------------------- 12,"subordinates"
[  328] new: attribute value--------------- 2
[  329] ----------------------------------- 58,"one.worker two.worker three.worker four.worker five.worker"
===========================================
[  388] characters------------------------- 1
[  389] new: content----------------------- 2
[  390] ----------------------------------- 3,"\n  "
===========================================
[  394] endElement------------------------- 5
===========================================
[  395] characters------------------------- 1
[  396] new: content----------------------- 2
[  397] ----------------------------------- 4,"\n\n  "
===========================================
[  402] startElementAttributes------------- 3
[  403] old: qualified element name-------- 4 = "():person"
[  404] number of attributes--------------- 1
[  405] old: qualified attribute name------ 3 = "():id"
[  406] new: attribute value--------------- 2
[  407] ----------------------------------- 10,"one.worker"
===========================================
[  418] characters------------------------- 1
[  419] old: content----------------------- 4 = "\n    "
===========================================
[  420] startElementNamespaceDecls--------- 4
[  421] number of NS Maps------------------ 1
[  422] old: namespace map----------------- 4 = "xmlns=http://foo.com/hr"
[  423] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
[  424] number of attributes--------------- 0
===========================================
[  425] characters------------------------- 1
[  426] old: content----------------------- 5 = "\n      "
===========================================
[  427] startElement----------------------- 2
[  428] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  429] characters------------------------- 1
[  430] new: content----------------------- 2
[  431] ----------------------------------- 6,"Worker"
===========================================
[  438] endElement------------------------- 5
===========================================
[  439] characters------------------------- 1
[  440] old: content----------------------- 5 = "\n      "
===========================================
[  441] startElement----------------------- 2
[  442] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  443] characters------------------------- 1
[  444] new: content----------------------- 2
[  445] ----------------------------------- 3,"One"
===========================================
[  449] endElement------------------------- 5
===========================================
[  450] characters------------------------- 1
[  451] old: content----------------------- 4 = "\n    "
===========================================
[  452] endElement------------------------- 5
===========================================
[  453] comment---------------------------- 14
[  454] old: comment----------------------- 3 = " Email Contact information "
===========================================
[  455] characters------------------------- 1
[  456] old: content----------------------- 8 = "\n    \n    "
===========================================
[  457] startElement----------------------- 2
[  458] old: qualified element name-------- 6 = "():email"
===========================================
[  459] characters------------------------- 1
[  460] new: content----------------------- 2
[  461] ----------------------------------- 11,"one@foo.com"
===========================================
[  473] endElement------------------------- 5
===========================================
[  474] characters------------------------- 1
[  475] old: content----------------------- 4 = "\n    "
===========================================
[  476] emptyElementAttributes------------- 8
[  477] old: qualified element name-------- 8 = "():link"
[  478] number of attributes--------------- 1
[  479] new: qualified attribute name------ 2
[  480] null namespace map----------------- 0
[  481] new: localpart of an attribute name 2
[  482] ----------------------------------- 7,"manager"
[  490] old: attribute value--------------- 3 = "Big.Boss"
===========================================
[  491] characters------------------------- 1
[  492] old: content----------------------- 4 = "\n    "
===========================================
[  493] startElement----------------------- 2
[  494] old: qualified element name-------- 7 = "(http://foo.com/hr/empType)empType:employeeType"
===========================================
[  495] characters------------------------- 1
[  496] new: content----------------------- 2
[  497] ----------------------------------- 19,"\n      regular\n    "
===========================================
[  517] endElement------------------------- 5
===========================================
[  518] characters------------------------- 1
[  519] old: content----------------------- 11 = "\n  "
===========================================
[  520] endElement------------------------- 5
===========================================
[  521] characters------------------------- 1
[  522] old: content----------------------- 12 = "\n\n  "
===========================================
[  523] startElementAttributes------------- 3
[  524] old: qualified element name-------- 4 = "():person"
[  525] number of attributes--------------- 1
[  526] old: qualified attribute name------ 3 = "():id"
[  527] new: attribute value--------------- 2
[  528] ----------------------------------- 10,"two.worker"
===========================================
[  539] characters------------------------- 1
[  540] old: content----------------------- 4 = "\n    "
===========================================
[  541] startElementNamespaceDecls--------- 4
[  542] number of NS Maps------------------ 1
[  543] old: namespace map----------------- 4 = "xmlns=http://foo.com/hr"
[  544] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
[  545] number of attributes--------------- 0
===========================================
[  546] characters------------------------- 1
[  547] old: content----------------------- 5 = "\n      "
===========================================
[  548] startElement----------------------- 2
[  549] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  550] characters------------------------- 1
[  551] old: content----------------------- 13 = "Worker"
===========================================
[  552] endElement------------------------- 5
===========================================
[  553] characters------------------------- 1
[  554] old: content----------------------- 5 = "\n      "
===========================================
[  555] startElement----------------------- 2
[  556] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  557] characters------------------------- 1
[  558] new: content----------------------- 2
[  559] ----------------------------------- 3,"Two"
===========================================
[  563] endElement------------------------- 5
===========================================
[  564] characters------------------------- 1
[  565] old: content----------------------- 4 = "\n    "
===========================================
[  566] endElement------------------------- 5
===========================================
[  567] comment---------------------------- 14
[  568] old: comment----------------------- 3 = " Email Contact information "
===========================================
[  569] characters------------------------- 1
[  570] old: content----------------------- 8 = "\n    \n    "
===========================================
[  571] startElement----------------------- 2
[  572] old: qualified element name-------- 6 = "():email"
===========================================
[  573] characters------------------------- 1
[  574] new: content----------------------- 2
[  575] ----------------------------------- 11,"two@foo.com"
===========================================
[  587] endElement------------------------- 5
===========================================
[  588] characters------------------------- 1
[  589] old: content----------------------- 4 = "\n    "
===========================================
[  590] emptyElementAttributes------------- 8
[  591] old: qualified element name-------- 8 = "():link"
[  592] number of attributes--------------- 1
[  593] old: qualified attribute name------ 5 = "():manager"
[  594] old: attribute value--------------- 3 = "Big.Boss"
===========================================
[  595] characters------------------------- 1
[  596] old: content----------------------- 4 = "\n    "
===========================================
[  597] startElement----------------------- 2
[  598] old: qualified element name-------- 7 = "(http://foo.com/hr/empType)empType:employeeType"
===========================================
[  599] characters------------------------- 1
[  600] old: content----------------------- 16 = "\n      regular\n    "
===========================================
[  601] endElement------------------------- 5
===========================================
[  602] characters------------------------- 1
[  603] old: content----------------------- 11 = "\n  "
===========================================
[  604] endElement------------------------- 5
===========================================
[  605] characters------------------------- 1
[  606] old: content----------------------- 12 = "\n\n  "
===========================================
[  607] startElementAttributes------------- 3
[  608] old: qualified element name-------- 4 = "():person"
[  609] number of attributes--------------- 1
[  610] old: qualified attribute name------ 3 = "():id"
[  611] new: attribute value--------------- 2
[  612] ----------------------------------- 12,"three.worker"
===========================================
[  625] characters------------------------- 1
[  626] old: content----------------------- 4 = "\n    "
===========================================
[  627] startElementNamespaceDecls--------- 4
[  628] number of NS Maps------------------ 1
[  629] old: namespace map----------------- 4 = "xmlns=http://foo.com/hr"
[  630] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
[  631] number of attributes--------------- 0
===========================================
[  632] characters------------------------- 1
[  633] old: content----------------------- 5 = "\n      "
===========================================
[  634] startElement----------------------- 2
[  635] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  636] characters------------------------- 1
[  637] old: content----------------------- 13 = "Worker"
===========================================
[  638] endElement------------------------- 5
===========================================
[  639] characters------------------------- 1
[  640] old: content----------------------- 5 = "\n      "
===========================================
[  641] startElement----------------------- 2
[  642] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  643] characters------------------------- 1
[  644] new: content----------------------- 2
[  645] ----------------------------------- 5,"Three"
===========================================
[  651] endElement------------------------- 5
===========================================
[  652] characters------------------------- 1
[  653] old: content----------------------- 4 = "\n    "
===========================================
[  654] endElement------------------------- 5
===========================================
[  655] comment---------------------------- 14
[  656] old: comment----------------------- 3 = " Email Contact information "
===========================================
[  657] characters------------------------- 1
[  658] old: content----------------------- 8 = "\n    \n    "
===========================================
[  659] startElement----------------------- 2
[  660] old: qualified element name-------- 6 = "():email"
===========================================
[  661] characters------------------------- 1
[  662] new: content----------------------- 2
[  663] ----------------------------------- 13,"three@foo.com"
===========================================
[  677] endElement------------------------- 5
===========================================
[  678] characters------------------------- 1
[  679] old: content----------------------- 4 = "\n    "
===========================================
[  680] emptyElementAttributes------------- 8
[  681] old: qualified element name-------- 8 = "():link"
[  682] number of attributes--------------- 1
[  683] old: qualified attribute name------ 5 = "():manager"
[  684] old: attribute value--------------- 3 = "Big.Boss"
===========================================
[  685] characters------------------------- 1
[  686] old: content----------------------- 4 = "\n    "
===========================================
[  687] startElement----------------------- 2
[  688] old: qualified element name-------- 7 = "(http://foo.com/hr/empType)empType:employeeType"
===========================================
[  689] characters------------------------- 1
[  690] old: content----------------------- 16 = "\n      regular\n    "
===========================================
[  691] endElement------------------------- 5
===========================================
[  692] characters------------------------- 1
[  693] old: content----------------------- 11 = "\n  "
===========================================
[  694] endElement------------------------- 5
===========================================
[  695] characters------------------------- 1
[  696] old: content----------------------- 12 = "\n\n  "
===========================================
[  697] startElementAttributes------------- 3
[  698] old: qualified element name-------- 4 = "():person"
[  699] number of attributes--------------- 1
[  700] old: qualified attribute name------ 3 = "():id"
[  701] new: attribute value--------------- 2
[  702] ----------------------------------- 11,"four.worker"
===========================================
[  714] characters------------------------- 1
[  715] old: content----------------------- 4 = "\n    "
===========================================
[  716] startElementNamespaceDecls--------- 4
[  717] number of NS Maps------------------ 1
[  718] old: namespace map----------------- 4 = "xmlns=http://foo.com/hr"
[  719] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
[  720] number of attributes--------------- 0
===========================================
[  721] characters------------------------- 1
[  722] old: content----------------------- 5 = "\n      "
===========================================
[  723] startElement----------------------- 2
[  724] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  725] characters------------------------- 1
[  726] old: content----------------------- 13 = "Worker"
===========================================
[  727] endElement------------------------- 5
===========================================
[  728] characters------------------------- 1
[  729] old: content----------------------- 5 = "\n      "
===========================================
[  730] startElement----------------------- 2
[  731] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  732] characters------------------------- 1
[  733] new: content----------------------- 2
[  734] ----------------------------------- 4,"Four"
===========================================
[  739] endElement------------------------- 5
===========================================
[  740] characters------------------------- 1
[  741] old: content----------------------- 4 = "\n    "
===========================================
[  742] endElement------------------------- 5
===========================================
[  743] comment---------------------------- 14
[  744] old: comment----------------------- 3 = " Email Contact information "
===========================================
[  745] characters------------------------- 1
[  746] old: content----------------------- 8 = "\n    \n    "
===========================================
[  747] startElement----------------------- 2
[  748] old: qualified element name-------- 6 = "():email"
===========================================
[  749] characters------------------------- 1
[  750] new: content----------------------- 2
[  751] ----------------------------------- 12,"four@foo.com"
===========================================
[  764] endElement------------------------- 5
===========================================
[  765] characters------------------------- 1
[  766] old: content----------------------- 4 = "\n    "
===========================================
[  767] emptyElementAttributes------------- 8
[  768] old: qualified element name-------- 8 = "():link"
[  769] number of attributes--------------- 1
[  770] old: qualified attribute name------ 5 = "():manager"
[  771] old: attribute value--------------- 3 = "Big.Boss"
===========================================
[  772] characters------------------------- 1
[  773] old: content----------------------- 4 = "\n    "
===========================================
[  774] startElement----------------------- 2
[  775] old: qualified element name-------- 7 = "(http://foo.com/hr/empType)empType:employeeType"
===========================================
[  776] characters------------------------- 1
[  777] new: content----------------------- 2
[  778] ----------------------------------- 20,"\n      contract\n    "
===========================================
[  799] endElement------------------------- 5
===========================================
[  800] characters------------------------- 1
[  801] old: content----------------------- 11 = "\n  "
===========================================
[  802] endElement------------------------- 5
===========================================
[  803] characters------------------------- 1
[  804] old: content----------------------- 12 = "\n\n  "
===========================================
[  805] startElementAttributes------------- 3
[  806] old: qualified element name-------- 4 = "():person"
[  807] number of attributes--------------- 1
[  808] old: qualified attribute name------ 3 = "():id"
[  809] new: attribute value--------------- 2
[  810] ----------------------------------- 11,"five.worker"
===========================================
[  822] characters------------------------- 1
[  823] old: content----------------------- 4 = "\n    "
===========================================
[  824] startElementNamespaceDecls--------- 4
[  825] number of NS Maps------------------ 1
[  826] old: namespace map----------------- 4 = "xmlns=http://foo.com/hr"
[  827] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
[  828] number of attributes--------------- 0
===========================================
[  829] characters------------------------- 1
[  830] old: content----------------------- 5 = "\n      "
===========================================
[  831] startElement----------------------- 2
[  832] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  833] characters------------------------- 1
[  834] old: content----------------------- 13 = "Worker"
===========================================
[  835] endElement------------------------- 5
===========================================
[  836] characters------------------------- 1
[  837] old: content----------------------- 5 = "\n      "
===========================================
[  838] startElement----------------------- 2
[  839] old: qualified element name-------- 5 = "(http://foo.com/hr):name"
===========================================
[  840] characters------------------------- 1
[  841] new: content----------------------- 2
[  842] ----------------------------------- 4,"Five"
===========================================
[  847] endElement------------------------- 5
===========================================
[  848] characters------------------------- 1
[  849] old: content----------------------- 4 = "\n    "
===========================================
[  850] endElement------------------------- 5
===========================================
[  851] comment---------------------------- 14
[  852] old: comment----------------------- 3 = " Email Contact information "
===========================================
[  853] characters------------------------- 1
[  854] old: content----------------------- 8 = "\n    \n    "
===========================================
[  855] startElement----------------------- 2
[  856] old: qualified element name-------- 6 = "():email"
===========================================
[  857] characters------------------------- 1
[  858] new: content----------------------- 2
[  859] ----------------------------------- 12,"five@foo.com"
===========================================
[  872] endElement------------------------- 5
===========================================
[  873] characters------------------------- 1
[  874] old: content----------------------- 4 = "\n    "
===========================================
[  875] emptyElementAttributes------------- 8
[  876] old: qualified element name-------- 8 = "():link"
[  877] number of attributes--------------- 1
[  878] old: qualified attribute name------ 5 = "():manager"
[  879] old: attribute value--------------- 3 = "Big.Boss"
===========================================
[  880] characters------------------------- 1
[  881] old: content----------------------- 4 = "\n    "
===========================================
[  882] startElement----------------------- 2
[  883] old: qualified element name-------- 7 = "(http://foo.com/hr/empType)empType:employeeType"
===========================================
[  884] characters------------------------- 1
[  885] old: content----------------------- 16 = "\n      regular\n    "
===========================================
[  886] endElement------------------------- 5
===========================================
[  887] characters------------------------- 1
[  888] old: content----------------------- 11 = "\n  "
===========================================
[  889] endElement------------------------- 5
===========================================
[  890] characters------------------------- 1
[  891] new: content----------------------- 2
[  892] ----------------------------------- 2,"\n\n"
===========================================
[  895] endElement------------------------- 5
[  896] End of file------------------------ 0
======================================================================

personal.xml

Here is the XML document that the above is based on.

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE personnel SYSTEM "personal.dtd">

<personnel xmlns:empType="http://foo.com/hr/empType">

 

  <!-- Email Contact information -->

  <person id="Big.Boss">

    <name xmlns="http://foo.com/hr" >

      <family>Boss</family>

      <given>Big</given>

    </name>

    <!-- Email Contact information -->

    <email>chief@foo.com</email>

    <empType:employeeType>

      &mgr;

    </empType:employeeType>

    <link subordinates="one.worker two.worker three.worker four.worker five.worker"/>

  </person>

 

  <person id="one.worker">

    <name xmlns="http://foo.com/hr">

      <family>Worker</family>

      <given>One</given>

    </name>

    <!-- Email Contact information -->

    <email>one@foo.com</email>

    <link manager="Big.Boss"/>

    <empType:employeeType>

      &reg;

    </empType:employeeType>

  </person>

 

  <person id="two.worker">

    <name xmlns="http://foo.com/hr">

      <family>Worker</family>

      <given>Two</given>

    </name>

    <!-- Email Contact information -->

    <email>two@foo.com</email>

    <link manager="Big.Boss"/>

    <empType:employeeType>

      &reg;

    </empType:employeeType>

  </person>

 

  <person id="three.worker">

    <name xmlns="http://foo.com/hr">

      <family>Worker</family>

      <given>Three</given>

    </name>

    <!-- Email Contact information -->

    <email>three@foo.com</email>

    <link manager="Big.Boss"/>

    <empType:employeeType>

      &reg;

    </empType:employeeType>

  </person>

 

  <person id="four.worker">

    <name xmlns="http://foo.com/hr">

      <family>Worker</family>

      <given>Four</given>

    </name>

    <!-- Email Contact information -->

    <email>four@foo.com</email>

    <link manager="Big.Boss"/>

    <empType:employeeType>

      &con;

    </empType:employeeType>

  </person>

 

  <person id="five.worker">

    <name xmlns="http://foo.com/hr">

      <family>Worker</family>

      <given>Five</given>

    </name>

    <!-- Email Contact information -->

    <email>five@foo.com</email>

    <link manager="Big.Boss"/>

    <empType:employeeType>

      &reg;

    </empType:employeeType>

  </person>

 

</personnel>