XML

This section describes functions which import XML data. Two separate sets of functions implement two approaches to parse XML data:

Document Object Model (DOM): XML is loaded entirely in memory from a file (xmlread) or a character string (xmlreadstring). Additional functions permit to traverse the DOM tree and to get its structure, the element names and attributes and the text.
Simple API for XML (SAX): XML is parsed from a file descriptor (saxnew) and events are generated for document start and end, element start and end, and character sequences.

With both approaches, creation and modification of the document are not possible.

DOM

Two opaque types are implemented: DOM nodes (including document, element and text nodes), and attribute lists. A document node object is created with the functions xmlreadstring (XML string) or xmlread (XML file or other input channel). Other DOM nodes and attribute lists are obtained by using DOM methods and properties.

Methods and properties of DOM node objects

Method	Description
`fieldnames`	List of property names
`getElementById`	Get a node specified by id
`getElementsByTagName`	Get a list of all descendent nodes of the given tag name
`subsref`	Get a property value
`xmlrelease`	Release a document node

Property	Description
`attributes`	Attribute list (opaque object)
`childElementCount`	Number of element children
`childNodes`	List of child nodes
`children`	List of element child nodes
`depth`	Node depth in document tree
`documentElement`	Root element of a document node
`firstChild`	First child node
`firstElementChild`	First element child node
`lastChild`	Last child node
`lastElementChild`	Last element child node
`line`	Line number in original XML document
`nextElementSibling`	Next sibling element node
`nextSibling`	Next sibling node
`nodeName`	Node tag name, `'#document'`, or `'#text'`
`nodeValue`	Text of a text node
`offset`	Offset in original XML document
`ownerDocument`	Owner DOM document node
`parentNode`	Parent node
`previousElementSibling`	Previous sibling element node
`previousSibling`	Previous sibling node
`textContent`	Concatenated text of all descendent text nodes
`xml`	XML representation, including all children

A document node object is released with the xmlrelease method. Once a document node object is released, all associated node objects become invalid. Attribute lists and native LME types (strings and numbers) remain valid.

Methods and properties of DOM attribute list objects

Method	Description
`fieldnames`	List of attribute names
`length`	Number of attributes
`subsref`	Get an attribute

Properties of attribute lists are the attribute values as strings. Properties whose name is compatible with LME field name syntax can be retrieved with the dot syntax, such as attr.id. For names containing invalid characters, such as accented letters, or to enumerate unknown attributes, attributes can be accessed with indexing, with either parenthesis or braces. The result is a structure with two fields name and value.

SAX

XML is read from a file descriptor, typically obtained with fopen. The next event is retrieved with saxnext which returns its description in a structure.

getElementById

Get a node specified by id.

Syntax

node = getElementById(root, id)

Description

getElementById(root,id) gets the node which is a descendant of node root and whose attribute id matches argument id. It throws an error if the node is not found.

In valid XML documents, every id must be unique. If the document is invalid, the first element with the specified id is obtained.

getElementsByTagName

Get a list of all descendent nodes of the given tag name.

Syntax

node = getElementsByTagName(root, name)

Description

getElementsByTagName(root,name) collects a list of all the element nodes which are direct or indirect descendants of node root and whose name matches argument name.

Examples

doc = xmlreal('<p>Abc <b>de</b> <i>fg <b>hijk</b></i></p>');
b = getElementsByTagName(doc, 'b')
  b = 
    {DOMNode,DOMNode}
b2 = b{2}.xml
  b2 =
    <b>hijk</b>
xmlrelease(doc);

saxcurrentline

Get current line number of SAX parser.

Syntax

n = saxcurrentline(sax)

Description

saxcurrentline(sax) gets the current line of the XML file parsed by the SAX parser passed as argument. It can also be used after an error.

saxcurrentpos

Get current position in input stream of SAX parser.

Syntax

n = saxcurrentpos(sax)

Description

saxcurrentpos(sax) gets the current position of the XML file parsed by the SAX parser passed as argument (the number of bytes consumed thus far). It can also be used after an error.

The value given by saxcurrentpos differs from the result of ftell on the file descriptor, because the SAX parser input is buffered.

saxnew

Create a new SAX parser.

Syntax

sax = saxnew(fd)
sax = saxnew(fd, Trim=t, HTML=h)

Description

saxnew(fd) create a new SAX parser to parse XML from file descriptor fd. The parser is an opaque (non-numeric) type. Once it is not needed anymore, it should be released with the saxrelease function.

Named argument Trim (a boolean value) specifies if white spaces are trimmed around tags. The default value is false.

Named argument HTML (a boolean value) specifies HTML mode. The default value is false (XML mode). HTML mode has the following differences with respect to XML mode:

unknown entities and less-than characters not followed by tag names are considered as plain text;
attribute values can be missing (same as attribute names) or unquoted;
tag and attribute names are converted to lowercase;
text following a start script tag is not interpreted until the closing script tag (the litteral character sequence </script>, possibly with spaces before >).

This can be used for the lowest level of a rudimentary HTML parser.

Example

fd = fopen('data.xml');
sax = saxnew(fd);
while true
  ev = saxnext(sax);
  switch ev.event
    case 'docBegin'
      // beginning of document
    case 'docEnd'
      // end of document
      break;
    case 'elBegin'
      // beginning of element ev.tag with attr ev.attr
    case 'elEnd'
      // end of element ev.tag
    case 'elEmpty'
      // empty element ev.tag with attr ev.attr
    case 'text'
      // text element ev.text
  end
end
saxrelease(sax);
fclose(fd);

saxnext

Get next SAX event.

Syntax

event = saxnext(sax)

Description

saxnext(sax) gets the next SAX event and returns its description in a structure. Argument sax is the SAX parser created with saxnew.

The event structure contains the following fields:

event: Event type as a string: 'docBegin', 'docEnd', 'elBegin', 'elEnd', 'elEmpty', or 'text'.
tag: For 'elBegin', 'elEnd' and 'elEmpty', element tag.
attr: For 'elBegin' and 'elEmpty', structure array containing the element attributes. Each attribute is defined by two string fields, name and value.
text: For 'text', text string.

saxrelease

Release a SAX parser.

Syntax

saxrelease(sax)

Description

saxrelease(sax) releases the SAX parser sax created with saxnew.

xmlread

Load a DOM document object from a file descriptor.

Syntax

doc = xmlread(fd)

Description

xmlread(fd) loads XML to a new DOM document node object by reading a file descriptor until the end, and returns a new document node object. The file descriptor can be closed before the document node object is used. Once the document is not needed anymore, it should be released with the xmlrelease method.

Example

Load an XML file 'doc.xml' (this assumes support for files with the function fopen).

fd = fopen('doc.xml');
doc = xmlread(fd);
fclose(fd);
root = doc.documentElement;
...
xmlrelease(doc);

xmlreadstring

Parse an XML string into a DOM document object.

Syntax

doc = xmlreadstring(str)

Description

xmlreadstring(str) parses XML from a string to a new DOM document node object. Once the document is not needed anymore, it should be released with the xmlrelease method.

Examples

xml = '<a>one <b id="x">two</b> <c id="y" num="3">three</c></a>';
doc = xmlreadstring(xml)
  doc =
  DOM document
root = doc.documentElement;
root.nodeName
  ans =
    a
root.childNodes{1}.nodeValue
  ans =
    one
root.childNodes{2}.xml
  ans =
    <b id="x">two</b> 
a = root.childNodes{2}.attributes
  a =
  DOM attributes (1 item)
a.id
  x
getElementById(doc,'y').xml
  <c id="y" num="3">three</c>
xmlrelease(doc);

xmlrelease

Release a DOM document object.

Syntax

xmlrelease(doc)

Description

xmlrelease(doc) releases a DOM document object. All DOM node objects obtained directly or indirectly from it become invalid.

Releasing a node which is not a document has no effect.

XML

DOM

Methods and properties of DOM node objects

Methods and properties of DOM attribute list objects

SAX

getElementById

Syntax

Description

See also

getElementsByTagName

Syntax

Description

Examples

See also

saxcurrentline

Syntax

Description

See also

saxcurrentpos

Syntax

Description

See also

saxnew

Syntax

Description

Example

See also

saxnext

Syntax

Description

See also

saxrelease

Syntax

Description

See also

xmlread

Syntax

Description

Example

See also

xmlreadstring

Syntax

Description

Examples

See also

xmlrelease

Syntax

Description

See also