en fr

Sysquake Pro – Table of Contents

Sysquake – Table of Contents

Sysquake for LaTeX – Table of Contents

XML

This section describes functions which import XML data. Two separate sets of functions implement two approaches to parse XML data:

  • Document Object Model (DOM): XML is loaded entirely in memory from a file (xmlread) or a character string (xmlreadstring). Additional functions permit to traverse the DOM tree and to get its structure, the element names and attributes and the text.
  • Simple API for XML (SAX): XML is parsed from a file descriptor (saxnew) and events are generated for document start and end, element start and end, and character sequences.

With both approaches, creation and modification of the document are not possible.

DOM

Two opaque types are implemented: DOM nodes (including document, element and text nodes), and attribute lists. A document node object is created with the functions xmlreadstring (XML string) or xmlread (XML file or other input channel). Other DOM nodes and attribute lists are obtained by using DOM methods and properties.

Methods and properties of DOM node objects

MethodDescription
fieldnames List of property names
getElementById Get a node specified by id
getElementsByTagName Get a list of all descendent nodes of the given tag name
subsref Get a property value
xmlrelease Release a document node
PropertyDescription
attributes Attribute list (opaque object)
childElementCount Number of element children
childNodes List of child nodes
children List of element child nodes
depth Node depth in document tree
documentElement Root element of a document node
firstChild First child node
firstElementChild First element child node
lastChild Last child node
lastElementChild Last element child node
line Line number in original XML document
nextElementSibling Next sibling element node
nextSibling Next sibling node
nodeName Node tag name, '#document', or '#text'
nodeValue Text of a text node
offset Offset in original XML document
ownerDocument Owner DOM document node
parentNode Parent node
previousElementSibling Previous sibling element node
previousSibling Previous sibling node
textContent Concatenated text of all descendent text nodes
xml XML representation, including all children

A document node object is released with the xmlrelease method. Once a document node object is released, all associated node objects become invalid. Attribute lists and native LME types (strings and numbers) remain valid.

Methods and properties of DOM attribute list objects

MethodDescription
fieldnames List of attribute names
length Number of attributes
subsref Get an attribute

Properties of attribute lists are the attribute values as strings. Properties whose name is compatible with LME field name syntax can be retrieved with the dot syntax, such as attr.id. For names containing invalid characters, such as accented letters, or to enumerate unknown attributes, attributes can be accessed with indexing, with either parenthesis or braces. The result is a structure with two fields name and value.

SAX

XML is read from a file descriptor, typically obtained with fopen. The next event is retrieved with saxnext which returns its description in a structure.

getElementById

Get a node specified by id.

Syntax

node = getElementById(root, id)

Description

getElementById(root,id) gets the node which is a descendant of node root and whose attribute id matches argument id. It throws an error if the node is not found.

In valid XML documents, every id must be unique. If the document is invalid, the first element with the specified id is obtained.

See also

xmlread, getElementsByTagName

getElementsByTagName

Get a list of all descendent nodes of the given tag name.

Syntax

node = getElementsByTagName(root, name)

Description

getElementsByTagName(root,name) collects a list of all the element nodes which are direct or indirect descendants of node root and whose name matches argument name.

Examples

doc = xmlreal('<p>Abc <b>de</b> <i>fg <b>hijk</b></i></p>');
b = getElementsByTagName(doc, 'b')
  b = 
    {DOMNode,DOMNode}
b2 = b{2}.xml
  b2 =
    <b>hijk</b>
xmlrelease(doc);

See also

xmlread, getElementById

saxcurrentline

Get current line number of SAX parser.

Syntax

n = saxcurrentline(sax)

Description

saxcurrentline(sax) gets the current line of the XML file parsed by the SAX parser passed as argument. It can also be used after an error.

See also

saxcurrentpos, saxnew, saxnext

saxcurrentpos

Get current position in input stream of SAX parser.

Syntax

n = saxcurrentpos(sax)

Description

saxcurrentpos(sax) gets the current position of the XML file parsed by the SAX parser passed as argument (the number of bytes consumed thus far). It can also be used after an error.

The value given by saxcurrentpos differs from the result of ftell on the file descriptor, because the SAX parser input is buffered.

See also

saxcurrentline, saxnew, saxnext

saxnew

Create a new SAX parser.

Syntax

sax = saxnew(fd)
sax = saxnew(fd, Trim=t, HTML=h)

Description

saxnew(fd) create a new SAX parser to parse XML from file descriptor fd. The parser is an opaque (non-numeric) type. Once it is not needed anymore, it should be released with the saxrelease function.

Named argument Trim (a boolean value) specifies if white spaces are trimmed around tags. The default value is false.

Named argument HTML (a boolean value) specifies HTML mode. The default value is false (XML mode). HTML mode has the following differences with respect to XML mode:

  • unknown entities and less-than characters not followed by tag names are considered as plain text;
  • attribute values can be missing (same as attribute names) or unquoted;
  • tag and attribute names are converted to lowercase;
  • text following a start script tag is not interpreted until the closing script tag (the litteral character sequence </script>, possibly with spaces before >).

This can be used for the lowest level of a rudimentary HTML parser.

Example

fd = fopen('data.xml');
sax = saxnew(fd);
while true
  ev = saxnext(sax);
  switch ev.event
    case 'docBegin'
      // beginning of document
    case 'docEnd'
      // end of document
      break;
    case 'elBegin'
      // beginning of element ev.tag with attr ev.attr
    case 'elEnd'
      // end of element ev.tag
    case 'elEmpty'
      // empty element ev.tag with attr ev.attr
    case 'text'
      // text element ev.text
  end
end
saxrelease(sax);
fclose(fd);

See also

saxrelease, saxnext, xmlread

saxnext

Get next SAX event.

Syntax

event = saxnext(sax)

Description

saxnext(sax) gets the next SAX event and returns its description in a structure. Argument sax is the SAX parser created with saxnew.

The event structure contains the following fields:

event
Event type as a string: 'docBegin', 'docEnd', 'elBegin', 'elEnd', 'elEmpty', or 'text'.
tag
For 'elBegin', 'elEnd' and 'elEmpty', element tag.
attr
For 'elBegin' and 'elEmpty', structure array containing the element attributes. Each attribute is defined by two string fields, name and value.
text
For 'text', text string.

See also

saxnew, saxcurrentline

saxrelease

Release a SAX parser.

Syntax

saxrelease(sax)

Description

saxrelease(sax) releases the SAX parser sax created with saxnew.

See also

saxnew

xmlread

Load a DOM document object from a file descriptor.

Syntax

doc = xmlread(fd)

Description

xmlread(fd) loads XML to a new DOM document node object by reading a file descriptor until the end, and returns a new document node object. The file descriptor can be closed before the document node object is used. Once the document is not needed anymore, it should be released with the xmlrelease method.

Example

Load an XML file 'doc.xml' (this assumes support for files with the function fopen).

fd = fopen('doc.xml');
doc = xmlread(fd);
fclose(fd);
root = doc.documentElement;
...
xmlrelease(doc);

See also

xmlreadstring, xmlrelease, saxnew

xmlreadstring

Parse an XML string into a DOM document object.

Syntax

doc = xmlreadstring(str)

Description

xmlreadstring(str) parses XML from a string to a new DOM document node object. Once the document is not needed anymore, it should be released with the xmlrelease method.

Examples

xml = '<a>one <b id="x">two</b> <c id="y" num="3">three</c></a>';
doc = xmlreadstring(xml)
  doc =
  DOM document
root = doc.documentElement;
root.nodeName
  ans =
    a
root.childNodes{1}.nodeValue
  ans =
    one
root.childNodes{2}.xml
  ans =
    <b id="x">two</b> 
a = root.childNodes{2}.attributes
  a =
  DOM attributes (1 item)
a.id
  x
getElementById(doc,'y').xml
  <c id="y" num="3">three</c>
xmlrelease(doc);

See also

xmlread, xmlrelease

xmlrelease

Release a DOM document object.

Syntax

xmlrelease(doc)

Description

xmlrelease(doc) releases a DOM document object. All DOM node objects obtained directly or indirectly from it become invalid.

Releasing a node which is not a document has no effect.

See also

xmlreadstring, xmlread