Sysquake Pro – Table of Contents
Sysquake for LaTeX – Table of Contents
XML
This section describes functions which import XML data. Two separate sets of functions implement two approaches to parse XML data:
- Document Object Model (DOM): XML is loaded entirely in memory from a file (xmlread) or a character string (xmlreadstring). Additional functions permit to traverse the DOM tree and to get its structure, the element names and attributes and the text.
- Simple API for XML (SAX): XML is parsed from a file descriptor (saxnew) and events are generated for document start and end, element start and end, and character sequences.
With both approaches, creation and modification of the document are not possible.
DOM
Two opaque types are implemented: DOM nodes (including document, element and text nodes), and attribute lists. A document node object is created with the functions xmlreadstring (XML string) or xmlread (XML file or other input channel). Other DOM nodes and attribute lists are obtained by using DOM methods and properties.
Methods and properties of DOM node objects
Method | Description |
---|---|
fieldnames | List of property names |
getElementById | Get a node specified by id |
getElementsByTagName | Get a list of all descendent nodes of the given tag name |
subsref | Get a property value |
xmlrelease | Release a document node |
Property | Description |
---|---|
attributes | Attribute list (opaque object) |
childElementCount | Number of element children |
childNodes | List of child nodes |
children | List of element child nodes |
depth | Node depth in document tree |
documentElement | Root element of a document node |
firstChild | First child node |
firstElementChild | First element child node |
lastChild | Last child node |
lastElementChild | Last element child node |
line | Line number in original XML document |
nextElementSibling | Next sibling element node |
nextSibling | Next sibling node |
nodeName | Node tag name, '#document', or '#text' |
nodeValue | Text of a text node |
offset | Offset in original XML document |
ownerDocument | Owner DOM document node |
parentNode | Parent node |
previousElementSibling | Previous sibling element node |
previousSibling | Previous sibling node |
textContent | Concatenated text of all descendent text nodes |
xml | XML representation, including all children |
A document node object is released with the xmlrelease method. Once a document node object is released, all associated node objects become invalid. Attribute lists and native LME types (strings and numbers) remain valid.
Methods and properties of DOM attribute list objects
Method | Description |
---|---|
fieldnames | List of attribute names |
length | Number of attributes |
subsref | Get an attribute |
Properties of attribute lists are the attribute values as strings. Properties whose name is compatible with LME field name syntax can be retrieved with the dot syntax, such as attr.id. For names containing invalid characters, such as accented letters, or to enumerate unknown attributes, attributes can be accessed with indexing, with either parenthesis or braces. The result is a structure with two fields name and value.
SAX
XML is read from a file descriptor, typically obtained with fopen. The next event is retrieved with saxnext which returns its description in a structure.
getElementById
Get a node specified by id.
Syntax
node = getElementById(root, id)
Description
getElementById(root,id) gets the node which is a descendant of node root and whose attribute id matches argument id. It throws an error if the node is not found.
In valid XML documents, every id must be unique. If the document is invalid, the first element with the specified id is obtained.
See also
getElementsByTagName
Get a list of all descendent nodes of the given tag name.
Syntax
node = getElementsByTagName(root, name)
Description
getElementsByTagName(root,name) collects a list of all the element nodes which are direct or indirect descendants of node root and whose name matches argument name.
Examples
doc = xmlreal('<p>Abc <b>de</b> <i>fg <b>hijk</b></i></p>'); b = getElementsByTagName(doc, 'b') b = {DOMNode,DOMNode} b2 = b{2}.xml b2 = <b>hijk</b> xmlrelease(doc);
See also
saxcurrentline
Get current line number of SAX parser.
Syntax
n = saxcurrentline(sax)
Description
saxcurrentline(sax) gets the current line of the XML file parsed by the SAX parser passed as argument. It can also be used after an error.
See also
saxcurrentpos, saxnew, saxnext
saxcurrentpos
Get current position in input stream of SAX parser.
Syntax
n = saxcurrentpos(sax)
Description
saxcurrentpos(sax) gets the current position of the XML file parsed by the SAX parser passed as argument (the number of bytes consumed thus far). It can also be used after an error.
The value given by saxcurrentpos differs from the result of ftell on the file descriptor, because the SAX parser input is buffered.
See also
saxcurrentline, saxnew, saxnext
saxnew
Create a new SAX parser.
Syntax
sax = saxnew(fd) sax = saxnew(fd, Trim=t, HTML=h)
Description
saxnew(fd) create a new SAX parser to parse XML from file descriptor fd. The parser is an opaque (non-numeric) type. Once it is not needed anymore, it should be released with the saxrelease function.
Named argument Trim (a boolean value) specifies if white spaces are trimmed around tags. The default value is false.
Named argument HTML (a boolean value) specifies HTML mode. The default value is false (XML mode). HTML mode has the following differences with respect to XML mode:
- unknown entities and less-than characters not followed by tag names are considered as plain text;
- attribute values can be missing (same as attribute names) or unquoted;
- tag and attribute names are converted to lowercase;
- text following a start script tag is not interpreted until the closing script tag (the litteral character sequence </script>, possibly with spaces before >).
This can be used for the lowest level of a rudimentary HTML parser.
Example
fd = fopen('data.xml'); sax = saxnew(fd); while true ev = saxnext(sax); switch ev.event case 'docBegin' // beginning of document case 'docEnd' // end of document break; case 'elBegin' // beginning of element ev.tag with attr ev.attr case 'elEnd' // end of element ev.tag case 'elEmpty' // empty element ev.tag with attr ev.attr case 'text' // text element ev.text end end saxrelease(sax); fclose(fd);
See also
saxnext
Get next SAX event.
Syntax
event = saxnext(sax)
Description
saxnext(sax) gets the next SAX event and returns its description in a structure. Argument sax is the SAX parser created with saxnew.
The event structure contains the following fields:
- event
- Event type as a string: 'docBegin', 'docEnd', 'elBegin', 'elEnd', 'elEmpty', or 'text'.
- tag
- For 'elBegin', 'elEnd' and 'elEmpty', element tag.
- attr
- For 'elBegin' and 'elEmpty', structure array containing the element attributes. Each attribute is defined by two string fields, name and value.
- text
- For 'text', text string.
See also
saxrelease
Release a SAX parser.
Syntax
saxrelease(sax)
Description
saxrelease(sax) releases the SAX parser sax created with saxnew.
See also
xmlread
Load a DOM document object from a file descriptor.
Syntax
doc = xmlread(fd)
Description
xmlread(fd) loads XML to a new DOM document node object by reading a file descriptor until the end, and returns a new document node object. The file descriptor can be closed before the document node object is used. Once the document is not needed anymore, it should be released with the xmlrelease method.
Example
Load an XML file 'doc.xml' (this assumes support for files with the function fopen).
fd = fopen('doc.xml'); doc = xmlread(fd); fclose(fd); root = doc.documentElement; ... xmlrelease(doc);
See also
xmlreadstring, xmlrelease, saxnew
xmlreadstring
Parse an XML string into a DOM document object.
Syntax
doc = xmlreadstring(str)
Description
xmlreadstring(str) parses XML from a string to a new DOM document node object. Once the document is not needed anymore, it should be released with the xmlrelease method.
Examples
xml = '<a>one <b id="x">two</b> <c id="y" num="3">three</c></a>'; doc = xmlreadstring(xml) doc = DOM document root = doc.documentElement; root.nodeName ans = a root.childNodes{1}.nodeValue ans = one root.childNodes{2}.xml ans = <b id="x">two</b> a = root.childNodes{2}.attributes a = DOM attributes (1 item) a.id x getElementById(doc,'y').xml <c id="y" num="3">three</c> xmlrelease(doc);
See also
xmlrelease
Release a DOM document object.
Syntax
xmlrelease(doc)
Description
xmlrelease(doc) releases a DOM document object. All DOM node objects obtained directly or indirectly from it become invalid.
Releasing a node which is not a document has no effect.