XML parsing

../../_images/33888714601_a1f7d020a2_k_d.jpg

untangle

untangle is a simple library which takes an XML document and returns a Python object which mirrors the nodes and attributes in its structure.

For example, an XML file like this:

<?xml version="1.0"?>
<root>
    <child name="child1">
</root>

can be loaded like this:

import untangle
obj = untangle.parse('path/to/file.xml')

and then you can get the child element’s name attribute like this:

obj.root.child['name']

untangle also supports loading XML from a string or a URL.

xmltodict

xmltodict is another simple library that aims at making XML feel like working with JSON.

An XML file like this:

<mydocument has="an attribute">
  <and>
    <many>elements</many>
    <many>more elements</many>
  </and>
  <plus a="complex">
    element as well
  </plus>
</mydocument>

can be loaded into a Python dict like this:

import xmltodict

with open('path/to/file.xml') as fd:
    doc = xmltodict.parse(fd.read())

and then you can access elements, attributes, and values like this:

doc['mydocument']['@has'] # == u'an attribute'
doc['mydocument']['and']['many'] # == [u'elements', u'more elements']
doc['mydocument']['plus']['@a'] # == u'complex'
doc['mydocument']['plus']['#text'] # == u'element as well'

xmltodict also lets you roundtrip back to XML with the unparse function, has a streaming mode suitable for handling files that don’t fit in memory, and supports XML namespaces.

xmlschema

xmlschema provides support for using XSD-Schemas in Python. Unlike other XML libraries, automatic type parsing is available, so f.e. if the schema defines an element to be of type int, the parsed dict will contain also an int value for that element. Moreover the library supports automatic and explicit validation of XML documents against a schema.

from xmlschema import XMLSchema, etree_tostring

# load a XSD schema file
schema = XMLSchema("your_schema.xsd")

# validate against the schema
schema.validate("your_file.xml")

# or
schema.is_valid("your_file.xml")

# decode a file
data = schmema.decode("your_file.xml")

# encode to string
s = etree_tostring(schema.encode(data))