XML

There is quite a bit of software written for the financial industry where the data moved around is stored in XML format. To open up the powerful LSQL engine to those environments, XML support has been added. To instruct the engine to use the XML decoder is as simple as writing the following:

SELECT *
FROM  `swap_trades`

Supporting XML has been added through the stream processing side of LSQL. Here is an example of joining two topics containing XML data:

-- LEFT JOIN two topics with XML data
INSERT INTO `toTopic`
SELECT STREAM
    od.orderNumber
    , od.productCode
    , od.quantityOrdered
    , od.priceEach
    , od.orderLineNumber
    , concat(od.productCode,'-',p.productName) as productName
FROM  `OrdersDetailsTopic` AS od
    LEFT JOIN `ProductTopic` AS p
        ON p.productCode = od.productCode
GROUP BY TUMBLE(4,s)

XML rules

With XML there is no type inferred for a field. Therefore all fields are STRING; using the CAST function the field can be marshaled to a different type i.e. INT/LONG/DOUBLE/DECIMAL. The next LSQL code computes the total value for a given order entry.

SELECT CAST(priceEach as decimal)  * CAST ( quantityOrdered as INT) as total
FROM  `order_item`

XML contains XML elements. An XML element is everything from (including) the element’s start tag to (including) the element’s end tag.

<price>29.99</price>

Taking an example let’s see how the elements can be addressed via LSQL. Imagine a topic containing the book records like the one below:

<book>
    <title>Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
</book>

To select all fields the following LSQL code is required:

SELECT  book.title,
        book.author,
        cast(book.year as int) as year,
        cast(book.price as decimal) as price
FROM `books`

But XML contains XML attributes. Attributes are designed to contain data related to a specific element.

<note date="2008-01-10">some text</note>

Taking the book example earlier let’s add a category attribute:

<book category="children">
    <title>Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
</book>

Retrieving the value for the category value the following code should be used:

SELECT   book.category,
         book.title,
         book.author,
         cast(book.year as int) as year,
         cast(book.price as decimal) as price
FROM  `books`

XML can contain attributes and elements with the same name.

<book category="children">
    <title>Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
    <category>novel</category>
</book>

In this scenario, to address the category field array index needs to be used.

SELECT   book.category[0] as firstCategory,
         book.category[1] as secondCategory,
         book.title,
         book.title,
         book.author,
         cast(book.year as int) as year,
         cast(book.price as decimal) as price
FROM  `books`

Having nested elements with the same name is quite common. Consider this XML to represent a list of books:

<bookstore>
  <book category="children">
    <title>Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="web">
    <title>Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>

Using the fields from the second book entry is as simple as doing this:

SELECT   bookstore.book[0].category as category,
         cast(bookstore.book[1].price as decimal) as price
FROM  `books`

Earlier an example was given for a note entry. It contains a date attribute and the XML element retains the text associated with the node. Here is the code to address the attribute but also the XML element content

SELECT  note.date,
        node._content
FROM `notes`

LSQL streaming with XML

XML is just another payload type supported by the Lenses engine. Therefore all the rules discussed earlier on AVRO, JSON apply for XML as well. One thing to keep in mind with XML, each field is a string. Always use the CAST function to convert to the type required.

XML-LSQL examples

<entry va="2"><va>30</va></entry>
SELECT entry.va[0] as firstVa, --returns 2
        entry.va[1] as secondVa --returns 30
FROM `topic`