The engine has been extended to support CSV files and allow users to browse the data as well as do stream processing on top of comma-separated payloads. Using the CSV decoder is as simple as writing the following code:

FROM  temperature_values

It is expected that CSV payload comes as comma-separated list of values. Therefore, the payload resulted from CSV files has the logical structure of the following JSON format:


Stream processing on CSV payloads is possible as well. Here is an example of joining two topics containing CSV data:

 -- LEFT JOIN two topics with CSV data
INSERT INTO target_topic

     cast(o.content[0] as long) as orderNumber,
     o.content[4] as status,
     SUM(cast(od.content[2] as int) * cast(od.content[3] as decimal)) total
 FROM  orders AS o LEFT JOIN orders_details AS od ON o.content[0] = od.content[0]
 GROUP BY tumble(2,s), o.content[0]

CSV rules

With CSV there is no type inferred for a field. Therefore all fields are considered to be STRING values. Using the CAST function, each field can be marshaled to a different data type, for example, INT, LONG, DOUBLE or DECIMAL. To be able to reference the values in the incoming payload, the user is expected to provide the index in the incoming data. The index is always 0 bound; this means that in order to retrieve the first element, content[0] has to be provided. Making the data accessible is requiring the presence of the content keyword.

The next SQL code computes the total value for a given order entry.

SELECT CAST(content[1] as decimal) * CAST (content[2] as INT) as total
FROM  `order_item`