SQL engine

For querying existing data in the middleware, Lenses SQL engine can access the entire data, as it exists at execution time. With a messaging system like Apache Kafka, new data can be added continuously, and as a query needs to terminate, it will only address the data as a point in time query.

A typical use case is identifying a specific message. Identifying a specific payment transaction which your system has processed, or maybe identifying all the thermostats readings for a specific customer if you are working for an energy provider, should be achieved quickly. The usage of SQL is not limited to these scenarios alone. More often than not, a user might need to count all the transactions processed within a given time window - a specific day or within a specific hour. Counting the transactions processed by the company in the previous business day or even summing the total amount of sales made on a given day should take a few lines of SQL code. Last but not least, the user might need to set a continuous query to join two streams of data. The person can use the SQL engine first to try the query on a fixed set of data before applying it as a streaming query via the SQL streaming engine.

For the SQL engine, the term table is equivalent to Kafka topic. The two are used interchangeably.

Before learning about the SQL statements support, users are encouraged to familiarize themselves with a few concepts first.

What’s a Message

A single entry in a Kafka topic is called a message. Earlier, both the topic schema and the Kafka record structure were introduced. Lenses considers a message to have three distinct components Key, Value, and Meta(data). The structure emulates more or less the underlying Kafka message structure. The Key contains the underlying Kafka message Key (primitive or composite), Value (primitive or composite) and _Meta (retains the underlying record extra information, which are things like partition, timestamp, offset and so on.).

The two rules that are listed below are required for understanding how the engine computes and returns a record:

  • When querying a table with a wildcard *, the projection of the records will be returned as follows:
{
    "Key":   ...
    "Value": ...
    "_Meta": ...
}

When the Key or Value part is not present, it means that the underlying Kafka record equivalent is null.

Note

A wildcard SELECT * statement might not provide content for the Key and Value parts. This indicates that the underlying Kafka record Key and Value are null.

  • A non-wildcard statement returns only the Value component and if no aggregates or joins are involved the _Meta(data) as well. At this point, the engine returns the record parts the user has requested only. In such a case the resulting Key is null. However, this does not mean that the underlying record Key is null.

Note

When specifying specific fields to be returned in a SELECT statement, only those fields (key / value / header / metadata) are returned.

Selecting Fields

Having the data and yet not being able to pick specific parts of it would not provide much use to the user. Being able to instruct the engine to return only those records where the user identifier is a specific value is mandatory. Since messages can contain nested elements and embedded arrays, a few operators are needed. The . operator is used to refer to children, and the [] operator is used for referring to an element in an array. You can use a combination of these two operators to access data of any depth.

SELECT dependencies[0].first_name AS childName
FROM policy_holder
WHERE policyId='100001'

Since a Lenses record has three distinctive parts, it is important for the user to access any one of them as well as their parts. The engine allows the user to prefix the fields. Given that there are three components, it is expected that there are three different prefixes. For the Key use _key, for the Value use _value, and for Metadata use _meta. Since the bulk of the data is found within the Value component, the engine accepts a shortcut. This means that any field selection not starting with any of the prefixes specified earlier is automatically qualified to the Value part. For example, the following two queries are identical:

SELECT amount
FROM payments

SELECT _value.amount
FROM payments

There are those cases where a Key or a Value is a primitive data type. To select them is as simple as just using the prefix. Imagine the table messages retain the device identifier as the Key and the temperature as the Value. In this scenario, the SQL code would be:

SELECT _key as deviceId
     , _value as temperature
FROM iot_data

Accessing the metadata for a record follows the same track. The following query is returning only the timestamp and the offset for a Kafka message:

SELECT _meta.timestamp as timestamp
     , _meta.offset as index
FROM iot_data