Table-based engine

For querying existing data in the middleware, Lenses SQL engine can address the entire data as it exists at execution time. With a messaging system like Apache Kafka, new data can be added continuously, and as a query needs to terminate, it will only address the data as a point in time query.

A typical use case for the tabled query is identifying a specific record(-s). Identifying a specific payment transaction your system has processed, or maybe identifying all the thermostats readings for a specific customer if you are working for an energy provider should be achieved in quickly. However, the scenarios for when a user will require these types of queries do not stop there. More often than not, a user might need to count all the transactions processed within a given time window - a specific day, or within a specific hour. Counting the transactions processed by the company in the previous business day, or even summing the total amount of sales made on a given day, should take a few lines of SQL. Last but not least, the user might need to set a continuous query to join two streams of data, and uses the tabled query to try the query on a fixed set of data before applying it as a streaming query.

Before learning about the SQL statements support the user is encouraged to familiarize with a few concepts first.

What’s a Record

A single entry in a table is called a record. Earlier, both the table schema and the Kafka record structure were introduced. Lenses considers a record to have three distinct components Key, Value, and Meta(data). The structure emulates more or less the underlying Kafka record structure. The Key contains the underlying Kafka record Key (primitive or composite value), Value (primitive or composite value) and _Meta (retains the underlying record extra information, things like partition, timestamp, offset and so on.)

The two rules listed below are required to understand how the engine computes and returns a record:

  • When querying a table with a wildcard * projection the records will be returned as
{
    "Key":   ...
    "Value": ...
    "_Meta": ...
}

When the Key or Value part is not present, it means they underlying Kafka record equivalent is null.

Note

A wildcard SELECT * statement might not provide content for the Key and Value parts. This indicates the underlying Kafka record Key and Value are null.

  • A non-wildcard statement will return only the Value component and if not aggregates or joins are involved the _Meta(data) as well. At this point, the engine returns only the record parts the user has requested. In such a case the resulting Key is null, however, it does not mean the underlying record Key is null.

Note

When specifying particular fields to be returned in a SELECT statement, only those fields ( key / value / header / metadata ) will be returned.

Selecting Fields

Having the data and yet not being able to pick specific parts of it, would not provide much use to the user. Being able to instruct the engine to return only those records where the user identifier is a specific value is mandatory. As table records can contain nested elements and embedded arrays, a few operators are needed. The . operator is used to refer to children, and the [] is used to refer to an element in an array. You can use a combination of these two operators to access data at any depth.

SELECT dependencies[0].first_name AS childName
FROM policy_holder
WHERE policyId='100001'

Since a Lenses record has three distinctive parts, it is important for the user to access any of them and their parts. To be explicit the engine allows the user to prefix the fields. Given there are three components then it is expected there are three different prefixes. For the Key use _key, for the Value use _value, and for Metadata it is _meta. Since the bulk of the data is found within the Value component, the engine accepts a shortcut. This means any field selection not starting with any of the prefixes specified earlier are automatically qualified to the Value part. For example, the following two queries are identical:

SELECT amount
FROM payments

SELECT _value.amount
FROM payments

There are those cases where a Key or Value are primitive types. To select them is as simple as just using the prefix. Imagine the table records retain the device identifier as the Key and the temperature as the Value. In this scenario the SQL code would be:

SELECT _key as deviceId
     , _value as temperature
FROM iot_data

Accessing the metadata for a record follows the same track. Next query is returning only the timestamp and the offset for a record:

SELECT _meta.timestamp as timestamp
     , _meta.offset as index
FROM iot_data

Track all Queries

Through configuration, Lenses can be instructed to store all the Tabled-based queries made. Alongside the actual query, the information stored includes the user and the point in time it was executed. To activate the functionality, the following configuration needs to be set before Lenses starts: lenses.topics.lsql.storage=_kafka_lenses_lsql_storage. Once the queries are stored, the table associated with this topic can be queried in order to understand the way the users are using the data.