Snapshot mode

Kafka SQL pull with snapshot queries

LSQL engine Snapshot mode accesses the data at point in time the query is executed. This means, for Apache Kafka, data added just after the query was initiated will not be processed.

Typical use cases are, but limited to:

  • Identifying a specific message.
  • Identifying a specific payment transaction which your system has processed
  • Identifying all the thermostats readings for a specific customer if you are working for an energy provider
  • Counting transactions processed within a given time window.

What is a message? 

A single entry in a Kafka topic is called a message.

The engine considers a message to have three distinct components key, value, and metadata.

The structure emulates the underlying Kafka message structure.

The key contains the underlying Kafka message key, value and meta, which includes the partition, timestamp, offset.

The two rules listed below are required for understanding how the engine computes and returns a record:

  • When querying a table with a wildcard *, the projection of the records will be returned as follows:
{
    "key":   ...
    "value": ...
    "_meta": ...
}

  • When the key or value part is not present, it means that the underlying Kafka record equivalent is null.
  • A non-wildcard statement returns only the value component and if no aggregates or joins are involved the meta data as well. In such a case the resulting key is null. However, this does not mean that the underlying record key is null.

Selecting nested fields 

Messages can contain nested elements and embedded arrays. The . operator is used to refer to children, and the [] operator is used for referring to an element in an array.

You can use a combination of these two operators to access data of any depth.

SELECT 
    dependencies[0].first_name AS childName
FROM policy_holder
WHERE policyId='100001'

You explicitly reference the key, value and metadata.

For the key use _key, for the value use _value, and for metadata use _meta. When there is no prefix, the engine will resolve the field(s) as being part of the message value. For example, the following two queries are identical:

SELECT 
    amount
FROM payments;

SELECT 
    _value.amount
FROM payments;

Primitive types 

When the key or a value content is a primitive data type use the prefix only to address them.

For example, if messages contain a device identifier as the key and the temperature as the value, SQL code would be:

SELECT 
    _key AS deviceId
  , _value AS temperature
FROM iot_data

Accessing metadata 

Use the _meta keyword to address the metadata. For example:

SELECT 
    _meta.timestamp AS timestamp
    , _meta.offset AS index
FROM iot_data