Concepts

This page describes the concepts of the Lenses SQL snapshot engine that drives the SQL Studio allowing you to query data in Kafka.

Escape topic names with backticks if they contain non-alpha numeric characters

Snapshot queries on streaming data provide answers to a direct question, e.g. The current balance is $10. The query is active, the data is passive.

What is a message?

A single entry in a Kafka topic is called a message.

The engine considers a message to have four distinct components key, value, headers and metadata.

Facets

Currently, the Snapshot Engine supports four different facets _key, _value, _headers and _metadata; These strings can be used to reference properties of each of the aforementioned message components and build a query that way.

By default, unqualified properties are assumed to belong to the _value facet:

SELECT 
  property
FROM source_topic;

In order to reference a different facet, a facet qualifier can be added:

SELECT 
  _value.valueField,
  _key.keyField,
  _meta.metaField,
  _headers.headerField
FROM source_topic;

When more than one sources/topics are specified in a query (like it happens when two topics are joined) a table reference can be added to the selection to fix the ambiguity:

SELECT 
  users._value.field
FROM users JOIN purchases

the same can be done for any of the other facets (_key,_meta,_headers).

Note Using a wildcard selection statement SELECT * provides only the value component of a message.

Headers are interpreted as a simple mapping of strings to strings. This means that if a header is a JSON, XML or any other structured type, the snapshot engine will still read it as a string value.

Selecting nested fields

Messages can contain nested elements and embedded arrays. The . operator is used to refer to children, and the [] operator is used for referring to an element in an array.

You can use a combination of these two operators to access data of any depth.

SELECT 
    dependencies[0].first_name AS childName
FROM policy_holder
WHERE policyId='100001'

You explicitly reference the key, value and metadata.

For the key use _key, for the value use _value, and for metadata use _meta. When there is no prefix, the engine will resolve the field(s) as being part of the message value. For example, the following two queries are identical:

SELECT 
    amount
FROM payments;

SELECT 
    _value.amount
FROM payments;

Primitive types

When the key or a value content is a primitive data type use the prefix only to address them.

For example, if messages contain a device identifier as the key and the temperature as the value, SQL code would be:

SELECT 
    _key AS deviceId
  , _value AS temperature
FROM iot_data

Accessing metadata

Use the _meta keyword to address the metadata. For example:

SELECT 
    _meta.timestamp AS timestamp
    , _meta.offset AS index
FROM iot_data

Projections and nested aliases

When projecting a field into a target record, Lenses allows complex structures to be built. This can be done by using a nested alias like below:

SELECT 
    amount as user.amount
    userId as user.id
FROM payments;

The result would be a struct with the following shape:

{
  "user": {
    "amount" : 10.19,
    "id": 10
  }  
}

Alias clashes (repeated fields)

When two alias names clash, the snapshot engine does not “override” that field. Lenses will instead generate a new name by appending a unique integer. This means that a query like the following:

SELECT 
    amount as result.amount,
    amount + 5 as result.amount
FROM payments;

will generate a structure like the following:

{
  "result": {
    "amount" : 10, 
    "amount0": 15
  }  
}

Nested queries

The tabled query allows you to nest queries. Let us take the query in the previous section and say we are only interested in those entries where there exist more than 1 customer per country.

SELECT *
FROM (
    SELECT 
        COUNT(*) AS count
        , country
    FROM customer
    GROUP BY country
    )
WHERE count > 1

Run the query, and you will only see those entries for which there is more than one person registered per country.

Functions

Functions can be used directly.

For example, the ROUND function allows you to round numeric functions:


SELECT
    name 
    , ROUND(quantity * price) AS rounded_total
FROM groceries

/*The output:
Fairtrade Bananas                                   2
Meridian Crunchy Peanut Butter                      3
Green & Black's organic 85% dark chocolate bar      4
Activia fat free cherry yogurts                     6
Green & Blacks Organic Chocolate Ice Cream          8
*/

For a full list of functions see SQL Reference.

Last updated

Logo

2024 © Lenses.io Ltd. Apache, Apache Kafka, Kafka and associated open source project names are trademarks of the Apache Software Foundation.