1 of 14

Concepts

This page describes the concepts of the Lenses SQL snapshot engine that drives the SQL Studio allowing you to query data in Kafka.

Escape topic names with backticks if they contain non-alpha numeric characters

Snapshot queries on streaming data provide answers to a direct question, e.g. The current balance is $10. The query is active, the data is passive.

What is a message?

A single entry in a Kafka topic is called a message.

The engine considers a message to have four distinct components key, value, headers and metadata.

Facets

Currently, the Snapshot Engine supports four different facets _key, _value, _headers and _metadata; These strings can be used to reference properties of each of the aforementioned message components and build a query that way.

By default, unqualified properties are assumed to belong to the _value facet:

In order to reference a different facet, a facet qualifier can be added:

When more than one sources/topics are specified in a query (like it happens when two topics are joined) a table reference can be added to the selection to fix the ambiguity:

the same can be done for any of the other facets (_key,_meta,_headers).

Note Using a wildcard selection statement SELECT * provides only the value component of a message.

Headers are interpreted as a simple mapping of strings to strings. This means that if a header is a JSON, XML or any other structured type, the snapshot engine will still read it as a string value.

Selecting nested fields

Messages can contain nested elements and embedded arrays. The . operator is used to refer to children, and the [] operator is used for referring to an element in an array.

You can use a combination of these two operators to access data of any depth.

You explicitly reference the key, value and metadata.

For the key use _key, for the value use _value, and for metadata use _meta. When there is no prefix, the engine will resolve the field(s) as being part of the message value. For example, the following two queries are identical:

Primitive types

When the key or a value content is a primitive data type use the prefix only to address them.

For example, if messages contain a device identifier as the key and the temperature as the value, SQL code would be:

Accessing metadata

Use the _meta keyword to address the metadata. For example:

Projections and nested aliases

When projecting a field into a target record, Lenses allows complex structures to be built. This can be done by using a nested alias like below:

The result would be a struct with the following shape:

Alias clashes (repeated fields)

When two alias names clash, the snapshot engine does not “override” that field. Lenses will instead generate a new name by appending a unique integer. This means that a query like the following:

will generate a structure like the following:

Nested queries

The tabled query allows you to nest queries. Let us take the query in the previous section and say we are only interested in those entries where there exist more than 1 customer per country.

Run the query, and you will only see those entries for which there is more than one person registered per country.

Functions

Functions can be used directly.

For example, the ROUND function allows you to round numeric functions:

Filter by timestamp or offset

This page describes how to view the most recents messages in Lenses.

A popular question is "how do I see the most recent messages?" or "how do i filter for a specific date"

SQL studio shows you the oldest events (the earliest) because this is how Kafka is designed.

To skip the old events and see the most recent very fast, you need to either filter on time or offset with these filters. A list of functions is available here SQL reference

View the most recent messages

Filter by timestamp

Use the now() function to view events from the last 5 minutes:

SELECT * FROM mytopic WHERE _meta.timestamp > now() - "5m" LIMIT 100;

Other time filters are available here

Filter by offset

View the most recent 100 events :

SELECT * FROM mytopic WHERE _meta.offset >= LAST_OFFSET() - 10 LIMIT 100;

Slice from the past

Filter by timestamp

SELECT *
FROM position_reports
WHERE
   _meta.timestamp > "2020-04-01" AND
   _meta.timestamp < "2020-04-02";

Same example with time filter :

SELECT * from mytopic
WHERE 
    _meta.timestamp > '2024-10-16 14:30:00' AND 
    _meta.timestamp < "2025-04-02 14:30:00"
;

Filter by Offset

the same logic can be applied for filtering by offset :

SELECT * FROM mytopic 
WHERE 
    _meta.offset >= 100 and
    _meta.offset < 1000;

Example and tutorials are available here

Limit & Sampling

This page describes how to limit return and sample data in Kafka with Lenses SQL Studio.

Limit the output

To limit the output of the query you can use two approaches:

use the LIMIT clause
set the max size of the data to be returned

Set a time limit for a query

To restrict the time to run the query, use SET max.query.time:

Sampling data

To sample data and discard the first rows:

This statement instructs Lenses to skip the first record matched and then sample the next two.

Metadata fields

This page describes access Kafka message metadata in Lenses SQL Studio.

When running queries against Kafka, Snapshot Engine enables you to access the record metadata through the special _meta facet.

These are the available meta fields:

Select all the meta fields

The following query will select all the meta fields listed above:

View headers

To view the value of a specific header you can run:

Filter on record timestamp

Filter on table partition

To read records from a specific partition, the following query can be used:

Search for a record on a specific offset

Here is the query to use when the record offset and partition are known:

Get the latest N records per partition

This query will get the latest 100 records per partition (assuming the topic is not compacted):

This instead will get the latest 100 records for a given partition (again assuming the topic is not compacted):