Filtering

This page describes common filtering of data in Kafka with Lenses SQL Studio.

WHERE clause

WHERE clause allows you to define a set of logical predicates the data needs to match in order to be returned. Standard comparison operators are supported (>, >=, <, <=, =, and !=) as well as calling functions.

We are going to use the groceries table created earlier. Select all items purchased where the prices are greater or equal to 2.00:

SELECT
     name
     , price
FROM groceries
WHERE price >= 2.0

/* Output
Meridian Crunchy Peanut Butter              2.5
Activia fat free cherry yogurts             2
Green & Blacks Organic Chocolate Ice Cream  4.2
*/

Select all customers whose last name length equals to 5:

SELECT *
FROM customers
WHERE LEN(last_name) = 5

/* Output
key         value.first_name    value.last_name
mikejones       Mike                Jones
anasmith        Ana                 Smith
*/

Search all customers containing Ana in their first name:

Keep in mind that text search is case-sensitive. To use case insensitive text search, you can write:

Missing values

Sometimes data can contain explicit NULL values, or it can omit fields entirely. Using IS [ NOT ] NULL, or EXISTS functions allows you to check for these situations.

Exists is a keyword in Lenses SQL grammar so it needs to be escaped, the escape character is `````.

Lenses supports JSON. JSON does not enforce a schema allowing you to insert null values.

Create the following table named customers_json:

Query this table for all its entries:

The middle_name is only present on the mikejones record.

Write a query which filters out records where middle_name is not present:

This can also be written as:

When a field is actually NULL or is missing, checking like in the above query has the same outcome.

Multiple WHERE conditions

You can use AND/OR to specify complex conditions for filtering your data.

To filter the purchased items where more than one item has been bought for a given product, and the unit price is greater than 2:

Now try changing the AND logical operand to OR and see the differences in output.

HAVING clause

To filter the entries returned from a grouping query. As with the WHERE statement, you can use HAVING syntax to achieve the same result when it comes to grouped queries.

Read a table partition only

To select data from a specific partition access the metadata of the topic.

In the following example, a table is created with three partitions and the message key is hashed and then the remainder HashValue % partitions will be the table partition the record is sent to.

Next, run the following query:

As you can see from the results (your timestamps will be different) the records span over the three partitions. Now query specific partitions:

Query by partitions

Kafka reads are non-deterministic over multiple partitions. The Snapshot engine may reach its max.size before it finds your record in one run, next time it might.

If we specify in our query that we are only interested in partition 1, and for the sake of example the above Kafka topic has 50 x partitions. Then Lenses will automatically push this predicate down, meaning that we will only need to scan 1GB instead of 50GB of data.

Query by offsets

If we specify the offset range and the partition, we would only need to scan the specific range of 100K messages resulting in scanning 5MB.

Query by timestamp

The above will query only the data added to the topic up to 1 hour ago. Thus we would query just 10MB.Time-traveling

The above will query only the data that have been added to the Kafka topic on a specific day. If we are storing 1,000 days of data, we would query just 50MB.

Last updated

Was this helpful?