This page describes common filtering of data in Kafka with Lenses SQL Studio.
WHERE
clause allows you to define a set of logical predicates the data needs to match in order to be returned. Standard comparison operators are supported (>
, >=
, <
, <=
, =
, and !=
) as well as calling functions.
We are going to use the groceries
table created earlier. Select all items purchased where the prices are greater or equal to 2.00:
Select all customers whose last name length equals to 5:
Search all customers containing Ana
in their first name:
Keep in mind that text search is case-sensitive. To use case insensitive text search, you can write:
Sometimes data can contain explicit NULL values, or it can omit fields entirely. Using IS [ NOT ] NULL, or EXISTS functions allows you to check for these situations.
Exists
is a keyword in Lenses SQL grammar so it needs to be escaped, the escape character is `````.
Lenses supports JSON. JSON does not enforce a schema allowing you to insert null values.
Create the following table named customers_json
:
Query this table for all its entries:
The middle_name
is only present on the mikejones
record.
Write a query which filters out records where middle_name
is not present:
This can also be written as:
When a field is actually NULL
or is missing, checking like in the above query has the same outcome.
You can use AND/OR to specify complex conditions for filtering your data.
To filter the purchased items where more than one item has been bought for a given product, and the unit price is greater than 2:
Now try changing the AND logical operand to OR and see the differences in output.
To filter the entries returned from a grouping query. As with the WHERE
statement, you can use HAVING
syntax to achieve the same result when it comes to grouped queries.
To select data from a specific partition access the metadata of the topic.
In the following example, a table is created with three partitions and the message key is hashed and then the remainder HashValue % partitions
will be the table partition the record is sent to.
Next, run the following query:
As you can see from the results (your timestamps will be different) the records span over the three partitions. Now query specific partitions:
Kafka reads are non-deterministic over multiple partitions. The Snapshot engine may reach its max.size
before it finds your record in one run, next time it might.
If we specify in our query that we are only interested in partition 1, and for the sake of example the above Kafka topic has 50 x partitions. Then Lenses will automatically push this predicate down, meaning that we will only need to scan 1GB instead of 50GB of data.
If we specify the offset range and the partition, we would only need to scan the specific range of 100K messages resulting in scanning 5MB.
The above will query only the data added to the topic up to 1 hour ago. Thus we would query just 10MB.Time-traveling
The above will query only the data that have been added to the Kafka topic on a specific day. If we are storing 1,000 days of data, we would query just 50MB.