Query

In this page we are going to explore some SQL commands in order to query data stored in topics. We can directly access this data using Lenses tabled and continuous SQL. Even though this may seem to have a lot of similarities with the way Relational Databases work, it is worth mentioning that Kafka does not have the same behavior as Relational Databases. Kafka keeps the data according to the Topic retention policies – once a policy kicks in, the data is no longer available and theres no index. Kafka does provide APIs to efficently seek to a particular offset (message id) and timestamp there is no secondary indexing like and relational database.

The SQL commands presented in this page are for querying table data, that is data that is stored in a Kafka topic. This is pretty close to the way Relational Databases work.

Since a Lenses record has three distinctive parts, it is important for the user to access any one of them. To be explicit, the engine allows the user to prefix the fields. Given that there are three components, it is expected that there will be three different prefixes. For the Key use _key, for the Value use _value, and for Metadata use _meta. Notice that there are those cases where a Key or Value in a Kafka record is a primitive type. To select them is as simple as just using the prefix.

Your first Lenses SQL Query

The following query will select log messages based on their length:

SELECT _value AS LOG_MESSAGE
FROM var_log_broker
WHERE LENGTH(_value) > 1000

This query selects the values of the Kafka records in the var_log_broker topic that have a length that is greater than 1000. Additionally, the name of the field will be printed as LOG_MESSAGE instead of _value.

Creating another query

The following query will find the average length of all log messages:

SELECT AVG(LENGTH(_value)) AS AVERAGE_LENGTH
FROM var_log_broker

The output will be as single number. The returned value in JSON format will be similar to the following:

[
    {
        "value": {
            "AVERAGE_LENGTH": 199.08857979502196
        }
    }
]

Doing something tricky

In this last example query we are going to calculate a number of values for our sample log file:

  • The number of entries in the topic.
  • The length of the biggest log message.
  • The total number of characters of the log messages.
  • The average number of characters per log entry.
SELECT
    MAXIMUM
    , TOTAL
    , CHARS
    , CHARS/TOTAL AS AVERAGE
FROM
    (SELECT
        COUNT(*) AS TOTAL
        , SUM(LENGTH(_value)) AS CHARS
        , MAX(LENGTH(_value)) AS MAXIMUM
     FROM var_log_broker)

The results will be similar to the following:

[
    {
        "value": {
            "MAXIMUM": 4700,
            "TOTAL": 1387,
            "CHARS": 275175,
            "AVERAGE": 198
        }
    }
]

Please have in mind that all queries presented in this page are executed using the table based query engine.