Topics and Messages¶
A Kafka record (message) is composed of:
- headers ( a collection of key-value pairs)
By using LSQL you can access any of the above. For example, if you want to select a field from the key the syntax is
_key.id or even
to use the entire key value. A detail explanation is provided in the next chapter.
To extract maximum performance, Kafka is not looking at the message content. Kafka takes and sends
However, the applications consuming these data need to interpret meaningful data, beyond bytes.
There is multiple data formats that you can use (JSON, XML, CSV, etc) but Apache Avro
is becoming the go-to approach for data serializing in fast and big data systems.
AVRO for Kafka is recommended
One of the advantages using AVRO for Kafka, is that enforces the data schema and provides schema evolution rules, out of the box. Schema management can be achived by introducing a Schema Registry to your infrastructure. Lenses, support Confluent’s and Hortonworks’ schema registries and provides the web interface to explore and manage the schemas, get the history changes, edit and configure the evolution rules.
Google Protobuf is another popular option for data in Kafka, or even use custom serializers.
LSQL can handle popular data formats or you can configure any custom serde (*Ser*ialization-*De*serialization)
- BYTES (default)
The W… decoders are for Windowed keys. When aggregating based on a key and over the time window the resulting Kafka message key contains the actual key (INT/LONG/STRING) plus the timestamp (epoch time).
How Lenses identifies data types
The format type for the
key and the
value is configurable in Lenses for each topic. If the configuration
is not available, Lenses will try to detect it. You can override the types in two ways:
- You can set the default values for each topic, which will reflect the types for your queries:
- Override the default values, on your LSQL query by specifying the
For example, imagine a topic where the message key is an INT and the message value is a JSON, but the user decides the value part should be read as STRING. To achieve that the following code can be used:
SELECT * FROM `payments` WHERE _vtype=STRING
If Lenses fails to use the serializers will fall back to the default