Topics and Messages

A Kafka record (message) is composed of:

  • key
  • value
  • timestamp
  • topic
  • partition
  • offset
  • headers ( a collection of key-value pairs)

By using LSQL you can access any of the above. For example, if you want to select a field from the key the syntax is _key.id or even _key.* to use the entire key value. A detail explanation is provided in the next chapter.

Serializers

To extract maximum performance, Kafka is not looking at the message content. Kafka takes and sends BYTES. However, the applications consuming these data need to interpret meaningful data, beyond bytes. There is multiple data formats that you can use (JSON, XML, CSV, etc) but Apache Avro is becoming the go-to approach for data serializing in fast and big data systems.

AVRO for Kafka is recommended

One of the advantages using AVRO for Kafka, is that enforces the data schema and provides schema evolution rules, out of the box. Schema management can be achived by introducing a Schema Registry to your infrastructure. Lenses, support Confluent’s and Hortonworks’ schema registries and provides the web interface to explore and manage the schemas, get the history changes, edit and configure the evolution rules.

Google Protobuf is another popular option for data in Kafka, or even use custom serializers.

Lenses Serializers

LSQL can handle popular data formats or you can configure any custom serde (*Ser*ialization-*De*serialization)

  • JSON
  • AVRO
  • XML
  • CSV
  • PROTOBUF
  • STRING/WSTRING
  • INT/WINT
  • LONG/WLONG
  • BYTES (default)
  • Custom

The W… decoders are for Windowed keys. When aggregating based on a key and over the time window the resulting Kafka message key contains the actual key (INT/LONG/STRING) plus the timestamp (epoch time).

How Lenses identifies data types

The format type for the key and the value is configurable in Lenses for each topic. If the configuration is not available, Lenses will try to detect it. You can override the types in two ways:

  • You can set the default values for each topic, which will reflect the types for your queries:
../_images/topic_serdes.png
  • Override the default values, on your LSQL query by specifying the _ktype and _vtype keywords.

For example, imagine a topic where the message key is an INT and the message value is a JSON, but the user decides the value part should be read as STRING. To achieve that the following code can be used:

SELECT *
FROM `payments`
WHERE _vtype=STRING

If Lenses fails to use the serializers will fall back to the default BYTES type.