Topic schemas


Messages in Kafka are bytes but they can have a schema, especially if a Schema registry is used. Lenses support schema registries and infers data types and schemas.

Schemas via Schema Registry 

In the case of AVRO types Lenses pulls the schema from Schema Registry. It will periodically check for new versions and update its internal index.The naming policy is based on the Schema Registry’s TopicNameStrategy which names the schemas with the topic name plus the -key and -value prefix. Schema References are also supported.

Alternative strategies, ie. you have multiple types of messages in the same topic with different schemas or types other than AVRO are not currently supported via this integration.

Other schemas 

Schemas are used to give a structure to your data and data operations. Lenses uses the schemas to better serve the data, create SQL queries or processors, apply field level data policies, catalog searches and more. In your Kafka ecosystem, it’s recommended to use a Schema Registry for better governance and schema management across apps. Lenses supports this integration for AVRO topics and provides the Schema Registry capabilities.

When Schema Registry isn’t available, apart from the type, Lenses will attempt to auto-detect the schema for this type. The schema will be maintained internally, and also allow the users to amend or correct it. Those schemas in Lenses are represented with an AVRO syntax. Primitive types do not require a schema as they are self-descriptive. For headers schema declaration is not currently supported.

Example of a JSON value:

Lenses Schema

Data types and schemas 

For every Kafka topic, Lenses tracks and maintains a type for the key and the value - and, in some cases, a Header - with an associated schema where required in order to read, serve, process the data or enable field level capabilities such as catalog search, data protection policies etc.

Topic Schemas

When Schema Registry is integrated and there are AVRO topics, Lenses will identify those topics and use those schemas (see Schema Registry integration).

When Schema Registry isn’t available, Lenses will try to auto-detect the type for the key and the value and also their schema. If the detection isn’t successful to a matching type then will default the type to BYTES. You are able to change the type as well as the schema detected by Lenses to better serve the users viewing or processing this topic.

Supported types 

JSON, AVRO, PROTOBUF, XML, CSV, STRING, INT,LONG, BYTES (default), as well as custom pluggable formats.

For the AVRO and the PROTOBUF formats, Lenses relies on the configured schema registry to detect the topic format and schema. For other formats, detection happens by inspecting the raw bytes of the first message in the first topic partition.

For semi-structured formats such as JSON and XML, Lenses tries to infer a schema if the message value is an object with one or more fields. Please note that such schema is inferred and set only once: it’s down to you managing its evolution safely (e.g. re-setting the schema when a new field is introduced, avoiding backward incompatible changes, etc).

Built-in types 

TypesDescription
STRING, INT, LONG, BYTESPrimitive types don’t require a schema. The default type of a topic without an associated type is BYTES.
JSON, XML, CSVJSON, XML and CSV would require a schema for Lenses to decerialize the data and enable its capabilities. The schema would automatically attempt to be detected or can be added by the user.
AVROWhen Schema Registry is available and topic has an AVRO schema, Lenses will use this schema. The schema topic matching is done by using the naming convention topicName-key & topicName-value. Schema references are also applicable.
PROTOBUFWhen Schema Registry is available and topic has an Protobuf schema, Lenses will use this schema. The schema topic matching is done by using the naming convention topicName-key & topicName-value.
TW[<the_other_formats>]This is used by Streaming mode when using hopping or tumbling windows
SW[<the_other_formats>]This is used by Streaming mode when using session windowing

Custom serdes

Lenses supports custom serializers to support custom data formats.

If custom serdes are configured for your environment you can use it directly from the UI and associate with your topic:

Topic Serdes

For more on custom formats see

--
Last modified: September 13, 2024