JDBC concepts

In the JDBC specification, the terminology for a record is a row; historically, the interfaces have been designed with relational databases in mind. In this guide, when you see the term row, you can think of a Kafka record value part.

The Java Database Connectivity (JDBC) API is the industry standard for database-independent connectivity between the Java programming language and a wide range of databases SQL databases and other tabular data sources, such as spreadsheets or flat files. Each row represents a unique record, and each column represents a field in the record. For example, a table that contains employee data for a company contains a row for each employee and columns representing employee information such as number, name, job title, email, office code, extension, the person to report to.

Kafka JDBC example

To query such a table one would write this SQL:

SELECT employeeNumber,
        lastName,
        firstName,
        extension,
        email,
        officeCode,
        reportsTo,
        jobTitle
FROM employee

JDBC interfaces have been built with tabular data in mind.

A Kafka record data is anything but tabular. A record has a key (optional), a value (optional) and some metadata (partition, offset, timestamp) and headers. Furthermore, the key and value are not linear structure necessarily. For example, it is quite common to have a payload for an employee like the one below:

    {
      "number": 1056,
      "name": {
        "first": "Mary",
        "last": "Patterson"
      },
      "extension": "x4611",
      "email": "mpatterso@classicmodelcars.com",
      "officeCode": 1,
      "reportsTo": 1002,
      "jobTitle": "VP Sales"
    }

Lenses SQL engine can handle multiple data formats and multiple data structures, and it can address key and value field(-s) as well as metadata, When it comes to field selection, it can handle the full spectrum of nested fields, arrays, nested fields in arrays. Every result for a wildcard query like the following:

SELECT * FROM employee

returns for each record the information split into three sections:

  • key
  • value
  • metadata

Any field selection made outside of the wildcard projection is attached to the value category. Given the query below

SELECT _key as employeeNumber,
        jobTitle,
        name.first as firstName,
        _meta.offset,
        _meta.__keysize,
        _meta.valsize
FROM employee

the result for each record has the information returned in two sections

  • value
  • metadata (always available)

To be able to query Kafka via JDBC, some rules have to be put in place to accommodate the multi values of a Kafka record. For example, a query like:

SELECT * FROM mytopic

returns multi-value data for each record - and this is a result of a Kafka message structure. This does not fit into the JDBC models - you can think of trying to represent 3 tables (key, value, metadata) as one.

The convention for the JDBC driver is that it only picks up the information from the value category. Using wildcard selection means the value information section for a record returns only the data present in the Kafka record value. However through projections, the key and metadata fields can be brought into the JDBC row.