Managing Tables

This section introduces all the supported Table commands. The Table-query (or bound query) has been developed to support the typical SQL commands supported by a relational database: CREATE, DROP, TRUNCATE, DELETE, alongside non-standard SHOW TABLES, DESCRIBE TABLE, DESCRIBED FORMATTED, SHOW QUERIES, SHOW ALL QUERIES, KILL QUERY.

Create a Table

Before storing any data, a user has to create a table. The syntax for creating a Kafka topic (table) is as follows:

CREATE TABLE $Table($field $fieldType[, $field fieldType,...])

FORMAT ($keyStorageFormat, $valueStorageFormat)

[PROPERTIES(partitions=$partitionCount, replication=$replication, compacted=true/false)]

The CREATE statement has the following parts:

  • CREATE TABLE - Instructs the construction of a table.
  • $Table - The actual name given to the table.
  • Schema - Constructed as a list of (field, type) tuple, it describes the data each record in the table contains.
  • FORMAT - Defines the storage format. Since it is an Apache Kafka topic both the Key and the Value formats are required. Valid values are STRING, INT, LONG, JSON, AVRO.
  • PROPERTIES - Specifies the number of partitions the final Kafka topic should have, the replication factor in order to ensure high availability (it cannot be a number higher than the current Kafka Brokers number) and if the topic should be compacted.

A Kafka topic which is compacted is a special type of topic with a finer-grained retention mechanism that retains the last update record for each key. A compacted topic (once the compaction has been completed) contains a full snapshot of the final record values for every record key and not just the recently changed keys. They are useful for in-memory services, persistent data stores, reloading caches, etc. For more details on the subject you should look at Kafka Documentation

Given the aforementioned theory, here is an example of creating a table for storing customer entries:

CREATE TABLE customer (id string, address.line string, address.city string, address.postcode int, email string)
FORMAT (string, json)
PROPERTIES (partitions=1, compacted=true)

Executing the statement will end up creating the topic. Best practices dictate to use Avro as a storage format over other formats. In this case, the Key can still be stored as STRING (although nothing stops the user to set it as Avro) but the Value should be Avro. By adapting the previous query the new statement will be as follows:

CREATE TABLE customer_avro (id string, address.line string, address.city string, address.postcode int, email string)
FORMAT (string, avro)
PROPERTIES (partitions=1, compacted=true)

List all tables

A system could have many tables and the user could be interested in just listing their names. Therefore this type of syntax is supported:

SHOW TABLES
/* The output for tables mapping to Kafka topics

name            internal replicas  partitions
customer        false       1            1
customer_avro   false       1            1
*/

Table Schema

For each table, the SQL allows the user to get its schema. The schema describes the data the table retains on each record. The syntax for the statement is as follows:

DESCRIBE TABLE $tableName

The $tableName should contain the name of the table to describe. Given the two tables created earlier, a user can run the following SQL to get the information on each table:

DESCRIBE TABLE customer;

DESCRIBE TABLE customer_avro;
/*
The output:
_key                        String
_value.address.postcode     Int
_value.address.city         String
_value.address.line         String
_value.email                String
_value.id                   String
*/

Table Extended Information

Since a table is associated with a Kafka topic, and the table information goes beyond the data schema, Lenses SQL supports another statement allowing the user to retrieve more information on the table. This information will return the actual Kafka topic properties. It is expected this type of query to be used mainly by the developers and DevOps looking after the health of the infrastructure:

DESCRIBE FORMATTED $Table

Taking the tables referred to so far, a user can execute the following:

DESCRIBE FORMATTED customer_avro

/* Output:
# Column Name                           # Data Type
_key                                    String
_value.address.postcode                 Int
_value.address.city                     String
_value.address.line                     String
_value.email                            String
_value.id                               String

# Config Key                            # Config Value
cleanup.policy                          compact
compression.type                        producer
delete.retention.ms                     86400000
file.delete.delay.ms                    60000
flush.messages                          9223372036854775807
flush.ms                                9223372036854775807
index.interval.bytes                    4096
max.message.bytes                       1000012
message.format.version                  1.1-IV0
message.timestamp.difference.max.ms     9223372036854775807
message.timestamp.type                  CreateTime
min.cleanable.dirty.ratio               0.5
min.compaction.lag.ms                   0
min.insync.replicas                     1
preallocate                             false
retention.bytes                         2147483648
retention.ms                            604800000
segment.bytes                           1073741824
segment.index.bytes                     10485760
segment.jitter.ms                       0
segment.ms                              604800000
unclean.leader.election.enable          false
*/

Deleting a Table

Being able to drop an existing table is a common request. Maybe a large table is not needed anymore or, in the case of a development environment, being a good citizen and cleaning after yourself should be the norm. In order to remove a table, a user would need to run this code:

DROP TABLE $Table

If the Lenses user permission allows it, running the SQL statement for the customer/customer_avro tables will result in the underlying Kafka topics to be removed.