Is it a Table or a Stream?

If you are familiar with SQL for RDBMS (Relational Database Management System - Oracle, MySQL, Postgres, MS SQL) world, you will likely know a table consists of rows and columns. The order and the columns number is fixed and the rows count is variable. Under the hood, a database retains an append-only log. This log retains all the transactions (a transaction is a logical unit that is independently executed for data updates) applied to a table. The content of these entries is then applied to the table (in the order they occurred) to materialize the data. In the context of streaming processing, the log is nothing but a stream of data. Therefore, if you have a stream of data and you think of it as the table log, you can construct a table from a stream. To go the other way, if you have a table you can construct a stream out of it by using its underlying log. As a result, a table is a stream and a stream is a table.

Understanding the duality of table and stream is fundamental for building complex SQL targeting streaming processing. Tables represent data at rest. Since a table can be constructed out of a stream, the likelihood of it being static is slim. Most tables tend to change over time. Streams represent data in motion. A stream captures the way data has evolved over time. This is different to a table, where a table provides a view of the dataset at a given point in time.

Although a table and a stream look a lookalike, it is important to not be taken away by the thought they are the same thing. The subtle differences are important.

Construct a stream

Lenses SQL allows the user to create a stream by using the STREAM keyword when declaring a SELECT statement. Here is how to declare a stream:

SELECT STREAM ...
FROM iot_data

Since the SQL engine materializes the flow using Kafka Streams API, the syntax above will create a KStream at runtime.

Construct a table

Any SELECT statement where the keyword STREAM is left out is considered to be a table.

SELECT *
FROM customers

A table is created from the underlying Kafka topic. When a table is created there are a few rules which apply. Please make sure the concepts presented in the tables section, are understood first.

  • the record Key component is the unique identifier for a record
  • if the record Key is null then the entire record is ignored
  • a record will replace an existing one if the Key K already exists.

Building a table, in terms of streaming processing, means the application worker running the continuous query accumulates the data locally. To avoid exhausting the available memory (the data stream can easily exceed the available machine memory), a disk storage is involved via RocksDB. Advanced users can tweak the setting for the database. All the details about how can be found in the settings section.

Important

Use SELECT STREAM to create a stream. Use SELECT to create a table.