Streaming-based engineΒΆ

Important

Always escape the topic name via `. If you do not and the topic contains non-alphanumeric characters the parsing will fail.

With a growing volume, velocity and data variety, it is essential for enterprises to manage and process data efficiently as soon as possible. These days, a company competitive advantage comes from analyzing, understanding and responding to the data. Alongside accuracy, time is vital. One important aspect of data to keep in mind is that it degrades with time. Knowing that last week there were n-incidents of fraud or n-IoT sensors have failed, provides the knowledge from which you can learn how to react to the problems in the future. It would be much better, though, if we can gain the insights at the time they are occurring so that someone can react and manage the situation immediately.

Using databases and batch processing technologies has been the approach to analyze the data. While this approach is still well suited for various applications, it is not the best when a real-time response to data is required. The technology designed to handle these requirements is known as streaming processing.

Important

Lenses SQL for streaming enables real-time data processing using only SQL.

Similar to SQL, SQL for streaming gives the user the means for manipulating streaming data declaratively without having to write custom code. Here are some of the advantages of streaming SQL:

  • Easy to understand and learn for people who are already familiar with SQL. Hiding some of the complexity of streaming processing behind the simplicity of SQL allows more than just software developers to engage in processing data in real-time.
  • Expressive, short and with a faster deployment cycle.
  • It covers around 80% of the problem space. For some scenarios, there is no escape from writing code (Java, C++, Go).

Lenses SQL streaming leverages the Apache Kafka Streams API. At runtime, the SQL written materializes to a Kafka Streams flow.

Here is a continuous query example, processing records as soon as they arrive:

INSERT INTO hot_sensors
SELECT
      ip
    , lat
    , long
    , (temp * 1.8 + 32) AS metric
FROM  sensors
WHERE temp > 30

Streaming Syntax

Unlike SQL, there is no standard streaming SQL syntax. There are many favors, which follow SQL but have variations. Here is how Lenses Streaming SQL syntax looks:

[ SET autocreate = true;]
[ SET partitions = 1;]
[ SET replication = 2;]
[ SET `decimal.scale`= 18;]
[ SET `decimal.precision`= 38;]
[ SET `ANY KAFKA STREAMS CONFIG. See Kafka documentation StreamsConfig, ConsumerConfig and ProducerConfig` = '';]
[ SET `topic.[ANY KAFKA Log CONFIG. See LogConfig]` = '';]
[ SET `rocksdb.[RocksDB specific configurations. See the section below on RocksDB]`= '';]

INSERT INTO _TARGET_TOPIC_
[WITH
   _ID_NAME_ AS (SELECT [STREAM] ...FROM _TOPIC_ ...),
   _ID_NAME_ AS (SELECT [STREAM] ...FROM _TOPIC_ ...)
]
SELECT select_expr [, select_expr ...]
FROM _ID_NAME_ INNER JOIN _OTHER_ID_NAME_ ON join_expr
[WHERE condition_expr]
[GROUP BY group_by_expr]
[HAVING having_expr]

# If you are not familiar with Apache Kafka stream processing API please follow the documentation.

Important

Streaming SQL is not your typical RDBMS SQL. Core concepts around stream processing with Apache Kafka, the duality of Table/Stream, the implication of creating a Table versus a Stream instance, etc. need to be understood first.

What you get

SQL is a language for querying structured data. It provides a set of operators which allows the user to do projection, filter, joins, and grouping. The operators can be combined to create powerful queries. There is a great similarity with standard SQL, the slight difference appears when doing joins and aggregation (grouping). Given the operators, a person can handle scenarios where data:

  • needs to be transformed and filtered. Imagine a stream of records received from the sensors of your farming machines where it is required that records based on the sensor type and specific measurements need to be handled.

    Or maybe the incoming data needs to be reshaped to match an expected structure. This is achieved via the projection and filter operators.

  • needs to be aggregated based on grouping criteria. From simple counting of current users on a website to seeing the amount transacted on an hourly computation. All these are possible via Lenses SQL.

  • needs to be enriched by combining two or more streams of data. For a payment system, it is a common scenario to have the continuous stream of transactions which are enriched with customer data.

This means the transaction table has to be joined with the customer table.

It is also possible to combine all the operators at once. That would likely be a requirement to deliver continuous SQL for complex scenarios. Taking the payments system as an example, filtering the transactions for a specific region and then joining them with the customer table and aggregate on a time window of 1 hour is possible.