1 of 19

SQL Processors

Data formats

Changing data formats

Tutorial on how to change the format of data in a Kafka topic from JSON to AVRO with Lenses SQL Processors.

In this example, we will show how to create an AVRO topic from an existing JSON topic.

Requirements

For this to work, Lenses has to know what the source topic schema is. Lenses can do this in one of three ways:

through direct user action where the schema is manually set
through inference; Lenses will try to infer the schema of a topic by looking at the topic data
and lastly, if the topic is created through Lenses, the schema will be automatically set

Creating the JSON data

With the following SQL we can create our intial JSON topic:

to which we can add data using:

It can be quickly verified that the format of our newly created topic is JSON for both key and value by searching our topic car_speed_events_json in our explore view:

Creating the AVRO topic

To create a new topic with format AVRO, we can create a processor that will copy the data from our original topic to a new topic changing the format in the process.

To do this, we start by going to “SQL Processor”, clicking: “New SQL Processor” and defining our processor with the following code:

Notice the addition of STORE KEY AS AVRO VALULE AS AVRO. This statement will tell our processor which format we want each facet (key or value) to be stored as.

Hitting “Create New Processor” will start a new processor

We can see the events were added and from now on, Lenses will keep pushing any new events added to car_speed_events_avro into car_speed_events_avro.

We can also verify that the topic format of our new topic is also AVRO for both the key and value facets:

Rekeying data

This page describes a tutorial to rekey data in a Kafka topic with Lenses SQL Processors.

Sometimes you have a topic that is almost exactly what you need, except that the key of the record requires a bit of massaging.

In Lenses SQL you can use the special SELECT ... as _key syntax to quickly re-key your data.

In our example, we have a topic containing events coming from temperature sensors.

Each record contains the sensor’ measured temperature, the time of the measurement, and the id of the sensor. The key is a unique string (for example, a UUID) that the upstream system assigns to the event.

You can replicate the example, by creating a topic in SQL Studio:

CREATE TABLE temperature_events(
    sensor_id string
    , temperature int
    , event_time long
)
FORMAT (string, avro);

We can also insert some example data to do our experiments:

You can explore the topic in lenses to check the content and the shape of what you just inserted.

Let’s say that what you need is that same stream of events, but the record key should be the sensor_id instead of the UUID.

With the special SELECT ... as _key syntax, a few lines are enough to define our new re-keying processor:

The query above will take the sensor_id from the value of the record and put it as the new key. All the values fields will remain untouched:

Maybe the sensor_id is not enough, and for some reason, you also need the hour of the measurement in the key. In this case, the key will become a composite object with two fields: the sensor_id and the event_hour:

As you can see, you can build composite objects in Lenses with ease just listing all the structure’s fields, one after the other.

In the last example, the _key output storage format will be inferred automatically by the system as JSON. If you need more control, you can use the STORE AS clause before the SELECT.

The following example will create a topic as the previous one, but where the keys will be stored as AVRO:

Happy re-keying!

Controlling AVRO record names and namespaces

This page a tutorial to control AVRO record names and namespaces with Lenses SQL Processors.

When writing output as AVRO, Lenses creates schemas for you, automatically generating AVRO record names.

In this tutorial we will learn how to override the default record naming strategy.

In Lenses SQL you can use a SET statement to control the record and namespace name generated for the AVRO schema.

Setting up our input topic

We are going to create and populate a topic that we will later use in a couple of SQL Processors.

In SQL Studio, create the topic running the following query:

For the purposes of our tutorial, it is enough to insert a single topic:

Create a simple SQL Processor

We are now going to create a processor that will show the default behavior of AVRO record naming in Lenses.

The processor does not do much, it just reshapes the fields of the original topic, putting some of them in a nested field:

We then start the processor. Lenses will create the new topic mytopic_2, and new schema will be created in the Schema Registry, as soon as the first (and only) record is processed.

If we inspect the value schema of mytopic_2, we see that this is the one generated:

As we can see, each record type has a name (it is mandatory in AVRO), and Lenses has generated those names automatically for us (record, record0, record1 etc.).

Set the record name and the namespace of the value schema

We are now going to see how to override that default behavior.

Let’s create and start the new processor with the following SQL:

Notice how we added the new SET statements to the query:

These settings are telling Lenses to set the root record name and namespace to the values specified.

If we now check the value schema for mytopic_3 we get:

As we can see, the root record element has now name myRecordName and namespace myNamespace.

Notice how the settings did not affect nested records.

Set the record name and the namespace of the key schema

If the key of the generated topic has AVRO format as well, you can use the following analogous settings to control the key record name and namespace:

Changing the shape of data

This page describe a tutorial to change the shape (fields) of data in a Kafka topic using Lenses SQL Processors.

In this tutorial, we will see how to use Lenses SQL to alter the shape of your records.

In Lenses SQL you can quickly reshape your data with a simple SELECT and some built-in functions.

We will learn how to

put value fields into the key
lift key fields into the value
call functions to transform your data
build nested structures for your keys/values
unwrap singleton objects into primitive types

Setting up our example

In our example, we are getting data from speed sensors from a speed car circuit.

The upstream system registers speed measurement events as records in a Kafka topic.

An example of such an event is the following:

We can replicate such a structure running the following query in SQL Studio:

Each event is keyed by a unique string generated by the upstream system.

We can again use SQL Studio to insert some data to play with:

Simple projections

In this section, we are only interested in the speed of single cars, and we do not care about all the other fields.

We want to use the car id, which now is part of the record Value, to become the new key (using the special as _key syntax). We also want the car speed as the new record Value.

To achieve that we can create a new SQL Processor using some simple projections.

Checking the records emitted by the processor we see that the shape of the records is

We want to avoid that intermediate wrapping of speedMph inside an object. To do that we can tell Lenses to unwrap the value with the special as _value syntax, saving some bytes and CPU cycles:

Now the shape of the records is what we had in mind:

Using built-in functions

This time we want to do some more complex manipulation. We want to convert the speed from Mph to Kmph, and we would also want to build a nice string describing the event.

An example of an output record would be:

In this case, we are using CONCATENATE to concatenate multiple strings, CAST to convert an expression to another type, and *, the usual multiplication operator.

If we check the resulting records, we can see that we obtained the shape we were looking for. Please note that the keys have been left untouched:

Composite objects

In this last example, we will show how to create composite keys and values in our projections.

We want both the sensor id and the event_time as the record Key. For the record Value, we want the car_id and the speed, expressed both as Mph and Kmph.

Lenses SQL allows as to use nested aliases to build nested structures. You have to put some dots in your aliases.

The resulting shape of the record is what we were aiming for:

Happy re-shaping!

Filtering & Joins

Filtering data

This page describes a tutorial to filter records in a Kafka topic with Lenses SQL Processors.

Filtering messages and copying them to a topic can be achieved using the WHERE clause.

Setting up our example

In our example, we have a topic where our application registers bank transactions.

We have a topic called payments where records have this shape:

We can replicate such a structure running the following query in SQL Studio:

Enriching data streams

This page describes a tutorial to enrich a Kafka topic using Lenses SQL Processors.

In this article, we will be enriching customer call events with their customer details.

Enriching data streams with extra information by performing an efficient lookup is a common scenario for streaming SQL on Apache Kafka.

Topics involved:

customer_details messages contain information about the customer
customer_call_details messages contain information about calls

Streaming SQL enrichment on Apache Kafka

Testing data

To simplify our testing process and manage to run the above example in less than 60 seconds, we will be using SQL to create and populate the three Apache Kafka topics:

CREATE TOPIC `customer_details`

POPULATE TOPIC `customer_details`

CREATE TOPIC `customer_call_details`

POPULATE TOPIC `customer_call_details`

Validate results

Joining streams of data

This page describes a tutorial joining Kafka topics with Lenses SQL Processors.

magine you are the next Amazon, and you want to track the orders and shipment events to work out which orders have been shipped and how long it took. In this case, there will be two data streams, one for each event type, and the resulting stream will answer the questions above.

Enriching two streams of data requires a sliding window join. The events are said to be “close” to each other, if the difference between their timestamp is up to the time window specified.

Topics involved:

orders messages contain information about a customer
shipments messages contain information about the shipment

Streaming SQL enrichment on Apache Kafka

The query combines the data from orders and shipments if the orders are processed within 24 hours. Resulting records contain the order and shipment identifier, and the time between the order was registered to the time it was shipped.

Testing data

To simplify our testing process and manage to run the above example in less than 60 seconds, we will be using SQL to create and populate the three Apache Kafka topics:

CREATE TOPIC `orders`

POPULATE TOPIC `orders`

CREATE TOPIC `shipments`

POPULATE TOPIC `shipments`

The output seen in the next screenshot shows two records. For the order with o2 identifier, there is no shipments entry because it has not been processed. For the order with identifier o3, the shipment happened after one day.

Validate results

Let’s switch to the Snapshot engine by navigating to SQL Studio menu item. With the entries in both topics, we can write the following query to see which data is joinable without the window interval:

These are the results for the non-streaming query (i.e., Snapshot)

Running the query returned three records. But you can see the order o3 was processed two days after it was placed. Let’s apply the sliding window restriction for the Snapshot query by adding a filter to only match those records having their timestamp difference within a day.

Now the result matches the one from Streaming query.

Conclusion

In this tutorial you learned how to join to Streams together using a sliding window. You achieved all the above using Lenses SQL engine.

Good luck and happy streaming!

Using multiple topics

This page describes a tutorial to use multiple Kafka topics in a Lenses SQL Processor.

In this tutorial, we will see how we can read data from multiple topics, process it as needed, and write the results to as many output topics we need, all by using a single SQL Processor.

Setting up our example

Let’s assume that we have a topic (game-sessions) that contains data regarding remote gaming sessions by users.

Each gaming session will contain:

the points the user achieved throughout the session
Metadata information regarding the session:
- The country where the game took place
- The language the user played the game in

The above structure represents the value of each record in our game-sessions topic.

Additionally, each record will is keyed by user details.

A pid, or player id, representing this user uniquely
Some additional denormalised user details:
- a name
- a surname

In light of the above, a record might look like the following (in JSON for simplicity):

Finally, let’s assume we also have another, normalised, compacted topic user-details, keyed by an int matching the pid from topic game-sessions and containing user information like address and period of membership to the platform.

In light of the above, a record might look like the following (in JSON for simplicity):

We can replicate such structures using SQL Studio and the following query:

We can then use SQL Studio again to insert the data we will use in the rest of the tutorial:

Multiple transformations all in one go

Let’s imagine that, given the above data, we are given the following requirements:

For each country in the games-sessions, create a record with the count of games played in from that country. Write the results to the games-per-country topic.
For each record in the games-sessions, reshape the records to remove everything from the key beside pid. Additionally, add the user’s memberYears to the value. Write the results to the games-sessions-normalised topic .

We can obtain the above with the following query:

The result of this processor in the UI will be a processor graph similar to the following:

Finally, the content of the output topics games-per-country and games-sessions-normalised can now be inspected in the Lenses Explore screen:

Conclusion

In this tutorial, we learned how to read data from multiple topics, combine it, and process in different ways and save it in as many output topics as needed.

Good luck and happy streaming!

Aggregations

Aggregating data in a table

This page describes a tutorial to aggregate Kafka topic data into a table using Lenses SQL Processors.

In this tutorial, we will see how data in a table can be aggregated continuously using GROUP BY and how the aggregated results are emitted downstream.

In Lenses SQL you can read your data as a TABLE and quickly aggregate over it using the GROUP BY clause and SELECT TABLE.

Setting up our example

Let’s assume that we have a topic (game-sessions) containing data regarding remote gaming sessions by users.

Each gaming session will contain:

the points the user achieved throughout the session
Metadata information regarding the session:
- The country where the game took place
- The language the user played the game in

The above structure represents the value of each record in our game-sessions topic.

Additionally, each record will be keyed by user information, including the following:

A pid, or player id, representing this user uniquely
Some additional denormalised user details:
- a name
- a surname

Putting denormalised data in keys is not something that should be done in a production environment.

In light of the above, a record might look like the following (in JSON for simplicity):

We can replicate such structure using SQL Studio and the following query:

We can then use SQL Studio again to insert the data we will use in the rest of the tutorial:

Count the users that are in a given country

Now we can start processing the data we have inserted above.

Let’s imagine that we are told that we want to keep a running count of how many users are in a given country. To do this, we can assume that a user is currently in the same country where his last game took place.

We can achieve the above with the following query:

The content of the output topic, groupby-table-country, can now be inspected in the Lenses Explore screen and it will look similar to this:

The key results to notice here are the ones for Spain and the UK:

Spain is 2 because Jorge and Dave had their last game played there.
UK is 1 because, while Nigel had his only game played there, Dave initially played from the UK

The last point from above is the main difference (and power) of Tables vs. Streams: they represent the latest state of the world for each of their keys, so any aggregation will apply only on that latest data. If this is not clear enough.

Given what a Table is, it will have by definition only a single value for any given key, so doing GROUP BY _key on a Table is a pointless operation because it will always only generate 1-element groups.

Calculate the total and average points of games played in a given language

We can expand on the example from the previous section, imagining that our requirement was extended.

Just as before, we want to calculate statistics based on the current country of a user, as defined in Example 1, but now we want to know all the following:

count how many users are in a given country
what is the total amount of points these users achieved
what is the average amount of points these users achieved

All of the above can be achieved with the following query:

The content of the output topic, groupby-table-country-multi, can now be inspected in the Lenses Explore screen and it will look similar to this:

One thing to highlight here is that the functions we are using in this query (COUNT, SUM, and AVG) all support aggregating over Tables. However, that is not true of all functions. To find out which functions support Tables and which ones only support Streams.

Filtering aggregation data

We will cover one final scenario where we want to filter some data within our aggregation.

There are two possible types of filtering we might want to do when it comes to aggregations:

Pre-aggregation: we want some rows to be ignored by the grouping, so they will not be part of the calculation done by aggregation functions. In these scenarios, we will use the WHERE clause.
Post-aggregation: we want to filter the aggregation results themselves so that those aggregated records that meet some specified condition are not emitted at all. In these scenarios, we will use the HAVING clause.

Let’s see an example.

We want to calculate the statistics from Example 2, but grouping by the session language. Here we will make again the assumption that a user’s language is represented only by his latest recorded game session.

Additionally, we are only interested in languages used by players who don’t achieve a high total of points (we might want to focus our marketing team’s effort there, to keep them entertained). Finally, we are aware that some users have been using VPNs to access our platform, so we want to exclude some records from our calculations if a given user appeared to have played from a given country.

For the sake of this example, we will:

Show statistics for languages with total points lower than 100
Ignore sessions that Dave made from Spain (because we know he was not there)

The query for all of the above is:

The content of the output topic, groupby-table-language-filtered, can now be inspected in the Lenses Explore screen and it will look similar to this:

Notice that IT (which is the only language that has 120 points in total) appears in the output but without any data in the value section.

This is because aggregations are Tables, and the key IT used to be present (while it was lower than 100), but then it was removed. Deletion is expressed, in Tables, by setting the value section of a record to null, which is what we are seeing here.

Conclusion

In this tutorial, you learned how to use aggregation over Tables to:

group by arbitrary fields, based on the latest state of the world
calculate multiple results in a single processor
filtering both the data that is to be aggregated and the one that will be emitted as a result of the aggregation itself

Good luck and happy streaming!

Aggregating streams

This page describes a tutorial to aggregate data Kafka topic data into a stream using Lenses SQL Processors

In this tutorial we will see how data in a stream can be aggregated continuously using GROUP BY and how the aggregated results are emitted downstream.

In Lenses SQL you can read your data as a STREAM and quickly aggregate over it using the GROUP BY clause and SELECT STREAM

Setting up our example

Let’s assume that we have a topic (game-sessions) that contains data regarding remote gaming sessions by users.

Each gaming session will contain:

the points the user achieved throughout the session
Metadata information regarding the session:
- The country where the game took place
- The language the user played the game in

The above structure represents the value of each record in our game-sessions topic.

Additionally, each record will be keyed by user information, including the following:

A pid, or player id, representing this user uniquely
Some additional denormalised user details:
- a name
- a surname

Keep in mind this is just an example in the context of this tutorial. Putting denormalised data in keys is not something that should be done in a production environment.

In light of the above, a record might look like the following (in json for simplicity):

We can replicate such structure using SQL Studio and the following query:

We can then use SQL Studio again to insert the data we will use in the rest of the tutorial:

Count how many games each user played

Now we can start processing the data we have inserted above.

One requirement could be to count how many games each user has played. Additionally, we want to ensure that, should new data come in, it will update the calculations and return the up to date numbers.

We can achieve the above with the following query:

The content of the output topic, groupby-key, can now be inspected in the Lenses Explore screen and it will look similar to this:

As you can see, the keys of the records did not change, but their value is the result of the specified aggregation.

You might have noticed that groupby-key has been created as a compacted topic, and that is by design.

All aggregations result in a Table because they maintain a running, fault-tolerant, state of the aggregation and when the result of an aggregation is written to a topic, then the topic will need to reflect these semantics (which is what a compacted topic does).

Add each user’s best results, and the average over all games

We can expand on the example from the previous section. We now want to know, for each user, the following:

count how many games the user has played
what are the user’s best 3 results
what is the user’s average of points

All the above can be achieved with the following query:

The content of the output topic, groupby-key-multi-aggs, can now be inspected in the Lenses Explore screen, and it will look similar to this:

Gather statistics about users playing from the same country and using the same language

Our analytics skills are so good that we are now asked for more. We now want to calculate the same statistics as before, but grouping together players that played from the same country and used the same language.

Here is the query for that:

The content of the output topic, groupby-country-and-language, can now be inspected in the Lenses Explore screen and it will look similar to this:

Notice how we projected sessionMetadata.language as sessionLanguage in the query. We could do that because sessionMetadata.language is part of the GROUP BY clause. Lenses SQL only supportsas Full Group By mode, so if the projected field is not part of the GROUP BY clause, the query will be invalid.

Filtering aggregation data

One final scenario we will cover in this tutorial is when we want to filter some data within our aggregation.

There are two possible types of filtering we might want to do, when it comes to aggregations:

Pre-aggregation: we want some rows to be ignored by the grouping, so they will not be part of the calculation done by aggregation functions. In these scenarios we will use the WHERE clause.
Post-aggregation: we want to filter the aggregation results themselves, so that those aggregated records which meet some specified condition are not emitted at all. In these scenarios we will use the HAVING clause.

Let’s see an example.

We want calculate the usual statistics from the previous scenarios, but grouping by the session language only. However, we are interested only in languages that are used a small amount of times (we might want to focus our marketing team’s effort there); additionally, we are aware that some users have been using VPNs to access our platform, so we want to exclude some records from our calculations, if a given user appeared to have played from a given country.

For the sake of this example, we will:

Show statistics for languages that are used less than 9 times
Ignore sessions that Dave made from Spain (because we know he was not there)

The query for all the above is:

The content of the output topic, groupby-language-filtered, can now be inspected in the Lenses Explore screen and it will look similar to this:

Notice that IT (which is the only language that has 9 sessions in total) appears in the output but without any data in the value section.

This is because aggregations are Tables, and the key IT used to be present (while it was lower than 9), but then it was removed. Deletion is expressed, in Tables, by setting the value section of a record to null, which is what we are seeing here.

Conclusion

In this tutorial you learned how to use aggregation over Streams to:

group by the current key of a record
calculate multiple results in a single processor
group by a combination of different fields of the input record
filtering both the data that is to be aggregated, and the one that will be emitted by the aggregation itself

You achieved all the above using Lenses SQL engine.

You can now proceed to learn about more complex scenarios like aggregation over Tables and windowed aggregations.

Good luck and happy streaming!

Time window aggregations

This page describes a tutorial to perform time windowed aggregations on Kafka topic data with Lenses SQL Processors.

In this tutorial we will see how data in a Stream can be aggregated continuously using GROUP BY over a time window and the results are emitted downstream.

In Lenses SQL you can read your data as a STREAM and quickly aggregate over it using the GROUP BY clause and SELECT STREAM

Complex types

Unwrapping complex types

This page describes a tutorial to unwrap a complex data type in a Kafka topic using Lenses SQL Processors.

In this example, we will show how Lenses can be used to transform complex data types into simple primitive ones.

Setting up

We start this tutorial by creating a topic which will hold information regarding visits to our website:

CREATE TABLE lenses_monitoring(
   _key.landing_page string
  , _key.user string
  , time_spent_s int
)
FORMAT(avro, avro);

Firstly we’ll add some data to our newly created topic:

INSERT INTO lenses_monitoring(
    _key.landing_page
    , _key.user
    , time_spent_s
) VALUES
("homepage", "anon_21", 30),
("why-lenses", "anon_32", 45),
("use-cases", "anon_56", 12),
("customers", "anon_36", 12),
("use-cases", "anon_126", 12);

Unwrapping the data

For example, let’s say we’re interested in sending this data to a service that analyses the time spent on a page and how it changes over time.

This system has a caveat though it only accepts data where keys are specified as strings and values are specified as integers.

Rather than having to reimplement our analysis system, we can create a SQL Processor that will continuously send data to a new topic in a format the target system can work with:

Notice the addition of the as _key and as _value aliases; these tell lenses to “unwrap” the values; effectively making lenses write them as primitive types (string and integer respectively) instead of (in this particular case) Avro objects.

Lenses will also automatically infer the format of each topic facet, in this case it set them to STRING and INT respectively.

Working with Arrays

This page describes a tutorial on how to work with array data in your Kafka topics using Lenses SQL Processors.

In this tutorial, we will see how to use Lenses SQL to extract, manipulate and inspect the single elements of an array.

In Lenses SQL you can use a LATERAL JOIN to treat the elements of an arrays as a normal field.

You will learn to:

Extract the single elements of an array with a single

Controlling event time

This describes how to control event time for data in your Kafka topics with Lenses SQL Processors.

Every message in Kafka comes with a timestamp, and Lenses Engine Streaming mode uses that by default when doing time-dependent operations, like aggregations and joins.

Sometimes though that timestamp is not exactly what you need, and you would like to use a field in the record value or key as the new timestamp.

In Lenses SQL you can use the special EVENTTIME BY ... syntax to control records timestamp.

Setting up our example

In our toy example, we have a simple topic where electricity meter readings events are collected:

We can also insert some example data to do our experiments:

If you query the events, you can see that Kafka sets a timestamp for each record. That timestamp is, in our case, the time of when the record was inserted. As you can see, it is totally unrelated to the event_time field we have in the payload.

Computing moving averages

We would like to transform our original stream of events, aggregating events with a hopping window of 10s width and an increment of 5s, computing the average for each window.

You can create a new processor that streams those averages, using the special WINDOW BY ... syntax:

For customer 1, we have three events in input, with a 5s delay between them, so we expect four output events for that customer, since 4 is the number of hopping windows involved.

ButChecking the emitted records we see that only two are produced.

This is because by default windowing operations works on the record timestamp, and in our case all the timestamps are pretty much the same, and they coincide with the time the records were inserted.

Fortunately e can change this behavior using the special EVENTTIME BY ... syntax, specifying an expression to be used as a timestamp:

As you can see, the results have been windowed using event_time as the timestamp:

INSERT INTO customer_details( _key.customer.typeID , _key.customer.id , customer.name , customer.middleName , customer.surname , customer.nationality , customer.passportNumber , customer.phoneNumber , customer.email , customer.address , customer.country , customer.driverLicense , package.typeID , package.description , active ) VALUES ("userType1","5162258362252394","April","-","Paschall","GBR","APGBR...","1999153354","[email protected]","-","GBR","-","TypeA","Desc.",true), ("internal","5290441401157247","Charisse","-","Daggett","USA","CDUSA...","6418577217","[email protected]","-","USA","-","TypeC","Desc.",true), ("internal","5397076989446422","Gibson","-","Chunn","USA","GCUSA...","8978860472","[email protected]","-","USA","-","TypeC","Desc.",true), ("partner","5248189647994492","Hector","-","Swinson","NOR","HSNOR...","8207437436","[email protected]","-","NOR","-","TypeA","Desc.",true), ("userType1","5196864976665762","Booth","-","Spiess","CAN","BSCAN...","6220504387","[email protected]","-","CAN","-","TypeA","Desc.",true), ("userType2","5423023313257503","Hitendra","-","Sibert","SWZ","HSSWZ...","6731834082","[email protected]","-","SWZ","-","TypeA","Desc.",true), ("userType2","5337899393425317","Larson","-","Asbell","SWE","LASWE...","2844252229","[email protected]","-","SWE","-","TypeA","Desc.",true), ("partner","5140590381876333","Zechariah","-","Schwarz","GER","ZSGER...","4936431929","[email protected]","-","GER","-","TypeB","Desc.",true), ("internal","5524874546065610","Shulamith","-","Earles","FRA","SEFRA...","2119087327","[email protected]","-","FRA","-","TypeC","Desc.",true), ("userType1","5204216758311612","Tangwyn","-","Gorden","GBR","TGGBR...","9172511192","[email protected]","-","GBR","-","TypeA","Desc.",true), ("userType1","5336077954566768","Miguel","-","Gonzales","ESP","MGESP...","5664871802","[email protected]","-","ESP","-","TypeA","Desc.",true), ("userType3","5125835811760048","Randie","-","Ritz","NOR","RRNOR...","3245795477","[email protected]","-","NOR","-","TypeA","Desc.",true), ("userType1","5317812241111538","Michelle","-","Fleur","FRA","MFFRA...","7708177986","[email protected]","-","FRA","-","TypeA","Desc.",true), ("userType1","5373595752176476","Thurborn","-","Asbell","GBR","TAGBR...","5927996719","[email protected]","-","GBR","-","TypeA","Desc.",true), ("userType3","5589753170506689","Noni","-","Gorden","AUT","NGAUT...","7288041910","[email protected]","-","AUT","-","TypeA","Desc.",true), ("userType2","5588152341005179","Vivian","-","Glowacki","POL","VGPOL...","9001088901","[email protected]","-","POL","-","TypeA","Desc.",true), ("partner","5390713494347532","Elward","-","Frady","USA","EFUSA...","2407143487","[email protected]","-","USA","-","TypeB","Desc.",true), ("userType1","5322449980897580","Severina","-","Bracken","AUT","SBAUT...","7552231346","[email protected]","-","AUT","-","TypeA","Desc.",true);

INSERT INTO customer_call_details( _key.customer.typeID , _key.customer.id , callInfoCustomerID , callInfoType , callInfoDuration , callInfoInit ) VALUES ("userType1", "5322449980897580","5322449980897580", "CallTypeA", 470, 0), ("internal", "5290441401157247","5290441401157247", "CallTypeC", 67, 0), ("partner", "5140590381876333","5140590381876333", "CallTypeB", 377, 0), ("internal", "5397076989446422","5397076989446422", "CallTypeC", 209, 0), ("userType2", "5337899393425317","5337899393425317", "CallTypeA", 209, 0), ("partner", "5140590381876333","5140590381876333", "CallTypeB", 887, 0), ("userType1", "5322449980897580","5322449980897580", "CallTypeA", 203, 0), ("partner", "5140590381876333","5140590381876333", "CallTypeB", 1698, 0), ("userType3", "5589753170506689","5589753170506689", "CallTypeA", 320, 1), ("internal", "5290441401157247","5290441401157247", "CallTypeC", 89, 0), ("partner", "5140590381876333","5140590381876333", "CallTypeB", 355, 0), ("internal", "5290441401157247","5290441401157247", "CallTypeC", 65, 0), ("userType2", "5337899393425317","5337899393425317", "CallTypeA", 43, 1), ("partner", "5390713494347532","5390713494347532", "CallTypeB", 530, 0), ("internal", "5397076989446422","5397076989446422", "CallTypeC", 270, 0), ("userType3", "5589753170506689","5589753170506689", "CallTypeA", 1633, 0), ("internal", "5290441401157247","5290441401157247", "CallTypeC", 110, 0), ("userType1", "5322449980897580","5322449980897580", "CallTypeA", 540, 0), ("internal", "5290441401157247","5290441401157247", "CallTypeC", 168, 0), ("userType3", "5589753170506689","5589753170506689", "CallTypeA", 1200, 0), ("internal", "5290441401157247","5290441401157247", "CallTypeC", 1200, 0), ("partner", "5390713494347532","5390713494347532", "CallTypeB", 22, 0), ("userType3", "5589753170506689","5589753170506689", "CallTypeA", 333, 1), ("internal", "5397076989446422","5397076989446422", "CallTypeC", 87, 0), ("partner", "5390713494347532","5390713494347532", "CallTypeB", 123, 0), ("userType2", "5337899393425317","5337899393425317", "CallTypeA", 182, 1), ("partner", "5140590381876333","5140590381876333", "CallTypeB", 844, 0), ("partner", "5390713494347532","5390713494347532", "CallTypeB", 56, 1), ("internal", "5397076989446422","5397076989446422", "CallTypeC", 36, 0), ("partner", "5140590381876333","5140590381876333", "CallTypeB", 794, 0), ("userType3", "5589753170506689","5589753170506689", "CallTypeA", 440, 0), ("internal", "5397076989446422","5397076989446422", "CallTypeC", 52, 0), ("userType1", "5322449980897580","5322449980897580", "CallTypeA", 770, 0), ("internal", "5397076989446422","5397076989446422", "CallTypeC", 627, 0), ("partner", "5140590381876333","5140590381876333", "CallTypeB", 555, 0), ("userType2", "5337899393425317","5337899393425317", "CallTypeA", 55, 1);

SQL Processors

Data formats

Changing data formats

Requirements

Creating the JSON data

Creating the AVRO topic

Rekeying data

Controlling AVRO record names and namespaces

Setting up our input topic

Create a simple SQL Processor

Set the record name and the namespace of the value schema

Set the record name and the namespace of the key schema

More control on the topic affected by the setting

Changing the shape of data

Setting up our example

Simple projections

Using built-in functions

Composite objects

Filtering & Joins

Filtering data

Setting up our example

Enriching data streams

Streaming SQL enrichment on Apache Kafka

Testing data

CREATE TOPIC customer_details

POPULATE TOPIC customer_details

CREATE TOPIC customer_call_details

POPULATE TOPIC customer_call_details

Validate results

Joining streams of data

Streaming SQL enrichment on Apache Kafka

Testing data

CREATE TOPIC orders

POPULATE TOPIC orders

CREATE TOPIC shipments

POPULATE TOPIC shipments

Validate results

Conclusion

Using multiple topics

Setting up our example

Multiple transformations all in one go

Conclusion

Aggregations

Aggregating data in a table

Setting up our example

Count the users that are in a given country

Calculate the total and average points of games played in a given language

Filtering aggregation data

Conclusion

Aggregating streams

Setting up our example

Count how many games each user played

Add each user’s best results, and the average over all games

Gather statistics about users playing from the same country and using the same language

Filtering aggregation data

Conclusion

Time window aggregations

Complex types

Unwrapping complex types

Setting up

Unwrapping the data

Working with Arrays

Controlling event time

Setting up our example

Computing moving averages

SQL Processors

Data formats

Changing data formats

Requirements

Creating the JSON data

Creating the AVRO topic

Filtering & Joins

Complex types

Aggregations

Rekeying data

Controlling event time

Setting up our example

Computing moving averages

Unwrapping complex types

Setting up

CREATE TOPIC `customer_details`

POPULATE TOPIC `customer_details`

CREATE TOPIC `customer_call_details`

POPULATE TOPIC `customer_call_details`

CREATE TOPIC `orders`

POPULATE TOPIC `orders`

CREATE TOPIC `shipments`

POPULATE TOPIC `shipments`

CREATE TOPIC `customer_details`

POPULATE TOPIC `customer_details`

CREATE TOPIC `customer_call_details`

POPULATE TOPIC `customer_call_details`

CREATE TOPIC `orders`

POPULATE TOPIC `orders`

CREATE TOPIC `shipments`

POPULATE TOPIC `shipments`