View the latest documentation 5.5
This document takes the reader through typical use cases for querying existing data. Each exercise first builds the topics and insert the data before proceeding to the actual SELECT statement.
SELECT
Let’s create an Apache Kafka topic which stores String values for both Key and Value and then inserts one entry. The CREATE TABLE is used in the SQL engine to create a Kafka topic.
CREATE TABLE
CREATE TABLE greetings( _key STRING , _value STRING ) FORMAT (string, STRING); INSERT INTO greetings(_key, _value) VALUES ("Hello", "World");
Go ahead and run your first SELECT query now:
SELECT * FROM greetings
To remove the topic we just created use the DROP statement below:
DROP
DROP TABLE greetings
Each Kafka message contains information related to partition, offset, timestamp, and topic. Additionally the engine adds the key and value raw byte size.
Create a topic and insert a few entries.
CREATE TABLE tutorial( _key STRING , name STRING , difficulty INT ) FORMAT (Avro, Avro); INSERT INTO tutorial(_key, name, difficulty) VALUES ("1", "Learn Lenses SQL", 3), ("2", "Learn Quantum Physics", 10), ("3", "Learn French", 5);
Now we can query for specific metadata related to the records.
To query for metadata such as the underlying Kafka topic offset, partition and timestamp prefix your desired fields with _meta.
_meta
Run the following query to see each tutorial name along with its metadata information:
SELECT name , _meta.offset , _meta.timestamp , _meta.partition , _meta.__keysize , _meta.__valsize FROM tutorial /* The output is (timestamp will be different) Learn Lenses SQL 0 1540575169198 0 7 23 Learn Quantum Physics 1 1540575169226 0 7 28 Learn French 2 1540575169247 0 7 19 */
SQL expressions allow you to perform basic arithmetic.
The common arithmetic operations +, -, *, / and % are supported.
+
-
*
/
%
Create a grocery table and insert data into it:
CREATE TABLE groceries ( _key INT , id INT , name STRING , quantity INT , price DOUBLE) FORMAT(INT, Avro); INSERT INTO groceries( _key , id , name , quantity , price) VALUES (1, 1, "Fairtrade Bananas", 1, 1.90), (2, 2, "Meridian Crunchy Peanut Butter", 1, 2.50), (3, 3, "Green & Black's organic 85% dark chocolate bar", 2, 1.99), (4, 4, "Activia fat free cherry yogurts", 3, 2.00), (5, 5, "Green & Blacks Organic Chocolate Ice Cream", 2, 4.20);
Now calculate the price of each purchase:
SELECT name , quantity * price AS total FROM groceries /*The output: Fairtrade Bananas 1.9 Meridian Crunchy Peanut Butter 2.5 Green & Black's organic 85% dark chocolate bar 3.98 Activia fat free cherry yogurts 6 Green & Blacks Organic Chocolate Ice Cream 8.4 */
Functions can be used directly. See the list of SQL functions.
For example, the ROUND function allows you to round numeric functions:
ROUND
SELECT name , ROUND(quantity * price) AS rounded_total FROM groceries /*The output: Fairtrade Bananas 2 Meridian Crunchy Peanut Butter 3 Green & Black's organic 85% dark chocolate bar 4 Activia fat free cherry yogurts 6 Green & Blacks Organic Chocolate Ice Cream 8 */
String functions allow you to perform different operations on text. For example, the CONCAT function allows combining multiple values. Let’s first create a customer table:
CONCAT
CREATE TABLE customers ( _key STRING , first_name STRING , last_name STRING ) FORMAT(string, Avro); INSERT INTO customers(_key, first_name, last_name) VALUES ("mikejones", "Mike", "Jones"), ("anasmith", "Ana", "Smith"), ("shannonelliott", "Shannon","Elliott");
Next query combines first and last names into full names:
SELECT CONCAT(first_name, " ", last_name) AS name FROM customers /* Output Mike Jones Ana Smith Shannon Elliott */
WHERE clause allows you to define a set of logical predicates the data needs to match in order to be returned. Standard comparison operators are supported (>, >=, <, <=, =, and !=) as well as calling functions.
WHERE
>
>=
<
<=
=
!=
We are going to use the groceries table created earlier. Select all items purchased where the prices are greater or equal to 2.00:
groceries
SELECT name , price FROM groceries WHERE price >= 2.0 /* Output Meridian Crunchy Peanut Butter 2.5 Activia fat free cherry yogurts 2 Green & Blacks Organic Chocolate Ice Cream 4.2 */
Select all customers whose last name length equals to 5:
SELECT * FROM customers WHERE LEN(last_name) = 5 /* Output key value.first_name value.last_name mikejones Mike Jones anasmith Ana Smith */
Search all customers containing Ana in their first name:
Ana
SELECT * FROM customers WHERE first_name LIKE '%Ana%'
Keep in mind that text search is case sensitive. To use case insensitive text search, you can write:
SELECT * FROM customers WHERE LOWERCASE(first_name) LIKE '%ana%'; -- And here is the negated version SELECT * FROM customers WHERE LOWERCASE(first_name) NOT LIKE '%ana%';
Sometimes data can contain explicit NULL values, or it can omit fields entirely. Using IS [ NOT ] NULL, or EXISTS functions allows you to check for these situations.
Exists is a keyword in Lenses SQL grammar so it needs to be escaped, the escape character is `````.
Exists
Lenses supports JSON. JSON does not enforce a schema allowing you to insert null values.
Create the following table named customers_json:
customers_json
CREATE TABLE customers_json ( _key STRING , first_name STRING , last_name STRING , middle_name STRING ) FORMAT(string, json); INSERT INTO customers_json(_key, first_name, last_name, middle_name) VALUES("mikejones", "Mike", "Jones", "Albert"); INSERT INTO customers_json(_key, first_name, last_name) VALUES("anasmith", "Ana", "Smith"); INSERT INTO customers_json(_key, first_name, last_name) VALUES("shannonelliott", "Shannon","Elliott");
Query this table for all its entries:
SELECT * FROM customers_json /* The output key value.first_name value.middle_name value.last_name mikejones Mike Albert Jones anasmith Ana Smith shannonelliott Shannon Elliott */
The middle_name is only present on the mikejones record.
middle_name
mikejones
Write a query which filters out records where middle_name is not present:
SELECT * FROM customers_json WHERE `EXISTS`(middle_name) /* The output key value.first_name value.middle_name value.last_name mikejones Mike Albert Jones */
This is can also be written as:
SELECT * FROM customers_json WHERE middle_name IS NULL
When a field is actually NULL or is missing, checking like in the above query has the same outcome.
NULL
You can use AND/OR to specify complex conditions for filtering your data.
To filter the purchased items where more than one item has been bought for a given product, and the unit price is greater than 2:
SELECT * FROM groceries WHERE quantity > 1 AND price > 2
Now try changing the AND logical operand to OR and see the differences in output.
To limit the output of the query you can use two approaches:
LIMIT
-- limit to 1 record SELECT * FROM groceries LIMIT 1 -- set the data size returned to be 1 megabyte. SET max.size="1m"; -- on the small dataset we have here 1 MB will accommodate all records added and more SELECT * FROM groceries
To sample data and discard the first rows:
SELECT * FROM groceries LIMIT 1,2
This statement instructs Lenses to skip the first record matched and then sample the next two.
To select data from a specific partition access the meta data of the topic.
In the following example, a table is created with three partitions and the message key is hashed and then the remainder from HashValue % partitions will be the table partition the record is sent to.
HashValue % partitions
-- Run CREATE TABLE customers_partitioned ( _key STRING , first_name STRING , last_name STRING ) FORMAT(string, Avro) properties(partitions = 3); INSERT INTO customers_partitioned( _key , first_name , last_name) VALUES ("mikejones", "Mike", "Jones"), ("anasmith", "Ana", "Smith"), ("shannonelliott", "Shannon","Elliott"), ("tomwood", "Tom","Wood"), ("adelewatson", "Adele","Watson"), ("mariasanchez", "Maria", "Sanchez");
Next, run the following query:
SELECT * FROM customers_partitioned /* The output offset partition timestamp key value.first_name value.last_name 0 0 1540830780401 mikejones Mike Jones 1 0 1540830780441 anasmith Ana Smith 2 0 1540830780464 shannonelliott Shannon Elliott 0 2 1540831270170 mariasanchez Maria Sanchez 0 1 1540830984698 tomwood Tom Wood 1 1 1540831183308 adelewatson Adele Watson */
As you can see from the results (your timestamps will be different) the records span over the three partitions. Now query specific partitions:
-- selects only records from partition = 0 SELECT * FROM customers_partitioned WHERE _meta.partition = 0; -- selects only records from partition 0 and 2 SELECT * FROM customers_partitioned WHERE _meta.partition = 0 OR _meta.partition = 2;
Using the COUNT aggregate function you can count the records in a table. Run the following SQL to see how many records we have on the customers_partitioned:
COUNT
customers_partitioned
SELECT COUNT(*) AS total FROM customers_partitioned
Using the SUM function you can sum records in a table.
SUM
SELECT SUM(quantity * price) AS amount FROM groceries
To group data use the GROUP BY clause:
GROUP BY
CREATE TABLE customer ( firstName STRING , lastName STRING , city STRING , country STRING , phone STRING) FORMAT(string, avro) properties(compacted=true); INSERT INTO customer ( _key ,firstName , lastName , city , country , phone) VALUES ('1','Craig', 'Smith', 'New York', 'USA', '1-01-993 2800'), ('2','William', 'Maugham','Toronto','Canada','+1-613-555-0110'), ('3','William', 'Anthony','Leeds','UK', '+44 1632 960427'), ('4','S.B.','Swaminathan','Bangalore','India','+91 7513200000'), ('5','Thomas','Morgan','Arnsberg','Germany','+49-89-636-48018'), ('6','Thomas', 'Merton','New York','USA', '+1-617-555-0147'), ('7','Piers','Gibson','London','UK', '+44 1632 960269'), ('8','Nikolai','Dewey','Atlanta','USA','+1-404-555-0178'), ('9','Marquis', 'de Ellis','Rio De Janeiro','Brazil','+55 21 5555 5555'), ('10','Joseph', 'Milton','Houston','USA','+1-202-555-0153'), ('11','John','Betjeman Hunter','Sydney','Australia','+61 1900 654 321'), ('12','Evan', 'Hayek','Vancouver','Canada','+1-613-555-0130'), ('13','E.','Howard','Adelaide','Australia','+61 491 570 157'), ('14','C. J.', 'Wilde','London','UK','+44 1632 960111'), ('15','Butler', 'Andre','Florida','USA','+1-202-555-0107');
Let’s see how many customers there are from each country. Here is the code which computes that:
SELECT COUNT(*) AS count , country FROM customer GROUP BY country /* Output 2 Canada 3 UK 2 Australia 1 India 1 Brazil 5 USA 1 Germany */
The tabled query allows you to nest queries. Let us take the query in the previous section and say we are only interested in those entries where there exist more than 1 customer per country.
SELECT * FROM ( SELECT COUNT(*) AS count , country FROM customer GROUP BY country ) WHERE count > 1
Run the query, and you will only see those entries for which there is more than one person registered per country.
To filter the entries returned from a grouping query. As with the WHERE statement, you can use HAVING syntax to achieve the same result when it comes to grouped queries.
HAVING
SELECT COUNT(*) AS count , country FROM customer GROUP BY country HAVING count > 1
Lenses allows you to combine records from two tables. A query can contain zero, one or multiple JOIN operations.
Create an orders table and insert some data INTo it:
orders
CREATE TABLE orders( _key INT , orderDate STRING , customerId STRING , amount DOUBLE ) FORMAT(int, avro); INSERT INTO orders ( _key , orderDate , customerId , amount ) VALUES (1, '2018-10-01', '1', 200.50), (2, '2018-10-11', '1', 813.00), (3, '2018-10-11', '3', 625.20), (4, '2018-10-11', '14', 730.00), (5, '2018-10-11', '10', 440.00), (6, '2018-10-11', '9', 444.80);
With these tables in place, join them to get more information about an order by combining it with the customer information found in the customer table:
customer
SELECT o._key AS orderNumber , o.amount AS totalAmount , c.firstName , c.lastName , c.city , c.country FROM orders o INNER JOIN customer c ON o.customerId = c._key /* city orderNumber country totalAmount lastName firstName New York 1 USA 200.5 Smith Craig New York 2 USA 813 Smith Craig Leeds 3 UK 625.2 Anthony William Rio De Janeiro 6 Brazil 444.8 de Ellis Marquis Houston 5 USA 440 Milton Joseph London 4 UK 730 Wilde C. J. */
With lateral joins, Lenses allows you to combine records from a table with the elements of an array expression.
We are going to see in more details what lateral joins are with an example.
Create a batched_readings table and insert some data into it:
batched_readings
CREATE TABLE batched_readings( meter_id int , readings int[] ) FORMAT(int, AVRO); INSERT INTO batched_readings( meter_id , readings ) VALUES (1, [100, 80, 95, 91]), (2, [87, 93, 100]), (1, [88, 89, 92, 94]), (2, [81])
You now can use a LATERAL join to inspect, extract and filter the single elements of the readings array, as if they were a normal field:
readings
SELECT meter_id , reading FROM batched_readings LATERAL readings AS reading WHERE reading > 90
Running that query we will get the values:
You can use multiple LATERAL joins, one inside the other, if you want to extract elements from a nested array:
LATERAL
CREATE TABLE batched_readings_nested( sensor_id int , nested_readings int[][] ) FORMAT(int, AVRO); INSERT INTO batched_readings_nested( sensor_id , nested_readings ) VALUES (1, [[100, 101], [103]]), (2, [[80, 81], [82, 83, 82]]), (1, [[100], [103, 102], [104]])
Running the following query we will obtain the same records of the previous example:
SELECT meter_id , reading FROM batched_readings LATERAL nested_readings AS readings LATERAL readings as reading WHERE reading > 90
On this page