Querying a Table or View

For the SQL engine, the term table is equivalent to Kafka topic. The two are used interchangeably.

The powerful SELECT allows the user to get the data is looking for quickly. The full syntax is described below.

A SELECT statement has a few parts to it:

  • SET - Extending the select context by setting variables which impact the execution.
  • SELECT - Defines parts of the record to return or function calls.
  • FROM - Indicates the table or view from which to retrieve rows.
  • JOIN - Indicates the type of join to execute between $Table1 and $Table2.
  • $JoinCondition - A set of conditions for which two records, one from $Table and the other one from $Table2 can be combined.
  • WHERE - A set of condition(-s) the record must satisfy.
  • GROUP BY - Indicates the records should be combined based on the sql expression which follows the syntax.
  • $GroupingExpr - A SQL expression which could use fields or even function calls. The result of this expression drives the grouping criteria.
  • HAVING - Filters on the summarized output of GROUP BY
  • $HavingExpr - A condition the summarized groups must satisfy
  • LIMIT - A restriction set on how many records to return.
[ SET max.size = 1000000;]
[ SET max.query.time = 5000;]
[ SET {any.kafka.consumer.setting}= '$value';]
SELECT select_expr [, select_expr ...]
FROM $Table1
    [JOIN $Table2 on $JoinCondition]
[WHERE $condition]
[GROUP BY $GroupingExpr]
[HAVING $HavingExpr]
[LIMIT N]

A wildcard select returns the entire row, whereas if specific fields are selected then only parts of the row data is returned. The statement is not limited to selecting fields (nested and array elements are included) but allows calling functions, aggregating values (count, sum, min, max, etc.) or even joins of two tables.

Considering the previous step where records were created for the customer and customer_avro tables, a user can run the following query to retrieve them back:

SELECT * FROM customer_avro;

SELECT * FROM customer;

SELECT *
FROM customer_avro
WHERE address.city = 'New York';

SELECT _key as id, address.line as address, address.postcode as code
FROM customer_avro
WHERE address.city = 'New York';

Examples of the many types of statements a user can run can be found under the Learn in 10 minutes chapter.

Live Aggregations

When executing an aggregated query - that is a select statement that contains an aggregate function such as count or sum - the SQL engine has the ability to preview the results in real-time. This allows partial results to be made available as the query is executing, refreshed continually as the query executes.

To enable this functionality include the set statement as follows:

SET live.aggs = true;

For example, a sum executing on a topic of 100m records will take several minutes to return as the SQL engine must perform the sum aggregation for all records. By enabling live aggregations the current value of sum will be periodically updated as the query is running.

A full example follows:

SET live.aggs = true;
SELECT sum(total), country FROM payments GROUP BY country;