Tunning

The tabled based engine has a few parts which can be tuned. From controlling the resources involved when running a query to tweaking for performance the underlying Kafka records consumption, it can all be understood in this chapter. To instruct the engine what to do, a user would have to set a few parameters via the SET statement.

Control execution

Querying a table could take a long time if it contains a lot of records. The underlying Kafka topic has to be read and the filter conditions applied and the projections made. Even more, the SELECT statement could end up bringing a large amount to the client. To be able to constrain the resources involved the SQL engine allows for context customization, which would end up driving the execution thus giving the control to the user. Here is the list of context parameters to overwrite:

Name Description Usage
max.size
The maximum amount of data to return.
It can be expressed as bytes (1024b), as kilo-bytes (1024k)
, as mega-bytes (10m) or as giga-bytes (5g).
Default is 20MB
SET max.bytes = '20m';
will set a max of 20MB to be returned
max.query.time
The maximum amount of time the query
is allowed to run. It can be specified as millisecon (2000ms)
,as hours (2h), minutes (10m) or seconds (60s).
Default is 1 hour
SET max.time = '60000ms';
sets a one minute query limit
max.idle.time
The amount of time to wait
when no more records are read from the source
before the query is completed.
Default is 5 seconds.
SET max.idle.time = '5s';;
sets a maximum of 5 calls returning 0.
LIMIT N
The maximum of records to return.
Default is 10000
SELECT *
FROM payments
LIMIT 100;
skip.bad.records
Flag to drive the behavior
of handling topic records when their payload
does not correspond with the table storage format.
Default is true. This means bad records are
skipped. Set it to false to fail the query
on first invalid payload record.
SET skip.bad.records=false;

Query tuning

Lenses SQL uses Kafka Consumer to read the data. This means, an advanced user with knowledge of Kafka, could tweak the consumer properties to achieve better throughput. This would occur on very rare occasions. The query context can receive Kafka consumer settings. For example, to set the consumer max.poll.records can be set as:

SET max.poll.records= 100000;
SELECT *
FROM payments
LIMIT 1000000

Example

The fact is that streaming SQL is operating on un-bounded streams of events: A query would normally be a never-ending query. In order to bring query termination semantics into Apache Kafka we introduced 4 controls:

  • LIMIT 10000 - Force the query to terminate when 10,000 records are matched
  • max.bytes = 20000000 - Force the query to terminate once 20 MBytes have been retrieved
  • max.time = 60000 - Force the query to terminate after 60 seconds
  • max.zero.polls = 8 - Force the query to terminate after 8 consecutive polls are empty, indicating we have exhausted a topic

Thus when retrieving data, you can avoid setting a limit, and set max bytes to 1GByte and max query time for 1 hour

SET max.bytes = 1000000000;
SET max.time = 60000000;
SELECT * from topicA WHERE customer.id = "XXX"