Processing application logs as they arrive is quite a common scenario. Looking at the log entries as a stream of data, which can be aggregated or even joined in real-time, can add a lot of value when it comes to monitoring what your applications are doing.

The Lenses SQL engine supports the projection of an unstructured text line to a well defined structure based on a regular expression used to extract the information. This way turning your log data into searchable data can be achieved in a matter of minutes.

Each log line is published to Kafka as a message of type STRING. To be able to project from the primitive type to a structured type, in this case JSON, the following decoder has been introduced STRING_TO_JSON. When used, it wraps the text in a simple structure and the message value is referenced via the content field.

SELECT content
FROM `topic`

Imagine there is a requirement to process the logs from a set of running web application and extract the information like incoming IP address, the API call, response code, etc. Here is an example of such a log entry:, - - [08/Feb/2017:16:33:27 +0100] "GET /api/house/get_for_compare?idn=33&amp;code=99992&amp;type= HTTP/1.1<em>" 404 19636 "" "Mozilla/5.0 (Linux; Android 5.0.1; GT-I9505 Build/LRX22C) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Mobile Safari/537.36"</em>

One possible regular expression to apply to the given input can be seen below:


Based on the above the output produced should be the following:

   "timestamp":"08/Feb/2017:16:33:27 +0100",

To achieve the above, the next SQL code should be used:

WITH log_regex as
  SELECT STREAM regexp(content, '(.+?)\\s-\\s-\\s\\[(.+?)\\]\\s\\\"(.+?)\\s(.+?)\\s(.+?)\\\"\\s(.+?)\\s(.+?)\\s\\\"(.+?)\\\"\\s\\\"(.+?)\\\".*') as regx
  FROM $fromTopic

  regx[1] as ip,
  regx[2] as `timestamp`,
  regx[3] as `type`,
  regx[4] as request,
  regx[6] as status,
  regx[7] as response_size,
  regx[8] as website
FROM log_regex
WHERE regx is not null

Above, is just a stream transformation. log_regex stream can be grouped or joined further if the requirements dictate to.