Text to JSON

In this page you will learn how to convert plain text log entries into JSON records using regular expressions.

Applying regex transformations to plain text data is a very common scenario especially when working with log files in order to filter out important information. At the same time, this will convert the format of the log file data into JSON records in order to have a more structured schema to work with.

The following SQL query will process the topic generated by the File Source we created in the previous page - the name of that topic is var_log_broker. The SQL will be the following:

SET autocreate=true;

INSERT INTO log_files_as_json
WITH log_regex AS
(
    SELECT STREAM REGEXP(content, '.*\[(.*)\] (.*)') AS regx
    FROM var_log_broker
    WHERE _vtype='STRING_AS_JSON'
)

SELECT STREAM
  regx[1] AS time_date,
  regx[2] AS the_text
FROM log_regex
WHERE regx IS NOT NULL

Notice that this query runs using an SQL Processor – SQL Processors are explained in the SQL Processors page. Its a long running process.

The regular expression used, which is .*\[(.*)\] (.*), breaks each log entry into two parts. The first part is the time field of each log entry and the second part contains the text of the log entry. So, we are going to convert each existing record into a JSON record with two fields - the SELECT STREAM query gets the desired fields from the output of the regular expression and puts them into the log_files_as_json Kafka topic. Notice that the indexing in the array returned by the regular expression begins with 1, not 0.

You can find more about Regular Expressions in Lenses SQL here.