Text to JSON¶
In this page you will learn how to convert plain text log entries into JSON records using regular expressions.
Applying regex transformations to plain text data is a very common scenario especially when working with log files in order to filter out important information. At the same time, this will convert the format of the log file data into JSON records in order to have a more structured schema to work with.
The following SQL query will process the topic generated by the File Source
we created in the previous page - the name of that topic is
The SQL will be the following:
SET autocreate=true; INSERT INTO log_files_as_json WITH log_regex AS ( SELECT STREAM REGEXP(content, '.*\[(.*)\] (.*)') AS regx FROM var_log_broker WHERE _vtype='STRING_AS_JSON' ) SELECT STREAM regx AS time_date, regx AS the_text FROM log_regex WHERE regx IS NOT NULL
Notice that this query runs using an SQL Processor – SQL Processors are explained in the SQL Processors page. Its a long running process.
The regular expression used, which is
.*\[(.*)\] (.*), breaks each log entry
into two parts. The first part is the time field of each log entry and the second
part contains the text of the log entry. So, we are going to convert each existing record into
a JSON record with two fields - the
SELECT STREAM query gets the desired
fields from the output of the regular expression and puts them into the
log_files_as_json Kafka topic. Notice that the indexing in the array returned by the
regular expression begins with
You can find more about Regular Expressions in Lenses SQL here.