LogoLogo
HomeProductsDownload Community Edition
6.0
  • Lenses DevX
  • Kafka Connectors
6.0
  • Overview
  • What's New?
    • Version 6.0.5
      • Features / Improvements & Fixes
    • Version 6.0.4
      • Features / Improvements & Fixes
    • Version 6.0.3
      • Features / Improvements & Fixes
    • Version 6.0.2
    • Version 6.0.1
    • Version 6.0.0-la.2
      • Features / Improvements & Fixes
    • Version 6.0.0-la.1
      • Features / Improvements & Fixes
    • Version 6.0.0-la.0
      • Features / Improvements & Fixes
    • Version 6.0.0-alpha.20
      • Features / Improvements & Fixes
      • Helm
    • Version 6.0.0-alpha.19
      • Features / Improvements & Fixes
      • Helm
    • Version 6.0.0-alpha.18
      • Features / Improvements & Fixes
      • Helm
    • Version 6.0.0-alpha.17
      • Features / Improvements & Fixes
      • Helm
    • Version 6.0.0-alpha.16
    • Version 6.0.0-alpha.14
  • Getting Started
    • Setting Up Community Edition
      • Hands-On Walk Through of Community Edition
    • Connecting Lenses to your Kafka environment
      • Overview
      • Install
  • Deployment
    • Installation
      • Kubernetes - Helm
        • Deploying HQ
        • Deploying an Agent
      • Docker
        • Deploying HQ
        • Deploying an Agent
      • Linux
        • Deploying HQ
        • Deploying an Agent
    • Configuration
      • Authentication
        • Admin Account
        • Basic Authentication
        • SSO & SAML
          • Overview
          • Azure SSO
          • Google SSO
          • Keycloak SSO
          • Okta SSO
          • OneLogin SSO
          • Generic SSO
      • HQ
        • Configuration Reference
      • Agent
        • Overview
        • Provisioning
          • Overview
          • HQ
          • Kafka
            • Apache Kafka
            • Aiven
            • AWS MSK
            • AWS MSK Serverless
            • Azure EventHubs
            • Azure HDInsight
            • Confluent Cloud
            • Confluent Platform
            • IBM Event Streams
          • Schema Registries
            • Overview
            • AWS Glue
            • Confluent
            • Apicurio
            • IBM Event Streams Registry
          • Kafka Connect
          • Zookeeper
          • AWS
          • Alert & Audit integrations
          • Infrastructure JMX Metrics
        • Hardware & OS
        • Memory & CPU
        • Database
        • TLS
        • Kafka ACLs
        • Rate Limiting
        • JMX Metrics
        • JVM Options
        • SQL Processor Deployment
        • Logs
        • Plugins
        • Configuration Reference
  • User Guide
    • Environments
      • Create New Environment
    • Lenses Resource Names (LRNs)
    • Identity & Access Management
      • Overview
      • Users
      • Groups
      • Roles
      • Service Accounts
      • IAM Reference
      • Example Policies
    • Topics
      • Global Topic Catalogue
      • Environment Topic Catalogue
        • Finding topics & fields
        • Searching for messages
        • Inserting & deleting messages
        • Viewing topic metrics
        • Viewing topic partitions
        • Topic Settings
        • Adding metadata & tags to topics
        • Managing topic configurations
        • Approval requests
        • Downloading messages
        • Backup & Restore
    • SQL Studio
      • Concepts
      • Best practices
      • Filter by timestamp or offset
      • Creating & deleting Kafka topics
      • Filtering
      • Limit & Sampling
      • Joins
      • Inserting & deleting data
      • Aggregations
      • Metadata fields
      • Views & synonyms
      • Arrays
      • Managing queries
    • Applications
      • Connectors
        • Overview
        • Sources
        • Sinks
        • Secret Providers
      • SQL Processors
        • Concepts
        • Projections
        • Joins
        • Lateral Joins
        • Aggregations
        • Time & Windows
        • Storage format
        • Nullibility
        • Settings
      • External Applications
        • Registering via SDK
        • Registering via REST
    • Schemas
    • Monitoring & Alerting
      • Infrastructure Health
      • Alerting
        • Alert Reference
      • Integrations
      • Consumer Groups
    • Self Service & Governance
      • Data policies
      • Audits
      • Kafka ACLs
      • Kafka Quotas
    • Topology
    • Tutorials
      • SQL Processors
        • Data formats
          • Changing data formats
          • Rekeying data
          • Controlling AVRO record names and namespaces
          • Changing the shape of data
        • Filtering & Joins
          • Filtering data
          • Enriching data streams
          • Joining streams of data
          • Using multiple topics
        • Aggregations
          • Aggregating data in a table
          • Aggregating streams
          • Time window aggregations
        • Complex types
          • Unwrapping complex types
          • Working with Arrays
        • Controlling event time
      • SQL Studio
        • Querying data
        • Accessing headers
        • Deleting data from compacted topics
        • Working with JSON
    • SQL Reference
      • Expressions
      • Functions
        • Aggregate
          • AVG
          • BOTTOMK
          • COLLECT
          • COLLECT_UNIQUE
          • COUNT
          • FIRST
          • LAST
          • MAXK
          • MAXK_UNIQUE
          • MINK
          • MINK_UNIQUE
          • SUM
          • TOPK
        • Array
          • ELEMENT_OF
          • FLATTEN
          • IN_ARRAY
          • REPEAT
          • SIZEOF
          • ZIP_ALL
          • ZIP
        • Conditions
        • Conversion
        • Date & Time
          • CONVERT_DATETIME
          • DATE
          • DATETIME
          • EXTRACT_TIME
          • EXTRACT_DATE
          • FORMAT_DATE
          • FORMAT_TIME
          • FORMAT_TIMESTAMP
          • HOUR
          • MONTH_TEXT
          • MINUTE
          • MONTH
          • PARSE_DATE
          • PARSE_TIME_MILLIS
          • PARSE_TIME_MICROS
          • PARSE_TIMESTAMP_MILLIS
          • PARSE_TIMESTAMP_MICROS
          • SECOND
          • TIMESTAMP
          • TIME_MICROS
          • TIMESTAMP_MICROS
          • TIME_MILLIS
          • TIMESTAMP_MILLIS
          • TO_DATE
          • TO_DATETIME
          • TOMORROW
          • TO_TIMESTAMP
          • YEAR
          • YESTERDAY
        • Headers
          • HEADERASSTRING
          • HEADERASINT
          • HEADERASLONG
          • HEADERASDOUBLE
          • HEADERASFLOAT
          • HEADERKEYS
        • JSON
          • JSON_EXTRACT_FIRST
          • JSON_EXTRACT_ALL
        • Numeric
          • ABS
          • ACOS
          • ASIN
          • ATAN
          • CBRT
          • CEIL
          • COSH
          • COS
          • DEGREES
          • DISTANCE
          • FLOOR
          • MAX
          • MIN
          • MOD
          • NEG
          • POW
          • RADIANS
          • RANDINT
          • ROUND
          • SIGN
          • SINH
          • SIN
          • SQRT
          • TANH
          • TAN
        • Nulls
          • ISNULL
          • ISNOTNULL
          • COALESCE
          • AS_NULLABLE
          • AS_NON_NULLABLE
        • Obfuscation
          • ANONYMIZE
          • MASK
          • EMAIL
          • FIRST1
          • FIRST2
          • FIRST3
          • FIRST4
          • LAST1
          • LAST2
          • LAST3
          • LAST4
          • INITIALS
        • Offsets
        • Schema
          • TYPEOF
          • DUMP
        • String
          • ABBREVIATE
          • BASE64
          • CAPITALIZE
          • CENTER
          • CHOP
          • CONCAT
          • CONTAINS
          • DECODE64
          • DELETEWHITESPACE
          • DIGITS
          • DROPLEFT
          • DROPRIGHT
          • ENDSWITH
          • INDEXOF
          • LEN
          • LOWER
          • LPAD
          • MKSTRING
          • REGEXP
          • REGEX_MATCHES
          • REPLACE
          • REVERSE
          • RPAD
          • STARTSWITH
          • STRIPACCENTS
          • SUBSTR
          • SWAPCASE
          • TAKELEFT
          • TAKERIGHT
          • TRIM
          • TRUNCATE
          • UNCAPITALIZE
          • UPPER
          • UUID
        • User Defined Functions
        • User Defined Aggregate Functions
      • Deserializers
      • Supported data formats
        • Protobuf
  • Resources
    • Downloads
    • CLI
      • Environment Creation
    • API Reference
      • API Authentication
      • Websocket Spec
      • Lenses API Spec
        • Authentication
        • Environments
        • Users
        • Groups
        • Roles
        • Service Accounts
        • Meta
        • Settings
        • License
        • Topics
        • Applications
          • SQL Processors
          • Kafka Connectors
          • External Applications
        • Kafka ACLs & Quotas
        • Kafka Consumer Groups
        • Schema Registry
        • SQL Query Management
        • Data Policies
        • Alert Channels
        • Audit Channels
        • Provisioning State
        • Agent Metadata
        • Backup & Restore
        • As Code
Powered by GitBook
LogoLogo

Resources

  • Privacy
  • Cookies
  • Terms & Conditions
  • Community EULA

2024 © Lenses.io Ltd. Apache, Apache Kafka, Kafka and associated open source project names are trademarks of the Apache Software Foundation.

On this page
  • Setting up our example
  • Extracting fields from a JSON string
  • Filtering the results
  • Extract multiple values from a JSON string
  • Sql Processors
  • Conclusion

Was this helpful?

Export as PDF
  1. User Guide
  2. Tutorials
  3. SQL Studio

Working with JSON

This page describes a tutorial on how to work with JSON in Lenses SQL.

PreviousDeleting data from compacted topicsNextSQL Reference

Last updated 6 months ago

Was this helpful?

In this tutorial, we will see how to use Lenses SQL to process JSON strings using , a sort of XPath for JSON objects.

In Lenses SQL you can use a JSON_EXTRACT_FIRST() and JSON_EXTRACT_ALL() to navigate and transform JSON strings.

Setting up our example

We have a topic http_logs that collects the details of HTTP calls to a microservice. Basic details of the request are stored, like the URL and the HTTP method used. The payload of the requests is stored as well as a string.

We can create the topic and insert some example data through SQL Studio:

CREATE TABLE http_logs(_key string, method string, url string, content_type string, body string)
FORMAT (STRING, AVRO);

INSERT INTO http_logs(_key, method, url, content_type, body) VALUES
('event_1', 'POST', '/users', 'application/json', '{ "id": 1, "username": "juno", "contacts": [ { "type": "email", "value": "juno@example.org" }, { "type": "phone", "value": "12345" }] }'),
('event_2', 'POST', '/users', 'application/json', '{ "id": 2, "username": "max", "contacts": [ { "type": "email", "value": "max@example.org" }, { "type": "twitter", "value": "@max" }] }'),
('event_3', 'GET', '/users/1', '', ''),
('event_4', 'GET', '/users/2', '', ''),
('event_5', 'POST', '/users', 'application/json', '{ "id": 3, "username": "nic", "contacts": [ { "type": "email", "value": "nic@example.org" }, { "type": "phone", "value": "78910" }] }'),
('event_6', 'PUT', '/users/1', 'application/json', '{ "username": "juno", "contacts": [ { "type": "email", "value": "juno@domain.org" }] }'),
('event_7', 'POST', '/users', 'application/json', '{ "id": 4, "username": "john", "contacts": [ { "type": "email", "value": "john@example.org" }] }');

The HTTP method and the URL used for the request are stored in the method and url fields respectively, while the optional payload, and its content-type, are stored in the body and content_type fields.

As you can imagine the logs contained in this topic are quite generic, and different endpoints may have different content-types for their body. For this reason the best the system can do is storing the payload as a simple string, whenever that is possible.

This comes with some drawbacks: since the data is a simple string, and it is not structured, it is not possible to inspect it as we would do with a normal AVRO/JSON object.

Fortunately Lenses SQL offers a couple of handful functions that make our life easier in these kind of scenarios.

Extracting fields from a JSON string

Our first task is to find the username of users created with a call to POST /users.

JsonPath is a powerful way to traverse and extract elements from a JSON object. Explaining the full details of goes beyond the scope of this article, but in general it can be thought as a JSON version of XPath, the standard used to select elements from an XML document.

In our case, we would like to extract the name of the user just created. The simple path $.username will do it!

Let’s try to use it in a SELECT that we can run in SQL Studio:

SELECT
    JSON_EXTRACT_FIRST(body, '$.username') as username
FROM
    http_logs

That query will produce the results

{ "username": "\"juno\"" }
{ "username": "\"max\"" }
{ "username": null }
{ "username": null }
{ "username": "\"nic\"" }
{ "username": "\"juno\"" }
{ "username": "\"john\"" }

As you can see we have two entries for juno. That’s because the user was first created, and then modified later, with a PUT call.

Also, there are some null values. This is because JSON_EXTRACT_FIRST was not able to extract the username, either because the payload was not valid JSON, or because the field was not found.

We can fix this restricting our query to user creation calls:

SELECT
   JSON_EXTRACT_FIRST(body, '$.username') as username
FROM
   http_logs
WHERE
   method = "POST" AND
   url = "/users"

We have now only valid results:

{ "username": "\"juno\"" }
{ "username": "\"max\"" }
{ "username": "\"nic\"" }
{ "username": "\"john\"" }

Filtering the results

All Lenses SQL functions can be used in any part of the query. Thus JSON_EXTRACT_FIRST can be used in the projections, where, and group bys.

For example, you can run the query

SELECT
   JSON_EXTRACT_FIRST(body, '$.contacts[?(@.type=="email")].value') as email
FROM
   http_logs
WHERE
   JSON_EXTRACT_FIRST(body, '$.username') = '"max"'

to retrieve max’s e-mail:

{ "email" : "max@example.org" }

Extract multiple values from a JSON string

So far we had fun using JSON_EXTRACT_FIRST, but we have not talked yet about its bigger brother, JSON_EXTRACT_ALL.

JSON_EXTRACT_ALL(json_string, pattern) works like JSON_EXTRACT_FIRST, except that it will return all the values that match the pattern. The results will be returned in an array, and when no results are found the empty array will be returned.

Let’s make use of it, extracting all the contact types used at the moment of the creation of the user:

SELECT
   JSON_EXTRACT_FIRST(body, '$.username') as username,
   JSON_EXTRACT_ALL(body, '$.contacts[*].type') as contact_types
FROM
   http_logs
WHERE
   method = "POST" AND
   url = "/users"

Running the query above we get what we desired:

{ "username": "\"juno\"", "contact_types": [ "email", "\"phone\"" ] }'),
{ "username": "\"max\"", "contact_types": [ "email", "\"twitter\"" ] }'),
{ "username": "\"nic\"", "contact_types": [ "email", "\"phone\"" ] }'),
{ "username": "\"john\"", "contact_types": [ "\"email\""] }')

Sql Processors

JSON_EXTRACT_FIRST() and JSON_EXTRACT_ALL() are available also in the Streaming Engine, like most Lenses SQL functions.

Let’s say we want another topic continuously filled with the contact types used for user creations. We also want each record containing a single username-contact type pair. To achieve that we can take the query of the last example and adapt it a bit, using a lateral join.

SET defaults.topic.autocreate=true;

INSERT INTO contact_types
SELECT STREAM
   JSON_EXTRACT_FIRST(body, '$.username') as username,
   contact_type
FROM
   http_logs LATERAL
   JSON_EXTRACT_ALL(body, '$.contacts[*].type') as contact_type
WHERE
   method = "POST" AND
   url = "/users"

Conclusion

JSON_EXTRACT_FIRST() and JSON_EXTRACT_ALL() simplifies your life every time you have to deal with JSON that is represented as a string value of a field in your topic.

The use of JsonPath make them very powerful and even complex operations are easily representable with it.

To do that we can use JSON_EXTRACT_FIRST(json_string, pattern), one of the string functions available in Lenses SQL. The first argument of the function is the string representing the JSON we want to manipulate. The second is a string representing a .

A nice way to try and test if your JsonPaths are doing what you intended, is using the .

JsonPath
JsonPath
JsonPath online evaluator