# Working with JSON

In this tutorial, we will see how to use Lenses SQL to process JSON strings using [JsonPath](https://goessner.net/articles/JsonPath/), a sort of XPath for JSON objects.

{% hint style="success" %}
In Lenses SQL you can use a `JSON_EXTRACT_FIRST()` and `JSON_EXTRACT_ALL()` to navigate and transform JSON strings.
{% endhint %}

## Setting up our example

We have a topic `http_logs` that collects the details of HTTP calls to a microservice. Basic details of the request are stored, like the `URL` and the `HTTP` method used. The payload of the requests is stored as well as a string.

We can create the topic and insert some example data through SQL Studio:

```sql
CREATE TABLE http_logs(_key string, method string, url string, content_type string, body string)
FORMAT (STRING, AVRO);

INSERT INTO http_logs(_key, method, url, content_type, body) VALUES
('event_1', 'POST', '/users', 'application/json', '{ "id": 1, "username": "juno", "contacts": [ { "type": "email", "value": "juno@example.org" }, { "type": "phone", "value": "12345" }] }'),
('event_2', 'POST', '/users', 'application/json', '{ "id": 2, "username": "max", "contacts": [ { "type": "email", "value": "max@example.org" }, { "type": "twitter", "value": "@max" }] }'),
('event_3', 'GET', '/users/1', '', ''),
('event_4', 'GET', '/users/2', '', ''),
('event_5', 'POST', '/users', 'application/json', '{ "id": 3, "username": "nic", "contacts": [ { "type": "email", "value": "nic@example.org" }, { "type": "phone", "value": "78910" }] }'),
('event_6', 'PUT', '/users/1', 'application/json', '{ "username": "juno", "contacts": [ { "type": "email", "value": "juno@domain.org" }] }'),
('event_7', 'POST', '/users', 'application/json', '{ "id": 4, "username": "john", "contacts": [ { "type": "email", "value": "john@example.org" }] }');
```

The `HTTP` method and the `URL` used for the request are stored in the `method` and `url` fields respectively, while the optional payload, and its content-type, are stored in the `body` and `content_type` fields.

As you can imagine the logs contained in this topic are quite generic, and different endpoints may have different `content-type`s for their body. For this reason the best the system can do is storing the payload as a simple string, whenever that is possible.

This comes with some drawbacks: since the data is a simple string, and it is not structured, it is not possible to inspect it as we would do with a normal `AVRO`/`JSON` object.

Fortunately Lenses SQL offers a couple of handful functions that make our life easier in these kind of scenarios.

## Extracting fields from a JSON string <a href="#extracting-fields-from-a-json-string" id="extracting-fields-from-a-json-string"></a>

Our first task is to find the username of users created with a call to `POST /users`.

To do that we can use `JSON_EXTRACT_FIRST(json_string, pattern)`, one of the string functions available in Lenses SQL. The first argument of the function is the string representing the `JSON` we want to manipulate. The second is a string representing a [JsonPath](https://goessner.net/articles/JsonPath/).

JsonPath is a powerful way to traverse and extract elements from a `JSON` object. Explaining the full details of goes beyond the scope of this article, but in general it can be thought as a `JSON` version of `XPath`, the standard used to select elements from an XML document.

A nice way to try and test if your JsonPaths are doing what you intended, is using the [JsonPath online evaluator](https://jsonpath.com/).

In our case, we would like to extract the name of the user just created. The simple path `$.username` will do it!

Let’s try to use it in a `SELECT` that we can run in SQL Studio:

```sql
SELECT
    JSON_EXTRACT_FIRST(body, '$.username') as username
FROM
    http_logs
```

That query will produce the results

```json
{ "username": "\"juno\"" }
{ "username": "\"max\"" }
{ "username": null }
{ "username": null }
{ "username": "\"nic\"" }
{ "username": "\"juno\"" }
{ "username": "\"john\"" }
```

As you can see we have two entries for `juno`. That’s because the user was first created, and then modified later, with a `PUT` call.

Also, there are some `null` values. This is because `JSON_EXTRACT_FIRST` was not able to extract the username, either because the payload was not valid `JSON`, or because the field was not found.

We can fix this restricting our query to user creation calls:

```sql
SELECT
   JSON_EXTRACT_FIRST(body, '$.username') as username
FROM
   http_logs
WHERE
   method = "POST" AND
   url = "/users"
```

We have now only valid results:

```json
{ "username": "\"juno\"" }
{ "username": "\"max\"" }
{ "username": "\"nic\"" }
{ "username": "\"john\"" }
```

## Filtering the results <a href="#filtering-the-results" id="filtering-the-results"></a>

All Lenses SQL functions can be used in any part of the query. Thus `JSON_EXTRACT_FIRST` can be used in the projections, where, and group bys.

For example, you can run the query

```sql
SELECT
   JSON_EXTRACT_FIRST(body, '$.contacts[?(@.type=="email")].value') as email
FROM
   http_logs
WHERE
   JSON_EXTRACT_FIRST(body, '$.username') = '"max"'
```

to retrieve `max`’s e-mail:

```json
{ "email" : "max@example.org" }
```

## Extract multiple values from a JSON string <a href="#extract-multiple-values-from-a-json-string" id="extract-multiple-values-from-a-json-string"></a>

So far we had fun using `JSON_EXTRACT_FIRST`, but we have not talked yet about its bigger brother, `JSON_EXTRACT_ALL`.

`JSON_EXTRACT_ALL(json_string, pattern)` works like `JSON_EXTRACT_FIRST`, except that it will return *all* the values that match the `pattern`. The results will be returned in an array, and when no results are found the empty array will be returned.

Let’s make use of it, extracting all the contact types used at the moment of the creation of the user:

```sql
SELECT
   JSON_EXTRACT_FIRST(body, '$.username') as username,
   JSON_EXTRACT_ALL(body, '$.contacts[*].type') as contact_types
FROM
   http_logs
WHERE
   method = "POST" AND
   url = "/users"
```

Running the query above we get what we desired:

```properties
{ "username": "\"juno\"", "contact_types": [ "email", "\"phone\"" ] }'),
{ "username": "\"max\"", "contact_types": [ "email", "\"twitter\"" ] }'),
{ "username": "\"nic\"", "contact_types": [ "email", "\"phone\"" ] }'),
{ "username": "\"john\"", "contact_types": [ "\"email\""] }')
```

## Sql Processors <a href="#sql-processors" id="sql-processors"></a>

`JSON_EXTRACT_FIRST()` and `JSON_EXTRACT_ALL()` are available also in the Streaming Engine, like most Lenses SQL functions.

Let’s say we want another topic continuously filled with the contact types used for user creations. We also want each record containing a single username-contact type pair. To achieve that we can take the query of the last example and adapt it a bit, using a lateral join.

```sql
SET defaults.topic.autocreate=true;

INSERT INTO contact_types
SELECT STREAM
   JSON_EXTRACT_FIRST(body, '$.username') as username,
   contact_type
FROM
   http_logs LATERAL
   JSON_EXTRACT_ALL(body, '$.contacts[*].type') as contact_type
WHERE
   method = "POST" AND
   url = "/users"
```

## Conclusion <a href="#conclusions" id="conclusions"></a>

`JSON_EXTRACT_FIRST()` and `JSON_EXTRACT_ALL()` simplifies your life every time you have to deal with `JSON` that is represented as a string value of a field in your topic.

The use of `JsonPath` make them very powerful and even complex operations are easily representable with it.
