# Filtering

## WHERE clause

`WHERE` clause allows you to define a set of logical predicates the data needs to match in order to be returned. Standard comparison operators are supported (`>`, `>=`, `<`, `<=`, `=`, and `!=`) as well as calling functions.

We are going to use the `groceries` table created earlier. Select all items purchased where the prices are greater or equal to 2.00:

```sql
SELECT
     name
     , price
FROM groceries
WHERE price >= 2.0

/* Output
Meridian Crunchy Peanut Butter              2.5
Activia fat free cherry yogurts             2
Green & Blacks Organic Chocolate Ice Cream  4.2
*/
```

Select all customers whose last name length equals to 5:

```sql
SELECT *
FROM customers
WHERE LEN(last_name) = 5

/* Output
key         value.first_name    value.last_name
mikejones       Mike                Jones
anasmith        Ana                 Smith
*/
```

Search all customers containing `Ana` in their first name:

```sql
SELECT *
FROM customers
WHERE first_name LIKE '%Ana%'
```

Keep in mind that text search is *case-sensitive*. To use *case insensitive* text search, you can write:

```sql
SELECT *
FROM customers
WHERE LOWERCASE(first_name) LIKE '%ana%';

-- And here is the negated version
SELECT *
FROM customers
WHERE LOWERCASE(first_name) NOT LIKE '%ana%';
```

## Missing values <a href="#missing-values" id="missing-values"></a>

Sometimes data can contain explicit NULL values, or it can omit fields entirely. Using IS \[ NOT ] NULL, or EXISTS functions allows you to check for these situations.

`Exists` is a keyword in Lenses SQL grammar so it needs to be escaped, the escape character is \`\`\`\`\`.

Lenses supports JSON. JSON does not enforce a schema allowing you to insert null values.

Create the following table named `customers_json`:

```sql
CREATE TABLE customers_json (
    _key STRING
    , first_name STRING
    , last_name STRING
    , middle_name STRING
) FORMAT(string, json);


INSERT INTO customers_json(_key, first_name, last_name, middle_name) VALUES("mikejones", "Mike", "Jones", "Albert");
INSERT INTO customers_json(_key, first_name, last_name) VALUES("anasmith", "Ana", "Smith");
INSERT INTO customers_json(_key, first_name, last_name) VALUES("shannonelliott", "Shannon","Elliott");
```

Query this table for all its entries:

```sql
SELECT * 
FROM customers_json

/* The output
key             value.first_name   value.middle_name   value.last_name
mikejones           Mike                Albert          Jones
anasmith            Ana                                 Smith
shannonelliott      Shannon                             Elliott
*/
```

The `middle_name` is only present on the `mikejones` record.

Write a query which filters out records where `middle_name` is not present:

```sql

SELECT *
FROM customers_json
WHERE `EXISTS`(middle_name)

/* The output
 key             value.first_name   value.middle_name   value.last_name
mikejones            Mike            Albert                Jones
*/
```

This can also be written as:

```sql
SELECT *
FROM customers_json
WHERE middle_name IS NULL
```

When a field is actually `NULL` or is missing, checking like in the above query has the same outcome.

## Multiple WHERE conditions <a href="#multiple-where-conditions" id="multiple-where-conditions"></a>

You can use AND/OR to specify complex conditions for filtering your data.

To filter the purchased items where more than one item has been bought for a given product, and the unit price is greater than 2:

```sql
SELECT *
FROM groceries
WHERE quantity > 1 
AND price > 2
```

Now try changing the AND logical operand to OR and see the differences in output.

## HAVING clause <a href="#filtering-grouped-data-with-having" id="filtering-grouped-data-with-having"></a>

To filter the entries returned from a grouping query. As with the `WHERE` statement, you can use `HAVING` syntax to achieve the same result when it comes to grouped queries.

```sql
SELECT
    COUNT(*) AS count
    , country
FROM customer
GROUP BY country
HAVING count > 1
```

## Read a table partition only <a href="#read-a-table-partition-only" id="read-a-table-partition-only"></a>

To select data from a specific partition access the metadata of the topic.

In the following example, a table is created with three partitions and the message key is hashed and then the remainder `HashValue % partitions` will be the table partition the record is sent to.

```sql
-- Run
CREATE TABLE customers_partitioned (
    _key STRING
    , first_name STRING
    , last_name STRING
) 
FORMAT(string, Avro)
properties(partitions = 3);

INSERT INTO customers_partitioned(
    _key
    , first_name
    , last_name)
VALUES
("mikejones", "Mike", "Jones"),
("anasmith", "Ana", "Smith"),
("shannonelliott", "Shannon","Elliott"),
("tomwood", "Tom","Wood"),
("adelewatson", "Adele","Watson"),
("mariasanchez", "Maria", "Sanchez");
```

\
Next, run the following query:

```sql
SELECT *
FROM customers_partitioned

/* The output
offset  partition   timestamp       key         value.first_name    value.last_name
0       0           1540830780401   mikejones       Mike                Jones
1       0           1540830780441   anasmith        Ana                 Smith
2       0           1540830780464   shannonelliott  Shannon             Elliott
0       2           1540831270170   mariasanchez    Maria               Sanchez
0       1           1540830984698   tomwood         Tom                 Wood
1       1           1540831183308   adelewatson     Adele               Watson
*/
```

As you can see from the results (your timestamps will be different) the records span over the three partitions. Now query specific partitions:

```sql
-- selects only records from partition = 0
SELECT *
FROM customers_partitioned
WHERE _meta.partition = 0;

-- selects only records from partition  0 and 2
SELECT *
FROM customers_partitioned
WHERE _meta.partition = 0
   OR _meta.partition = 2;
```

## Query by partitions

Kafka reads are non-deterministic over multiple partitions. The Snapshot engine may reach its `max.size` before it finds your record in one run, next time it might.

```sql
SELECT *
FROM topicA
WHERE transaction_id=123
   AND _meta.partition = 1
```

If we specify in our query that we are only interested in partition 1, and for the sake of example the above Kafka topic has 50 x partitions. Then Lenses will automatically push this predicate down, meaning that we will only need to scan **1GB** instead of 50GB of data.

## Query by offsets

```sql
SELECT *
FROM topicA
WHERE transaction_id=123
AND _meta.offset > 100
AND _meta.offset < 100100
AND _meta.partition = 1
```

If we specify the offset range and the partition, we would only need to scan the specific range of 100K messages resulting in scanning **5MB**.

## Query by timestamp

```sql
SELECT *
FROM topicA
WHERE transaction_id=123
AND _meta.timestamp > NOW() - "1H"
```

The above will query only the data added to the topic up to 1 hour ago. Thus we would query just **10MB**.Time-traveling

```sql
SELECT *
FROM position_reports
WHERE _meta.timestamp > "2020-04-01" 
AND _meta.timestamp < "2020-04-02"
```

The above will query only the data that have been added to the Kafka topic on a specific day. If we are storing 1,000 days of data, we would query just **50MB**.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.lenses.io/latest/devx/6.0/user-guide/sql-studio/filtering.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
