# Data policies

Data policies allow you to define data masking rules that redact data in Lenses based on field names. This applies to Kafka topics, Postgres tables and Elasticsearch indices.

Additionally, for each policy Lenses will identify not only the datasets involved but also any application, e.g. SQL Processor or Connectors using this data.

### How it works

A **Data Policy** is a rule to detect, classify and protect data with an associated redaction to mask the data.

For example, the policy below describes how Lenses should handle Credit Cards. For every dataset, across multiple connections, when a field matches the declared fields in the policy, the data will be masked with the `Last-4` redaction, which means only the last 4 digits will appear. The datasets are classified under the `Financial` category of `HIGH` severity.

![](/files/87da67d48f8d4f0803c8515546610f084243102f)

#### Matching

Lenses maintains an internal cache to identify fields for each dataset (ie, your Kafka topics). Review data types and schemas to understand more about this topic. As a result, every time a new policy is created or a new field is added to an existing policy, the matching mechanism applies and detects which datasets are going to be affected by the policy and also which applications known to Lenses are using them.

#### Governance

The governance is global and applies to all users. That means that there is no way to “escape” the policy even if you are an admin user. In order to retrieve the actual data, you will have to remove the policy or the respective fields.

#### Underlying data

The underlying data is not affected by Lenses policies. That means that the applications processing the affected datasets will have full access to the data itself. The policies apply to the Lenses interfaces.

#### Kafka topics

For Kafka Topics, we apply the Policy to both `Key` and `Value`, and the policy will apply to each of these if they contain the corresponding field.

#### Policy properties

The Data Policy’s principal properties are:

* `Redaction` — The masking policy, which determines how the fields will be redacted
* `Category` — Under which category will the policy be classified, ie. PII
* `Impact` — What is the severity of the policy
* `Datasets` — Which datasets will be applied to. If wildcard, it will apply to all
* `Fields` — Which fields will be masked

### Redaction Types

The rule to use to obfuscate a field. Lenses applies data obfuscation to all data access requests, and several data types/structures are supported, including Strings, Numbers, Emails for every data format (JSON, XML, AVRO or Protobuf).

#### Common

These rules can apply regardless of the field type:

| Rules | Explanation                                    |
| ----- | ---------------------------------------------- |
| None  | Track sensitive data, but do not protect them. |
| All   | Mask the entire value.                         |

#### Special

These rules can apply only to alphanumeric fields:

| Rules | Explanation                                  |
| ----- | -------------------------------------------- |
| Email | Mask email address, showing the domain name. |

#### Strings

These rules can apply only on alphanumeric fields:

| Rules    | Explanation                                  |
| -------- | -------------------------------------------- |
| Last-1   | Display the last 1 characters of the value.  |
| Last-2   | Display the last 2 characters of the value.  |
| Last-3   | Display the last 3 characters of the value.  |
| Last-4   | Display the last 4 characters of the value.  |
| First-1  | Display the first 1 characters of the value. |
| First-2  | Display the first 2 characters of the value. |
| First-3  | Display the first 3 characters of the value. |
| First-4  | Display the first 4 characters of the value. |
| Initials | Display the first letter of each word.       |

#### Numbers

These rules can apply to numeric fields:

| Rules                  | Explanation                          |
| ---------------------- | ------------------------------------ |
| Number-to-zero         | Replace a numeric value with 0.      |
| Number-to-negative-one | Replace a numeric value with -1.     |
| Number-to-null         | Replace a numeric value with `null`. |

Fields which are not numeric will not be affected by these Policies. Strings that contain numbers will not be affected either.

### Category

What is your Data’s category for sensitivity? Any value can be entered here, based on what makes sense for your organisation to classify the policies. Every policy belongs to one category.

**Examples:**

| Data Classification | Explanation                      |
| ------------------- | -------------------------------- |
| PII                 | Personal Identifible Infomation. |
| HIPPA               | Protected Health Infomation.     |

Find more information about [Data Classification](https://safecomputing.umich.edu/dataguide/?q=all-data). Also here are a few popular options.

### Impact

How important is the Data for the Business? It refers to the sensitivity level of the information to be stored and processed.

| Impact Level | Explanation                                  |
| ------------ | -------------------------------------------- |
| HIGH         | Information such as PII(`name`,`religion`..) |
| MEDIUM       | Information such as Assets(`productIds`..)   |
| LOW          | Information such as Linkables(`Dates`..)     |

### Datasets

You can choose to encapsulate your Policy for a specific Dataset(s). This is a `wildcard` option, and if not specified, it will apply to all Datasets.

| Wildcard Rule | Explanation                                     |
| ------------- | ----------------------------------------------- |
| `*word`       | Will match all Datasets that end with `word`    |
| `word*`       | Will match all Datasets that start with `word`  |
| `*word*`      | Will match all Datasets that contain the `word` |

### Fields

Specifies which field(s) are targeted and obfuscated. This is also a `wildcard` option. There are a few advanced field specifications that we need to be careful with.

#### Nested Fields

In the case of nested data, it is possible to specify nested fields using the “.” character. For example, if your “customers” Dataset has a field called `information` which contains a field called `name`, it is possible to specify the field `information.name` so that only that particular field is obfuscated, instead of every field.

> Note that obfuscation is only performed on nodes without children. Continuing with the example above, `information.name` will be obfuscated, but if we attempt to apply it to `information`, it will not be affected, as it has child properties.

#### Clashing Policies

In the event of two policies matching a given field, the more specific one will be applied. For example, if there is a policy for `name` with a redaction of `First-4` and a policy for `customers.information.name` with a redaction of `Initials`, the latter will be applied.

> Please note that `wildcards` and dataset rules do not affect this.

#### Advanced Wildcards

It is also possible to specify wildcards using the `*` character so that `i*n.name` it will match both `information.name` and `installation.name`. As `.` is considered a field separator, such that a wildcard will not match it. So `i*n.name` will match `information.name` but will not match `information.details.name`

### Creating and managing Data Policies

Select the `Environments & Topics` option from the left sidebar, expand the environment node you are interested in and select the `Data Policy` tab, this will open the listing of policies, select a policy to view its details.

Selecting a policy in the grid wil open the side draw, the context menu on the allows you edit or delete polices, this can also be done inline on the listing row

From the listing  tab you can create new policies and load defaults policies bundled within Lenses.

<figure><img src="/files/HpB6hJBMuM8yZboVt8ex" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.lenses.io/latest/user-guide/using/governance/data-policies.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
