Data policies
This page describes how to use Lenses to create data policies to identify and mask data in Lenses as well as identify applications consuming them.
Data policies allow you to define data masking rules that redact data in Lenses based on field names. This applies to Kafka topics, Postgres tables and Elasticsearch indices.
Additionally, for each policy Lenses will identify not only the datasets involved but also any application, e.g. SQL Processor or Connectors using this data.
How it works
A Data Policy is a rule to detect, classify and protect data with an associated redaction to mask the data.
For example, the policy below describes how Lenses should handle Credit Cards. For every dataset, across multiple connections, when a field matches the declared fields in the policy, the data will be masked with the Last-4
redaction, which means only the last 4 digits will appear. The datasets are classified under the Financial
category of HIGH
severity.

Matching
Lenses maintains an internal cache to identify fields for each dataset (ie, your Kafka topics). Review data types and schemas to understand more about this topic. As a result, every time a new policy is created or a new field is added to an existing policy, the matching mechanism applies and detects which datasets are going to be affected by the policy and also which applications known to Lenses are using them.
Governance
The governance is global and applies to all users. That means that there is no way to “escape” the policy even if you are an admin user. In order to retrieve the actual data, you will have to remove the policy or the respective fields.
Underlying data
The underlying data is not affected by Lenses policies. That means that the applications processing the affected datasets will have full access to the data itself. The policies apply to the Lenses interfaces.
Kafka topics
For Kafka Topics, we apply the Policy to both Key
and Value
, and the policy will apply to each of these if they contain the corresponding field.
Policy properties
The Data Policy’s principal properties are:
Redaction
The masking policy, which determines how the fields will be redactedCategory
Under which category will the policy be classified, ie. PIIImpact
What is the severity of the policyDatasets
Which datasets will be applied to. If wildcard, it will apply to allFields
Which fields will be masked
Redaction Types
The rule to use to obfuscate a field. Lenses applies data obfuscation to all data access requests, and several data types/structures are supported, including Strings, Numbers, Emails for every data format (JSON, XML, AVRO or Protobuf).
Common
These rules can apply regardless of the field type:
Don't want to readact? Set the type to None.
None
Track sensitive data, but do not protect them.
All
Mask the entire value.
Special
These rules can apply only to alphanumeric fields:
Mask email address, showing the domain name.
Strings
These rules can apply only on alphanumeric fields:
Last-1
Display the last 1 characters of the value.
Last-2
Display the last 2 characters of the value.
Last-3
Display the last 3 characters of the value.
Last-4
Display the last 4 characters of the value.
First-1
Display the first 1 characters of the value.
First-2
Display the first 2 characters of the value.
First-3
Display the first 3 characters of the value.
First-4
Display the first 4 characters of the value.
Initials
Display the first letter of each word.
Numbers
These rules can apply to numeric fields:
Number-to-zero
Replace a numeric value with 0.
Number-to-negative-one
Replace a numeric value with -1.
Number-to-null
Replace a numeric value with null
.
Fields which are not numeric will not be affected by these Policies. Strings that contain numbers will not be affected either.
Category
What is your Data’s category for sensitivity? Any value can be entered here, based on what makes sense for your organisation to classify the policies. Every policy belongs to one category.
Examples:
PII
Personal Identifible Infomation.
HIPPA
Protected Health Infomation.
Find more information about Data Classification. Also here are a few popular options.
Impact
How important is the Data for the Business? It refers to the sensitivity level of the information to be stored and processed.
HIGH
Information such as PII(name
,religion
..)
MEDIUM
Information such as Assets(productIds
..)
LOW
Information such as Linkables(Dates
..)
Datasets
You can choose to encapsulate your Policy for a specific Dataset(s). This is a wildcard
option, and if not specified, it will apply to all Datasets.
*word
Will match all Datasets that end with word
word*
Will match all Datasets that start with word
*word*
Will match all Datasets that contain the word
Fields
Specifies which field(s) are targeted and obfuscated. This is also a wildcard
option. There are a few advanced field specifications that we need to be careful with.
Nested Fields
In the case of nested data, it is possible to specify nested fields using the “.” character. For example, if your “customers” Dataset has a field called information
which contains a field called name
, it is possible to specify the field information.name
so that only that particular field is obfuscated, instead of every field.
Note that obfuscation is only performed on nodes without children. Continuing with the example above,
information.name
will be obfuscated, but if we attempt to apply it toinformation
, it will not be affected, as it has child properties.
Clashing Policies
In the event of two policies matching a given field, the more specific one will be applied. For example, if there is a policy for name
with a redaction of First-4
and a policy for customers.information.name
with a redaction of Initials
, the latter will be applied.
Please note that
wildcards
and dataset rules do not affect this.
Advanced Wildcards
It is also possible to specify wildcards using the *
character so that i*n.name
it will match both information.name
and installation.name
. As .
is considered a field separator, such that a wildcard will not match it. So i*n.name
will match information.name
but will not match information.details.name
Viewing a policy
To view data policies, go to Environment->Workspaces->Policies.
If no policies are listed, you can load the default policies that come built into Lenses.
Creating a policy
To create a policy, click Environment->Workspaces->Policies->New Policy. Enter the details.
If you wish to have no masking but use the policies to identify datasets containing certain fields, set the Redaction to NONE.
Editing a policy
To edit a policy, select the policy and edit from the actions menu.
Deleting a policy
To delete a policy, select the policy and delete it from the actions menu.
Last updated
Was this helpful?