Data policies


Data Protection Policies

Lenses allows you share data and create policies to mask data at a field level. This is an important feature to protect sensitive data or meet requirements to comply with regulations such as GDPR, CCPA or HIPAA. This guide details how to apply and manage data policies.

Introduction 

Data policies are used to detect, classify and protect data. The best practice is to create a comprehensive data inventory which includes details about personal information, which data source holds this data and what applications are using it.

Lenses helps you automate this process by creating policies on a field level, which apply to all datasets or specific ones. When building a streaming platform and onboard multiple users and projects with data policies you can:

  • Automatically and dynamically detect and classify data based on the fields
  • Apply masking when accessing users retrieve data via Lenses without affecting the underlying data
  • Enforce your company’s data privacy policies
  • Streamline regulatory compliance

Lenses data policies are influenced by the standards of the National Institute of Standards and Technology (NIST). The governance is global, across all users and clients including API, CLI, UI, and SQL.

data policies Lenses.io

Required permission 

PermissionTypeDescription
Data Policies / ViewAdminTo view the available policies and associations to datasets and applications
Data Policies / ManageAdminTo create, edit, delete and load default policies

Policy permissions are under the Admin category so they are not scoped to the namespace. That means that users authorized with this permission can create policies for all the known datasets to Lenses.

Access Management & permissions

How it works 

A Data Policy is a rule to detect, classify and protect data with an associated redaction to mask the data.

Example

For example the policy below describes how Lenses should handle Credit Cards. For every dataset, across multiple connections, when a field is matching the declared fields in the policy, the data will be masked with the Last-4 redaction, which means only the last 4 digits will appear. The datasets are classified under the Financial category and of HIGH severity.

data policies Lenses.io

Matching

Lenses maintains an internal cache to identify fields for each dataset (ie. your Kafka topics). Review data types and schemas to understand more about this topic. As a result every time a new policy is created or a new field is added to an existing policy the matching mechanism applies and detects which datasets are going to be affected by the policy and also which applications known to Lenses are using them.

Governance

The governance is global and applies for all users. That means that there is no way to “escape” the policy even if you are an admin user. In order to retrieve the actual data you will have to remove the policy or the respective fields.

Underlying data

The underlying data is not affected by Lenses policies. That means that the applications processing the affected datasets will be having full access to the data itself. The policies apply to the Lenses interfaces.

Kafka topics

For Kafka Topics, we apply the Policy to both Key and Value, and the policy will apply to each of these if they contain the corresponding field.

Policy properties 

The Data Policy’s principal properties are:

  • Redaction, the masking policy which determines how the fields will be redacted
  • Category, under which category the policy will be classified ie. PII
  • Impact, what is the severity of the policy
  • Datasets, which datasets will be applied to. If wildcard, it will apply to all
  • Fields, which fields will be masked

Redaction 

The rule to use to obfuscate a field. Lenses applies data obfuscation to all data access requests, and several data types/structures are supported, including Strings, Numbers, Emails for every data format (JSON, XML, AVRO or Protobuf).

Common

These rules can apply regardless of the field type:

RulesExplanation
NoneTrack sensitive data, but do not protect them.
AllMask the entire value.

Special

These rules can apply only on alphanumeric fields:

RulesExplanation
EmailMask email address, showing the domain name.

Strings

These rules can apply only on alphanumeric fields:

RulesExplanation
Last-1Display the last 1 characters of the value.
Last-2Display the last 2 characters of the value.
Last-3Display the last 3 characters of the value.
Last-4Display the last 4 characters of the value.
First-1Display the first 1 characters of the value.
First-2Display the first 2 characters of the value.
First-3Display the first 3 characters of the value.
First-4Display the first 4 characters of the value.
InitialsDisplay the first letter of each word.

Numbers

These rules can apply to numeric fields:

RulesExplanation
Number-to-zeroReplace a numeric value with 0.
Number-to-negative-oneReplace a numeric value with -1.
Number-to-nullReplace a numeric value with null.

Fields which are not numeric will not be affected by these Policies. Strings that contain numbers will not be affected either.

Category 

What is your Data’s category for sensitivity? Any value can be entered here, based on what makes sense for your organisation to classify the policies. Every policy belongs to one category.

Examples:

Data ClassificationExplanation
PIIPersonal Identifible Infomation.
HIPPAProtected Health Infomation.

Find more information about Data Classification. Also here are a few popular options.

Impact 

How important is the Data for the Business? It refers to the sensitivity level of the information to be stored and processed.

Impact LevelExplanation
HIGHInformation such as PII(name,religion..)
MEDIUMInformation such as Assets(productIds..)
LOWInformation such as Linkables(Dates..)

Datasets 

You can choose to encapsulate your Policy, for a specific Dataset(s). This is a wildcard option, and if not specified, it will apply to all Datasets.

Wildcard RuleExplanation
*wordWill match all Datasets that end with word
word*Will match all Datasets that start with word
*word*Will match all Datasets that contain the word

Fields 

Which field(s), should we target and obfuscate. This is a also a wildcard option. There are a few advanced fields specifications that we need to be careful with.

Nested Fields

In the case of nested data, it is possible to specify nested fields using the “.” character. For example, if your “customers” Dataset has a field called information which contains a field called name, it is possible to specify the field information.name, so that only that particular field is obfuscated, instead of every field.

Note that obfuscation is only performed on nodes without children. Continuing with the example above, information.name will be obfuscated, but if we attempt to apply it to information, it will not be affected, as it has child properties.

Clashing Policies

In the event of two policies matching a given field, the more specific one will be applied. For example, if there is a policy for name with a redaction of First-4 and a policy for customers.information.name with a redaction of Initials, the latter will be applied.

Please note that wildcards and dataset rules do not affect this.

Advanced Wildcards

It is also possible to specify wildcards using the * character so that i*n.name will match both information.name and installation.name. As . is considered a field separator, such that a wildcard will not match it. So i*n.name will match information.namebut will not match information.details.name.

Create Data Policy 

To create a new Policy navigate to Data Policies and select New Policy. Let’s create a Policy called Full Name, which protects PII information by showing only the first Letter First-1 Redaction, for either first or last names.

The obfuscation is applied to all Datasets, with names that end with the word info and apply the obfuscation to the fields firstName and lastName.

Data Policies Create

Once the Policy Full Name is created, any data source in data catalog (Kafka, elasticsearch, etc), contains “firstName” or “lastName” will automatically be detected, irrispective the data format as Avro, JSON, XML and Protobuf.

Apart from identifying all the sensitive data at a field level, Lenses will also protect the data for you. That means that anyone accessing data via Lenses (UI/CLI/API/SQL) can access production data while respecting the underlying data’s sensitivity.

In the image below, you can see that the fields firstName and lastName are masked, and the First-1 policy is applied, just like we wanted.

Data Policies SQL Obfuscation

View Policy Details 

We can now view the details of the Policy. By clicking the link in the Listing, you will be redirected to the Details Page. There you will be able to see all the available information about the given policy. From Details, Applied Data and Detected Flows, to quickly identify if an Application (SQL Processors, Kafka Connector, or Custom App) uses protected data.

Data Policies Details

This example shows the Policy we just created and that it affects 2 Kafka Topics and 1 Elasticsearch Index. You can see that all Datasets end with the word info, exactly what we wanted to achieve.

We can also see that we have detected some Data Flows producing/consuming from those Datasets. In our case, we see that we have 3 Applications that are consuming from CustomersInfo:

  • An SQL Processor
  • An Elasticsearch Sink Connector
  • A External Microservice NodeJS Application

Data Policies on Explore 

Lenses Data catalog is Data Policies aware. Obfuscated fields are now highlighted together with their respected Policies Categories

Below you can see, that when we are searching for cu, we can see that the search API, is returning all the fields containing cu like customerFirstName, currency which are protected with Data Policies. On the other hand, the field accuracy is not.

Data Policies on Explore Screen

Load Default Policies 

Out of the box, Lenses provides a set of data protection policies with matchers for the most common fields. You can optionally load the default policies and Lenses will automatically scan the datasets with those fields in their schemas and apply the policy while exploring data. If your schema is not detected make sure you amend to match.

data sources to Lenses.io

The default policies will load at once. Here is a list of what’s included:

Policy NameCategoryImpactRedactionFields
CityAddressLOWNonecity
CountryAddressLOWNonecountry
Credit CardPIIHIGHLast-4credit_card, creditcard
Date of birthPersonal LinkableLOWNonedate_of_birth, dob
Drivers LicensePIIHIGHFirst-2driver_license
EmailPIIHIGHEmailemail
Financial AccountPIIHIGHLast-4account_number, sort_code, accountnumber, sortcode
First NameNameMEDIUMNonename
Full NameNameHIGHNonefull_name, fullname
IP AddressAssetMEDIUMFirst-4ip_address, ipaddress
MAC AddressAssetMEDIUMFirst-4mac_address, macaddress
Maiden NameNameMEDIUMNonemaiden_name
Mother’s NameNameMEDIUMNonemother_name
NationalityPersonal LinkableLOWNoneethnicity, nationality
PassportPIIHIGHFirst-2passport, passport_number, national_id
Patient IDPIIHIGHFirst-2patient_id, patientID
Phone numberAssetMEDIUMFirst-3phone_number, mobile_number, mobile_phone
Place of birthPersonal LinkableLOWNoneplace_of_birth
Post CodeAddressLOWNonepost_code, postcode, zip_code, zipcode
ReligionPersonal LinkableLOWNonereligion
Social Security NumberPIIHIGHFirst-2ssn, social_security, social_security_number
Street AddressAddressMEDIUMInitialshome_address, street_address, address
SurnameNameMEDIUMNonesurname, lastname, last_name
Tax Payer IDPIIHIGHFirst-2tax_payer_id, taxpayerid, unique_taxpayer, nino, utr, tin, atin, itin, tax_reference
User NamePIILOWNoneusername, user_name, login_name
Vehicle numberAssetMEDIUMFirst-2vehicle_registration_number, vehicle_number

Edit & Delete Policy 

You can edit and delete a Data Policy by clicking the actions button at the top right of the screen. You can also Delete a Policy by the Listing Page as well.

Data Policies Edit and Delete

You can edit or add new fields to a default policy or even delete if not applicable.

API clients 

Data policies are also supported by the CLI to enable automation scenarios.

Here is an example to export and import policies to a different Lenses setup:

# Export policies
export policies --resource-name policyName

# Import policies
import policies --dir /prod-dir --ignore-errors

CLI - API