The Lenses Data Policies module can protect data in motion. As a user, you can track, secure and govern your sensitive data as it flows, is shared or analyzed via Lenses.
Data protection of sensitive or classified data is not a new problem. Analyzing and acting on data insights in real time requires to address risk for data in motion. As a Data officer, you can set up global Data Policies and secure data via redaction modes so that sensitive data can be protected while is being analyzed.
Lenses will automatically detect and apply rules to the relevant datasets, which in the case of Apache Kafka are Kafka topics, across all APIs and client libraries.
Protect your data in motion¶
Individuals, as well as businesses, face challenges protecting Personally Identifiable Information (PII). Individuals are responsible for exposing their own information and understanding the risks. However, businesses have greater liability for exposing sensitive customer data. Since businesses are built on top of people and processes, they are fully responsible for their employee’s actions and how well their internal processes avoid exposing PII.
Businesses that do not protect their customers and employees personally identifiable information risk of paying substantial fines as well as incur reputation damages in case of a data breach.
According to the Guide to Protecting the Confidentiality of Personally Identifiable Information (PII) from NIST (National Institute of Standards and Technology), organizations should identify and manage all PII residing in their environments.
Due to their continuous flux, Data-In-Motion poses additional challenges to organizations. With data streaming technologies (like Apache Kafka) being used at the heart of modern systems, the demand for Intelligent Data Protection requires new processes for the modern Data Officer.
Data protection requirements¶
When it comes to protecting the personally identifiable information, there are a few fundamental requirements to fulfill:
- Identify where PII data is stored – without knowing where this information is retained, it is impossible to provide adequate protection.
- Audit and control access to your data – a key control for protecting the privacy of data is access control.
- Use Data Policies for PII data – rules to control the impact and reduction levels regarding data access.
- Educate your users – everyone in your organizations handling PII should know the risks and the responsibilities for the mishandling of data.
The ever-evolving nature of data and schemas and the adoption of streaming data make it even more challenging to keep up with compliance and operate a certified data platform. Lenses ® gives data officers and data stewards control over sensitive data.
Data Access and Control¶
Lenses provides protection on data read. The original data is stored in Apache Kafka, and Lenses enables the governed access to the data content. This comes both from user/group permissions that specify what data a user can access, as well as from field level protection enforced via the data officer policies put in place.
Managing data policies¶
Enabling data protection is achieved via data policies. As a Data steward or Data Officer, you need to know your data and protect sensitive data by adding a new policy via the Policies screen:
On the data policies page, add a new policy:
And create a data policy by filling in the following information:
- Policy Name - A short description to say what the policy is for.
- Redaction - How to protect the data.
- Category - A logical group to better classify the data.
- Impact - Set the confidentiality impact level.
- Fields - A collection of data field names to protect by the new data policy rule. If a field named
credit_cardis added, for example, Lenses will make sure that the field is protected in accordance to the redaction level. If your user has the Data Officer role, they can
create, update and delete policies.
Data policy details¶
Adding a new data policy will impact the results returned by the SQL engine. If the topic data contains the fields specified in the policy, those field values will be redacted on each query. Each policy might potentially impact different topics. A list of all affected topics can be seen on the policy details view.
The details policy page highlights the risks of exposing the data. For each policy, the user can see the applications using the topics where the policy entry fields are present. These fields are grouped in three different categories: Lenses Connectors, Lenses SQL Processors and Custom Applications. The screenshot below gives you an example of a data policy entry for a field named creditCard:
Based on the NIST specifications, Lenses provides a set of policies out of the box. This data taxonomy is configurable and can be tuned to the specific business domain and business requirements. Here is an example:
- Name: Full Name, Maiden Name, Mother’s name, Alias
- Personal Identification Information: SSN, Passport number, Driver’s license Number, TaxPayer identification number, Patient Identification Number, Financial Account, Credit Card Number, Login name / Username
- Address Information: Street Address, Email Address, Zip Code, City, Country
- Asset Information: IP Address, MAC Address
- Telephone Number: Mobile, Business and Personal number
- Personally owned property: Vehicle registration Number
- Personal linkable Information: Data of birth, age, place of birth, religion, race, weight, height, activities, geographical indicators, employment information, medical info, educational info, financial info
Lenses protects the data by obfuscating parts of it. The DataOps platform comes with a predefined set of functions that define the obfuscation behavior. Here is the full list:
Masks the matching fields. For example: ‘(123) 800 2999’
will be translate to ‘**** * **’.
|None||No obfuscation is applied. The matching fields values stay as they are.|
Extracts the first character of every word. For example: ‘Lenses
all the way ‘ will translate to ‘L a t w’.
Masks matching fields keeping the first character as unmasked.
For example: ‘Lenses’ will translate to ‘L*****’.
Masks matching fields keeping the first 2 characters as
unmasked. For example: ‘Lenses’ will translate to ‘Le****’.
Masks matching fields keeping the first 3 characters as
unmasked. For example: ‘Lenses’ will translate to ‘Len***’.
Masks matching fields keeping the first 4 characters as
unmasked. For example: ‘Lenses’ will translate to ‘Lens**’.
Confidentiality Impact Level¶
Not all data has the same confidentiality impact level. Fields that can be used to fully identify a person (i.e. Passport or Social Security Number) have to be treated with extra care in comparison to information like Country or Postal Code that can only partially reveal the identity of a subject. Lenses provides three levels of impact: high, medium and low.
Default Data Policies¶
Policies are applied and defined on the record field level. We have identified the most commonly used fields that organizations need to comply with in accordance to national standards in the US and EU. You can optionally load them when you get started with Lenses policies.
|Policy Name||Category||Impact||Redaction Policy||Fields|
|Social Security Number||PII||HIGH||First2||ssn, social_security, social_security_number|
|Passport||PII||HIGH||First2||passport, passport_number, national_id|
|Tax Payer ID||PII||HIGH||First2||tax_payer_id, taxpayerid, unique_taxpayer, nino, utr, tin, atin, itin, tax_reference|
|Patient ID||PII||HIGH||First2||patient_id, patientID|
|Financial Account||PII||HIGH||Last4||account_number, sort_code, accountnumber, sortcode|
|Credit Card||PII||HIGH||Last4||credit_card, creditcard|
|User Name||PII||LOW||NoObfuscation||username, user_name, login_name|
|Full Name||Name||HIGH||NoObfuscation||full_name, fullname|
|Surname||Name||MEDIUM||NoObfuscation||surname, lastname, last_name|
|Street Address||Address||MEDIUM||Initials||home_address, street_address, address|
|Post Code||Address||LOW||NoObfuscation||post_code, postcode, zip_code, zipcode|
|IP Address||Asset||MEDIUM||First4||ip_address, ipaddress|
|MAC Address||Asset||MEDIUM||First4||mac_address, macaddress|
|Phone number||Asset||MEDIUM||First3||phone_number, mobile_number, mobile_phone|
|Vehicle number||Asset||MEDIUM||First2||vehicle_registration_number, vehicle_number|
|Date of birth||Personal Linkable||LOW||NoObfuscation||date_of_birth, dob|
|Place of birth||Personal Linkable||LOW||NoObfuscation||place_of_birth|
|Nationality||Personal Linkable||LOW||NoObfuscation||ethnicity, nationality|
Data Policy Example¶
Here, we can see an example of a Data Policy, applied to a topic. The data policy information screen shows all the topics and applications impacted, that include the fields [credit_card, creditcard, creditCardId].
You can also, see the specific topics and applications impacted, that this Policy is applied to.
Now, if you move to the Topics page for one of the topics this Policy is applied to, you can see that the fields [creaditCardId], is obfuscated.
You can also, decide to not obfuscate data for a specific topic, by clicking the
disable-toggle, like the one below.
Please note that in order to be able to change the obfuscation for that topic, you would need to have the
datapolicydisable permission applied to your account.
The operation described above can be performed in the SQL Page as well.