Data Governance

Hide sensitive field while browsing

More often than not, it is required parts of the data in Apache Kafka to be hidden. The credit card details, or patient condition are typical use case. When using Avro records the requirement is achievable by simply annotating the Avro schema field with:

"obfuscate": "true"

Any business accepting card payments is likely to have a topic retaining customers information and their credit card details. Browsing the data should make sure the credit card information is masked.

Here is a sample Avro schema for a customer information; the creditcard field has been annotated such that Lenses SQL engine knows to return the data obfuscated.

{
  "type": "record",
  "name": "CreditCard",
  "namespace": "com.landoop.payments",
  "fields": [
    {
      "name": "firstname",
      "type": "string"
    },
    {
      "name": "lastname",
      "type": "string"
    },
    {
      "name": "address",
      "type": "string"
    },
    {
      "name": "creditcard",
      "type": "string",
      "obfuscate": "true"
    },
    {
      "name": "age",
      "type": "int"
    }
  ]
}

The records returned from the topic will have the credit-card details masked. For example a sample record could look like:

{
  "firstname": "Alex",
  "lastname": "Jones",
  "address": "4-5 Primrose Road, London, UK, W6",
  "creditcard": "****"
}

Sometimes you want the full details of a nested Avro record to be obfuscated. Below is an example of a nested Avro record. As specified above, make sure the field schema contains "obfuscate": "true"

{
  "type": "record",
  "name": "PaymentRecord",
  "namespace": "com.landoop.payments",
  "fields": [
    {
      "name": "customerId",
      "type": "string"
    },
    {
      "name": "details",
      "obfuscate": "true",
      "type": {
        "type": "record",
        "name": "PaymentDetails",
        "fields": [
          {
            "name": "amount",
            "type": {
              "type": "bytes",
              "logicalType": "decimal",
              "precision": 38,
              "scale": 18
            }
          },
          {
            "name": "timestamp",
            "type": "long"
          },
          {
            "name": "currency",
            "type": "string"
          }
        ]
      }
    }
  ]
}

The data returned by the Lenses SQL engine to be displayed on the screen will look like:

{
  "customerId": "customer01",
  "details": {
    "amount": "****",
    "timestamp": "****",
    "currency": "****"
  }
}

Data anonymization

Lenses SQL engine comes with function support to hide sensitive data. For example, to obfuscate the credit card details just use ANONYMIZE like in the example below:

INSERT INTO `anonymized_payments`
SELECT amount, anonymize(creditcard, '****') as card, date
FROM  `payments`
WHERE _ktype=INT and _vtype=JSON