5.0

Nullability

Null values are used as the way to express a value that isn’t yet known.

Null values can be found in the data present in existing sources, or they can be the product of joining data using non-inner joins.

The schema of nullable types is represented as a union of the field type and a null value:

{
  "type": "record",
  "name": "record",
  "namespace": "example",
  "doc": "A schema representing a nullable type.",
  "fields": [
    {
      "name": "property",
      "type": [
        "null",
        "double"
      ],
      "doc": "A property that can be null or a double."
    }
  ]
}

Problem: operations using nullable types 

Working with null values can create situations where it’s not clear what the outcome of an operation is.

One example of this would be the following:

null * 1 = ?
null + 1 = ?

Looking at the first two expressions, one may be tempted to solve the problem above by saying “null is 1 when multiplying and 0 when summing” meaning the following would be the evaluation result:

null * 1 = 1
null + 1 = 1

Rewriting the third expression applying the distributive property of multiplication however shows that the rule creates inconsistencies:

(null + 1) * null = (null + 1) * null <=>
null * null + 1 * null = (null + 1) * null <=>
1 * 1 + 1*1 = (0 + 1) * 1 <=>
1 + 1 = 1 * 2 <=>
1 = 2  //not valid

With the intent of avoiding scenarios like the above where a computation may have different results based on the evaluation approach taken, most operations in lenses do not allow operations to use nullable types.

COALESCE, AS_NON_NULLABLE and CASE 

Lenses provides the following tools to address nullability in a flexible way:

  • Coalesce: A function that allows specifying a list of fields to be tried until the first non null value is found.

    • Note: the coalesce function won’t verify if a non nullable field is provided so an error may still be thrown if all the provided fields are null
    • e.g:COALESCE(nullable_fieldA, nullable_fieldB, 0)
  • AS_NON_NULLABLE: a function that changes the type of a property from nullable to non nullable.

    • Note: This function is unsafe and will throw an error if a null value is passed. It should only be used if there’s a guarantee that the value won’t ever be null (for instance if used in a CASE branch where the null case has been previously handled or if the data has previously been filtered and the null values removed).
    • e.g: AS_NON_NULLABLE(nullable_field)
  • AS_NON_NULLABLE with CASE: A type checked construct equivalent to using coalesce:

    • e.g:
       CASE
           WHEN a_nullable_field IS NULL THEN 0
           ELSE AS_NON_NULLABLE(a_nullable_field)
       END
    

AS_NULLABLE 

The AS_NULLABLE function is the inverse transformation of the AS_NON_NULLABLE version. This function allows a non nullable field type to be transformed into a nullable type. It can be used to insert data into existing topics where the schema of the target field is nullable.