Nullability


Null values are used as the way to express a value that isn’t yet known.

Null values can be found in the data present in existing sources, or they can be the product of joining data using non-inner joins.

The schema of nullable types is represented as a union of the field type and a null value:

{
  "type": "record",
  "name": "record",
  "namespace": "example",
  "doc": "A schema representing a nullable type.",
  "fields": [
    {
      "name": "property",
      "type": [
        "null",
        "double"
      ],
      "doc": "A property that can be null or a double."
    }
  ]
}

Using nullable types 

Working with null values can create situations where it’s not clear what the outcome of an operation is.

One example of this would be the following:

null * 1 = ?
null + 1 = ?

Looking at the first two expressions, one may be tempted to solve the problem above by saying “null is 1 when multiplying and 0 when summing” meaning the following would be the evaluation result:

null * 1 = 1
null + 1 = 1

Rewriting the third expression applying the distributive property of multiplication however shows that the rule creates inconsistencies:

(null + 1) * null = (null + 1) * null <=>
null * null + 1 * null = (null + 1) * null <=>
1 * 1 + 1*1 = (0 + 1) * 1 <=>
1 + 1 = 1 * 2 <=>
1 = 2  //not valid

With the intent of avoiding scenarios like the above where a computation may have different results based on the evaluation approach taken, most operations in lenses do not allow operations to use nullable types.

Address nullability 

Lenses provides the following tools to address nullability in a flexible way:

COALESCE 

  • Coalesce: A function that allows specifying a list of fields to be tried until the first non null value is found.
    • Note: the coalesce function won’t verify if a non nullable field is provided so an error may still be thrown if all the provided fields are null
    • e.g:COALESCE(nullable_fieldA, nullable_fieldB, 0)

AS_NON_NULLABLE 

  • AS_NON_NULLABLE: a function that changes the type of a property from nullable to non nullable.
    • Note: This function is unsafe and will throw an error if a null value is passed. It should only be used if there’s a guarantee that the value won’t ever be null (for instance if used in a CASE branch where the null case has been previously handled or if the data has previously been filtered and the null values removed).
    • e.g: AS_NON_NULLABLE(nullable_field)

AS_NON_NULLABLE and CASE 

  • AS_NON_NULLABLE with CASE: A type checked construct equivalent to using coalesce:
    • e.g:
CASE
    WHEN a_nullable_field IS NULL THEN 0
    ELSE AS_NON_NULLABLE(a_nullable_field)
END

AS_NULLABLE 

The AS_NULLABLE function is the inverse transformation of the AS_NON_NULLABLE version. This function allows a non nullable field type to be transformed into a nullable type. It can be used to insert data into existing topics where the schema of the target field is nullable.

--
Last modified: September 26, 2024