4.1
Nullability
Null values are used as the way to express a value that isn’t yet known.
Null values can be found in the data present in existing sources, or they can be the product of joining data using non-inner joins.
The schema of nullable types is represented as a union of the field type and a null value:
{
"type": "record",
"name": "record",
"namespace": "example",
"doc": "A schema representing a nullable type.",
"fields": [
{
"name": "property",
"type": [
"null",
"double"
],
"doc": "A property that can be null or a double."
}
]
}
Problem: operations using nullable types
Working with null values can create situations where it’s not clear what the outcome of an operation is.
One example of this would be the following:
null * 1 = ?
null + 1 = ?
Looking at the first two expressions, one may be tempted to solve the problem above by saying “null is 1 when multiplying and 0 when summing” meaning the following would be the evaluation result:
null * 1 = 1
null + 1 = 1
Rewriting the third expression applying the distributive property of multiplication however shows that the rule creates inconsistencies:
(null + 1) * null = (null + 1) * null <=>
null * null + 1 * null = (null + 1) * null <=>
1 * 1 + 1*1 = (0 + 1) * 1 <=>
1 + 1 = 1 * 2 <=>
1 = 2 //not valid
With the intent of avoiding scenarios like the above where a computation may have different results based on the evaluation approach taken, most operations in lenses do not allow operations to use nullable types.
COALESCE, AS_NON_NULLABLE and CASE
Lenses provides the following tools to address nullability in a flexible way:
Coalesce: A function that allows specifying a list of fields to be tried until the first non null value is found.
- Note: the coalesce function won’t verify if a non nullable field is provided so an error may still be thrown if all the provided fields are null
- e.g:
COALESCE(nullable_fieldA, nullable_fieldB, 0)
AS_NON_NULLABLE: a function that changes the type of a property from nullable to non nullable.
- Note: This function is unsafe and will throw an error if a null value is passed. It should only be used if there’s a guarantee that the value won’t ever be null (for instance if used in a CASE branch where the null case has been previously handled or if the data has previously been filtered and the null values removed).
- e.g:
AS_NON_NULLABLE(nullable_field)
AS_NON_NULLABLE with CASE: A type checked construct equivalent to using coalesce:
- e.g:
CASE WHEN a_nullable_field IS NULL THEN 0 ELSE AS_NON_NULLABLE(a_nullable_field) END
AS_NULLABLE
The AS_NULLABLE
function is the inverse transformation of the AS_NON_NULLABLE
version. This function allows a non nullable field type to be transformed into a nullable type. It can be used to insert data into existing topics where the schema of the target field is nullable.