Storage format
The output storage format depends on the sources. For example, if the incoming data is stored as JSON, then the output will be JSON as well. The same applies when Avro is involved.
When using a custom storage format, the output will be JSON.
At times, it is required to control the resulting Key and/or Value storage. If the input is JSON, for example, the output for the streaming computation can be set to Avro.
Another scenario involves Avro source(-s), and a result which projects the Key as a primitive type. Rather than using the Avro storage format to store the primitive, it might be required to use the actual primitive format.
Syntax
Controlling the storage format can be done using the following syntax:
There is no requirement to always set both the Key and the Value. Maybe only the Key or maybe only the Value needs to be changed. For example:
Considering a scenario where the input data is stored as Avro, and there is an aggregation on a field which yields an INT, using the primitive INT storage and not the Avro INT storage set the Key format to INT:
Here is an example of the scenario of having Json input source(-s), but an Avro stored output:
Validation
Changing the storage format is guarded by a set of rules. The following table describes how storage formats can be converted for the output.
INT
=
yes
yes
no
yes
no
no
LONG
no
=
yes
no
yes
no
no
STRING
no
no
=
no
yes
no
no
JSON
If the Json storage contains integer only
If the Json storage contains integer or long only
yes
=
yes
no
no
AVRO
If Avro storage contains integer only
If the Avro storage contains integer or long only
yes
yes
=
no
no
XML
no
no
no
yes
yes
no
no
Custom (includes Protobuf)
no
no
no
yes
yes
no
no
Time/Session window validations
Time windowed formats follow similar rules to the ones described above with the additional constraint that Session Windows(SW) cannot be converted into Time Windows (TW) nor vice-versa.
SW[A]
yes if format A is compatible with format B
no
TW[A]
no
yes if format A is compatible with format B
Example: Changing the storage format from TWAvro to TWJson is possible since they’re both TW formats and Avro can be converted to JSON.
Example: Changing the storage format from TWString to TWJson is not possible since, even though they’re both TW formats, String formats can’t be written as JSON.
XML as well as any custom formats are only supported as an input format. Lenses will, by default translate and process these formats by translating them to JSON and writing them as such (AVRO is also supported if a store is explicitly set).
Last updated