Set storage format


The output storage format, see formats, depends on the sources. For example, if the incoming data is stored as Json, then the output will be Json as well. The same applies, when Avro is involved.

When using custom storage format, the output will be Json.

At times, it is required to control the resulting Key and/or Value storage. If the input is Json, for example, the output for the streaming computation can be set to Avro.

Another scenario involves Avro source(-s), and a result which projects the Key as a primitive type. Rather than using the Avro storage format to store the primitive, it might be required to use the actual primitive format.

Syntax 

Controlling the storage format can be done using the following syntax:

INSERT INTO <target>
STORE
  KEY AS <format>
  VALUE AS <format>
  ...

There is no requirement to always set both the Key and the Value. Maybe only the Key or maybe only the Value needs to be changed. For example:

 INSERT INTO <target>
 STORE KEY AS <format>
 ...

 //or

 INSERT INTO <target>
 STORE VALUE AS <format>
 ...

Considering a scenario where the input data is stored as Avro, and there is an aggregation on a field which yields an INT, to use the primitive INT storage and not the Avro INT storage set the Key format to INT:

INSERT INTO <target>
STORE KEY AS INT
SELECT TABLE
    SUM(amount) AS total
FROM <source>
GROUP BY CAST(merchantId AS int)

Here is an example for the scenario of having Json input source(-s), but an Avro stored output:

INSERT INTO <target>
STORE
  KEY AS AVRO
  VALUE AS AVRO
SELECT STREAM
    _key.cId AS _key.cId
    , CONCAT(_key.name, "!") AS _key.name
    , pId
    , CONCAT("!", name) AS name
    , surname
    , age  
FROM <source>

Validation 

Changing the storage format is guarded by a set of rules. Following table describes how storage formats can be converted for the output.

From \ ToINTLONGSTRINGJSONAVROXMLCustom/Protobuf
INT=yesyesnoyesnono
LONGno=yesnoyesnono
STRINGnono=noyesnono
JSONIf the Json storage contains integer onlyIf the Json storage contains integer or long onlyyes=yesnono
AVROIf Avro storage contains integer onlyIf the Avro storage contains integer or long onlyyesyes=nono
XMLnononoyesyesnono
Custom (includes Protobuf)nononoyesyesnono

Time/Session window validations 

Time windowed formats follow similar rules to the ones described above with the additional constraint that Session Windows(SW) cannot be converted into Time Windows (TW) nor vice-versa.

From \ ToSW[B]TW[B]
SW[A]yes if format A is compatible with format Bno
TW[A]noyes if format A is compatible with format B

Example: Changing the storage format from TWAvro to TWJson is possible since they’re both TW formats and they Avro can be converted to Json.

Example: Changing the storage format from TWString to TWJson is not possible since, even thought they’re both TW formats, String formats can’t be written as JSON.

--
Last modified: September 15, 2024