Storage format

The output storage format depends on the sources. For example, if the incoming data is stored as JSON, then the output will be JSON as well. The same applies when Avro is involved.

When using a custom storage format, the output will be JSON.

At times, it is required to control the resulting Key and/or Value storage. If the input is JSON, for example, the output for the streaming computation can be set to Avro.

Another scenario involves Avro source(-s), and a result which projects the Key as a primitive type. Rather than using the Avro storage format to store the primitive, it might be required to use the actual primitive format.

Syntax

Controlling the storage format can be done using the following syntax:

INSERT INTO <target>
STORE
  KEY AS <format>
  VALUE AS <format>
  ...

There is no requirement to always set both the Key and the Value. Maybe only the Key or maybe only the Value needs to be changed. For example:

 INSERT INTO <target>
 STORE KEY AS <format>
 ...

 //or

 INSERT INTO <target>
 STORE VALUE AS <format>
 ...

Considering a scenario where the input data is stored as Avro, and there is an aggregation on a field which yields an INT, using the primitive INT storage and not the Avro INT storage set the Key format to INT:

INSERT INTO <target>
STORE KEY AS INT
SELECT TABLE
    SUM(amount) AS total
FROM <source>
GROUP BY CAST(merchantId AS int)

Here is an example of the scenario of having Json input source(-s), but an Avro stored output:

INSERT INTO <target>
STORE
  KEY AS AVRO
  VALUE AS AVRO
SELECT STREAM
    _key.cId AS _key.cId
    , CONCAT(_key.name, "!") AS _key.name
    , pId
    , CONCAT("!", name) AS name
    , surname
    , age  
FROM <source>

Validation

Changing the storage format is guarded by a set of rules. The following table describes how storage formats can be converted for the output.

From \ ToINTLONGSTRINGJSONAVROXMLCustom/Protobuf

INT

=

yes

yes

no

yes

no

no

LONG

no

=

yes

no

yes

no

no

STRING

no

no

=

no

yes

no

no

JSON

If the Json storage contains integer only

If the Json storage contains integer or long only

yes

=

yes

no

no

AVRO

If Avro storage contains integer only

If the Avro storage contains integer or long only

yes

yes

=

no

no

XML

no

no

no

yes

yes

no

no

Custom (includes Protobuf)

no

no

no

yes

yes

no

no

Time/Session window validations

Time windowed formats follow similar rules to the ones described above with the additional constraint that Session Windows(SW) cannot be converted into Time Windows (TW) nor vice-versa.

From \ ToSW[B]TW[B]

SW[A]

yes if format A is compatible with format B

no

TW[A]

no

yes if format A is compatible with format B

Example: Changing the storage format from TWAvro to TWJson is possible since they’re both TW formats and Avro can be converted to JSON.

Example: Changing the storage format from TWString to TWJson is not possible since, even though they’re both TW formats, String formats can’t be written as JSON.

XML as well as any custom formats are only supported as an input format. Lenses will, by default translate and process these formats by translating them to JSON and writing them as such (AVRO is also supported if a store is explicitly set).

Last updated

Logo

2024 © Lenses.io Ltd. Apache, Apache Kafka, Kafka and associated open source project names are trademarks of the Apache Software Foundation.