TimestampConverter

SMT that allows the user to specify the format of the timestamp inserted as a header. It also avoids the synchronization block requirement for converting to a string representation of the timestamp.

An adapted version of the TimestampConverter SMT. The SMT adds a few more features to the original:

  • allows nested fields resolution (e.g. a.b.c)

  • uses _key or _value as prefix to understand the field to convert is part of the record Key or Value

  • allows conversion from one string representation to another (e.g. yyyy-MM-dd HH:mm:ss to yyyy-MM-dd)

  • allows conversion using a rolling window boundary (e.g. every 15 minutes, or one hour)

Transform Type Class

io.lenses.connect.smt.header.TimestampConverter

Configuration

NameDescriptionTypeDefaultValid Values

header.name

The name of the header to insert the timestamp into.

String

field

The field path containing the timestamp, or empty if the entire value is a timestamp. Prefix the path with the literal string _key, _value or _timestamp, to specify the record Key, Value or Timestamp is used as source. If not specified _value is implied.

String

target.type

Sets the desired timestamp representation.

String

string,unix,date,time,timestamp

format.from.pattern

Sets the format of the timestamp when the input is a string. The format requires a Java DateTimeFormatter-compatible pattern. Multiple (fallback) patterns can be added, comma-separated.

String

format.to.pattern

Sets the format of the timestamp when the output is a string. The format requires a Java DateTimeFormatter-compatible pattern.

String

rolling.window.type

An optional parameter for the rolling time window type. When set it will adjust the output value according to the time window boundary.

String

none

none, hours, minutes, seconds

rolling.window.size

An optional positive integer parameter for the rolling time window size. When rolling.window.type is defined this setting is required. The value is bound by the rolling.window.type configuration. If type is minutes or seconds then the value cannot bigger than 60, and if the type is hours then the max value is 24.

Int

15

unix.precision

The desired Unix precision for the timestamp. Used to generate the output when type=unix or used to parse the input if the input is a Long. This SMT will cause precision loss during conversions from, and to, values with sub-millisecond components.

String

milliseconds

seconds, milliseconds, microseconds, nanoseconds

timezone

Sets the timezone. It can be any valid java timezone. Overwrite it when target.type is set to date, time, or string, otherwise it will raise an exception.

String

UTC

Example

To convert to and from a string representation of the date and time in the format yyyy-MM-dd HH:mm:ss.SSS, use the following configuration:

transforms=TimestampConverter
transforms.TimestampConverter.type=io.lenses.connect.smt.header.TimestampConverter
transforms.TimestampConverter.header.name=wallclock
transforms.TimestampConverter.field=_value.ts
transforms.TimestampConverter.target.type=string
transforms.TimestampConverter.format.from.pattern=yyyyMMddHHmmssSSS
transforms.TimestampConverter.format.to.pattern=yyyy-MM-dd HH:mm:ss.SSS

To convert to and from a string representation while applying an hourly rolling window:

transforms=TimestampConverter
transforms.TimestampConverter.type=io.lenses.connect.smt.header.TimestampConverter
transforms.TimestampConverter.header.name=wallclock
transforms.TimestampConverter.field=_value.ts
transforms.TimestampConverter.target.type=string
transforms.TimestampConverter.format.from.pattern=yyyyMMddHHmmssSSS
transforms.TimestampConverter.format.to.pattern=yyyy-MM-dd-HH
transforms.TimestampConverter.rolling.window.type=hours
transforms.TimestampConverter.rolling.window.size=1

To convert to and from a string representation while applying an hourly rolling window and timezone:

transforms=TimestampConverter
transforms.TimestampConverter.type=io.lenses.connect.smt.header.TimestampConverter
transforms.TimestampConverter.header.name=wallclock
transforms.TimestampConverter.field=_value.ts
transforms.TimestampConverter.target.type=string
transforms.TimestampConverter.format.from.pattern=yyyyMMddHHmmssSSS
transforms.TimestampConverter.format.to.pattern=yyyy-MM-dd-HH
transforms.TimestampConverter.rolling.window.type=hours
transforms.TimestampConverter.rolling.window.size=1
transforms.TimestampConverter.timezone=Asia/Kolkata

To convert to and from a string representation while applying a 15 minutes rolling window:

transforms=TimestampConverter
transforms.TimestampConverter.type=io.lenses.connect.smt.header.TimestampConverter
transforms.TimestampConverter.header.name=wallclock
transforms.TimestampConverter.field=_value.ts
transforms.TimestampConverter.target.type=string
transforms.TimestampConverter.format.from.pattern=yyyyMMddHHmmssSSS
transforms.TimestampConverter.format.to.pattern=yyyy-MM-dd-HH-mm
transforms.TimestampConverter.rolling.window.type=minutes
transforms.TimestampConverter.rolling.window.size=15

To convert to and from a Unix timestamp, use the following:

transforms=TimestampConverter
transforms.TimestampConverter.type=io.lenses.connect.smt.header.TimestampConverter
transforms.TimestampConverter.header.name=wallclock
transforms.TimestampConverter.field=_key.ts
transforms.TimestampConverter.target.type=unix
transforms.TimestampConverter.unix.precision=milliseconds

Here is an example using the record timestamp field:

transforms=TimestampConverter
transforms.TimestampConverter.type=io.lenses.connect.smt.header.TimestampConverter
transforms.TimestampConverter.header.name=wallclock
transforms.TimestampConverter.field=_timestamp
transforms.TimestampConverter.target.type=unix
transforms.TimestampConverter.unix.precision=milliseconds

Configuration for format.from.pattern

Configuring multiple format.from.pattern items requires careful thought as to ordering and may indicate that your Kafka topics or data processing techniques are not aligning with best practices. Ideally, each topic should have a single, consistent message format to ensure data integrity and simplify processing.

Multiple Patterns Support

The format.from.pattern field supports multiple DateTimeFormatter patterns in a comma-separated list to handle various timestamp formats. Patterns containing commas should be enclosed in double quotes. For example:

format.from.pattern=yyyyMMddHHmmssSSS,"yyyy-MM-dd'T'HH:mm:ss,SSS"

Best Practices

While this flexibility can be useful, it is generally not recommended due to potential complexity and inconsistency. Ideally, a topic should have a single message format to align with Kafka best practices, ensuring consistency and simplifying data processing.

Configuration Order

The order of patterns in format.from.pattern matters. Less granular formats should follow more specific ones to avoid data loss. For example, place yyyy-MM-dd after yyyy-MM-dd'T'HH:mm:ss to ensure detailed timestamp information is preserved.

Last updated

Logo

2024 © Lenses.io Ltd. Apache, Apache Kafka, Kafka and associated open source project names are trademarks of the Apache Software Foundation.