InsertRollingFieldTimestampHeaders
Inserts the datetime as a message header from a value field and a rolling window configuration.
A Kafka Connect Single Message Transform (SMT) that inserts date, year, month,day, hour, minute and second headers using a timestamp field from the record payload and a rolling time window configuration. The timestamp field can be in various valid formats, including long integers, strings, or date objects. The timestamp field can originate from either the record Key or the record Value. When extracting from the record Key, prefix the field with _key.
; otherwise, extract from the record Value by default or explicitly using the field without prefixing. For string-formatted fields, specify a format.from.pattern
parameter to define the parsing pattern. Long integer fields are assumed to be Unix timestamps; the desired Unix precision can be specified using the unix.precision
parameter.
The headers inserted are of type STRING. By using this SMT, you can partition the data by yyyy-MM-dd/HH
or yyyy/MM/dd/HH
, for example, and only use one SMT.
The list of headers inserted are:
date
year
month
day
hour
minute
second
All headers can be prefixed with a custom prefix. For example, if the prefix is wallclock_
, then the headers will be:
wallclock_date
wallclock_year
wallclock_month
wallclock_day
wallclock_hour
wallclock_minute
wallclock_second
When used with the Lenses connectors for S3, GCS or Azure data lake, the headers can be used to partition the data. Considering the headers have been prefixed by _
, here are a few KCQL examples:
Transform Type Class
Configuration
field
The field name. If the key is part of the record Key, prefix with _key
; otherwise _value
. If _value
or _key
is not used, it defaults to the record Value to resolve the field.
String
format.from.pattern
Optional DateTimeFormatter-compatible format for the timestamp. Used to parse the input if the input is a string. Multiple (fallback) patterns can be added, comma-separated.
String
unix.precision
Optional. The desired Unix precision for the timestamp: seconds, milliseconds, microseconds, or nanoseconds. Used to parse the input if the input is a Long.
String
milliseconds
header.prefix.name
Optional header prefix.
String
date.format
Optional Java date time formatter.
String
yyyy-MM-dd
year.format
Optional Java date time formatter for the year component.
String
yyyy
month.format
Optional Java date time formatter for the month component.
String
MM
day.format
Optional Java date time formatter for the day component.
String
dd
hour.format
Optional Java date time formatter for the hour component.
String
HH
minute.format
Optional Java date time formatter for the minute component.
String
mm
second.format
Optional Java date time formatter for the second component.
String
ss
timezone
Optional. Sets the timezone. It can be any valid Java timezone.
String
UTC
locale
Optional. Sets the locale. It can be any valid Java locale.
String
en
rolling.window.type
Sets the window type. It can be fixed or rolling.
String
minutes
rolling.window.size
Sets the window size. It can be any positive integer, and depending on the window.type
it has an upper bound, 60 for seconds and minutes, and 24 for hours.
Int
15
## Example
To store the epoch value, use the following configuration:
To prefix the headers with wallclock_
, use the following:
To change the date format, use the following:
To use the timezone Asia/Kolkoata
, use the following:
To facilitate S3, GCS, or Azure Data Lake partitioning using a Hive-like partition name format, such as date=yyyy-MM-dd / hour=HH
, employ the following SMT configuration for a partition strategy.
and in the KCQL setting utilise the headers as partitioning keys:
Configuration for format.from.pattern
format.from.pattern
Configuring multiple format.from.pattern
items requires careful thought as to ordering and may indicate that your Kafka topics or data processing techniques are not aligning with best practices. Ideally, each topic should have a single, consistent message format to ensure data integrity and simplify processing.
Multiple Patterns Support
The format.from.pattern
field supports multiple DateTimeFormatter patterns in a comma-separated list to handle various timestamp formats. Patterns containing commas should be enclosed in double quotes. For example:
Best Practices
While this flexibility can be useful, it is generally not recommended due to potential complexity and inconsistency. Ideally, a topic should have a single message format to align with Kafka best practices, ensuring consistency and simplifying data processing.
Configuration Order
The order of patterns in format.from.pattern
matters. Less granular formats should follow more specific ones to avoid data loss. For example, place yyyy-MM-dd
after yyyy-MM-dd'T'HH:mm:ss
to ensure detailed timestamp information is preserved.
Last updated