A Kafka Connect Single Message Transform (SMT) that inserts date, year, month,day, hour, minute and second headers using the system clock as a message header.
The headers inserted are of type STRING. By using this SMT, you can partition the data by yyyy-MM-dd/HH
or yyyy/MM/dd/HH
, for example, and only use one SMT.
The list of headers inserted are:
date
year
month
day
hour
minute
second
All headers can be prefixed with a custom prefix. For example, if the prefix is wallclock_
, then the headers will be:
wallclock_date
wallclock_year
wallclock_month
wallclock_day
wallclock_hour
wallclock_minute
wallclock_second
When used with the Lenses connectors for S3, GCS or Azure data lake, the headers can be used to partition the data. Considering the headers have been prefixed by _
, here are a few KCQL examples:
To store the epoch value, use the following configuration:
To prefix the headers with wallclock_
, use the following:
To change the date format, use the following:
To use the timezone Asia/Kolkoata
, use the following:
To facilitate S3, GCS, or Azure Data Lake partitioning using a Hive-like partition name format, such as date=yyyy-MM-dd / hour=HH
, employ the following SMT configuration for a partition strategy.
and in the KCQL setting utilise the headers as partitioning keys: