1 of 1

Stream Reactor

This page contains the release notes for the Stream Reactor.

11.1.0 DataLakes Sinks

This release resolves a gap in the S3, GCS, and Azure sinks when latest schema optimization is enabled (via connect.***.latest.schema.optimization.enabled). Previously, connectors would fail if an Avro field changed from an Enum to a Union of an Enum and String. This update fixes that problem.

11.0.0

This release upgrades the connectors to Apache Kafka 4.1. All connectors have been tested and verified against Kafka versions 4.0 and 4.1.

Compatibility Notice

Important: Kafka 3.x versions are no longer supported or tested with this release. While the connectors may function with Kafka 3.x environments, they are not recommended for production use without thorough testing and validation in your specific environment.

Organizations currently running Kafka 3.x should plan to upgrade their Kafka infrastructure to version 4.0 or later before deploying these connectors in production.

Connector Retirement

Effective with version 11.0.0, the following sink connectors have been retired and are no longer available:

Redis Sink Connector
InfluxDB Sink Connector

These connectors will not receive further updates, bug fixes, or security patches. Organizations currently using these connectors should evaluate alternative solutions and plan migration strategies accordingly.

For questions regarding migration paths or alternative connector options, please consult the product documentation or contact support.

10.0.3

Datalakes sinks (AWS S3, GCS and Azure DataLake Gen2)

The sink addresses a gap by committing the message offset as-is, rather than incrementing it by one. This behavior causes Kafka's underlying consumer to appear as if it's always one message behind each partition. This isn't an issue under exactly-once semantics, as the state store in the data lake ensures message processing integrity, ignoring any already processed messages. Under at-least-once semantics, replaying a message is considered acceptable.

10.0.2

Azure DataLake Gen 2

This patch release addresses two issues on the Azure DataLake sink

Fixed errors caused by malformed HTTP headers in ADLS Gen2 requests, such as:
Fixed failures when writing to nested directories in ADLS Gen2 with Hierarchical Namespace (HNS) enabled.

Thank you to and for their fixes.

10.0.1

DataLakes

This patch addresses several critical bugs and regressions introduced in version 9.0.0, affecting the Datalake connectors for S3, GCS, and Azure.

Connector Restart Logic: Fixed an issue that caused an error on restart if pending operations from a previous session had not completed.
S3 Connector: Corrected a bug where delete object requests incorrectly sent an empty versionId. This resolves API exceptions when using S3-compatible storage solutions, such as Pure Storage.
Addressed a regression where the connector name was omitted from the lock path for exactly-once processing. This fix prevents conflicts when multiple connectors read from the same Kafka topics and write to the same destination bucket.

10.0.0

New Apache Cassandra sink

This new connector is compatible with the any Apache Cassandra compatible databases from version 3+. The previous connector is deprecated and will be removed in a future release

Azure CosmosDB sink

Version 10.0.0 introduces breaking changes: the connector has been renamed from DocumentDB, uses the official CosmosDB SDK, and supports new bulk and key strategies.

The CosmosDB Sink Connector is a Kafka Connect sink connector designed to write data from Kafka topics into Azure CosmosDB. It was formerly known as the DocumentDB Sink Connector and has been fully renamed and refactored with the following key changes:

Renamed from DocumentDB to CosmosDB across all code and documentation.
Updated package names to reflect the new CosmosDB naming.
Replaced the legacy DocumentDB client with the official CosmosDB Java SDK.
Introduced Bulk Processing Mode to enhance write throughput.

New Features and Improvements

In non-envelope mode, DataLakes connector sinks now skip tombstone records, preventing connector failures.
HTTP Sink: Enable Import of All Headers in the Message Template

9.0.2

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

The previous commit that addressed the Exactly Once semantics fix inadvertently omitted support for the connect.xxxxxxx.indexes.name configuration. As a result, custom index filenames were not being applied as expected.

This commit restores that functionality by ensuring the connect.xxxxxxx.indexes.name setting is properly respected during index file generation.

9.0.1

Google BigQuery Sink

The latest release of Stream Reactor includes a fork of Apache 2's Confluent BigQuery, originally sourced from WePay.

9.0.0

All Modules

Dependency upgrades.
Project now builds for Kafka 3.9.1.
Exclude Kafka dependencies and certain other things (eg log frameworks) from the final jar.

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

Improved Exactly Once Semantics by introducing a per-task lock mechanism to prevent duplicate writes when Kafka Connect inadvertently runs multiple tasks for the same topic-partition.
- Each task now creates a lock file per topic-partition (e.g., lockfile_topic_partition.yml) upon startup.
- The lock file is used to ensure only one task (the most recent) can write to the target bucket, based on object eTag validation.

MQTT Connector

Refactored MqttWriter to improve payload handling and support dynamic topics, with JSON conversion for structured data and removal of legacy converters and projections (#232).
Leading Slashes Fixes:
- Leading slash from MQTT topic should be removed when converting to Kafka topic target.

8.1.33

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

This update introduces an optimization that reduces unnecessary data flushes when writing to Avro or Parquet formats. Specifically, it leverages schema compatibility to avoid flushing data when messages with older but backward-compatible schemas are encountered.

How It Works

When the input message schema changes in a backward-compatible way, the sink tracks the latest schema and continues using it for serialization. This means that incoming records with an older schema won't automatically trigger a flush, as long as the older schema is compatible with the current one.

Example Flush Behavior

Consider the following sequence of messages and their associated schemas:

How to enable it

Set the following config to true:

S3: connect.s3.latest.schema.optimization.enabled
GCP: connect.gcpstorage.latest.schema.optimization.enabled
Azure DataLake: connect.datalake.latest.schema.optimization.enabled

8.1.32

GCP Storage Sink

Reduces log entries, by moving the log entry confirming the object upload from INFO to DEBUG

8.1.31

DataLake Sinks

This release introduces improvements to Avro error handling, providing better diagnostics and insights into failures during data writing.

8.1.30

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

Resolved dependency issues causing sink failures with Confluent 7.6.0. Confluent 7.6.0 introduced new Schema Registry rule modules that were force-loaded, even when unused, leading to the following error:
This update ensures compatibility by adjusting dependencies, preventing unexpected failures in affected sinks.

HTTP Sink

Adjusting the following log line from INFO level to TRACE level

8.1.29

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

Support has been added for writing to the same bucket from two different connectors located in different Kafka clusters, with both reading from the same topic name. To differentiate the object keys generated and prevent data loss, a new KCQL syntax has been introduced:\

HTTP Sink

This feature offers fixed interval retries in addition to the Exponential option. To enable it, use the following configuration settings

A bug affecting the P99, P90, and P50 metrics has been resolved. Previously, failed HTTP requests were not included in the metrics calculation.

8.1.28

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

Enable Skipping of Null Records

Added the following configuration property to prevent possible NullPointerException situations in the S3 Sink, Azure Datalake Sink, and GCP Storage Sink connectors:

S3 Sink: connect.s3.skip.null.values
GCP Storage Sink: connect.gcpstorage.skip.null.values
Azure Datalake Sink: connect.datalake.skip.null.values

When set to true, the sinks will skip null value records instead of processing them. Defaults to false.

If you expect null or tombstone records and are not using envelope mode, enabling this setting is recommended to avoid errors.

HTTP Sink

Fix MBean Registration Issue

Fixed an issue where MBean registration failed when multiple tasks were used, causing exceptions due to duplicate instances. The MBean name now includes the task ID to ensure uniqueness, and MBeans are properly unregistered when tasks stop.

8.1.27

Reverted: Change to LastModified Ordering with Post-Process Actions (introduced in 8.1.23)

We have reverted the change that avoided filtering to the latest result when ordering by LastModified with a post-process action. The original change aimed to prevent inconsistencies due to varying LastModified timestamps.

However, this introduced a potential issue due to Kafka Connect’s commitRecord method being called asynchronously, meaning files may not be cleaned up before the next record is read. This could result in records being processed multiple times. The previous behaviour has been restored to ensure correctness.

8.1.26

Azure Service Bus source

Source watermark is not stored anymore since it is not used when the task restarts

GCP PubSub source

A fix was made to enable attaching the headers to the resulting Kafka message.

8.1.25

All Modules

Align Confluent lib dependency with Kafka 3.8.

AWS S3 Source

Customers have noticed that when the source deletes files on S3, then the entire path is deleted. Whilst this is actually due to the way S3 works, we can actually do something about this in the connector.

This adds a new boolean KCQL property to the S3 source: post.process.action.retain.dirs (default value: false)

If this is set to true, then upon moving/deleting files within the source post-processing, then first a zero-byte object will be created to ensure that the path will still be represented on S3.

8.1.24

HTTP

JMX Metrics Support

The HTTP Sink now publishes counts for success, request average among other metrics to JMX.

Extractor Fix

A bug was reported that referencing the entire record value yields an exception when the payload is a complex type. In this case a HashMap.

The changes allow the return of the value when the extractor has no field to extract (ie. extract the full value)

DataLake Sinks (S3, GCP Storage, Azure Data Lake) Optimised Schema Change

This PR revamps the mechanisms governing schema change rollover, as they can lead to errors. New functionality is introduced to ensure improved compatibility.

Removed Hadoop Shaded Dependency

The hadoop-shaded-protobuf dependency has been removed as it was surplus to requirements and pulling in an old version of protobuf-java which introduced vulnerabilities to the project.

Removed Properties for Schema Change Rollover:

The properties $connectorPrefix.schema.change.rollover have been removed for the following connectors: connect.s3, connect.datalake, and connect.gcpstorage. This change eliminates potential errors and simplifies the schema change handling logic.

New Property for Schema Change Detection:

The property $connectorPrefix.schema.change.detector has been introduced for the following connectors: connect.s3, connect.datalake, and connect.gcpstorage.

This property allows users to configure the schema change detection behavior with the following possible values: • default: Schemas are compared using object equality. • version: Schemas are compared by their version field. • compatibility: A more advanced mode that ensures schemas are compatible using Avro compatibility features.

Packaging Changes

The META-INF/maven directory is now built by the assembly process. This will ensure it reflects what is in the jar and that this directory is not just built from our component jars. This mirrors an approach we took for the secret-provider.

8.1.23

DataLakes (S3, GCP) source fixes

Polling Backoff

The connector incurs high costs when there is no data available in the buckets because it continuously polls the data lake in a tight loop, as controlled by Kafka Connect.

From this version by default a backoff queue is used, introducing a standard method for backing off calls to the underlying cloud platform.

Avoid filtering by lastSeenFile where a post process action is configured

When ordering by LastModified and a post-process action is configured, avoid filtering to the latest result.

This change avoids bugs caused by inconsistent LastModified dates used for sorting. If LastModified sorting is used, ensure objects do not arrive late, or use a post-processing step to handle them.

Add a flag to populate kafka headers with the watermark partition/offset

This adds a connector property for GCP Storage and S3 Sources: connect.s3.source.write.watermark.header connect.gcpstorage.source.write.watermark.header

If set to true then the headers in the source record produced will include details of the source and line number of the file.

If set to false (the default) then the headers won't be set.

Currently this does not apply when using the envelope mode.

8.1.22

DataLakes (S3, GCP) source fixes

This release addresses two critical issues:

Corrupted connector state when DELETE/MOVE is used: The connector is designed to store the last processed document and its location within its state for every message sent to Kafka. This mechanism ensures that the connector can resume processing from the correct point in case of a restart. However, when the connector is configured with a post-operation to move or delete processed objects within the data lake, it stores the last processed object in its state. If the connector restarts and the referenced object has been moved or deleted externally, the state points to a non-existent object, causing the connector to fail. The current workaround requires manually cleaning the state and restarting the connector, which is inefficient and error-prone.
Incorrect Handling of Move Location Prefixes: When configuring the move location within the data lake, if the prefix ends with a forward slash (/), it results in malformed keys like a//b. Such incorrect paths can break compatibility with query engines like Athena, which may not handle double slashes properly.

8.1.21

Azure Service Bus source

Performance improvements in the source to handle a higher throughput. The code now leverages prefetch count, and disables the auto complete. The following connector configs were added

connect.servicebus.source.prefetch.count The number of messages to prefetch from ServiceBus
connect.servicebus.source.complete.retries.max The maximum number of retries to attempt while completing a message
connect.servicebus.source.complete.retries.min.backoff.ms The minimum duration in milliseconds for the first backoff

8.1.20

Datalakes extra logging

A NullPointerException is thrown and the sinks lose the stacktrace. It is a patch to enhance the logging

8.1.19

Prevent ElasticSearch from Skipping Records After Tombstone

Addresses a critical bug in ElasticSearch versions 6 (ES6) and 7 (ES7) where records following a tombstone are inadvertently skipped during the insertion process. The issue stemmed from an erroneous return statement that halted the processing of subsequent records.

8.1.18

🚀 New Features

Sequential Message Sending for Azure Service Bus

Introduced a new KCQL property: batch.enabled (default: true).
Users can now disable batching to send messages sequentially, addressing specific scenarios with large message sizes (e.g., >1 MB).
Why this matters: Batching improves performance but can fail for large messages. Sequential sending ensures reliability in such cases.

Post-Processing for Datalake Cloud Source Connectors

Added post-processing capabilities for AWS S3 and GCP Storage source connectors ( Azure Datalake Gen 2 support coming soon).
New KCQL properties:
- post.process.action: Defines the action (DELETE or MOVE) to perform on source files after successful processing.

Example 2: Move files to an archive bucket:

🛠 Dependency Updates

Updated Azure Service Bus Dependencies

azure-core updated to version 1.54.1.
azure-messaging-servicebus updated to version 7.17.6.

These updates ensure compatibility with the latest Azure SDKs and improve stability and performance.

Upgrade Notes

Review the new KCQL properties and configurations for Azure Service Bus and Datalake connectors.
Ensure compatibility with the updated Azure Service Bus dependencies if you use custom extensions or integrations.

Thank you to all contributors! 🎉

8.1.17

Improves ElasticSearch sinks by allowing the message key fields, or message headers to be used as part of the document primary key. Reference _key[.key_field_path or _header.header_name to set the document primary key
Fixes the data lake sinks error message on flush.interval

8.1.16

Improvements to the HTTP Sink

Queue Limiting: We've set a limit on the queue size per topic to reduce the chances of an Out-of-Memory (OOM) issue. Previously the queue was unbounded and in a scenario where the http calls are slow and the sink gets more records than it clears, it would eventually lead to OOM.
Offering Timeout: The offering to the queue now includes a timeout. If there are records to be offered, but the timeout is exceeded, a retriable exception is thrown. Depending on the connector's retry settings, the operation will be attempted again. This helps avoid situations where the sink gets stuck processing a slow or unresponsive batch.
Duplicate Record Handling: To prevent the same records from being added to the queue multiple times, we've introduced a Map[TopicPartition, Offset] to track the last processed offset for each topic-partition. This ensures that the sink does not attempt to process the same records repeatedly.

8.1.15

HTTP sink ignores null records. Avoids a NPE if custom SMTs send null records to the pipeline

8.1.14

Improvements to the HTTP Sink reporter

8.1.13

8.1.12

Datalake sinks allow an optimisation configuration to avoid rolling the file on schema change. To be used only when the schema changes are backwards compatible

Fixes for the HTTP sink batch.

8.1.11

HTTP sink improvements

8.1.10

Dependency version upgrades.

Azure Service Bus Source

Fixed timestamp of the messages and some of the fields in the payload to correctly use UTC millis.

8.1.9

Azure Service Bus Source

Changed contentType in the message to be nullable (Optional String).

8.1.8

HTTP Sink

Possible exception fix

Store success/failure response codes in http for the reporter.

8.1.7

HTTP Sink

Fixes possible exception in Reporter code.

8.1.6

HTTP Sink

Fixes and overall stability improvements.

8.1.4

Google Cloud Platform Storage

Remove overriding transport options on GCS client.

8.1.3

HTTP Sink

Bugfix: HTTP Kafka Reporter ClientID is not unique.

8.1.2

HTTP Sink & GCS Sink

Improvements for handing HTTP sink null values & fix for NPE on GCS.

8.1.1

Removal of shading for Avro and Confluent jars.

8.1.0

HTTP Sink

Addition of HTTP reporter functionally to route HTTP success and errors to a configurable topic.

8.0.0

HTTP Sink

Breaking Configuration Change

Configuration for the HTTP Sink changes from Json to standard Kafka Connect properties. Please study the documentation if upgrading to 8.0.0.

Removed support for JSON configuration format. Instead please now configure your HTTP Sink connector using kafka connect properties. .
Introduced SSL Configuration. .
Introduced OAuth Support. .

7.4.5

ElasticSearch (6 and 7)

Support for dynamic index names in KCQL.
Configurable tombstone behaviour using KCQL property. behavior.on.null.values
SSL Support using standard Kafka Connect properties.

Http Sink

Adjust defaultUploadSyncPeriod to make connector more performant by default.

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

Fix for issue causing sink to fail with NullPointerException due to invalid offset.

7.4.4

Azure Datalake & GCP Storage

Dependency version upgrades

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

Fixes a gap in the avro/parquet storage where enums where converted from Connect enums to string.
Adds support for explicit "no partition" specification to kcql, to enable topics to be written in the bucket and prefix without partitioning the data.
- Syntax Example: INSERT INTO foo SELECT * FROM bar NOPARTITION

7.4.3

All Connectors

Dependency version upgrades

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

This release introduces a new configuration option for three Kafka Connect Sink Connectors—S3, Data Lake, and GCP Storage—allowing users to disable exactly-once semantics. By default, exactly once is enabled, but with this update, users can choose to disable it, opting instead for Kafka Connect’s native at-least-once offset management.

S3 Sink Connector: connect.s3.exactly.once.enable Data Lake Sink Connector: connect.datalake.exactly.once.enable GCP Storage Sink Connector: connect.gcpstorage.exactly.once.enable

Default Value: true

Indexing is enabled by default to maintain exactly-once semantics. This involves creating an .indexes directory at the root of your storage bucket, with subdirectories dedicated to tracking offsets, ensuring that records are not duplicated.

Users can now disable indexing by setting the relevant property to false. When disabled, the connector will utilise Kafka Connect’s built-in offset management, which provides at-least-once semantics instead of exactly-once.

7.4.2

All Connectors

Dependency version upgrades

AWS S3 Source and GCP Storage Source Connectors

File Extension Filtering

Introduced new properties to allow users to filter the files to be processed based on their extensions.

For AWS S3 Source Connector:

connect.s3.source.extension.excludes: A comma-separated list of file extensions to exclude from the source file search. If this property is not configured, all files are considered.
connect.s3.source.extension.includes: A comma-separated list of file extensions to include in the source file search. If this property is not configured, all files are considered.

For GCP Storage Source Connector:

connect.gcpstorage.source.extension.excludes: A comma-separated list of file extensions to exclude from the source file search. If this property is not configured, all files are considered.
connect.gcpstorage.source.extension.includes: A comma-separated list of file extensions to include in the source file search. If this property is not configured, all files are considered.

These properties provide more control over the files that the AWS S3 Source Connector and GCP Storage Source Connector process, improving efficiency and flexibility.

GCP Storage Source and Sink Connectors

Increases the default HTTP Retry timeout from 250ms total timeout to 3 minutes as default. The default consumer group max.poll.timeout is 5 minutes, so it’s within the boundaries to avoid a group rebalance.

7.4.1

GCP Storage Connector

Adding retry delay multiplier as a configurable parameter (with default value) to Google Cloud Storage Connector. Main changes revolve around RetryConfig class and its translation to gax HTTP client config.

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

Making indexes directory configurable for both source and sink.

Sinks

Use the below properties to customise the indexes root directory:

connect.datalake.indexes.name
connect.gcpstorage.indexes.name
connect.s3.indexes.name

See connector documentation for more information.

Sources:

Use the below properties to exclude custom directories from being considered by the source.
connect.datalake.source.partition.search.excludes
connect.gcpstorage.source.partition.search.excludes
connect.s3.source.partition.search.excludes

7.4.0

NEW: Azure Service Bus Sink Connector

Azure Service Bus Sink Connector.

Other Changes

Exception / Either Tweaks
Bump java dependencies (assertJ and azure-core)
Add logging on flushing
Fix: Java class java.util.Date support in cloud sink maps

Full Changelog:

7.3.2

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

Support added for writing Date, Time and Timestamp data types to AVRO.

7.3.1

GCP Storage Connector

Invalid Protocol Configuration Fix

7.3.0

NEW: Azure Service Bus Source Connector

Azure Service Bus Source Connector

NEW: GCP Pub/Sub Source Connector

GCP Pub/Sub Source Connector

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

To back the topics up the KCQL statement is

When * is used the envelope setting is ignored.

This change allows for the * to be taken into account as a default if the given message topic is not found.

HTTP Sink

Bug fix to ensure that, if specified as part of the template, the Content-Type header is correctly populated.

All Connectors:

Update of dependencies.

Upgrading from any version prior to 7.0.0, please see the release and upgrade notes for 7.0.0.

7.2.0

Enhancements

Automated Skip for Archived Objects:

The S3 source now seamlessly bypasses archived objects, including those stored in Glacier and Deep Archive. This enhancement improves efficiency by automatically excluding archived data from processing, avoiding the connector crashing otherwise

Enhanced Key Storage in Envelope Mode:

Changes have been implemented to the stored key when using envelope mode. These modifications lay the groundwork for future functionality, enabling seamless replay of Kafka data stored in data lakes (S3, GCS, Azure Data Lake) from any specified point in time.

Full Changelog:

7.1.0

Source Line-Start-End Functionality Enhancements

We’ve rolled out enhancements to tackle a common challenge faced by users of the S3 source functionality. Previously, when an external producer abruptly terminated a file without marking the end message, data loss occurred.

To address this, we’ve introduced a new feature: a property entry for KCQL to signal the handling of unterminated messages. Meet the latest addition, read.text.last.end.line.missing. When set to true, this property ensures that in-flight data is still recognized as a message even when EOF is reached but the end line marker is missing.

Upgrading from any version prior to 7.0.0, please see the release and upgrade notes for 7.0.0.

7.0.0

Data-lakes Sink Connectors

This release brings substantial enhancements to the data-lakes sink connectors, elevating their functionality and flexibility. The focal point of these changes is the adoption of the new KCQL syntax, designed to improve usability and resolve limitations inherent in the previous syntax.

Key Changes

New KCQL Syntax: The data-lakes sink connectors now embrace the new KCQL syntax, offering users enhanced capabilities while addressing previous syntax constraints.
Data Lakes Sink Partition Name: This update ensures accurate preservation of partition names by avoiding the scraping of characters like \ and /. Consequently, SMTs can provide partition names as expected, leading to reduced configuration overhead and increased conciseness.

KCQL Keywords Replaced

Several keywords have been replaced with entries in the PROPERTIES section for improved clarity and consistency:

WITHPARTITIONER: Replaced by PROPERTIES ('partition.include.keys'=true/false). When WITHPARTITIONER KeysAndValue is set to true, the partition keys are included in the partition path. Otherwise, only the partition values are included.
WITH_FLUSH_SIZE: Replaced by PROPERTIES ('flush.size'=$VALUE).

Benefits

The adoption of the new KCQL syntax enhances the flexibility of the data-lakes sink connectors, empowering users to tailor configurations more precisely to their requirements. By transitioning keywords to entries in the PROPERTIES section, potential misconfigurations stemming from keyword order discrepancies are mitigated, ensuring configurations are applied as intended.

Upgrades

Please note that the upgrades to the data-lakes sink connectors are not backward compatible with existing configurations. Users are required to update their configurations to align with the new KCQL syntax and PROPERTIES entries. This upgrade is necessary for any instances of the sink connector (S3, Azure, GCP) set up before version 7.0.0.

To upgrade to the new version, users must follow these steps:

Stop the sink connectors.
Update the connector to version 7.0.0.
Edit the sink connectors’ KCQL setting to translate to the new syntax and PROPERTIES entries.
Restart the sink connectors.

6.3.1

This update specifically affects datalake sinks employing the JSON storage format. It serves as a remedy for users who have resorted to a less-than-ideal workaround: employing a Single Message Transform (SMT) to return a Plain Old Java Object (POJO) to the sink. In such cases, instead of utilizing the Connect JsonConverter to seamlessly translate the payload to JSON, reliance is placed solely on Jackson.

However, it’s crucial to note that this adjustment is not indicative of a broader direction for future expansions. This is because relying on such SMT practices does not ensure an agnostic solution for storage formats (such as Avro, Parquet, or JSON).

6.3.0

NEW: HTTP Sink Connector

HTTP Sink Connector (Beta)

NEW: Azure Event Hubs Source Connector

Azure Event Hubs Source Connector.

All Connectors:

Update of dependencies, CVEs addressed.

Please note that the Elasticsearch 6 connector will be deprecated in the next major release.

6.2.0

NEW: GCP Storage Source Connector

GCP Storage Source Connector (Beta)

AWS Source Connector

Important: The AWS Source Partition Search properties have changed for consistency of configuration. The properties that have changed for 6.2.0 are:
- connect.s3.partition.search.interval changes to connect.s3.source.partition.search.interval.
- connect.s3.partition.search.continuous changes to connect.s3.source.partition.search.continuous

JMS Source and Sink Connector

Jakarta Dependency Migration: Switch to Jakarta EE dependencies in line with industry standards to ensure evolution under the Eclipse Foundation.

All Connectors:

There has been some small tidy up of dependencies, restructuring and removal of unused code, and a number of connectors have a slimmer file size without losing any functionality.
The configuration directory has been removed as these examples are not kept up-to-date. Please see the connector documentation instead.
Dependency upgrades.

6.1.0

All Connectors:

In this release, all connectors have been updated to address an issue related to conflicting Antlr jars that may arise in specific environments.

AWS S3 Source:

Byte Array Support: Resolved an issue where storing the Key/Value as an array of bytes caused compatibility problems due to the connector returning java.nio.ByteBuffer while the Connect framework’s ByteArrayConverter only works with byte[]. This update ensures seamless conversion to byte[] if the key/value is a ByteBuffer.

JMS Sink:

Fix for NullPointerException: Addressed an issue where the JMS sink connector encountered a NullPointerException when processing a message with a null JMSReplyTo header value.

JMS Source:

Fix for DataException: Resolved an issue where the JMS source connector encountered a DataException when processing a message with a JMSReplyTo header set to a queue.

AWS S3 Sink/GCP Storage Sink (beta)/Azure Datalake Sink (beta):

GZIP Support for JSON Writing: Added support for GZIP compression when writing JSON data to AWS S3, GCP Storage, and Azure Datalake sinks.

6.0.2

GCP Storage Sink (beta):

Improve suppport for handling GCP naming conventions

6.0.1

AWS S3 / Azure Datalake (Beta) / GCP Storage (Beta)

Removed check preventing nested paths being used in the sink.
Avoid cast exception in GCP Storage connector when using Credentials mode.

6.0.0

New Connectors introduced in 6.0.0:

Azure Datalake Sink Connector (Beta)
GCP Storage Sink Connector (Beta)

Deprecated Connectors removed in 6.0.0:

Kudu
Hazelcast
Hbase
Hive

All Connectors:

Standardising package names. Connector class names and converters will need to be renamed in configuration.
Some clean up of unused dependencies.
Introducing cloud-common module to share code between cloud connectors.
Cloud sinks (AWS S3, Azure Data Lake and GCP Storage) now support BigDecimal and handle nullable keys.

5.0.1

New Connectors introduced in 5.0.1:

Consumer Group Offsets S3 Sink Connector

AWS S3 Connector - S3 Source & Sink

Enhancement: BigDecimal Support

Redis Sink Connector

Bug fix: Redis does not initialise the ErrorHandler

5.0.0

All Connectors

Test Fixes and E2E Test Clean-up: Improved testing with bug fixes and end-to-end test clean-up.
Code Optimization: Removed unused code and converted Java code and tests to Scala for enhanced stability and maintainability.
Ascii Art Loading Fix: Resolved issues related to ASCII art loading.
Build System Updates: Implemented build system updates and improvements.

AWS S3 Connector - S3 Source & Sink

The source and sink has been the focus of this release.

Full message backup. The S3 sink and source now supports full message backup. This is enabled by adding in the KCQL PROPERTIES('store.envelope'=true)
Removed Bytes_*** storage format. For those users leveraging them there is a migration information below. Storing raw Kafka message the storage format should be AVRO/PARQUET/JSON(less ideal).
Introduced support for BYTES storing single message as raw binary. Typically, storing images or videos are the use case for this. This is enabled by adding in the KCQL STOREAS BYTES

Sink

Enhanced PARTITIONBY Support: expanded support for PARTITIONBY fields, now accommodating fields containing dots. For instance, you can use PARTITIONBY a, `field1.field2` for enhanced partitioning control.
Advanced Padding Strategy: a more advanced padding strategy configuration. By default, padding is now enforced, significantly improving compatibility with S3 Source.
Improved Error Messaging: Enhancements have been made to error messaging, providing clearer guidance, especially in scenarios with misaligned topic configurations (#978).

Source

Expanded Text Reader Support: new text readers to enhance data processing flexibility, including:
- Regex-Driven Text Reader: Allows parsing based on regular expressions.
- Multi-Line Text Reader: Handles multi-line data.
- Start-End Tag Text Reader: Processes data enclosed by start and end tags, suitable for XML content.

Kudu and Hive

The Kudu and Hive connectors are now deprecated and will be removed in a future release.

InfluxDB

Fixed a memory issue with the InfluxDB writer.
Upgraded to Influxdb2 client (note: doesn’t yet support Influxdb2 connections).

S3 upgrade notes

Upgrading from 5.0.0 (preview) to 5.0.0

For installations that have been using the preview version of the S3 connector and are upgrading to the release, there are a few important considerations:

Previously, default padding was enabled for both “offset” and “partition” values starting in June.

However, in version 5.0, the decision to apply default padding to the “offset” value only, leaving the " partition" value without padding. This change was made to enhance compatibility with querying in Athena.

If you have been using a build from the master branch since June, your connectors might have been configured with a different default padding setting.

To maintain consistency and ensure your existing connector configuration remains valid, you will need to use KCQL configuration properties to customize the padding fields accordingly.

Upgrading from 4.x to 5.0.0

Starting with version 5.0.0, the following configuration keys have been replaced.

Field

Old Property

New Property

**Upgrading from 4.1.* and 4.2.0**

In version 4.1, padding options were available but were not enabled by default. At that time, the default padding length, if not specified, was set to 8 characters.

However, starting from version 5.0, padding is now enabled by default, and the default padding length has been increased to 12 characters.

Enabling padding has a notable advantage: it ensures that the files written are fully compatible with the Lenses Stream Reactor S3 Source, enhancing interoperability and data integration.

Sinks created with 4.2.0 and 4.2.1 should retain the padding behaviour, and, therefore should disable padding:

If padding was enabled in 4.1, then the padding length should be specified in the KCQL statement:

Upgrading from 4.x to 5.0.0 only when `STOREAS Bytes_*` is used**

The Bytes_*** storage format has been removed. If you are using this storage format, you will need to install the 5.0.0-deprecated connector and upgrade the connector instances by changing the class name:

Source Before:

Source After:

Sink Before:

Sink After:

The deprecated connector won’t be developed any further and will be removed in a future release. If you want to talk to us about a migration plan, please get in touch with us at .

Upgrade a connector configuration

To migrate to the new configuration, please follow the following steps:

stop all running instances of the S3 connector
upgrade the connector to 5.0.0
update the configuration to use the new properties
resume the stopped connectors

4.2.0

All
- Ensure connector version is retained by connectors
- Lenses branding ASCII art updates
AWS S3 Sink Connector

4.1.0

Note From Version 4.10, AWS S3 Connectors will use the AWS Client by default. You can revert to the jClouds version by setting the connect.s3.aws.client property.

All
- Scala upgrade to 2.13.10
- Dependency upgrades
- Upgrade to Kafka 3.3.0

4.0.0

All
- Scala 2.13 Upgrade
- Gradle to SBT Migration
- Producing multiple artifacts supporting both Kafka 2.8 and Kafka 3.1.

3.0.1

All
- Replace Log4j with Logback to overcome CVE-2021-44228
- Bringing code from legacy dependencies inside of project
Cassandra Sink Connector

3.0.0

All
- Move to KCQL 2.8.9
- Change sys.errors to ConnectExceptions
- Additional testing with TestContainers

2.1.3

Move to connect-common 2.0.5 that adds complex type support to KCQL

2.1.2

AWS S3 Sink Connector
- Prevent null pointer exception in converters when maps are presented will null values
- Offset reader optimisation to reduce S3 load
- Ensuring that commit only occurs after the preconfigured time interval when using WITH_FLUSH_INTERVAL

2.1.0

AWS S3 Sink Connector
Elasticsearch 7 Support

2.0.1

Hive Source
- Rename option connect.hive.hive.metastore to connect.hive.metastore
- Rename option connect.hive.hive.metastore.uris to connect.hive.metastore.uris

2.0.0

Move to Scala 2.12
Move to Kafka 2.4.1 and Confluent 5.4
Deprecated:
- Druid Sink (not scala 2.12 compatible)

Stream Reactor

This page contains the release notes for the Stream Reactor.

11.1.0 DataLakes Sinks

11.0.0

This release upgrades the connectors to Apache Kafka 4.1. All connectors have been tested and verified against Kafka versions 4.0 and 4.1.

Compatibility Notice

Organizations currently running Kafka 3.x should plan to upgrade their Kafka infrastructure to version 4.0 or later before deploying these connectors in production.

Connector Retirement

Effective with version 11.0.0, the following sink connectors have been retired and are no longer available:

Redis Sink Connector
InfluxDB Sink Connector

For questions regarding migration paths or alternative connector options, please consult the product documentation or contact support.

10.0.3

Datalakes sinks (AWS S3, GCS and Azure DataLake Gen2)

10.0.2

Azure DataLake Gen 2

This patch release addresses two issues on the Azure DataLake sink

Fixed errors caused by malformed HTTP headers in ADLS Gen2 requests, such as:
Fixed failures when writing to nested directories in ADLS Gen2 with Hierarchical Namespace (HNS) enabled.

Thank you to and for their fixes.

10.0.1

DataLakes

This patch addresses several critical bugs and regressions introduced in version 9.0.0, affecting the Datalake connectors for S3, GCS, and Azure.

Connector Restart Logic: Fixed an issue that caused an error on restart if pending operations from a previous session had not completed.
S3 Connector: Corrected a bug where delete object requests incorrectly sent an empty versionId. This resolves API exceptions when using S3-compatible storage solutions, such as Pure Storage.
Addressed a regression where the connector name was omitted from the lock path for exactly-once processing. This fix prevents conflicts when multiple connectors read from the same Kafka topics and write to the same destination bucket.

10.0.0

New Apache Cassandra sink

This new connector is compatible with the any Apache Cassandra compatible databases from version 3+. The previous connector is deprecated and will be removed in a future release

Azure CosmosDB sink

Version 10.0.0 introduces breaking changes: the connector has been renamed from DocumentDB, uses the official CosmosDB SDK, and supports new bulk and key strategies.

Renamed from DocumentDB to CosmosDB across all code and documentation.
Updated package names to reflect the new CosmosDB naming.
Replaced the legacy DocumentDB client with the official CosmosDB Java SDK.
Introduced Bulk Processing Mode to enhance write throughput.

New Features and Improvements

In non-envelope mode, DataLakes connector sinks now skip tombstone records, preventing connector failures.
HTTP Sink: Enable Import of All Headers in the Message Template

9.0.2

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

This commit restores that functionality by ensuring the connect.xxxxxxx.indexes.name setting is properly respected during index file generation.

9.0.1

Google BigQuery Sink

The latest release of Stream Reactor includes a fork of Apache 2's Confluent BigQuery, originally sourced from WePay.

9.0.0

All Modules

Dependency upgrades.
Project now builds for Kafka 3.9.1.
Exclude Kafka dependencies and certain other things (eg log frameworks) from the final jar.

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

Improved Exactly Once Semantics by introducing a per-task lock mechanism to prevent duplicate writes when Kafka Connect inadvertently runs multiple tasks for the same topic-partition.
- Each task now creates a lock file per topic-partition (e.g., lockfile_topic_partition.yml) upon startup.
- The lock file is used to ensure only one task (the most recent) can write to the target bucket, based on object eTag validation.

MQTT Connector

Refactored MqttWriter to improve payload handling and support dynamic topics, with JSON conversion for structured data and removal of legacy converters and projections (#232).
Leading Slashes Fixes:
- Leading slash from MQTT topic should be removed when converting to Kafka topic target.

8.1.33

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

How It Works

Example Flush Behavior

Consider the following sequence of messages and their associated schemas:

How to enable it

Set the following config to true:

S3: connect.s3.latest.schema.optimization.enabled
GCP: connect.gcpstorage.latest.schema.optimization.enabled
Azure DataLake: connect.datalake.latest.schema.optimization.enabled

8.1.32

GCP Storage Sink

Reduces log entries, by moving the log entry confirming the object upload from INFO to DEBUG

8.1.31

DataLake Sinks

This release introduces improvements to Avro error handling, providing better diagnostics and insights into failures during data writing.

8.1.30

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

Resolved dependency issues causing sink failures with Confluent 7.6.0. Confluent 7.6.0 introduced new Schema Registry rule modules that were force-loaded, even when unused, leading to the following error:
This update ensures compatibility by adjusting dependencies, preventing unexpected failures in affected sinks.

HTTP Sink

Adjusting the following log line from INFO level to TRACE level

8.1.29

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

HTTP Sink

This feature offers fixed interval retries in addition to the Exponential option. To enable it, use the following configuration settings

A bug affecting the P99, P90, and P50 metrics has been resolved. Previously, failed HTTP requests were not included in the metrics calculation.

8.1.28

DataLake Sinks (S3, GCP Storage, Azure Data Lake)

Enable Skipping of Null Records

Added the following configuration property to prevent possible NullPointerException situations in the S3 Sink, Azure Datalake Sink, and GCP Storage Sink connectors:

S3 Sink: connect.s3.skip.null.values
GCP Storage Sink: connect.gcpstorage.skip.null.values
Azure Datalake Sink: connect.datalake.skip.null.values

When set to true, the sinks will skip null value records instead of processing them. Defaults to false.

If you expect null or tombstone records and are not using envelope mode, enabling this setting is recommended to avoid errors.

HTTP Sink

Fix MBean Registration Issue

8.1.27

Reverted: Change to LastModified Ordering with Post-Process Actions (introduced in 8.1.23)

8.1.26

Azure Service Bus source

Source watermark is not stored anymore since it is not used when the task restarts

GCP PubSub source

A fix was made to enable attaching the headers to the resulting Kafka message.

8.1.25

All Modules

Align Confluent lib dependency with Kafka 3.8.

AWS S3 Source

This adds a new boolean KCQL property to the S3 source: post.process.action.retain.dirs (default value: false)

If this is set to true, then upon moving/deleting files within the source post-processing, then first a zero-byte object will be created to ensure that the path will still be represented on S3.

8.1.24

HTTP

JMX Metrics Support

The HTTP Sink now publishes counts for success, request average among other metrics to JMX.

Extractor Fix

A bug was reported that referencing the entire record value yields an exception when the payload is a complex type. In this case a HashMap.

The changes allow the return of the value when the extractor has no field to extract (ie. extract the full value)

DataLake Sinks (S3, GCP Storage, Azure Data Lake) Optimised Schema Change

This PR revamps the mechanisms governing schema change rollover, as they can lead to errors. New functionality is introduced to ensure improved compatibility.

Removed Hadoop Shaded Dependency

The hadoop-shaded-protobuf dependency has been removed as it was surplus to requirements and pulling in an old version of protobuf-java which introduced vulnerabilities to the project.

Removed Properties for Schema Change Rollover:

New Property for Schema Change Detection:

The property $connectorPrefix.schema.change.detector has been introduced for the following connectors: connect.s3, connect.datalake, and connect.gcpstorage.

Packaging Changes

8.1.23

DataLakes (S3, GCP) source fixes

Polling Backoff

The connector incurs high costs when there is no data available in the buckets because it continuously polls the data lake in a tight loop, as controlled by Kafka Connect.

From this version by default a backoff queue is used, introducing a standard method for backing off calls to the underlying cloud platform.

Avoid filtering by lastSeenFile where a post process action is configured

When ordering by LastModified and a post-process action is configured, avoid filtering to the latest result.

Add a flag to populate kafka headers with the watermark partition/offset

This adds a connector property for GCP Storage and S3 Sources: connect.s3.source.write.watermark.header connect.gcpstorage.source.write.watermark.header

If set to true then the headers in the source record produced will include details of the source and line number of the file.

If set to false (the default) then the headers won't be set.

Currently this does not apply when using the envelope mode.

8.1.22

DataLakes (S3, GCP) source fixes

This release addresses two critical issues:

Corrupted connector state when DELETE/MOVE is used: The connector is designed to store the last processed document and its location within its state for every message sent to Kafka. This mechanism ensures that the connector can resume processing from the correct point in case of a restart. However, when the connector is configured with a post-operation to move or delete processed objects within the data lake, it stores the last processed object in its state. If the connector restarts and the referenced object has been moved or deleted externally, the state points to a non-existent object, causing the connector to fail. The current workaround requires manually cleaning the state and restarting the connector, which is inefficient and error-prone.
Incorrect Handling of Move Location Prefixes: When configuring the move location within the data lake, if the prefix ends with a forward slash (/), it results in malformed keys like a//b. Such incorrect paths can break compatibility with query engines like Athena, which may not handle double slashes properly.

8.1.21

Azure Service Bus source

Performance improvements in the source to handle a higher throughput. The code now leverages prefetch count, and disables the auto complete. The following connector configs were added

connect.servicebus.source.prefetch.count The number of messages to prefetch from ServiceBus
connect.servicebus.source.complete.retries.max The maximum number of retries to attempt while completing a message
connect.servicebus.source.complete.retries.min.backoff.ms The minimum duration in milliseconds for the first backoff

8.1.20

Datalakes extra logging

A NullPointerException is thrown and the sinks lose the stacktrace. It is a patch to enhance the logging

8.1.19

Prevent ElasticSearch from Skipping Records After Tombstone

8.1.18

🚀 New Features

Sequential Message Sending for Azure Service Bus

Introduced a new KCQL property: batch.enabled (default: true).
Users can now disable batching to send messages sequentially, addressing specific scenarios with large message sizes (e.g., >1 MB).
Why this matters: Batching improves performance but can fail for large messages. Sequential sending ensures reliability in such cases.

Post-Processing for Datalake Cloud Source Connectors

Added post-processing capabilities for AWS S3 and GCP Storage source connectors ( Azure Datalake Gen 2 support coming soon).
New KCQL properties:
- post.process.action: Defines the action (DELETE or MOVE) to perform on source files after successful processing.

Example 2: Move files to an archive bucket:

🛠 Dependency Updates

Updated Azure Service Bus Dependencies

azure-core updated to version 1.54.1.
azure-messaging-servicebus updated to version 7.17.6.

These updates ensure compatibility with the latest Azure SDKs and improve stability and performance.

Upgrade Notes

Review the new KCQL properties and configurations for Azure Service Bus and Datalake connectors.
Ensure compatibility with the updated Azure Service Bus dependencies if you use custom extensions or integrations.

Thank you to all contributors! 🎉

8.1.17

Improves ElasticSearch sinks by allowing the message key fields, or message headers to be used as part of the document primary key. Reference _key[.key_field_path or _header.header_name to set the document primary key
Fixes the data lake sinks error message on flush.interval

8.1.16

Improvements to the HTTP Sink

Queue Limiting: We've set a limit on the queue size per topic to reduce the chances of an Out-of-Memory (OOM) issue. Previously the queue was unbounded and in a scenario where the http calls are slow and the sink gets more records than it clears, it would eventually lead to OOM.
Offering Timeout: The offering to the queue now includes a timeout. If there are records to be offered, but the timeout is exceeded, a retriable exception is thrown. Depending on the connector's retry settings, the operation will be attempted again. This helps avoid situations where the sink gets stuck processing a slow or unresponsive batch.
Duplicate Record Handling: To prevent the same records from being added to the queue multiple times, we've introduced a Map[TopicPartition, Offset] to track the last processed offset for each topic-partition. This ensures that the sink does not attempt to process the same records repeatedly.

8.1.15

HTTP sink ignores null records. Avoids a NPE if custom SMTs send null records to the pipeline

8.1.14

Improvements to the HTTP Sink reporter

8.1.13

8.1.12

Datalake sinks allow an optimisation configuration to avoid rolling the file on schema change. To be used only when the schema changes are backwards compatible

Fixes for the HTTP sink batch.

8.1.11

HTTP sink improvements

8.1.10

Dependency version upgrades.

Azure Service Bus Source

Fixed timestamp of the messages and some of the fields in the payload to correctly use UTC millis.

8.1.9

Azure Service Bus Source

Changed contentType in the message to be nullable (Optional String).

8.1.8

HTTP Sink

Possible exception fix

Store success/failure response codes in http for the reporter.

8.1.7

HTTP Sink

Fixes possible exception in Reporter code.

8.1.6

HTTP Sink

Fixes and overall stability improvements.

8.1.4

Google Cloud Platform Storage

Remove overriding transport options on GCS client.

8.1.3

HTTP Sink

Bugfix: HTTP Kafka Reporter ClientID is not unique.

8.1.2

HTTP Sink & GCS Sink

Improvements for handing HTTP sink null values & fix for NPE on GCS.

8.1.1

Removal of shading for Avro and Confluent jars.

8.1.0

HTTP Sink

Addition of HTTP reporter functionally to route HTTP success and errors to a configurable topic.

8.0.0

HTTP Sink

Breaking Configuration Change

Configuration for the HTTP Sink changes from Json to standard Kafka Connect properties. Please study the documentation if upgrading to 8.0.0.

Removed support for JSON configuration format. Instead please now configure your HTTP Sink connector using kafka connect properties. .
Introduced SSL Configuration. .
Introduced OAuth Support. .

7.4.5

ElasticSearch (6 and 7)

Support for dynamic index names in KCQL.
Configurable tombstone behaviour using KCQL property. behavior.on.null.values
SSL Support using standard Kafka Connect properties.

Http Sink

Adjust defaultUploadSyncPeriod to make connector more performant by default.

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

Fix for issue causing sink to fail with NullPointerException due to invalid offset.

7.4.4

Azure Datalake & GCP Storage

Dependency version upgrades

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

Fixes a gap in the avro/parquet storage where enums where converted from Connect enums to string.
Adds support for explicit "no partition" specification to kcql, to enable topics to be written in the bucket and prefix without partitioning the data.
- Syntax Example: INSERT INTO foo SELECT * FROM bar NOPARTITION

7.4.3

All Connectors

Dependency version upgrades

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

S3 Sink Connector: connect.s3.exactly.once.enable Data Lake Sink Connector: connect.datalake.exactly.once.enable GCP Storage Sink Connector: connect.gcpstorage.exactly.once.enable

Default Value: true

7.4.2

All Connectors

Dependency version upgrades

AWS S3 Source and GCP Storage Source Connectors

File Extension Filtering

Introduced new properties to allow users to filter the files to be processed based on their extensions.

For AWS S3 Source Connector:

connect.s3.source.extension.excludes: A comma-separated list of file extensions to exclude from the source file search. If this property is not configured, all files are considered.
connect.s3.source.extension.includes: A comma-separated list of file extensions to include in the source file search. If this property is not configured, all files are considered.

For GCP Storage Source Connector:

connect.gcpstorage.source.extension.excludes: A comma-separated list of file extensions to exclude from the source file search. If this property is not configured, all files are considered.
connect.gcpstorage.source.extension.includes: A comma-separated list of file extensions to include in the source file search. If this property is not configured, all files are considered.

These properties provide more control over the files that the AWS S3 Source Connector and GCP Storage Source Connector process, improving efficiency and flexibility.

GCP Storage Source and Sink Connectors

Increases the default HTTP Retry timeout from 250ms total timeout to 3 minutes as default. The default consumer group max.poll.timeout is 5 minutes, so it’s within the boundaries to avoid a group rebalance.

7.4.1

GCP Storage Connector

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

Making indexes directory configurable for both source and sink.

Sinks

Use the below properties to customise the indexes root directory:

connect.datalake.indexes.name
connect.gcpstorage.indexes.name
connect.s3.indexes.name

See connector documentation for more information.

Sources:

Use the below properties to exclude custom directories from being considered by the source.
connect.datalake.source.partition.search.excludes
connect.gcpstorage.source.partition.search.excludes
connect.s3.source.partition.search.excludes

7.4.0

NEW: Azure Service Bus Sink Connector

Azure Service Bus Sink Connector.

Other Changes

Exception / Either Tweaks
Bump java dependencies (assertJ and azure-core)
Add logging on flushing
Fix: Java class java.util.Date support in cloud sink maps

Full Changelog:

7.3.2

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

Support added for writing Date, Time and Timestamp data types to AVRO.

7.3.1

GCP Storage Connector

Invalid Protocol Configuration Fix

7.3.0

NEW: Azure Service Bus Source Connector

Azure Service Bus Source Connector

NEW: GCP Pub/Sub Source Connector

GCP Pub/Sub Source Connector

Data Lake Sinks (AWS, Azure Datalake and GCP Storage)

To back the topics up the KCQL statement is

When * is used the envelope setting is ignored.

This change allows for the * to be taken into account as a default if the given message topic is not found.

HTTP Sink

Bug fix to ensure that, if specified as part of the template, the Content-Type header is correctly populated.

All Connectors:

Update of dependencies.

Upgrading from any version prior to 7.0.0, please see the release and upgrade notes for 7.0.0.

7.2.0

Enhancements

Automated Skip for Archived Objects:

The S3 source now seamlessly bypasses archived objects, including those stored in Glacier and Deep Archive. This enhancement improves efficiency by automatically excluding archived data from processing, avoiding the connector crashing otherwise

Enhanced Key Storage in Envelope Mode:

Changes have been implemented to the stored key when using envelope mode. These modifications lay the groundwork for future functionality, enabling seamless replay of Kafka data stored in data lakes (S3, GCS, Azure Data Lake) from any specified point in time.

Full Changelog:

7.1.0

Source Line-Start-End Functionality Enhancements

Upgrading from any version prior to 7.0.0, please see the release and upgrade notes for 7.0.0.

7.0.0

Data-lakes Sink Connectors

Key Changes

New KCQL Syntax: The data-lakes sink connectors now embrace the new KCQL syntax, offering users enhanced capabilities while addressing previous syntax constraints.
Data Lakes Sink Partition Name: This update ensures accurate preservation of partition names by avoiding the scraping of characters like \ and /. Consequently, SMTs can provide partition names as expected, leading to reduced configuration overhead and increased conciseness.

KCQL Keywords Replaced

Several keywords have been replaced with entries in the PROPERTIES section for improved clarity and consistency:

WITHPARTITIONER: Replaced by PROPERTIES ('partition.include.keys'=true/false). When WITHPARTITIONER KeysAndValue is set to true, the partition keys are included in the partition path. Otherwise, only the partition values are included.
WITH_FLUSH_SIZE: Replaced by PROPERTIES ('flush.size'=$VALUE).

Benefits

Upgrades

To upgrade to the new version, users must follow these steps:

Stop the sink connectors.
Update the connector to version 7.0.0.
Edit the sink connectors’ KCQL setting to translate to the new syntax and PROPERTIES entries.
Restart the sink connectors.

6.3.1

6.3.0

NEW: HTTP Sink Connector

HTTP Sink Connector (Beta)

NEW: Azure Event Hubs Source Connector

Azure Event Hubs Source Connector.

All Connectors:

Update of dependencies, CVEs addressed.

Please note that the Elasticsearch 6 connector will be deprecated in the next major release.

6.2.0

NEW: GCP Storage Source Connector

GCP Storage Source Connector (Beta)

AWS Source Connector

Important: The AWS Source Partition Search properties have changed for consistency of configuration. The properties that have changed for 6.2.0 are:
- connect.s3.partition.search.interval changes to connect.s3.source.partition.search.interval.
- connect.s3.partition.search.continuous changes to connect.s3.source.partition.search.continuous

JMS Source and Sink Connector

Jakarta Dependency Migration: Switch to Jakarta EE dependencies in line with industry standards to ensure evolution under the Eclipse Foundation.

All Connectors:

There has been some small tidy up of dependencies, restructuring and removal of unused code, and a number of connectors have a slimmer file size without losing any functionality.
The configuration directory has been removed as these examples are not kept up-to-date. Please see the connector documentation instead.
Dependency upgrades.

6.1.0

All Connectors:

In this release, all connectors have been updated to address an issue related to conflicting Antlr jars that may arise in specific environments.

AWS S3 Source:

Byte Array Support: Resolved an issue where storing the Key/Value as an array of bytes caused compatibility problems due to the connector returning java.nio.ByteBuffer while the Connect framework’s ByteArrayConverter only works with byte[]. This update ensures seamless conversion to byte[] if the key/value is a ByteBuffer.

JMS Sink:

Fix for NullPointerException: Addressed an issue where the JMS sink connector encountered a NullPointerException when processing a message with a null JMSReplyTo header value.

JMS Source:

Fix for DataException: Resolved an issue where the JMS source connector encountered a DataException when processing a message with a JMSReplyTo header set to a queue.

AWS S3 Sink/GCP Storage Sink (beta)/Azure Datalake Sink (beta):

GZIP Support for JSON Writing: Added support for GZIP compression when writing JSON data to AWS S3, GCP Storage, and Azure Datalake sinks.

6.0.2

GCP Storage Sink (beta):

Improve suppport for handling GCP naming conventions

6.0.1

AWS S3 / Azure Datalake (Beta) / GCP Storage (Beta)

Removed check preventing nested paths being used in the sink.
Avoid cast exception in GCP Storage connector when using Credentials mode.

6.0.0

New Connectors introduced in 6.0.0:

Azure Datalake Sink Connector (Beta)
GCP Storage Sink Connector (Beta)

Deprecated Connectors removed in 6.0.0:

Kudu
Hazelcast
Hbase
Hive

All Connectors:

Standardising package names. Connector class names and converters will need to be renamed in configuration.
Some clean up of unused dependencies.
Introducing cloud-common module to share code between cloud connectors.
Cloud sinks (AWS S3, Azure Data Lake and GCP Storage) now support BigDecimal and handle nullable keys.

5.0.1

New Connectors introduced in 5.0.1:

Consumer Group Offsets S3 Sink Connector

AWS S3 Connector - S3 Source & Sink

Enhancement: BigDecimal Support

Redis Sink Connector

Bug fix: Redis does not initialise the ErrorHandler

5.0.0

All Connectors

Test Fixes and E2E Test Clean-up: Improved testing with bug fixes and end-to-end test clean-up.
Code Optimization: Removed unused code and converted Java code and tests to Scala for enhanced stability and maintainability.
Ascii Art Loading Fix: Resolved issues related to ASCII art loading.
Build System Updates: Implemented build system updates and improvements.

AWS S3 Connector - S3 Source & Sink

The source and sink has been the focus of this release.

Full message backup. The S3 sink and source now supports full message backup. This is enabled by adding in the KCQL PROPERTIES('store.envelope'=true)
Removed Bytes_*** storage format. For those users leveraging them there is a migration information below. Storing raw Kafka message the storage format should be AVRO/PARQUET/JSON(less ideal).
Introduced support for BYTES storing single message as raw binary. Typically, storing images or videos are the use case for this. This is enabled by adding in the KCQL STOREAS BYTES

Sink

Enhanced PARTITIONBY Support: expanded support for PARTITIONBY fields, now accommodating fields containing dots. For instance, you can use PARTITIONBY a, `field1.field2` for enhanced partitioning control.
Advanced Padding Strategy: a more advanced padding strategy configuration. By default, padding is now enforced, significantly improving compatibility with S3 Source.
Improved Error Messaging: Enhancements have been made to error messaging, providing clearer guidance, especially in scenarios with misaligned topic configurations (#978).

Source

Expanded Text Reader Support: new text readers to enhance data processing flexibility, including:
- Regex-Driven Text Reader: Allows parsing based on regular expressions.
- Multi-Line Text Reader: Handles multi-line data.
- Start-End Tag Text Reader: Processes data enclosed by start and end tags, suitable for XML content.

Kudu and Hive

The Kudu and Hive connectors are now deprecated and will be removed in a future release.

InfluxDB

Fixed a memory issue with the InfluxDB writer.
Upgraded to Influxdb2 client (note: doesn’t yet support Influxdb2 connections).

S3 upgrade notes

Upgrading from 5.0.0 (preview) to 5.0.0

For installations that have been using the preview version of the S3 connector and are upgrading to the release, there are a few important considerations:

Previously, default padding was enabled for both “offset” and “partition” values starting in June.

If you have been using a build from the master branch since June, your connectors might have been configured with a different default padding setting.

To maintain consistency and ensure your existing connector configuration remains valid, you will need to use KCQL configuration properties to customize the padding fields accordingly.

Upgrading from 4.x to 5.0.0

Starting with version 5.0.0, the following configuration keys have been replaced.

Field

Old Property

New Property

**Upgrading from 4.1.* and 4.2.0**

In version 4.1, padding options were available but were not enabled by default. At that time, the default padding length, if not specified, was set to 8 characters.

However, starting from version 5.0, padding is now enabled by default, and the default padding length has been increased to 12 characters.

Enabling padding has a notable advantage: it ensures that the files written are fully compatible with the Lenses Stream Reactor S3 Source, enhancing interoperability and data integration.

Sinks created with 4.2.0 and 4.2.1 should retain the padding behaviour, and, therefore should disable padding:

If padding was enabled in 4.1, then the padding length should be specified in the KCQL statement:

Upgrading from 4.x to 5.0.0 only when `STOREAS Bytes_*` is used**

Source Before:

Source After:

Sink Before:

Sink After:

The deprecated connector won’t be developed any further and will be removed in a future release. If you want to talk to us about a migration plan, please get in touch with us at .

Upgrade a connector configuration

To migrate to the new configuration, please follow the following steps:

stop all running instances of the S3 connector
upgrade the connector to 5.0.0
update the configuration to use the new properties
resume the stopped connectors

4.2.0

All
- Ensure connector version is retained by connectors
- Lenses branding ASCII art updates
AWS S3 Sink Connector

4.1.0

Note From Version 4.10, AWS S3 Connectors will use the AWS Client by default. You can revert to the jClouds version by setting the connect.s3.aws.client property.

All
- Scala upgrade to 2.13.10
- Dependency upgrades
- Upgrade to Kafka 3.3.0

4.0.0

All
- Scala 2.13 Upgrade
- Gradle to SBT Migration
- Producing multiple artifacts supporting both Kafka 2.8 and Kafka 3.1.

3.0.1

All
- Replace Log4j with Logback to overcome CVE-2021-44228
- Bringing code from legacy dependencies inside of project
Cassandra Sink Connector

3.0.0

All
- Move to KCQL 2.8.9
- Change sys.errors to ConnectExceptions
- Additional testing with TestContainers

2.1.3

Move to connect-common 2.0.5 that adds complex type support to KCQL

2.1.2

AWS S3 Sink Connector
- Prevent null pointer exception in converters when maps are presented will null values
- Offset reader optimisation to reduce S3 load
- Ensuring that commit only occurs after the preconfigured time interval when using WITH_FLUSH_INTERVAL

2.1.0

AWS S3 Sink Connector
Elasticsearch 7 Support

2.0.1

Hive Source
- Rename option connect.hive.hive.metastore to connect.hive.metastore
- Rename option connect.hive.hive.metastore.uris to connect.hive.metastore.uris

2.0.0

Move to Scala 2.12
Move to Kafka 2.4.1 and Confluent 5.4
Deprecated:
- Druid Sink (not scala 2.12 compatible)