Stream Reactor
This page contains the release notes for the Stream Reactor.
8.1.23
DataLakes (S3, GCP) source fixes
Polling Backoff
The connector incurs high costs when there is no data available in the buckets because it continuously polls the data lake in a tight loop, as controlled by Kafka Connect.
From this version by default a backoff queue is used, introducing a standard method for backing off calls to the underlying cloud platform.
Avoid filtering by lastSeenFile where a post process action is configured
When ordering by LastModified
and a post-process action is configured, avoid filtering to the latest result.
This change avoids bugs caused by inconsistent LastModified
dates used for sorting. If LastModified
sorting is used, ensure objects do not arrive late, or use a post-processing step to handle them.
Add a flag to populate kafka headers with the watermark partition/offset
This adds a connector property for GCP Storage and S3 Sources:
connect.s3.source.write.watermark.header
connect.gcpstorage.source.write.watermark.header
If set to true
then the headers in the source record produced will include details of the source and line number of the file.
If set to false
(the default) then the headers won't be set.
Currently this does not apply when using the envelope mode.
8.1.22
DataLakes (S3, GCP) source fixes
This release addresses two critical issues:
Corrupted connector state when DELETE/MOVE is used: The connector is designed to store the last processed document and its location within its state for every message sent to Kafka. This mechanism ensures that the connector can resume processing from the correct point in case of a restart. However, when the connector is configured with a post-operation to move or delete processed objects within the data lake, it stores the last processed object in its state. If the connector restarts and the referenced object has been moved or deleted externally, the state points to a non-existent object, causing the connector to fail. The current workaround requires manually cleaning the state and restarting the connector, which is inefficient and error-prone.
Incorrect Handling of Move Location Prefixes: When configuring the move location within the data lake, if the prefix ends with a forward slash (/), it results in malformed keys like a//b. Such incorrect paths can break compatibility with query engines like Athena, which may not handle double slashes properly.
8.1.21
Azure Service Bus source
Performance improvements in the source to handle a higher throughput. The code now leverages prefetch count, and disables the auto complete. The following connector configs were added
connect.servicebus.source.prefetch.count
The number of messages to prefetch from ServiceBusconnect.servicebus.source.complete.retries.max
The maximum number of retries to attempt while completing a messageconnect.servicebus.source.complete.retries.min.backoff.ms
The minimum duration in milliseconds for the first backoffconnect.servicebus.source.sleep.on.empty.poll.ms
The duration in milliseconds to sleep when no records are returned from the poll. This avoids a tight loop in Connect.
8.1.20
Datalakes extra logging
A NullPointerException is thrown and the sinks lose the stacktrace. It is a patch to enhance the logging
8.1.19
Prevent ElasticSearch from Skipping Records After Tombstone
Addresses a critical bug in ElasticSearch versions 6 (ES6) and 7 (ES7) where records following a tombstone are inadvertently skipped during the insertion process. The issue stemmed from an erroneous return statement that halted the processing of subsequent records.
8.1.18
🚀 New Features
Sequential Message Sending for Azure Service Bus
Introduced a new KCQL property:
batch.enabled
(default:true
).Users can now disable batching to send messages sequentially, addressing specific scenarios with large message sizes (e.g., >1 MB).
Why this matters: Batching improves performance but can fail for large messages. Sequential sending ensures reliability in such cases.
How to use: Configure
batch.enabled=false
in the KCQL mapping to enable sequential sending.
Post-Processing for Datalake Cloud Source Connectors
Added post-processing capabilities for AWS S3 and GCP Storage source connectors ( Azure Datalake Gen 2 support coming soon).
New KCQL properties:
post.process.action
: Defines the action (DELETE
orMOVE
) to perform on source files after successful processing.post.process.action.bucket
: Specifies the target bucket for theMOVE
action (required forMOVE
).post.process.action.prefix
: Specifies a new prefix for the file’s location when using theMOVE
action (required forMOVE
).
Use cases:
Free up storage space by deleting files.
Archive or organize processed files by moving them to a new location.
Example 1 : Delete Files:
Example 2: Move files to an archive bucket:
🛠 Dependency Updates
Updated Azure Service Bus Dependencies
azure-core
updated to version 1.54.1.azure-messaging-servicebus
updated to version 7.17.6.
These updates ensure compatibility with the latest Azure SDKs and improve stability and performance.
Upgrade Notes
Review the new KCQL properties and configurations for Azure Service Bus and Datalake connectors.
Ensure compatibility with the updated Azure Service Bus dependencies if you use custom extensions or integrations.
Thank you to all contributors! 🎉
8.1.17
Improves ElasticSearch sinks by allowing the message key fields, or message headers to be used as part of the document primary key. Reference
_key[.key_field_path
or_header.header_name to set the document primary key
Fixes the data lake sinks error message on
flush.interval
8.1.16
Improvements to the HTTP Sink
Queue Limiting: We've set a limit on the queue size per topic to reduce the chances of an Out-of-Memory (OOM) issue. Previously the queue was unbounded and in a scenario where the http calls are slow and the sink gets more records than it clears, it would eventually lead to OOM.
Offering Timeout: The offering to the queue now includes a timeout. If there are records to be offered, but the timeout is exceeded, a retriable exception is thrown. Depending on the connector's retry settings, the operation will be attempted again. This helps avoid situations where the sink gets stuck processing a slow or unresponsive batch.
Duplicate Record Handling: To prevent the same records from being added to the queue multiple times, we've introduced a
Map[TopicPartition, Offset]
to track the last processed offset for each topic-partition. This ensures that the sink does not attempt to process the same records repeatedly.Batch Failure Handling: The changes also address a situation where an HTTP call fails due to a specific input, but the batch is not removed from the queue. This could have led to the batch being retried indefinitely, which is now prevented.
8.1.15
HTTP sink ignores null records. Avoids a NPE if custom SMTs send null records to the pipeline
8.1.14
Improvements to the HTTP Sink reporter
8.1.13
8.1.12
Datalake sinks allow an optimisation configuration to avoid rolling the file on schema change. To be used only when the schema changes are backwards compatible
Fixes for the HTTP sink batch.
8.1.11
HTTP sink improvements
HTTP sink improvements
8.1.10
Dependency version upgrades.
Azure Service Bus Source
Fixed timestamp
of the messages and some of the fields in the payload to correctly use UTC millis.
8.1.9
Azure Service Bus Source
Changed contentType
in the message to be nullable (Optional String).
8.1.8
HTTP Sink
Possible exception fix
Store success/failure response codes in http for the reporter.
8.1.7
HTTP Sink
Fixes possible exception in Reporter code.
8.1.6
HTTP Sink
Fixes and overall stability improvements.
8.1.4
Google Cloud Platform Storage
Remove overriding transport options on GCS client.
8.1.3
HTTP Sink
Bugfix: HTTP Kafka Reporter ClientID is not unique.
8.1.2
HTTP Sink & GCS Sink
Improvements for handing HTTP sink null values & fix for NPE on GCS.
8.1.1
Removal of shading for Avro and Confluent jars.
8.1.0
HTTP Sink
Addition of HTTP reporter functionally to route HTTP success and errors to a configurable topic.
8.0.0
HTTP Sink
Breaking Configuration Change
Configuration for the HTTP Sink changes from Json to standard Kafka Connect properties. Please study the documentation if upgrading to 8.0.0.
Removed support for JSON configuration format. Instead please now configure your HTTP Sink connector using kafka connect properties. HTTP Sink Configuration.
Introduced SSL Configuration. SSL Configuration.
Introduced OAuth Support. OAuth Configuration.
7.4.5
ElasticSearch (6 and 7)
Support for dynamic index names in KCQL. more info
Configurable tombstone behaviour using KCQL property.
behavior.on.null.values
more infoSSL Support using standard Kafka Connect properties. more info
Http Sink
Adjust
defaultUploadSyncPeriod
to make connector more performant by default.
Data Lake Sinks (AWS, Azure Datalake and GCP Storage)
Fix for issue causing sink to fail with NullPointerException due to invalid offset.
7.4.4
Azure Datalake & GCP Storage
Dependency version upgrades
Data Lake Sinks (AWS, Azure Datalake and GCP Storage)
Fixes a gap in the avro/parquet storage where enums where converted from Connect enums to string.
Adds support for explicit "no partition" specification to kcql, to enable topics to be written in the bucket and prefix without partitioning the data.
Syntax Example:
INSERT INTO foo SELECT * FROM bar NOPARTITION
7.4.3
All Connectors
Dependency version upgrades
Data Lake Sinks (AWS, Azure Datalake and GCP Storage)
This release introduces a new configuration option for three Kafka Connect Sink Connectors—S3, Data Lake, and GCP Storage—allowing users to disable exactly-once semantics. By default, exactly once is enabled, but with this update, users can choose to disable it, opting instead for Kafka Connect’s native at-least-once offset management.
S3 Sink Connector: connect.s3.exactly.once.enable
Data Lake Sink Connector: connect.datalake.exactly.once.enable
GCP Storage Sink Connector: connect.gcpstorage.exactly.once.enable
Default Value: true
Indexing is enabled by default to maintain exactly-once semantics. This involves creating an .indexes directory at the root of your storage bucket, with subdirectories dedicated to tracking offsets, ensuring that records are not duplicated.
Users can now disable indexing by setting the relevant property to false
. When disabled, the connector will utilise Kafka Connect’s built-in offset management, which provides at-least-once semantics instead of exactly-once.
7.4.2
All Connectors
Dependency version upgrades
AWS S3 Source and GCP Storage Source Connectors
File Extension Filtering
Introduced new properties to allow users to filter the files to be processed based on their extensions.
For AWS S3 Source Connector:
connect.s3.source.extension.excludes
: A comma-separated list of file extensions to exclude from the source file search. If this property is not configured, all files are considered.connect.s3.source.extension.includes
: A comma-separated list of file extensions to include in the source file search. If this property is not configured, all files are considered.
For GCP Storage Source Connector:
connect.gcpstorage.source.extension.excludes
: A comma-separated list of file extensions to exclude from the source file search. If this property is not configured, all files are considered.connect.gcpstorage.source.extension.includes
: A comma-separated list of file extensions to include in the source file search. If this property is not configured, all files are considered.
These properties provide more control over the files that the AWS S3 Source Connector and GCP Storage Source Connector process, improving efficiency and flexibility.
GCP Storage Source and Sink Connectors
Increases the default HTTP Retry timeout from 250ms total timeout to 3 minutes as default. The default consumer group
max.poll.timeout
is 5 minutes, so it’s within the boundaries to avoid a group rebalance.
7.4.1
GCP Storage Connector
Adding retry delay multiplier as a configurable parameter (with default value) to Google Cloud Storage Connector. Main changes revolve around RetryConfig class and its translation to gax HTTP client config.
Data Lake Sinks (AWS, Azure Datalake and GCP Storage)
Making indexes directory configurable for both source and sink.
Sinks
Use the below properties to customise the indexes root directory:
connect.datalake.indexes.name
connect.gcpstorage.indexes.name
connect.s3.indexes.name
See connector documentation for more information.
Sources:
Use the below properties to exclude custom directories from being considered by the source.
connect.datalake.source.partition.search.excludes
connect.gcpstorage.source.partition.search.excludes
connect.s3.source.partition.search.excludes
7.4.0
NEW: Azure Service Bus Sink Connector
Azure Service Bus Sink Connector.
Other Changes
Exception / Either Tweaks
Bump java dependencies (assertJ and azure-core)
Add logging on flushing
Fix: Java class java.util.Date support in cloud sink maps
Full Changelog: https://github.com/lensesio/stream-reactor/compare/7.3.2...7.4.0
7.3.2
Data Lake Sinks (AWS, Azure Datalake and GCP Storage)
Support added for writing Date, Time and Timestamp data types to AVRO.
7.3.1
GCP Storage Connector
Invalid Protocol Configuration Fix
7.3.0
NEW: Azure Service Bus Source Connector
Azure Service Bus Source Connector Find out more.
NEW: GCP Pub/Sub Source Connector
GCP Pub/Sub Source Connector Find out more.
Data Lake Sinks (AWS, Azure Datalake and GCP Storage)
To back the topics up the KCQL statement is
When *
is used the envelope setting is ignored.
This change allows for the *
to be taken into account as a default if the given message topic is not found.
HTTP Sink
Bug fix to ensure that, if specified as part of the template, the Content-Type header is correctly populated.
All Connectors:
Update of dependencies.
Upgrading from any version prior to 7.0.0, please see the release and upgrade notes for 7.0.0.
7.2.0
Enhancements
Automated Skip for Archived Objects:
The S3 source now seamlessly bypasses archived objects, including those stored in Glacier and Deep Archive. This enhancement improves efficiency by automatically excluding archived data from processing, avoiding the connector crashing otherwise
Enhanced Key Storage in Envelope Mode:
Changes have been implemented to the stored key when using envelope mode. These modifications lay the groundwork for future functionality, enabling seamless replay of Kafka data stored in data lakes (S3, GCS, Azure Data Lake) from any specified point in time.
Full Changelog: https://github.com/lensesio/stream-reactor/compare/7.1.0...7.2.0
7.1.0
Source Line-Start-End Functionality Enhancements
We’ve rolled out enhancements to tackle a common challenge faced by users of the S3 source functionality. Previously, when an external producer abruptly terminated a file without marking the end message, data loss occurred.
To address this, we’ve introduced a new feature: a property entry for KCQL to signal the handling of unterminated messages. Meet the latest addition, read.text.last.end.line.missing. When set to true, this property ensures that in-flight data is still recognized as a message even when EOF is reached but the end line marker is missing.
Upgrading from any version prior to 7.0.0, please see the release and upgrade notes for 7.0.0.
7.0.0
Data-lakes Sink Connectors
This release brings substantial enhancements to the data-lakes sink connectors, elevating their functionality and flexibility. The focal point of these changes is the adoption of the new KCQL syntax, designed to improve usability and resolve limitations inherent in the previous syntax.
Key Changes
New KCQL Syntax: The data-lakes sink connectors now embrace the new KCQL syntax, offering users enhanced capabilities while addressing previous syntax constraints.
Data Lakes Sink Partition Name: This update ensures accurate preservation of partition names by avoiding the scraping of characters like
\
and/
. Consequently, SMTs can provide partition names as expected, leading to reduced configuration overhead and increased conciseness.
KCQL Keywords Replaced
Several keywords have been replaced with entries in the PROPERTIES section for improved clarity and consistency:
WITHPARTITIONER: Replaced by
PROPERTIES ('partition.include.keys'=true/false)
. WhenWITHPARTITIONER KeysAndValue
is set totrue
, the partition keys are included in the partition path. Otherwise, only the partition values are included.WITH_FLUSH_SIZE: Replaced by
PROPERTIES ('flush.size'=$VALUE)
.WITH_FLUSH_COUNT: Replaced by
PROPERTIES ('flush.count'=$VALUE)
.WITH_FLUSH_INTERVAL: Replaced by
PROPERTIES ('flush.interval'=$VALUE)
.
Benefits
The adoption of the new KCQL syntax enhances the flexibility of the data-lakes sink connectors, empowering users to tailor configurations more precisely to their requirements. By transitioning keywords to entries in the PROPERTIES section, potential misconfigurations stemming from keyword order discrepancies are mitigated, ensuring configurations are applied as intended.
Upgrades
Please note that the upgrades to the data-lakes sink connectors are not backward compatible with existing configurations. Users are required to update their configurations to align with the new KCQL syntax and PROPERTIES entries. This upgrade is necessary for any instances of the sink connector (S3, Azure, GCP) set up before version 7.0.0.
To upgrade to the new version, users must follow these steps:
Stop the sink connectors.
Update the connector to version 7.0.0.
Edit the sink connectors’ KCQL setting to translate to the new syntax and PROPERTIES entries.
Restart the sink connectors.
6.3.1
This update specifically affects datalake sinks employing the JSON storage format. It serves as a remedy for users who have resorted to a less-than-ideal workaround: employing a Single Message Transform (SMT) to return a Plain Old Java Object (POJO) to the sink. In such cases, instead of utilizing the Connect JsonConverter to seamlessly translate the payload to JSON, reliance is placed solely on Jackson.
However, it’s crucial to note that this adjustment is not indicative of a broader direction for future expansions. This is because relying on such SMT practices does not ensure an agnostic solution for storage formats (such as Avro, Parquet, or JSON).
6.3.0
NEW: HTTP Sink Connector
HTTP Sink Connector (Beta)
NEW: Azure Event Hubs Source Connector
Azure Event Hubs Source Connector.
All Connectors:
Update of dependencies, CVEs addressed.
Please note that the Elasticsearch 6 connector will be deprecated in the next major release.
6.2.0
NEW: GCP Storage Source Connector
GCP Storage Source Connector (Beta) Find out more.
AWS Source Connector
Important: The AWS Source Partition Search properties have changed for consistency of configuration. The properties that have changed for 6.2.0 are:
connect.s3.partition.search.interval
changes toconnect.s3.source.partition.search.interval
.connect.s3.partition.search.continuous
changes toconnect.s3.source.partition.search.continuous
.connect.s3.partition.search.recurse.levels
changes toconnect.s3.source.partition.search.recurse.levels
.
If you use any of these properties, when you upgrade to the new version then your source will halt and the log will display an error message prompting you to adjust these properties. Be sure to update these properties in your configuration to enable the new version to run.
Dependency upgrade of Hadoop libraries version to mitigate against CVE-2022-25168.
JMS Source and Sink Connector
Jakarta Dependency Migration: Switch to Jakarta EE dependencies in line with industry standards to ensure evolution under the Eclipse Foundation.
All Connectors:
There has been some small tidy up of dependencies, restructuring and removal of unused code, and a number of connectors have a slimmer file size without losing any functionality.
The configuration directory has been removed as these examples are not kept up-to-date. Please see the connector documentation instead.
Dependency upgrades.
6.1.0
All Connectors:
In this release, all connectors have been updated to address an issue related to conflicting Antlr jars that may arise in specific environments.
AWS S3 Source:
Byte Array Support: Resolved an issue where storing the Key/Value as an array of bytes caused compatibility problems due to the connector returning java.nio.ByteBuffer while the Connect framework’s ByteArrayConverter only works with byte[]. This update ensures seamless conversion to byte[] if the key/value is a ByteBuffer.
JMS Sink:
Fix for NullPointerException: Addressed an issue where the JMS sink connector encountered a NullPointerException when processing a message with a null JMSReplyTo header value.
JMS Source:
Fix for DataException: Resolved an issue where the JMS source connector encountered a DataException when processing a message with a JMSReplyTo header set to a queue.
AWS S3 Sink/GCP Storage Sink (beta)/Azure Datalake Sink (beta):
GZIP Support for JSON Writing: Added support for GZIP compression when writing JSON data to AWS S3, GCP Storage, and Azure Datalake sinks.
6.0.2
GCP Storage Sink (beta):
Improve suppport for handling GCP naming conventions
6.0.1
AWS S3 / Azure Datalake (Beta) / GCP Storage (Beta)
Removed check preventing nested paths being used in the sink.
Avoid cast exception in GCP Storage connector when using Credentials mode.
6.0.0
New Connectors introduced in 6.0.0:
Azure Datalake Sink Connector (Beta)
GCP Storage Sink Connector (Beta)
Deprecated Connectors removed in 6.0.0:
Kudu
Hazelcast
Hbase
Hive
Pulsar
All Connectors:
Standardising package names. Connector class names and converters will need to be renamed in configuration.
Some clean up of unused dependencies.
Introducing
cloud-common
module to share code between cloud connectors.Cloud sinks (AWS S3, Azure Data Lake and GCP Storage) now support BigDecimal and handle nullable keys.
5.0.1
New Connectors introduced in 5.0.1:
Consumer Group Offsets S3 Sink Connector
AWS S3 Connector - S3 Source & Sink
Enhancement: BigDecimal Support
Redis Sink Connector
Bug fix: Redis does not initialise the ErrorHandler
5.0.0
All Connectors
Test Fixes and E2E Test Clean-up: Improved testing with bug fixes and end-to-end test clean-up.
Code Optimization: Removed unused code and converted Java code and tests to Scala for enhanced stability and maintainability.
Ascii Art Loading Fix: Resolved issues related to ASCII art loading.
Build System Updates: Implemented build system updates and improvements.
Stream Reactor Integration: Integrated Kafka-connect-query-language inside of Stream Reactor for enhanced compatibility.
STOREAS Consistency: Ensured consistent handling of backticks with STOREAS.
AWS S3 Connector - S3 Source & Sink
The source and sink has been the focus of this release.
Full message backup. The S3 sink and source now supports full message backup. This is enabled by adding in the KCQL
PROPERTIES('store.envelope'=true)
Removed Bytes_*** storage format. For those users leveraging them there is a migration information below. Storing raw Kafka message the storage format should be AVRO/PARQUET/JSON(less ideal).
Introduced support for BYTES storing single message as raw binary. Typically, storing images or videos are the use case for this. This is enabled by adding in the KCQL
STOREAS BYTES
Introduced support for
PROPERTIES
to drive new settings required to drive the connectors’ behaviour. The KCQL looks like this:INSERT INTO ... SELECT ... FROM ... PROPERTIES(property=key, ...)
Sink
Enhanced PARTITIONBY Support: expanded support for PARTITIONBY fields, now accommodating fields containing dots. For instance, you can use
PARTITIONBY a, `field1.field2`
for enhanced partitioning control.Advanced Padding Strategy: a more advanced padding strategy configuration. By default, padding is now enforced, significantly improving compatibility with S3 Source.
Improved Error Messaging: Enhancements have been made to error messaging, providing clearer guidance, especially in scenarios with misaligned topic configurations (#978).
Commit Logging Refactoring: Refactored and simplified the CommitPolicy for more efficient commit logging (#964).
Comprehensive Testing: Added additional unit testing around configuration settings, removed redundancy from property names, and enhanced KCQL properties parsing to support Map structures.
Consolidated Naming Strategies: Merged naming strategies to reduce code complexity and ensure consistency. This effort ensures that both hierarchical and custom partition modes share similar code paths, addressing issues related to padding and the inclusion of keys and values within the partition name.
Optimized S3 API Calls: Switched from using deleteObjects to deleteObject for S3 API client calls (#957), enhancing performance and efficiency.
JClouds Removal: The update removes the use of JClouds, streamlining the codebase.
Legacy Offset Seek Removal: The release eliminates legacy offset seek operations, simplifying the code and enhancing overall efficiency
Source
Expanded Text Reader Support: new text readers to enhance data processing flexibility, including:
Regex-Driven Text Reader: Allows parsing based on regular expressions.
Multi-Line Text Reader: Handles multi-line data.
Start-End Tag Text Reader: Processes data enclosed by start and end tags, suitable for XML content.
Improved Parallelization: enhancements enable parallelization based on the number of connector tasks and available data partitions, optimizing data handling.
Data Consistency: Resolved data loss and duplication issues when the connector is restarted, ensuring reliable data transfer.
Dynamic Partition Discovery: No more need to restart the connector when new partitions are added; runtime partition discovery streamlines operations.
Efficient Storage Handling: The connector now ignores the .indexes directory, allowing data storage in an S3 bucket without a prefix.
Increased Default Records per Poll: the default limit on the number of records returned per poll was changed from 1024 to 10000, improving data retrieval efficiency and throughput.
Ordered File Processing: Added the ability to process files in date order. This feature is especially useful when S3 files lack lexicographical sorting, and S3 API optimisation cannot be leveraged. Please note that it involves reading and sorting files in memory.
Parquet INT96 Compatibility: The connector now allows Parquet INT96 to be read as a fixed array, preventing runtime failures.
Kudu and Hive
The Kudu and Hive connectors are now deprecated and will be removed in a future release.
InfluxDB
Fixed a memory issue with the InfluxDB writer.
Upgraded to Influxdb2 client (note: doesn’t yet support Influxdb2 connections).
S3 upgrade notes
Upgrading from 5.0.0 (preview) to 5.0.0
For installations that have been using the preview version of the S3 connector and are upgrading to the release, there are a few important considerations:
Previously, default padding was enabled for both “offset” and “partition” values starting in June.
However, in version 5.0, the decision to apply default padding to the “offset” value only, leaving the " partition" value without padding. This change was made to enhance compatibility with querying in Athena.
If you have been using a build from the master branch since June, your connectors might have been configured with a different default padding setting.
To maintain consistency and ensure your existing connector configuration remains valid, you will need to use KCQL configuration properties to customize the padding fields accordingly.
Upgrading from 4.x to 5.0.0
Starting with version 5.0.0, the following configuration keys have been replaced.
AWS Secret Key
aws.secret.key
connect.s3.aws.secret.key
Access Key
aws.access.key
connect.s3.aws.access.key
Auth Mode
aws.auth.mode
connect.s3.aws.auth.mode
Custom Endpoint
aws.custom.endpoint
connect.s3.custom.endpoint
VHost Bucket
aws.vhost.bucket
connect.s3.vhost.bucket
Upgrading from 4.1.* and 4.2.0
In version 4.1, padding options were available but were not enabled by default. At that time, the default padding length, if not specified, was set to 8 characters.
However, starting from version 5.0, padding is now enabled by default, and the default padding length has been increased to 12 characters.
Enabling padding has a notable advantage: it ensures that the files written are fully compatible with the Lenses Stream Reactor S3 Source, enhancing interoperability and data integration.
Sinks created with 4.2.0 and 4.2.1 should retain the padding behaviour, and, therefore should disable padding:
If padding was enabled in 4.1, then the padding length should be specified in the KCQL statement:
Upgrading from 4.x to 5.0.0 only when STOREAS Bytes_***
is used
STOREAS Bytes_***
is usedThe Bytes_*** storage format has been removed. If you are using this storage format, you will need to install the 5.0.0-deprecated connector and upgrade the connector instances by changing the class name:
Source Before:
Source After:
Sink Before:
Sink After:
The deprecated connector won’t be developed any further and will be removed in a future release. If you want to talk to us about a migration plan, please get in touch with us at sales@lenses.io.
Upgrade a connector configuration
To migrate to the new configuration, please follow the following steps:
stop all running instances of the S3 connector
upgrade the connector to 5.0.0
update the configuration to use the new properties
resume the stopped connectors
4.2.0
All
Ensure connector version is retained by connectors
Lenses branding ASCII art updates
AWS S3 Sink Connector
Improves the error in case the input on BytesFormatWriter is not binary
Support for ByteBuffer which may be presented by Connect as bytes
4.1.0
Note
From Version 4.10, AWS S3 Connectors will use the AWS Client by default. You can revert to the jClouds version by setting the connect.s3.aws.client
property.
All
Scala upgrade to 2.13.10
Dependency upgrades
Upgrade to Kafka 3.3.0
SimpleJsonConverter - Fixes mismatching schema error.
AWS S3 Sink Connector
Add connection pool config
Add Short type support
Support null values
Enabling Compression Codecs for Avro and Parquet
Switch to AWS client by default
Add option to add a padding when writing files, so that files can be restored in order by the source
Enable wildcard syntax to support multiple topics without additional configuration.
AWS S3 Source Connector
Add connection pool config
Retain partitions from filename or regex
Switch to AWS client by default
MQTT Source Connector
Allow toggling the skipping of MQTT Duplicates
MQTT Sink Connector
Functionality to ensure unique MQTT Client ID is used for MQTT sink
Elastic6 & Elastic7 Sink Connectors
Fixing issue with missing null values
4.0.0
All
Scala 2.13 Upgrade
Gradle to SBT Migration
Producing multiple artifacts supporting both Kafka 2.8 and Kafka 3.1.
Upgrade to newer dependencies to reduce CVE count
Switch e2e tests from Java to Scala.
AWS S3 Sink Connector
Optimal seek algorithm
Parquet data size flushing fixes.
Adding date partitioning capability
Adding switch to use official AWS library
Add AWS STS dependency to ensure correct operation when assuming roles with a web identity token.
Provide better debugging in case of exceptions.
FTP Source Connector
Fixes to slice mode support.
Hazelcast Sink Connector
Upgrade to HazelCast 4.2.4. The configuration model has changed and now uses clusters instead of username and password configuration.
Hive Sink Connector
Update of parquet functionality to ensure operation with Parquet 1.12.2.
Support for Hive 3.1.3.
JMS Connector
Enable protobuf support.
Pulsar
Upgrade to Pulsar 2.10 and associated refactor to support new client API.
3.0.1
All
Replace Log4j with Logback to overcome CVE-2021-44228
Bringing code from legacy dependencies inside of project
Cassandra Sink Connector
Ensuring the table name is logged on encountering an InvalidQueryException
HBase Sink Connector
Alleviate possible race condition
3.0.0
All
Move to KCQL 2.8.9
Change sys.errors to ConnectExceptions
Additional testing with TestContainers
Licence scan report and status
AWS S3 Sink Connector
S3 Source Offset Fix
Fix JSON & Text newline detection when running in certain Docker images
Byte handling fixes
Partitioning of nested data
Error handling and retry logic
Handle preCommit with null currentOffsets
Remove bucket validation on startup
Enabled simpler management of default flush values.
Local write mode - build locally, then ship
Deprecating old properties, however rewriting them to the new properties to ensure backwards compatibility.
Adding the capability to specify properties in yaml configuration
Rework exception handling. Refactoring errors to use Either[X,Y] return types where possible instead of throwing exceptions.
Ensuring task can be stopped gracefully if it has not been started yet
ContextReader testing and refactor
Adding a simple state model to the S3Writer to ensure that states and transitions are kept consistent. This can be improved in time.
AWS S3 Source Connector
Change order of match to avoid scala.MatchError
S3 Source rewritten to be more efficient and use the natural ordering of S3 keys
Region is necessary when using the AWS client
Cassandra Sink & Source Connectors
Add connection and read client timeout
FTP Connector
Support for Secure File Transfer Protocol
Hive Sink Connector
Array Support
Kerberos debug flag added
Influx DB Sink
Bump influxdb-java from version 2.9 to 2.29
Added array handling support
MongoDB Sink Connector
Nested Fields Support
Redis Sink Connector
Fix Redis Pubsub Writer
Add support for json and json with schema
2.1.3
Move to connect-common 2.0.5 that adds complex type support to KCQL
2.1.2
AWS S3 Sink Connector
Prevent null pointer exception in converters when maps are presented will null values
Offset reader optimisation to reduce S3 load
Ensuring that commit only occurs after the preconfigured time interval when using WITH_FLUSH_INTERVAL
AWS S3 Source Connector (New Connector)
Cassandra Source Connector
Add Bucket Timeseries Mode
Reduction of logging noise
Proper handling of uninitialized connections on task stop()
Elasticsearch Sink Connector
Update default port
Hive Sink
Improve Orc format handling
Fixing issues with partitioning by non-string keys
Hive Source
Ensuring newly written files can be read by the hive connector by introduction of a refresh frequency configuration option.
Redis Sink
Correct Redis writer initialisation
2.1.0
AWS S3 Sink Connector
Elasticsearch 7 Support
2.0.1
Hive Source
Rename option
connect.hive.hive.metastore
toconnect.hive.metastore
Rename option
connect.hive.hive.metastore.uris
toconnect.hive.metastore.uris
Fix Elastic start up NPE
Fix to correct batch size extraction from KCQL on Pulsar
2.0.0
Move to Scala 2.12
Move to Kafka 2.4.1 and Confluent 5.4
Deprecated:
Druid Sink (not scala 2.12 compatible)
Elastic Sink (not scala 2.12 compatible)
Elastic5 Sink(not scala 2.12 compatible)
Redis
Add support for Redis Streams
Cassandra
Add support for setting the LoadBalancer policy on the Cassandra Sink
ReThinkDB
Use SSL connection on Rethink initialize tables is ssl set
FTP Source
Respect connect.ftp.max.poll.records when reading slices
MQTT Source
Allow lookup of avro schema files with wildcard subscriptions
Last updated