Concepts
Learn the core concepts and glossary of Lenses Kafka to Kafka.
The K2K app orchestrates a robust, configurable pipeline for replicating and transforming Kafka data between clusters, with strong support for exactly-once semantics, flexible mapping, schema management, and operational observability.
Stages
K2K is comprised two stages, wrapped into a pipeline:
Processing Stages:
Modular stages for header handling, schema mapping, routing, serialization, and finalization.
Sink Stage:
Writes processed records to the target Kafka cluster, supporting both at-least-once and exactly-once semantics.
Before the stages are run, the app loads the pipeline definition from the configuration, validates it, and extracts all necessary settings for source, target, and coordination.
The configuration file path is passed as an argument to the binary.
k2k start -f my-pipeline.yaml
Kafka Clients
Naturally, K2K utilizes several Kafka clients, both consumers and producers.
Kafka Consumers:
Source Consumer: Reads records from the source Kafka cluster.
Control Consumer: Used for coordination and control messages (e.g., offset management), on the source cluster.
Kafka Producers:
Target Producer: Writes records to the target Kafka cluster, supporting both transactional (exactly-once) and non-transactional modes.
How do I configure the Kafka clients?
You can override the consumer and producer properties in the coordination, source and target sections. See Configuration.
Serialization & Deserialization
K2K supports serializers and deserializers for keys, values, and headers, supporting binary and JSON formats as needed for Kafka records and control messages, with Confluent API compatible Schema Registries.
How do I configure the serializers and deserializers?
You can configure the Schema registry URL and authentication in the consumer and producer sections for the source, coordinator and target by setting the prefix schema.registry
.
Partition and Topic Mapping
Partition Mapper: Determines how partitions are mapped from source to target.
Topic Mapper: Handles topic name mapping and routing logic.
How do I configure which topics are to be replicated?
The topics to be replicated from the source clusters are specified in the replication
section. See the configuration for more details.
Coordination and Assignment
Runs against the source Kafka cluster, to override the consumer settings in the coordination configuration entry, set an entry for the standard Kafka consumer properties.
Assignment Management:
Handles partition assignment, fencing, and exclusive access for exactly-once semantics.
Offset Management:
Manages committed offsets, including optimized and non-optimized strategies.
Transaction and Lease Management
Transaction Manager: Handles Kafka transactions for exactly-once delivery.
Lease Lifecycle Manager: Manages exclusive partition leases for exactly-once processing.
Telemetry and Metrics
Integrates with OpenTelemetry for metrics and tracing.
Tracks partition assignment and processing metrics.
Error Handling
Configurable strategies for handling commit timeouts, deserialization errors.
Last updated
Was this helpful?