Spark metrics

If you have an application written using Spark Structured Streams that consumes messages from Kafka, then the topology client can publish consumer metrics using a custom Format.

You will require the following maven module:

<dependency>
    <groupId>com.landoop</groupId>
    <artifactId>lenses-topology-client-kafka-spark</artifactId>
    <version>4.0.0</version>
</dependency>

Then you will need to make two small changes to the code that creates the streaming Dataset. The first one is to use the lenses-kafka format rather than the default kafka format. This is a slightly modified version of the Kafka source that retains a handle to the KafkaConsumer and publishes metrics exposed by that consumer. The second change is to add an option that informs the metrics publisher the name of the topology and the topics we are interested in.

Dataset<Row> words = spark
        .readStream()
        // the next line switches to the modified kafka format
        .format("lenses-kafka")
        .option("kafka.bootstrap.servers", "localhost:9092")
        // the next line sets the topology name and topics so the metrics publisher knows which topics we are interested in
        .option("kafka.lenses.topology.description", topology.getDescription())
        .option("subscribe", inputTopic)
        .load();

The Dataset can now be used as you would in any other application.