Spark Structured Metrics¶
If you have an application written using Spark Structured Streams that consumes messages from Kafka, then the topology client can publish consumer metrics using a custom
You will require the following maven module:
<dependency> <groupId>com.landoop</groupId> <artifactId>lenses-topology-client-kafka-spark</artifactId> <version>1.0.0</version> </dependency>
Then you will need to make two small changes to the code that creates the streaming
Dataset. The first is to use the
lenses-kafka format rather than the default
This is a slightly modified version of the kafka source that retains a handle to the
KafkaConsumer and publishes metrics exposed by that consumer. The second change is to add an option
that informs the metrics publisher the name of the topology and the topics we are interested in.
Dataset<Row> words = spark .readStream() // the next line switches to the modified kafka format .format("lenses-kafka") .option("kafka.bootstrap.servers", "localhost:9092") // the next line sets the topology name and topics so the metrics publisher knows which topics we are interested in .option("kafka.lenses.topology.description", topology.getDescription()) .option("subscribe", inputTopic) .load();
And that’s all that’s required. The
Dataset can now be used as you would in any other application.
Note: When you create the topology instance, set the app type to
AppType.SparkStreaming so that the UI renders with the correct visualization.