Kafka Monitoring Suite

The Lenses Kafka Monitoring Suite is a set of pre-defined templates, that use

  • A Time Series database (Prometheus)
  • Custom JMX exporters
  • A Data Visualization application (Grafana)
  • Built-in domain intelligence about operating Kafka with confidence in production

Lenses is great at continuously monitoring your Kafka cluster and can raise alerts for important metrics degradation, such as consumer lag and offline or under-replicated partitions. However, it does not strive to become a time series database because there exist well established solutions like Prometheus. See Kafka monitoring suite setup for details.

Landoop’s monitoring reference setup for Apache Kafka is thus based on Prometheus and Grafana software with a medium-term goal to bring more dashboards from Grafana into Lenses.

A question that comes up often is whether monitoring is really needed since Lenses can provide alerts. This should be answered eventually by each implementation team.

Keep your alerts and key metrics to a small tight set, so that you will not get overwhelmed. This is what we want to achieve through Lenses.

Alerts and key metrics are related to monitoring but are not the same. We define monitoring as the process of collecting a large number of metrics and storing them for a period of time. Queries to these data help engineers understand the cluster better, establish baselines so they can plan for additional capacity or act on deviations, or even extract new, important key metrics for a specific use case as the team acquires more experience in the field. Furthermore, new alerts can be added to any metric or any combination of metrics.

Demo Dashboards

Kafka Cluster Metrics

A 360-degree of the key metrics of your Kafka cluster curated into a single template that allows to time travel between the past 60 days (by default) of key metrics and pro-actively receive alerts and notifications when your streaming platform is under pressure or signals of partial failures appear.

../../_images/kafka-cluster-metrics-overview.png

Consumer Producer Metrics

A Kafka Consumer and Producer dashboard that includes all metrics for Kafka brokers, Zookeeper, Schema Registry, Connect Distributed, REST Proxy, Lenses and any other JVM applications that are connected to Lenses Monitoring.

../../_images/kafka-producer-consumer-metrics-UI1.png

Hard Disk Usage Metrics

A dashboard that displays the approximate metrics about the size (in bytes) of your topics. It is useful for planning disk capacity and having an overview of each topic’s size. The “Data Stored per Broker” graph can be used for detecting storage imbalances between brokers.

../../_images/kafka-hard-disk-usage-metrics.png

Client Application Monitoring

Operational metrics from your JVM-based Kafka applications. You can use it to monitor the performance and usage of system resources in order to detect issues at an early stage. It provides full access to how JVM apps and the Garbage Collector behave, as well as to open file descriptors and other critical aspects of your own applications.

../../_images/kafka-jvm-client-application-monitoring.png