Monitoring the health of your infrastructure.
Lenses provides monitoring of the health of your infrastructure via JMX.
Additionally, Lenses has a number of built-in alerts for these services.
Lenses monitors (by default every 10 seconds) your entire streaming data platform infrastructure and has the following alert rules built-in:
Rule | This rule fires when |
---|---|
For version below Lenses 6.0 omit the environment selection.
If you change your Kafka cluster size or replace an existing Kafka broker with another, Lenses will raise an active alert as it will detect that a broker of your Kafka cluster is no longer available. If the Kafka broker has been intentionally removed, then decommission it:
Navigate to Environments->[Your Environment]->Workspace->Services.
Select the broker, click on the actions in the options menu and click on the Decommission option.
Lenses License
Lenses licnese is invalid
Kafka broker is down
A Kafka broker from the cluster is not healthy
Zookeeper node is down
A Zookeeper node is not healthy
Connect Worker is down
A Kafka Connect worker node is not healthy
Schema Registry is down
A Schema Registry instance is not healthy
Under replicated partitions
The Kafka cluster has 1 or more under-replicated partitions
Partitions offline
The Kafka cluster has 1 or more partitions offline (partitions without an active leader)
Active Controller
The Kafka cluster has 0 or more than 1 active controllers
Multiple Broker versions
The Kafka cluster is under a version upgrade, and not all brokers have been upgraded
File-open descriptors on Brokers
A Kafka broker has an alarming number of file-open descriptors. When the operating system is exceeding 90% of the available open file descriptors
Average % the request handler is idle
The average fraction of time the request handler threads are idle is dangerously low. The alert is HIGH when the value is smaller than 10%, and CRITICAL when it is smaller than 2%.
Fetch requests failure
Fetch requests are failing. If the rate of failures per second is > 10% the alert level is set to CRITICAL, otherwise it is set to HIGH.
Produce requests failure
Producer requests are failing. When the value is > 10% the alert level is set to CRITICAL, otherwise it is set to HIGH.
Broker disk usage
A Kafka broker’s disk usage is greater than the cluster average. The build-in threshold is 1 GByte.
Leader imbalance
A Kafka broker has more leader replicas than the average broker in the cluster.