Alert reference

Alert	Alert Identifier	Description	Category	Instance	Severity
Kafka Broker is down	1000	Raised when the Kafka broker is not part of the cluster for at least 1 minute. i.e:host-1,host-2	Infrastructure	brokerID	INFO, CRITICAL
Zookeeper Node is down	1001	Raised when the Zookeeper node is not reachable. This is information is based on the Zookeeper JMX. If it responds to JMX queries it is considered to be running.	Infrastructure	service name	INFO, CRITICAL
Connect Worker is down	1002	Raised when the Kafka Connect worker is not responding to the API call for /connectors for more than 1 minute.	Infrastructure	worker URL	MEDIUM
Schema Registry is down	1003	Raised when the Schema Registry node is not responding to the root API call for more than 1 minute.	Infrastructure	service URL	HIGH, INFO
Under replicated partitions	1005	Raised when there are (topic, partitions) not meeting the replication factor set.	Infrastructure	partitions	HIGH, INFO
Partitions offline	1006	Raised when there are partitions which do not have an active leader. These partitions are not writable or readable.	Infrastructure	brokers	HIGH, INFO
Active Controllers	1007	Raised when the number of active controllers is not 1. Each cluster should have exactly one controller.	Infrastructure	brokers	HIGH, INFO
Multiple Broker Versions	1008	Raised when there are brokers in the cluster running on different Kafka version.	Infrastructure	brokers versions	HIGH, INFO
File-open descriptors high capacity on Brokers	1009	A broker has too many open file descriptors	Infrastructure	brokerID	HIGH, INFO, CRITICAL
Average % the request handler is idle	1010	Raised when the average fraction of time the request handler threads are idle. When the valueis smaller than 0.02 the alert level is CRITICAL. When the value is smaller than 0.1 the alert level is HIGH.	Infrastructure	brokerID	HIGH, INFO, CRITICAL
Fetch requests failure	1011	Raised when the Fetch request rate (the value is per second) for requests that failed is greater than a threshold. If the value is greater than 0.1 the alert level is set to CRITICAL otherwise is set to HIGH.	Infrastructure	brokerID	HIGH, INFO, CRITICAL
Produce requests failure	1012	Raised when the Producer request rate (the value is per second) for requests that failed is greater than a threshold. If the value is greater than 0.1 the alert level is set to CRITICAL otherwise is set to HIGH.	Infrastructure	brokerID	HIGH, INFO, CRITICAL
Broker disk usage is greater than the cluster average	1013	Raised when the Kafka Broker disk usage is greater than the cluster average. We provide by default a threshold of 1GB disk usage.	Infrastructure	brokerID	MEDIUM, INFO
Leader Imbalance	1014	Raised when the Kafka Broker has more leader replicas than the cluster average.	Infrastructure	brokerID	INFO
Consumer Lag exceeded	2000	Raises an alert when the consumer lag exceeds the threshold on any partition.	Consumers	topic	HIGH, INFO
Connector deleted	3000	Connector was deleted	Kafka Connect	connector name	INFO
Topic has been created	4000	New topic was added	Topics	topic	INFO
Topic has been deleted	4001	Topic was deleted	Topics	topic	INFO
Topic data has been deleted	4002	Records from topic were deleted	Topics	topic	INFO
Data Produced	5000	Raises an alert when the data produced on a topic doesn’t match expected threshold	Data Produced	topic	LOW, INFO
Connector Failed	6000	Raises an alert when a connector, or any worker in a connector is down	Apps	connector	LOW, INFO

Last modified: July 26, 2024