Alertmanager Integration

Lenses comes with an alerting subsystem that can be tailored to match individual needs. You can find more information about the alerting subsystem at the user guide section.

For an alerting system to be complete, there is usually the requirement for alerts management and notifications. In simple terms, there has to be a way for an alert to reach the proper team within the proper time frame. And all of this without overwhelming the team with alerts which does not involve it, duplicate the entries, or produce alerts which are a byproduct of a top level alert. Hence, Lenses integrates with the Alertmanager software which provides alerts de-duplication, grouping and routing via various systems (such as email, pager-duty, slack) as well as silencing and inhibition.

Lenses Configuration

To configure the Alertmanager integration, just a couple configuration entries are required. The only mandatory option is the Alertmanager endpoint setting. Alertmanager clusters (multiple endpoints) are well supported:


A useful option is the generator URL, where the Lenses address should be set. This way alerts will include a link to Lenses, which the recipient can use to quickly navigate to the web interface:


For the complete list of reference options, please visit the configuration reference.


Lenses® | Alertmanager Service Status

Alerts Attributes

Although Alertmanager inner workings and configuration is beyond the scope of this guide, it is useful to briefly go into some of the details. This way it will be easier to get the most out of this feature.

Alerts are posted to Alertmanager as JSON objects. Each alert has a set of labels and a set of annotations. The set of labels is what uniquely identifies the alert, whilst the annotations serve as further elaborate descriptions of the event.

Alertmanager can use the set of labels in order to deduplicate, group, route, silence and inhibit alerts, whilst the annotations (and a field called generatorURL) can be sent, along with the labels, to a recipient to help quickly understand the issue.

Lenses alerts offer these main labels [1]:

label name description values
category the category of the alert Infrastructure, Consumers, Kafka Connect, Topics
instance the URL or the subsystem that triggered the event can be the address of a broker, a description like UnderReplication, etc
severity the severity of the event INFO, MEDIUM, HIGH, CRITICAL [2]
[1]There are more labels actually but vary by instance. Only these three are present in all alerts and can be considered stable, so that the alertmanager configuration may be built around them.
[2]There is also a LOW level but it is not in use currently.

Lenses alerts have these annotations:

annotation name description values
source the source of the event default is Lenses unless configured otherwise
summary the summary of the event depends on the alert

Alertmanager Example

In the example below, Alertmanager is configured with three receivers: default, urgent and emergency. There are three backends available as well: email, slack and pushover. The default receiver sends events only to slack. The urgent receiver sends events to both email and slack, whilst the emergency receiver sends to all three backends.

The routing rules send Infrastructure category events of HIGH or CRITICAL severity to the emergency receiver so the team can receive push notifications to their mobile phones and act immediately. The rest of the categories of events of HIGH or CRITICAL severity are sent to the urgent receiver, so the proper team member can get email notifications. At last, all other events (events that didn’t match a routing rule) will be sent to the default receiver, which will post them to a slack channel, where a member of the team can look at a time of convenience.

Also two inhibition rules are set. If a CRITICAL alert is triggered, Alertmanager will not sent notifications for any other events until the CRITICAL issue is resolved. This is because the main problem (maybe a broker can no more serve requests) will cause more problems. Team members should not be flooded with notifications but rather get one notification for the root cause. The second inhibition rule applies to events of severity HIGH. In that case events from the same instance but with lower severity will be inhibited until the main alert for this instance is resolved.

For the Slack notification a custom text is set, which includes the summary, source and generatorURL.

  smtp_auth_username: SMTP_USER
  smtp_auth_password: SMTP_PASS

  # If an alert does not match any rule, it goes to the default:
  receiver: 'default'
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  group_by: [category,severity]
  - receiver: emergency
      severity: HIGH|CRITICAL
      category: Infrastructure
  - receiver: urgent
      severity: HIGH|CRITICAL

- source_match:
    severity: 'CRITICAL'
    severity: 'INFO|MEDIUM|HIGH'
  equal: ['source']
- source_match:
    severity: 'HIGH'
    severity: 'INFO|MEDIUM'
  equal: ['instance','source']

- name: 'default'
  - channel: alerts
    send_resolved: true
    text: {% raw %}"{{ range .Alerts }}{{ .Labels.instance }}: {{ .Annotations.summary }}.\nVia: {{ .Annotations.source }}\nGenerator: {{ .GeneratorURL }}\n{{ end }}"{% endraw %}

- name: 'urgent'
  - to: ',,'
  - channel: alerts
    send_resolved: true
    text: {% raw %}"{{ range .Alerts }}{{ .Labels.instance }}: {{ .Annotations.summary }}.\nVia: {{ .Annotations.source }}\nGenerator: {{ .GeneratorURL }}\n{{ end }}"{% endraw %}

- name: 'emergency'
  - user_key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    expire: 2m
  - to: ',,'
  - channel: alerts
    send_resolved: true
    text: {% raw %}"{{ range .Alerts }}{{ .Labels.instance }}: {{ .Annotations.summary }}\nVia: {{ .Annotations.source }}\nGenerator: {{ .GeneratorURL }}\n{{ end }}"{% endraw %}

Alerts Reference

Each alert is configured to fire on multiple criteria. Depending on the conditions, a number of tags is applied to each alert (for example topic name) and the description and severity of the alert is changed appropriately. The list of alerts Lenses can produce, along with the category, instance and severity can be found below.

Please note that alerts of severity INFO are not send to Alertmanager.

Alerts Table
Alert Description Category Instance Severity
New topic was added
Topics topic INFO
Topic was deleted
Topics topic INFO
Connector was deleted
KafkaConnector connector name INFO
Status of Schema Registry
Infrastructure service URL INFO, HIGH
Some partitions are under replicated
Infrastructure partitions INFO, HIGH
Rate of failed requests is above threshold
Infrastructure brokerID INFO, HIGH
A consumer group is falling behind
Consumers topic INFO, HIGH
A broker has a large number of leader replicas
Infrastructure brokerID INFO
A broker has too many open file descriptors
Infrastructure brokerID INFO, HIGH, CRITICAL
High number of active controllers
Infrastructure brokers HIGH, INFO
Connect client has gone offline
Infrastructure worker URL MEDIUM
Zookeeper node is offline
Infrastructure service name INFO, CRITICAL
Broker is almost fully utilized
Infrastructure brokerID INFO, HIGH, CRITICAL
Brokers version mismatch
Infrastructure brokers versions INFO, HIGH
Some partitions are offline
Infrastructure brokers INFO, HIGH
Rate of failed fetch requests is above threshold
Infrastructure brokerID INFO, HIGH, CRITICAL
Disk usage of broker is higher than average
Infrastructure brokerID INFO, MEDIUM
Broker is offline
Infrastructure brokerID INFO, CRITICAL
License is invalid
Infrastructure lenses CRITICAL
Records from topic were deleted
Infrastructure topic INFO