What’s the use of monitoring if you can’t raise alerts? It’s half a solution at
best and now I have basic monitoring working, as discussed in
Prometheus experiments with docker-compose,
it felt like it was time to add AlertManager, Prometheus often used
partner in crime, so I can investigate raising, handling and resolving
alerts. Unfortunately this turned out to be a lot harder than ‘just’
adding a basic exporter.
Before we delve into the issues and how I worked around them in my
implementation let’s see the result of all the work, adding a redis alert and
forcing it to trigger. Ignoring all the implementation details for now we need to
do four things to add AlertManager to our experiments:
- add the AlertManager container
- tell Prometheus how to contact AlertManager
- tell Prometheus where the alert rules files are located
- add an alerting rule to confirm everything is connected
Assuming we’re in the root of
we’ll run our
docker-compose command to create all the instances we need for
You can confirm all the containers are available by running:
In this screenshot you can see the Prometheus alerting page, with our
alert against a green background as everything is working correctly. We also
RedisDown AlertManager rule configuration.
This rule checks the
redis_up value returned by the redis exporter. If
redis is down it will be
0, and if it doesn’t recover in the next minute it
will trigger an alert. It’s worth noting here that you can confirm your rules files are
valid using this, less scary than it looks,
# the left hand argument to `-v` is the local file from this repo.
-it --entrypoint=promtool prom/prometheus:v2.1.0 check rules /fileof.rules
SUCCESS: 1 rules found
Everything seems to be configured correctly, so lets break it and
confirm alerting is working. First we will kill the redis container.
This will cause the exporter to change the value of
# kill the container
docker kill prometheusserver_redis-server_1
# check it has exited
docker ps -a | grep prometheusserver_redis-server_1
# simplified output
library/redis:4.0.8 Exited (137) 2 minutes ago prometheusserver_redis-server_1
The alert will then change to “State PENDING” on the prometheus alerts page.
Once the minute it up it will change to “State FIRING” and, if everything is
working, appear in AlertManager too.
In addition to using the web UI you can directly query alertmanager
via the command line using the docker container
docker exec -ti prometheusserver_alert-manager_1 amtool
--alertmanager.url http://127.0.0.1:9093 alert
Alertname Starts At Summary
RedisDown 2018-03-09 18:33:58 UTC Redis Availability alert.
At this point we have a basic but working AlertManager running alongside our
local prometheus. It’s far from a complete or comprehensive configuration, and
the alerts don’t yet go anywhere, but it’s a solid base to start your own
experiments from. You can see all the code to make this work in the
Now we’ve covered how AlertManager fits into our tests and how to confirm it’s
working we will delve into how it’s configured, something
that was much more work than I expected. Prometheus, by design, runs with
a single configuration file.
While this is fine for a number of use cases, my design goal of combining any
combination of docker-compose files to create a test environment doesn’t play
well with it. This became clear to me when I needed to add the alertmanager
configuration to the main config file, but only when alertmanager is included.
The config to enable AlertManager and its alerting rules is concise:
- targets: ['alert-manager:9093']
The first part,
rule_files:, accepts wild card selection of alert
rule files. Each of these files contain one of more alert rules, such as
RedisDown example above. This globbing makes it easy to add rules
to prometheus from each included component. The second part tells
prometheus where it can find the alertmanager instance it should raise
In order to use these configs I had to add another step to running prometheus;
collecting all the configuration snippets and combining them into a single
file before starting the process. My first thought was to create my own Prometheus
container and preprocess the configuration before starting the daemon. I
quickly decided against this as I don’t want to be responsible for
maintaining my own fork of the Dockerfile. I was also worried about
timing issues and start up race conditions from all the other containers
adding their configs. Instead I decided to add another container.
This tiny busybox based container, which I named
runs a short shell script in a loop. This code concatenates all the
configuration fragments, starting with the base config, together. If the
complete config file has changed it replaces the existing, volume
mounted, file which prometheus then detects as changed and reloads.
I have a strong suspicion I’ll be revisiting this part of the project
again and splitting the fragments more. Adding ordering will probably be
required as some of the exporters (such as MySQL) can’t be configured as
targets via the
file_sd_configs mechanism. However for now it’s allowed me to
test the basic alerting functionality and continue to delver more deeply into