How to Setup Alertmanager
How to Setup Alertmanager Alertmanager is a critical component of the Prometheus monitoring ecosystem, designed to handle alerts sent by Prometheus servers and route them to the appropriate notification channels. Whether you’re managing cloud infrastructure, microservices, or on-premise systems, effective alerting is non-negotiable for maintaining system reliability and minimizing downtime. Alertm
How to Setup Alertmanager
Alertmanager is a critical component of the Prometheus monitoring ecosystem, designed to handle alerts sent by Prometheus servers and route them to the appropriate notification channels. Whether youre managing cloud infrastructure, microservices, or on-premise systems, effective alerting is non-negotiable for maintaining system reliability and minimizing downtime. Alertmanager doesnt just send notificationsit consolidates, deduplicates, and silences alerts to prevent alert fatigue, ensuring that your team receives only the most relevant and actionable information.
Unlike basic alerting tools that fire off every minor anomaly, Alertmanager provides intelligent routing based on labels, grouping rules, and time-based policies. It supports integrations with email, Slack, PagerDuty, Microsoft Teams, Webhooks, and more, making it highly adaptable to any operational workflow. Setting up Alertmanager correctly is not just a technical taskits a strategic decision that impacts your teams responsiveness, system uptime, and overall operational maturity.
This guide walks you through every step of configuring Alertmanager from scratch, covering installation, configuration, integration with Prometheus, best practices, real-world examples, and troubleshooting. By the end, youll have a fully functional, production-ready alerting system that scales with your infrastructure.
Step-by-Step Guide
Prerequisites
Before beginning the setup, ensure you have the following:
- A Linux-based server (Ubuntu 20.04/22.04 or CentOS 8/9 recommended)
- Access to the command line with sudo privileges
- Prometheus server already installed and running
- Basic understanding of YAML configuration files
- Network access to external notification services (e.g., Slack, email SMTP)
Alertmanager is designed to work alongside Prometheus, so if you havent installed Prometheus yet, begin by following the official Prometheus installation guide. Once Prometheus is operational, proceed with Alertmanager setup.
Step 1: Download and Install Alertmanager
Alertmanager is distributed as a binary executable. Visit the official GitHub releases page to find the latest stable version. As of this writing, version 0.26.0 is recommended for production use.
Use wget to download the binary:
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
Extract the archive:
tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz
Move the extracted files to a standard system location:
sudo mv alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo mv alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
Create a dedicated system user for Alertmanager to run under for security:
sudo useradd --no-create-home --shell /bin/false alertmanager
Step 2: Create Configuration Directories
Organize your Alertmanager files in a standard structure:
sudo mkdir -p /etc/alertmanager
sudo mkdir -p /etc/alertmanager/templates
sudo mkdir -p /var/lib/alertmanager
Set ownership to the alertmanager user:
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager
sudo chown alertmanager:alertmanager /usr/local/bin/amtool
sudo chown alertmanager:alertmanager /etc/alertmanager
sudo chown alertmanager:alertmanager /var/lib/alertmanager
Step 3: Create the Alertmanager Configuration File
The core of Alertmanager is its configuration file: /etc/alertmanager/alertmanager.yml. This YAML file defines how alerts are routed, grouped, silenced, and notified.
Create the file:
sudo nano /etc/alertmanager/alertmanager.yml
Heres a minimal but functional configuration:
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'your-email@gmail.com'
smtp_auth_username: 'your-email@gmail.com'
smtp_auth_password: 'your-app-password'
smtp_hello: 'localhost'
smtp_require_tls: true
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'ops-team@example.com'
html: '{{ template "email.default.html" . }}'
headers:
subject: '[Alertmanager] {{ .CommonLabels.alertname }} - {{ .CommonLabels.severity }}'
templates:
- '/etc/alertmanager/templates/email.tmpl'
Lets break down the key sections:
- global: Defines default settings for all alerts, including SMTP server details for email notifications and timeout values.
- route: Determines how alerts are grouped and routed.
group_byensures similar alerts are bundled together.group_waitdelays initial notification to allow more alerts to accumulate.repeat_intervalcontrols how often a resolved alert is re-notified. - receivers: Specifies where alerts should be sent. In this case, an email receiver named
email-notifications. - templates: Points to custom email templates for richer notification formatting.
For production environments, avoid hardcoding passwords. Use environment variables or secret management tools like HashiCorp Vault or Kubernetes Secrets.
Step 4: Create a Custom Email Template (Optional but Recommended)
Custom templates improve readability and provide context. Create a template file:
sudo nano /etc/alertmanager/templates/email.tmpl
Add the following Go template:
{{ define "email.default.html" }}
body { font-family: Arial, sans-serif; }
.alert { background-color:
f8d7da; border-left: 4px solid #721c24; padding: 10px; margin: 10px 0; }
.label { font-weight: bold; color: 495057; }
Alertmanager Notification
Alert Name: {{ .CommonLabels.alertname }}
Severity: {{ .CommonLabels.severity }}
Instance: {{ .CommonLabels.instance }}
Description: {{ .CommonAnnotations.description }}
Start Time: {{ .StartsAt }}
{{ end }}
This template renders a clean, styled HTML email with key alert details and a direct link to the Prometheus UI for deeper investigation.
Step 5: Configure Prometheus to Send Alerts to Alertmanager
Alertmanager doesnt generate alertsit receives them from Prometheus. You must configure Prometheus to forward alerts to Alertmanager.
Open your Prometheus configuration file (typically /etc/prometheus/prometheus.yml):
sudo nano /etc/prometheus/prometheus.yml
Add or update the alerting section:
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
Ensure that the alerting block is at the same level as scrape_configs and not nested inside it.
Also, verify that your alert rules are defined in a separate file (e.g., /etc/prometheus/alerts.yml) and referenced in the main config:
rule_files:
- "alerts.yml"
Example alert rule (/etc/prometheus/alerts.yml):
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: page
annotations:
summary: "High request latency detected"
description: "{{ $labels.instance }} has a high request latency of {{ $value }}s."
Restart Prometheus after making changes:
sudo systemctl restart prometheus
Step 6: Create a Systemd Service for Alertmanager
To ensure Alertmanager starts automatically on boot and restarts on failure, create a systemd service file:
sudo nano /etc/systemd/system/alertmanager.service
Add the following content:
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=alertmanager
Group=alertmanager
ExecStart=/usr/local/bin/alertmanager \
--config.file=/etc/alertmanager/alertmanager.yml \
--storage.path=/var/lib/alertmanager \
--web.listen-address=0.0.0.0:9093 \
--web.template.files=/etc/alertmanager/templates/*.tmpl
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Reload systemd and enable the service:
sudo systemctl daemon-reload
sudo systemctl enable alertmanager
sudo systemctl start alertmanager
Verify the service is running:
sudo systemctl status alertmanager
You should see active (running). If not, check logs with:
journalctl -u alertmanager -f
Step 7: Access the Alertmanager Web UI
Alertmanager includes a built-in web interface for monitoring active alerts, silences, and configuration status. By default, it runs on port 9093.
Open your browser and navigate to:
http://your-server-ip:9093
Youll see a dashboard showing:
- Active alerts grouped by labels
- Alert history
- Configuration validation status
- Silence management interface
Test the setup by triggering a simulated alert. You can use Prometheuss up metric to force a failure:
curl -X POST -d '1' http://localhost:9090/-/reload
Or temporarily stop the Prometheus server to trigger a Prometheus is down alert. Within minutes, you should receive an email notification and see the alert appear in the web UI.
Step 8: Integrate with Slack (Optional but Highly Recommended)
Slack is one of the most popular notification channels. To integrate:
- Create an incoming webhook in your Slack workspace: Go to
https://your-workspace.slack.com/apps? Search Incoming Webhooks ? Add to workspace ? Create new webhook ? Copy the webhook URL. - Update your
alertmanager.ymlto include a Slack receiver:
receivers:
- name: 'slack-notifications'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
channel: '
alerts'
text: |
{{ .CommonLabels.alertname }} - {{ .CommonLabels.severity }}
{{ range .Alerts }}
*Description:* {{ .Annotations.description }}
*Instance:* {{ .Labels.instance }}
*Starts At:* {{ .StartsAt.Format "2006-01-02 15:04:05" }}
[View in Prometheus]({{ .GeneratorURL }})
{{ end }}
send_resolved: true
- name: 'email-notifications'
email_configs:
- to: 'ops-team@example.com'
html: '{{ template "email.default.html" . }}'
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: 'slack-notifications'
routes:
- match:
severity: 'page'
receiver: 'slack-notifications'
- match:
severity: 'warning'
receiver: 'email-notifications'
Key points:
- Use
send_resolved: trueto notify when an alert clears. - Use nested
routesto send critical alerts to Slack and warnings to email. - Always test webhook integration with a dummy alert before relying on it in production.
Restart Alertmanager after changes:
sudo systemctl restart alertmanager
Best Practices
1. Use Labels and Annotations Effectively
Labels (e.g., severity, instance, job) are used for grouping and routing. Annotations (e.g., description, summary) provide human-readable context. Always define consistent labels across all alert rules. Use severity with values like info, warning, critical, and page to enable tiered alerting.
2. Avoid Alert Storms with Grouping and Suppression
Alertmanagers grouping feature reduces noise by combining similar alerts. For example, if 50 servers lose connectivity due to a network outage, Alertmanager sends one grouped alert instead of 50 individual ones. Combine this with group_wait (e.g., 30s) to allow time for multiple alerts to accumulate before notification.
3. Set Appropriate Repeat Intervals
Too frequent repeats (e.g., every 5 minutes) cause alert fatigue. For critical alerts, 13 hours is sufficient. Use repeat_interval to prevent repetitive notifications for unresolved issues.
4. Use Silences Strategically
Silences allow you to temporarily mute alerts during maintenance windows or known outages. Always include a reason and expiration time when creating a silence. Use the web UI or amtool to manage them:
amtool silence add --author="admin" --reason="Maintenance" --duration=2h alertname=HighCPUUsage
5. Separate Environments
Use different Alertmanager instances or routing rules for dev, staging, and production. For example, route all dev alerts to a
dev-alerts Slack channel and production alerts to #prod-alerts. This prevents noise from non-critical environments.
6. Secure Configuration Files
Never store secrets like SMTP passwords or Slack webhook URLs in plaintext. Use environment variables:
smtp_auth_password: '{{ .Env.SMTP_PASSWORD }}'
Then set the variable before starting Alertmanager:
export SMTP_PASSWORD=your_app_password
sudo systemctl restart alertmanager
Alternatively, use tools like Vault, AWS Secrets Manager, or Kubernetes Secrets in containerized environments.
7. Monitor Alertmanager Itself
Alertmanager exposes metrics at /metrics. Set up a Prometheus scrape job for Alertmanager to monitor its health:
- job_name: 'alertmanager'
static_configs:
- targets: ['localhost:9093']
Then create an alert for when Alertmanager is down:
- alert: AlertmanagerDown
expr: up{job="alertmanager"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Alertmanager is down"
description: "Alertmanager has been unreachable for 5 minutes."
8. Test Alert Rules Before Deployment
Use Prometheuss promtool to validate alert rules:
promtool check rules /etc/prometheus/alerts.yml
Also, use the Alertmanager UIs Test button under the Silences tab to simulate alert routing.
Tools and Resources
Official Documentation
- Alertmanager Official Docs The authoritative source for configuration options and behavior.
- Prometheus Alerting Rules Learn how to write effective alert expressions.
Configuration Validators
- promtool Command-line utility to validate Prometheus and Alertmanager configurations.
- YAML Linter Use online tools like YAMLLint to catch syntax errors before restarting services.
Template Libraries
- Default Alertmanager Templates Reference Go templates for email, Slack, and other formats.
- Alertmanager Template Builder Community tools like alertmanager-template-builder help generate complex templates visually.
Integration Guides
- Grafana Alerting with Alertmanager Integrate with Grafana for unified dashboards and alerts.
- PagerDuty Integration For enterprise-grade on-call scheduling.
- Microsoft Teams Integration Send alerts directly into Teams channels.
Monitoring and Debugging Tools
- amtool Command-line interface for managing silences, templates, and testing routing.
- curl Test Alertmanagers HTTP API:
curl http://localhost:9093/api/v2/alerts - Wireshark / tcpdump For debugging webhook delivery failures.
Community and Support
- Prometheus Community Join the mailing list or Slack channel for real-time help.
- Stack Overflow (prometheus tag) Search for common configuration issues.
- GitHub Issues Report bugs or request features.
Real Examples
Example 1: E-Commerce Platform Alerting
Scenario: An online store uses Prometheus to monitor API latency, error rates, and database connections.
Alert Rules:
- High API Error Rate:
rate(http_requests_total{status=~"5.."}[5m]) > 0.05 - Database Connection Pool Exhausted:
database_connections{type="active"} / database_connections{type="max"} > 0.9 - Checkout Service Unreachable:
up{job="checkout-service"} == 0
Alertmanager Routing:
- All
severity: pagealerts ? Slack channelprod-alerts + PagerDuty
- All
severity: warningalerts ? Email to dev team + Slack channeldev-warnings
- Alerts with
service=checkout? Escalate to on-call engineer after 15 minutes
Outcome: During a Black Friday sale, a spike in errors triggered a grouped alert. The team responded within 3 minutes, identified a misconfigured load balancer, and restored service before revenue loss occurred.
Example 2: Kubernetes Cluster Monitoring
Scenario: A team manages 20+ Kubernetes clusters across multiple regions.
Alert Rules:
- Kubelet Down:
up{job="kubelet"} == 0 - Pod CrashLoopBackOff:
sum by (namespace, pod) (kube_pod_container_status_restarts_total) > 5 - Node Memory Pressure:
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes
Alertmanager Configuration:
- Group by
cluster, namespace, alertname - Send cluster-wide alerts to
k8s-alerts
- Send namespace-specific alerts to team Slack channels (e.g.,
team-frontend, #team-backend)
- Use silence for scheduled maintenance windows
Outcome: During a node upgrade, 120 alerts were grouped into 12 consolidated messages. The team avoided alert fatigue and focused on the root cause.
Example 3: Hybrid Cloud Infrastructure
Scenario: A company runs workloads on AWS, Azure, and on-premises data centers.
Challenge: Different teams manage different environments with varying SLAs.
Solution:
- Use labels:
cloud_provider=aws,cloud_provider=azure,cloud_provider=onprem - Route AWS alerts to cloud team, on-prem alerts to internal IT
- Set longer repeat intervals for on-prem alerts (4h) due to slower response cycles
- Use a custom webhook to send alerts to an internal ticketing system
Result: Reduced misrouted alerts by 85%. Each team now receives only alerts relevant to their domain.
FAQs
Q1: Can Alertmanager work without Prometheus?
No, Alertmanager is designed as a companion to Prometheus. It does not generate alertsit only receives and routes them. Other monitoring systems (e.g., Zabbix, Datadog) have their own alerting engines.
Q2: How do I test if my Alertmanager configuration is valid?
Use the amtool command:
amtool config check /etc/alertmanager/alertmanager.yml
It will return Success or list syntax errors. Always validate before restarting the service.
Q3: Why am I not receiving email alerts?
Common causes:
- Incorrect SMTP credentials or port
- Two-factor authentication enabled on the email account (use app-specific passwords)
- Firewall blocking outbound SMTP traffic
- Email being marked as spam
Check Alertmanager logs: journalctl -u alertmanager -f for SMTP errors.
Q4: How do I silence an alert permanently?
You cannot silence alerts permanently. Silences have a fixed duration (e.g., 1h, 1d). For long-term suppression, modify the alert rule itself or use ignore labels in your alert expressions.
Q5: Can I use Alertmanager with multiple Prometheus servers?
Yes. Configure each Prometheus server to send alerts to the same Alertmanager instance. Use labels like prometheus_cluster to distinguish sources in routing.
Q6: Whats the difference between Alertmanager and Prometheus alert rules?
Prometheus alert rules define when to trigger an alert (e.g., CPU > 90% for 5m). Alertmanager defines how to handle the alert (e.g., group by service, send to Slack, wait 30s, repeat every 3h). They work together but serve different purposes.
Q7: How do I upgrade Alertmanager?
Download the new binary, stop the service, replace the executable, validate the config, then restart. Always test in a staging environment first.
Q8: Is Alertmanager stateful? Does it store alerts?
Yes. Alertmanager stores active alerts and silences in its storage.path directory. If it restarts, it retains pending alerts. However, it does not persist historical alert data. For long-term alert history, integrate with external systems like Loki or Grafana.
Conclusion
Setting up Alertmanager is more than a technical configurationits a foundational step toward building a resilient, observable infrastructure. When properly configured, Alertmanager transforms raw metric anomalies into actionable, prioritized alerts that empower your team to respond quickly and confidently.
In this guide, we covered the full lifecycle of Alertmanager setup: from downloading and installing the binary, to writing precise routing rules, integrating with Slack and email, securing secrets, and validating configurations. We explored best practices that prevent alert fatigue, ensured scalability, and aligned alerting with real-world operational needs. Real-world examples demonstrated how organizations across industriesfrom e-commerce to Kubernetes clustersleverage Alertmanager to reduce downtime and improve system reliability.
Remember: the goal of alerting is not to notify you of every small fluctuation, but to ensure youre alerted to the right problems at the right time. Avoid over-alerting. Prioritize ruthlessly. Test continuously. Monitor Alertmanager itself. And always keep your templates clean, your labels consistent, and your silences intentional.
As your infrastructure grows, so should your alerting strategy. Alertmanager scales gracefully with your needs, and with the practices outlined here, youre now equipped to build an alerting system thats not just functionalbut exceptional.