How to Send Alerts With Grafana

How to Send Alerts With Grafana Grafana is one of the most powerful open-source platforms for monitoring and observability, widely adopted by DevOps teams, SREs, and infrastructure engineers around the world. While its intuitive dashboards provide real-time visualizations of metrics, logs, and traces, its true power lies in its alerting capabilities. Sending alerts with Grafana enables teams to pr

alex

Nov 10, 2025 - 12:00

How to Send Alerts With Grafana

Grafana is one of the most powerful open-source platforms for monitoring and observability, widely adopted by DevOps teams, SREs, and infrastructure engineers around the world. While its intuitive dashboards provide real-time visualizations of metrics, logs, and traces, its true power lies in its alerting capabilities. Sending alerts with Grafana enables teams to proactively respond to anomalies, performance degradation, system failures, and security incidents before they impact end users or business operations.

Whether youre monitoring a cloud-native Kubernetes cluster, a legacy on-premise database, or a microservices architecture, Grafanas alerting system integrates seamlessly with your data sourcessuch as Prometheus, InfluxDB, Loki, and moreto trigger notifications via email, Slack, PagerDuty, Microsoft Teams, Webhooks, and other channels. This tutorial provides a comprehensive, step-by-step guide to configuring, optimizing, and scaling alerting in Grafana, ensuring you never miss a critical event again.

By the end of this guide, youll understand how to define meaningful alert rules, avoid alert fatigue, integrate with notification platforms, and implement enterprise-grade alerting strategies that reduce mean time to detection (MTTD) and mean time to resolution (MTTR).

Step-by-Step Guide

Prerequisites

Before configuring alerts in Grafana, ensure you have the following:

A running Grafana instance (version 8.0 or higher recommended)
A supported data source connected (e.g., Prometheus, InfluxDB, Loki, MySQL, etc.)
Administrative or editor permissions in Grafana
Access to your notification channels (e.g., Slack webhook, SMTP server, PagerDuty API key)

If youre using Grafana Cloud, these components are pre-configured. For self-hosted installations, ensure your data source is properly connected and queried successfully in a dashboard panel.

Step 1: Create or Open a Dashboard

Alerts in Grafana are tied to individual panels within a dashboard. You cannot create an alert without first having a visualization panel that queries data from a supported data source.

To begin, navigate to the Dashboard menu in the left sidebar, then click New ? Add new panel. Alternatively, open an existing dashboard you wish to monitor.

In the panel editor, select your data source from the dropdown (e.g., Prometheus). Write a query that tracks a key metricsuch as HTTP error rates, CPU utilization, memory usage, or request latency. For example:

rate(http_requests_total{status_code=~"5.."}[5m]) > 0.1

This query calculates the 5-minute rate of HTTP 5xx responses and triggers an alert if it exceeds 10% of total requests.

Once your query returns data and the visualization looks correct, click the Alert tab at the bottom of the panel editor.

Step 2: Define Alert Conditions

The Alert tab allows you to define the conditions under which Grafana triggers an alert. There are two primary modes: Query and Expression. For most use cases, Query is preferred because it leverages your data sources native querying capabilities.

Under Alert condition, ensure When is set to Query A (or your selected query). Then choose:

Operator: >, =,
Value: The threshold number (e.g., 0.1 for 10%)
For: The duration the condition must persist before triggering (e.g., 5m)

For example:

Operator: >
Value: 0.1
For: 5m

This means: Trigger an alert if the rate of 5xx errors exceeds 10% for five consecutive minutes. The For clause is criticalit prevents false positives from transient spikes. A 30-second spike in errors is normal during deployments; a sustained 5-minute spike is a real incident.

Optionally, you can enable Evaluate every to control how often Grafana re-evaluates the condition (e.g., every 15s or 1m). This should align with your data sources scrape interval. For Prometheus, 15s1m is typical.

Step 3: Configure Alert Notifications

Alerts are useless if no one receives them. Grafana uses Notification Channels to deliver alerts to external systems.

To set up a notification channel:

Click the Alerting menu in the left sidebar.
Select Notification channels.
Click Add channel.

Choose your notification type:

Email: Requires SMTP configuration in grafana.ini
Slack: Requires a Slack webhook URL
PagerDuty: Requires an integration key
Microsoft Teams: Requires a webhook URL from Teams channel
Webhook: Custom HTTP POST endpoint (e.g., for internal ticketing systems)
Telegram: Bot token and chat ID
VictorOps, Google Chat, SNS, and more

For Slack:

Go to your Slack workspace ? Apps ? Search for Incoming Webhooks ? Add to workspace
Create a new webhook, select a channel, and copy the URL
Paste it into Grafanas Slack channel configuration
Test the connection by clicking Send test notification

Once saved, the channel appears in the list. Return to your panels Alert tab and select this channel under Alert notifications. You can select multiple channelsfor example, email for on-call engineers and Slack for the entire DevOps team.

Step 4: Customize Alert Message and Labels

Grafana allows you to personalize the alert message using templating variables. This makes alerts more actionable and context-rich.

In the Alert tab, scroll to Message. Use the following variables:

{{ .Title }} Alert title
{{ .State }} Current state (e.g., Alerting, OK)
{{ .RuleUrl }} Direct link to the alert rule
{{ .Values }} Current metric value
{{ .Tags }} Labels attached to the metric

Example message:

? HIGH HTTP ERROR RATE DETECTED
Service: {{ .Tags.instance }}
Error Rate: {{ .Values }} (threshold: 0.1)
Duration: 5 minutes
Dashboard: {{ .RuleUrl }}
Check logs: https://loki.example.com/inspect

You can also add custom Labels to the alert rule (e.g., team=backend, severity=critical). These labels help route alerts to the right teams in external systems and are passed through to notification channels.

Step 5: Test the Alert

Before relying on your alert in production, test it. You can do this in two ways:

Simulate the condition: Temporarily increase the metric value using a test endpoint or script. For example, if youre monitoring HTTP errors, send a few 500 responses using curl or Postman.
Use the Test Rule button: In the Alert tab, click Test rule. Grafana evaluates the query against the current data and shows whether the condition would trigger.

If the alert triggers, check your notification channel (Slack/email/etc.) to confirm the message was received correctly.

Step 6: Enable Alerting in Grafana Settings

Ensure alerting is enabled in your Grafana configuration. For self-hosted instances, edit the grafana.ini file:

[alerting]
enabled = true

Also, if using email alerts, configure SMTP:

[smtp]
enabled = true
host = smtp.gmail.com:587
user = your-email@gmail.com
password = your-app-password
from_address = alerts@yourcompany.com

Restart Grafana after making changes to the config file.

Step 7: Manage and Organize Alerts

As your alerting setup grows, managing hundreds of rules becomes challenging. Use the following best practices:

Group alerts by dashboard or service (e.g., API Gateway Alerts, Database Health)
Use consistent naming: HighLatency-frontend-v1, DiskFull-db-prod
Tag alerts with metadata: team, environment, severity
Export alert rules as JSON or YAML using Grafanas API or UI (Alerting ? Export)
Version-control alert rules in Git alongside your infrastructure-as-code (IaC) files

To export all alert rules:

Go to Alerting ? Alert rules
Click Export
Download the JSON file

You can later import these rules into another Grafana instance using Import in the same menu.

Best Practices

1. Avoid Alert Fatigue with Smart Thresholds

Alert fatigue occurs when teams receive too many low-priority or false-positive alerts, leading to ignored notifications. To prevent this:

Use relative thresholds instead of static values. For example, alert when CPU usage exceeds 150% of the 7-day average, not when its above 80%.
Apply anomaly detection using machine learning tools like Prometheuss predict_linear() or Grafanas built-in anomaly detection (available in Grafana Cloud).
Set For durations to at least 510 minutes for non-critical alerts, and 12 minutes for critical ones.
Exclude known maintenance windows or scheduled jobs using alert annotations or external scheduling tools.

2. Prioritize Alert Severity

Not all alerts are created equal. Classify alerts into tiers:

Critical: System down, data loss, security breach (e.g., 99.9%+ error rate, disk full)
High: Degraded performance, increased latency, high memory usage
Medium: Resource utilization nearing limits, non-critical service slowdown
Low: Informational, e.g., deployment completed, backup started

Use labels like severity=critical and route them to different channels. Critical alerts go to on-call engineers via SMS or PagerDuty; low alerts go to a general Slack channel.

3. Use Annotations for Context

Annotations provide additional context to alerts without triggering them. Add annotations for:

Deployment timestamps
Incident runbooks or troubleshooting guides
Links to dashboards or logs
Root cause hypotheses

In the alert rule editor, under Annotations, add key-value pairs:

runbook: https://wiki.yourcompany.com/runbooks/http-5xx
dashboard: https://grafana.yourcompany.com/d/123/api-latency

These appear in the alert notification and help responders take action faster.

4. Integrate with Incident Management Tools

For production environments, integrate Grafana alerts with incident management platforms like PagerDuty, Opsgenie, or VictorOps. These tools provide:

Escalation policies (if no one responds in 15 minutes, notify the next person)
On-call scheduling
Alert deduplication
Incident timelines and post-mortem templates

To integrate with PagerDuty:

Log into PagerDuty ? Services ? Add Service ? Integration Type: Grafana
Copy the integration key
In Grafana, create a new notification channel ? PagerDuty ? Paste the key
Test and save

Now, every Grafana alert becomes a PagerDuty incident with full lifecycle tracking.

5. Monitor Alerting Health Itself

Alerts can fail silently. A misconfigured rule, a downed data source, or a broken webhook can render your alerting system useless.

Create a Grafana Alerting Health dashboard with panels that monitor:

Number of active alert rules
Number of alerts in Alerting state
Time since last alert was triggered
Notification channel status (e.g., webhook HTTP 500 errors)

Use the Prometheus metric grafana_alerting_rules and grafana_alerting_evaluations_total to track alerting health.

6. Automate Alert Rule Deployment

Manually creating alerts in the UI is error-prone and not scalable. Use Grafanas API or configuration files to automate alert rule deployment.

For example, use the Grafana HTTP API to create alert rules programmatically:

POST /api/alerts
Content-Type: application/json
{
"name": "High CPU Usage",
"condition": "A",
"data": [
{
"refId": "A",
"queryType": "random_walk",
"relativeTimeRange": {
"from": 600,
"to": 0
},
"datasourceUid": "Prometheus",
"model": {
"expr": "avg_over_time(node_cpu_seconds_total{mode!=\"idle\"}[5m]) > 0.8",
"legendFormat": "CPU Usage",
"range": true
}
}
],
"message": "CPU usage is above 80% for 5 minutes.",
"for": "5m",
"executionErrorState": "alerting",
"folderId": 1,
"labels": {
"team": "infrastructure",
"severity": "high"
},
"annotations": {
"runbook": "https://wiki.example.com/cpu-alert"
}
}

Integrate this into your CI/CD pipeline using tools like Terraform, Ansible, or Helm charts to ensure alert rules are versioned and deployed alongside application code.

7. Review and Retire Alerts Regularly

Alerts decay over time. Services are decommissioned, thresholds become outdated, and teams change.

Establish a quarterly alert review process:

Identify alerts that havent triggered in 90+ days
Check if the underlying metric still matters
Archive or delete obsolete rules
Update documentation and runbooks

Use Grafanas alert history to analyze which alerts are most usefuland which are noise.

Tools and Resources

Official Grafana Documentation

The authoritative source for all things Grafana:

Community Alert Rules and Templates

Start with proven alert rules from the community:

kube-prometheus Alert Rules Excellent for Kubernetes environments
Grafanas built-in alert rule examples
Grafana Dashboard Library Many include pre-configured alert rules

Third-Party Integrations

Enhance alerting with these tools:

PagerDuty Incident response and escalation
Opsgenie Advanced alert routing and on-call scheduling
VictorOps DevOps-focused incident management
Microsoft Teams Native integration via webhooks
Slack Team communication with threaded alerts
Webhook Connect to Jira, ServiceNow, or custom ticketing systems

Monitoring Tools to Pair With Grafana

For full observability, combine Grafana with:

Prometheus Time-series metrics collection
Loki Log aggregation
Tempo Distributed tracing
Node Exporter Server-level metrics
Blackbox Exporter HTTP/S, TCP, ICMP probes

Alerting Best Practice Checklists

Download or print these to ensure your alerting setup is robust:

Real Examples

Example 1: High HTTP Error Rate Alert

Scenario: Your web application serves 10M requests/day. A sudden spike in 5xx errors indicates a backend service failure.

Query:

rate(http_requests_total{job="web-app", status_code=~"5.."}[5m]) / rate(http_requests_total{job="web-app"}[5m]) > 0.05

This calculates the proportion of 5xx responses out of total requests over 5 minutes. If it exceeds 5%, trigger an alert.

For: 5 minutes

Notification: Slack + PagerDuty

Message:

? CRITICAL: HTTP 5xx Error Rate > 5%
Service: web-app
Environment: production
Current Rate: {{ .Values }}
Threshold: 5%
Duration: 5m
Dashboard: {{ .RuleUrl }}
Runbook: https://wiki.example.com/5xx-troubleshooting

Result: The on-call engineer is notified within 5 minutes, investigates the failing microservice, and rolls back a bad deployment within 12 minutes.

Example 2: Disk Space Exhaustion Alert

Scenario: A database server is running out of disk space. This could cause data loss or downtime.

Query (Prometheus + Node Exporter):

100 * (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) > 90

This calculates the percentage of disk space used on the root filesystem. If it exceeds 90%, alert.

For: 10 minutes (to allow for temporary log rotation or cache cleanup)

Notification: Email + Teams

Message:

?? HIGH DISK USAGE ON {{ .Tags.instance }}
Filesystem: {{ .Tags.mountpoint }}
Used: {{ .Values }}%
Threshold: 90%
Runbook: https://wiki.example.com/disk-space-clear
Commands: df -h | grep /; du -sh /var/log/*; journalctl --vacuum-size=100M

Result: The system administrator clears old logs and increases disk size before the server crashes.

Example 3: Latency Spike in API Gateway

Scenario: Your API gateways p99 latency spikes above 2 seconds, impacting user experience.

Query:

histogram_quantile(0.99, sum(rate(api_request_duration_seconds_bucket{service="gateway"}[5m])) by (le)) > 2

This uses a histogram to calculate the 99th percentile latency. If it exceeds 2 seconds, trigger an alert.

For: 3 minutes

Notification: Slack channel

api-alerts + PagerDuty

Message:

? API LATENCY SPIKE: p99 > 2s
Service: gateway
Current p99: {{ .Values }}s
Threshold: 2s
Duration: 3m
Dashboard: {{ .RuleUrl }}
Check: https://grafana.example.com/d/456/api-latency
Trace: https://tempo.example.com/search?service=gateway

Result: The team identifies a misconfigured caching layer and fixes it before users report issues.

FAQs

Can Grafana send alerts without a data source?

No. Grafana requires a connected data source (Prometheus, InfluxDB, Loki, etc.) to evaluate metrics and trigger alerts. You cannot create alerts based on static values or manual inputs.

How often does Grafana evaluate alert rules?

By default, Grafana evaluates alert rules every 15 seconds. You can change this in the alert rule settings under Evaluate every. Ensure this aligns with your data sources scrape interval (e.g., Prometheus scrapes every 15s60s).

Can I silence alerts during maintenance windows?

Yes. Use Grafanas Alert Silences feature. Go to Alerting ? Silences ? Create Silence. Set a time range and match rules by name, label, or tag. This temporarily disables matching alerts without deleting them.

Do alerts work if Grafana is down?

No. Grafana must be running to evaluate alert conditions and send notifications. For high availability, deploy Grafana in a clustered or replicated setup. Alternatively, use Prometheus Alertmanager for alert routingit can continue to send alerts even if Grafana is temporarily unavailable.

Can I create alerts based on logs?

Yes, if youre using Loki as your log data source. You can create alerts based on log line counts, error patterns, or specific message content using LogQL queries. For example:

count_over_time({job="api"} |= "ERROR" [5m]) > 10

This triggers an alert if more than 10 ERROR lines appear in 5 minutes.

Whats the difference between Grafana alerts and Prometheus Alertmanager?

Grafana alerts are evaluated and triggered within Grafana using its UI and API. Prometheus Alertmanager is a separate component that receives alerts from Prometheus servers and handles routing, silencing, and deduplication. Grafana can send alerts to Alertmanager, but Alertmanager is more robust for large-scale, enterprise alerting. Use Grafana alerts for simplicity and dashboards; use Alertmanager for scale and reliability.

Can I send alerts to mobile apps?

Yes. Use notification channels like PagerDuty, Pushover, or Telegram, which have mobile apps. Configure the channel in Grafana, and alerts will appear as push notifications on your phone.

Is there a limit to the number of alerts I can create?

Grafana doesnt enforce a hard limit, but performance degrades with thousands of alert rules. For large deployments (>500 rules), consider using Prometheus Alertmanager or Grafana Clouds managed alerting, which scales better.

Conclusion

Alerting is not a one-time setupits an ongoing discipline that requires careful design, continuous refinement, and active maintenance. Sending alerts with Grafana gives you the power to detect issues before they become incidents, reduce downtime, and improve system reliability across your entire infrastructure.

By following the step-by-step guide in this tutorial, youve learned how to create meaningful alert rules, integrate with modern notification platforms, avoid alert fatigue, and automate alert management at scale. Youve seen real-world examples of how alerts prevent outages and how best practices turn reactive monitoring into proactive resilience.

Remember: The goal of alerting isnt to notify you of every minor fluctuationits to notify you of the right things, at the right time, with the right context. Invest time in tuning your alerts. Review them quarterly. Automate their deployment. Link them to runbooks. And always ask: If this alert fires, will I know exactly what to do?

With Grafanas powerful alerting system and the strategies outlined here, youre no longer just watching metricsyoure defending your systems. And in todays digital world, thats not just an advantage. Its a necessity.

alex

How to Send Alerts With Grafana

How to Send Alerts With Grafana

Step-by-Step Guide

Prerequisites

Step 1: Create or Open a Dashboard

Step 2: Define Alert Conditions

Step 3: Configure Alert Notifications

Step 4: Customize Alert Message and Labels

Step 5: Test the Alert

Step 6: Enable Alerting in Grafana Settings

Step 7: Manage and Organize Alerts

Best Practices

1. Avoid Alert Fatigue with Smart Thresholds

2. Prioritize Alert Severity

3. Use Annotations for Context

4. Integrate with Incident Management Tools

5. Monitor Alerting Health Itself

6. Automate Alert Rule Deployment

7. Review and Retire Alerts Regularly

Tools and Resources

Official Grafana Documentation

Community Alert Rules and Templates

Third-Party Integrations

Monitoring Tools to Pair With Grafana

Alerting Best Practice Checklists

Real Examples

Example 1: High HTTP Error Rate Alert

Example 2: Disk Space Exhaustion Alert

Example 3: Latency Spike in API Gateway

api-alerts + PagerDuty

FAQs

Can Grafana send alerts without a data source?

How often does Grafana evaluate alert rules?

Can I silence alerts during maintenance windows?

Do alerts work if Grafana is down?

Can I create alerts based on logs?

Whats the difference between Grafana alerts and Prometheus Alertmanager?

Can I send alerts to mobile apps?

Is there a limit to the number of alerts I can create?

Conclusion

Related Posts

Popular Posts

Recommended Posts

Popular Tags