How to Backup Elasticsearch Data
How to Backup Elasticsearch Data Elasticsearch is a powerful, distributed search and analytics engine used by organizations worldwide to store, search, and analyze vast volumes of data in real time. From log aggregation and monitoring systems to e-commerce product catalogs and security information and event management (SIEM) platforms, Elasticsearch powers mission-critical applications. Yet, despi
How to Backup Elasticsearch Data
Elasticsearch is a powerful, distributed search and analytics engine used by organizations worldwide to store, search, and analyze vast volumes of data in real time. From log aggregation and monitoring systems to e-commerce product catalogs and security information and event management (SIEM) platforms, Elasticsearch powers mission-critical applications. Yet, despite its robust architecture and high availability features, Elasticsearch is not immune to data loss. Hardware failures, misconfigurations, accidental deletions, software bugs, or even cyberattacks can lead to irreversible data loss. This is why implementing a reliable and automated backup strategy for Elasticsearch data is not optionalit is essential.
Backing up Elasticsearch data ensures business continuity, enables recovery from catastrophic failures, supports compliance with data retention policies, and provides a safety net during upgrades or migrations. Unlike traditional databases, Elasticsearchs distributed nature and dynamic indexing model require specialized backup approaches. A simple file copy is insufficient. You must understand snapshots, repositories, cluster state, and recovery procedures to safeguard your data effectively.
This comprehensive guide walks you through every aspect of backing up Elasticsearch datafrom foundational concepts to advanced automation techniques. Whether youre managing a small development cluster or a large-scale production environment, this tutorial will equip you with the knowledge and tools to implement a resilient backup strategy that protects your data and minimizes downtime.
Step-by-Step Guide
Understanding Elasticsearch Snapshots
At the core of Elasticsearchs backup mechanism lies the concept of snapshots. A snapshot is a point-in-time copy of one or more indices, along with the cluster state, stored in a shared repository. Snapshots are incremental by designonly changes since the last snapshot are saved, making subsequent backups faster and more storage-efficient.
Unlike full database dumps used in relational systems, Elasticsearch snapshots preserve the exact structure and metadata of your indices, including mappings, settings, and even aliases. This ensures that when you restore a snapshot, your data returns to its original state without requiring manual reconfiguration.
Before you begin, ensure your Elasticsearch cluster is healthy. Use the _cluster/health endpoint to verify the status:
GET /_cluster/health?pretty
The response should show a green status. If its yellow or red, resolve underlying issues (e.g., unassigned shards) before proceeding with backups.
Step 1: Choose a Repository Type
Elasticsearch supports multiple repository types for storing snapshots. The most common are:
- Shared File System Ideal for on-premises deployments where a shared network drive (NFS, SMB) is accessible by all nodes.
- Amazon S3 Best for cloud-native environments using AWS.
- Azure Blob Storage For Azure-based deployments.
- Google Cloud Storage For GCP environments.
- HDFS For organizations using Hadoop Distributed File System.
For this guide, well focus on the shared file system and Amazon S3 repositories, as they cover the majority of use cases.
Step 2: Configure a File System Repository
To use a shared file system, you must first define a location accessible to all Elasticsearch nodes. This requires modifying the elasticsearch.yml configuration file on each node.
Add the following line to specify the snapshot directory:
path.repo: /mnt/elasticsearch/snapshots
Ensure the directory exists and has proper read/write permissions for the Elasticsearch process. On Linux, you can create and set permissions with:
sudo mkdir -p /mnt/elasticsearch/snapshots
sudo chown -R elasticsearch:elasticsearch /mnt/elasticsearch/snapshots
sudo chmod -R 755 /mnt/elasticsearch/snapshots
Restart Elasticsearch on all nodes after making this change:
sudo systemctl restart elasticsearch
Once the cluster is back online, register the repository using the REST API:
PUT /_snapshot/my_filesystem_repo
{
"type": "fs",
"settings": {
"location": "/mnt/elasticsearch/snapshots",
"compress": true,
"max_snapshot_bytes_per_sec": "50mb",
"max_restore_bytes_per_sec": "50mb"
}
}
The compress setting enables compression of snapshot data, reducing storage usage. The max_snapshot_bytes_per_sec and max_restore_bytes_per_sec settings throttle network and disk I/O to prevent performance degradation during backup or restore operations.
Step 3: Configure an S3 Repository
If youre using AWS, the S3 repository is the preferred choice. First, install the S3 repository plugin on each node:
bin/elasticsearch-plugin install repository-s3
Restart Elasticsearch after installation.
Next, configure AWS credentials. You can use one of the following methods:
- Environment variables:
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY - Instance profile (recommended for EC2): Assign an IAM role to your EC2 instance with S3 read/write permissions
- Explicit credentials in the repository configuration
For explicit credentials (use only in secure environments):
PUT /_snapshot/my_s3_repo
{
"type": "s3",
"settings": {
"bucket": "my-elasticsearch-backups",
"region": "us-west-2",
"access_key": "YOUR_ACCESS_KEY",
"secret_key": "YOUR_SECRET_KEY",
"compress": true,
"base_path": "snapshots/",
"max_snapshot_bytes_per_sec": "50mb",
"max_restore_bytes_per_sec": "50mb"
}
}
Ensure your S3 bucket exists and has a policy allowing Elasticsearch to write objects. Example bucket policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:root"
},
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::my-elasticsearch-backups/*"
}
]
}
Step 4: Create Your First Snapshot
Now that your repository is registered, you can create your first snapshot. To back up all indices:
PUT /_snapshot/my_filesystem_repo/snapshot_1?wait_for_completion=true
The wait_for_completion=true parameter makes the request block until the snapshot is complete. For large clusters, this may take minutes or hours. For production environments, omit this parameter and monitor progress separately.
To back up only specific indices:
PUT /_snapshot/my_filesystem_repo/snapshot_2
{
"indices": "logstash-2024.04.01,logstash-2024.04.02",
"ignore_unavailable": true,
"include_global_state": false
}
The ignore_unavailable setting prevents the snapshot from failing if one or more specified indices dont exist. The include_global_state setting determines whether cluster-wide metadata (e.g., templates, ingest pipelines, security roles) is included. For most use cases, leave this as true to ensure full recoverability.
Step 5: Monitor Snapshot Status
To check the status of all snapshots in a repository:
GET /_snapshot/my_filesystem_repo/_all
To view details of a specific snapshot:
GET /_snapshot/my_filesystem_repo/snapshot_1
To monitor ongoing snapshot operations:
GET /_snapshot/_status
This returns information such as the number of files processed, bytes transferred, and estimated time remaining.
Step 6: Restore from a Snapshot
Restoring data is just as straightforward. To restore all indices from a snapshot:
POST /_snapshot/my_filesystem_repo/snapshot_1/_restore
To restore only specific indices and rename them during restore:
POST /_snapshot/my_filesystem_repo/snapshot_1/_restore
{
"indices": "logstash-2024.04.01",
"rename_pattern": "logstash-(.+)",
"rename_replacement": "restored_logstash-$1"
}
During restoration, Elasticsearch will create new indices with the specified names. Ensure that the target indices do not already exist, or set include_global_state to false if you want to avoid conflicts with existing cluster settings.
Step 7: Automate Snapshots with Curator or Watcher
Manual snapshots are impractical for production environments. Automate the process using Elasticsearch Curator or Watcher.
Using Elasticsearch Curator (CLI tool):
Install Curator:
pip install elasticsearch-curator
Create a configuration file curator.yml:
client:
hosts:
- 127.0.0.1
port: 9200
url_prefix:
use_ssl: false
certificate:
client_cert:
client_key:
ssl_no_validate: false
http_auth:
timeout: 30
master_only: false
logging:
loglevel: INFO
logfile:
logformat: default
blacklist: ['elasticsearch', 'urllib3']
Create an action file snapshot-action.yml:
actions:
1:
action: snapshot
description: "Create daily snapshot of all indices"
options:
repository: my_filesystem_repo
name: 'snapshot-%Y.%m.%d-%H.%M.%S'
ignore_unavailable: false
include_global_state: true
partial: false
wait_for_completion: true
skip_repo_fs_check: false
filters:
- filtertype: none
Run the snapshot daily via cron:
0 2 * * * /usr/bin/curator --config /etc/curator/curator.yml /etc/curator/snapshot-action.yml
This creates a snapshot every day at 2 AM.
Using Elasticsearch Watcher (for licensed clusters):
Watcher allows you to trigger snapshots based on conditions. Example watch:
PUT _watcher/watch/daily_snapshot
{
"trigger": {
"schedule": {
"cron": "0 0 2 * * ?"
}
},
"input": {
"simple": {}
},
"actions": {
"create_snapshot": {
"snapshot": {
"repository": "my_filesystem_repo",
"snapshot": "daily_snapshot_{{now}}",
"ignore_unavailable": true,
"include_global_state": true
}
}
}
}
Best Practices
1. Schedule Regular Snapshots
Establish a consistent backup schedule based on your data volatility and recovery point objectives (RPO). For high-traffic systems, daily snapshots are standard. For mission-critical systems with real-time data ingestion, consider hourly snapshots. Avoid backing up too frequentlythis can strain disk I/O and network bandwidth.
2. Retain Multiple Versions
Dont overwrite snapshots. Retain at least 7 daily snapshots, 4 weekly, and 12 monthly. Use Curators delete_snapshots action to automatically prune old snapshots:
actions:
1:
action: delete_snapshots
description: "Delete snapshots older than 30 days"
options:
repository: my_filesystem_repo
ignore_unavailable: true
delete_bootstrapped: false
filters:
- filtertype: age
source: creation_date
direction: older
unit: days
unit_count: 30
3. Test Restores Regularly
A backup is only as good as its restore. Schedule quarterly restore drills in a non-production environment. Verify that:
- All indices are restored correctly
- Mappings and settings match the original
- Queries return expected results
- Aliases and ingest pipelines are functional
Document the restore procedure and train at least two team members to execute it under pressure.
4. Use Separate Repositories for Different Purposes
Separate snapshots by use case:
- One repository for daily operational backups
- Another for pre-upgrade snapshots
- A third for compliance/archival snapshots (long-term retention)
This improves organization, access control, and retention policy enforcement.
5. Enable Compression and Throttling
Always enable compress: true in your repository settings. It reduces storage costs by 3070% depending on data type. Use max_snapshot_bytes_per_sec and max_restore_bytes_per_sec to limit bandwidth usage during backups and restores, especially in shared infrastructure environments.
6. Monitor Snapshot Health
Set up alerts for failed snapshots. Use Elasticsearchs monitoring features or integrate with Prometheus and Grafana to track:
- Number of successful vs. failed snapshots
- Snapshot duration
- Storage consumption trends
- Repository availability
Failure to detect a failed snapshot can lead to false confidence in data protection.
7. Avoid Snapshots During High Load
Snapshot operations consume CPU, memory, and I/O. Schedule them during off-peak hours. Use the _cluster/allocation/exclude API to temporarily move shards away from nodes undergoing backup if necessary.
8. Secure Your Repositories
Repository locations must be protected. For file systems, restrict access using OS-level permissions. For cloud storage, use IAM policies, bucket encryption, and versioning. Never store snapshots in publicly accessible locations.
9. Document Your Backup Strategy
Create and maintain a runbook that includes:
- Repository configurations
- Schedule and automation scripts
- Restore procedures
- Contact list for escalation
- Known issues and workarounds
Store this documentation in a version-controlled repository (e.g., Git) and update it after every change.
10. Plan for Cross-Cluster Recovery
Test restoring snapshots into a different cluster version. Elasticsearch supports restoring snapshots from older versions to newer ones (e.g., 7.x ? 8.x), but not vice versa. Always verify compatibility before upgrading.
Tools and Resources
Elasticsearch Built-in Tools
- Snapshot and Restore API The native interface for creating, listing, and restoring snapshots.
- Cluster Health API Monitor cluster state before and after backup operations.
- Snapshot Status API Track progress and errors during snapshot creation or restoration.
- Watcher Built-in automation tool for licensed users (requires Elasticsearch Platinum or higher).
Third-Party Tools
- Elasticsearch Curator A Python-based CLI tool for managing indices and snapshots. Ideal for automation and scheduling. Available on GitHub: https://github.com/elastic/curator
- OpenSearch Dashboard (for OpenSearch users) If youve migrated to OpenSearch, use its snapshot management UI.
- Velero (Kubernetes) For clusters running on Kubernetes, Velero can back up persistent volumes that store Elasticsearch data. Works with S3, Azure, and GCS.
- Portworx / Rook Storage orchestration tools for Kubernetes that offer snapshot capabilities at the volume level.
- Logstash + Filebeat + S3 For log data, consider archiving raw logs to S3 using Filebeat and then using S3 lifecycle policies. This complements but does not replace Elasticsearch snapshots.
Monitoring and Alerting
- Prometheus + Elasticsearch Exporter Export snapshot metrics like
elasticsearch_snapshot_countandelasticsearch_snapshot_duration_seconds. - Grafana Visualize snapshot trends and set up dashboards for operational visibility.
- PagerDuty / Opsgenie Integrate alerts for failed snapshots or repository unavailability.
- Elastic Observability If youre on Elastic Stack, use the built-in monitoring UI to track snapshot health.
Documentation and Community
- Official Elasticsearch Snapshot Documentation
- Elastic Community Forum Ask questions and share experiences with other users.
- Curator GitHub Repository Source code, issues, and examples.
- Elastic Blog: Snapshot and Restore Deep Dive
Real Examples
Example 1: E-Commerce Platform with Daily Product Catalog Backups
A global e-commerce company runs Elasticsearch to power its product search and filtering engine. The product catalog updates hourly with new items, prices, and availability. The company requires a 24-hour RPO and 4-hour RTO.
Implementation:
- Repository: Amazon S3 bucket named
prod-ecommerce-backups - Snapshot frequency: Hourly (12 AM to 11 PM)
- Indices backed up:
products-*,inventory-* - Retention: 30 days
- Automation: Curator scheduled via cron on a dedicated backup node
- Restore test: Quarterly, using a cloned staging cluster
Result: After a misconfiguration caused 8 hours of catalog data loss, the team restored from the most recent snapshot. Full service was restored in 22 minutes, meeting RTO. No customer-facing downtime occurred.
Example 2: Log Aggregation for Financial Services
A bank uses Elasticsearch to centralize application and security logs. Logs are ingested via Filebeat from hundreds of servers. Due to regulatory requirements, logs must be retained for 7 years.
Implementation:
- Repository: Shared NFS mounted on all Elasticsearch nodes
- Snapshot frequency: Daily at 2 AM
- Indices: Monthly rolling indices (e.g.,
logs-2024-04) - Retention: 365 days for active snapshots, then archived to cold storage
- Archive process: After 1 year, snapshots are copied to AWS Glacier using S3 lifecycle policies
- Monitoring: Prometheus alerts if snapshot fails 2 consecutive times
Result: During a forensic investigation into a data breach, analysts restored logs from a snapshot taken 11 months prior. The evidence was critical in identifying the attack vector and meeting compliance audit requirements.
Example 3: IoT Sensor Data with High Ingest Rate
An industrial IoT provider collects sensor data from 50,000 devices every 10 seconds. The data is stored in time-series indices with a 30-day retention. The cluster runs on-premises with 12 nodes.
Implementation:
- Repository: Local SSD array with RAID 10, mirrored to a remote data center via rsync
- Snapshot frequency: Every 6 hours
- Indices:
sensors-YYYY.MM.DD-HH - Compression: Enabled
- Throttling:
max_snapshot_bytes_per_sec: 100mbto avoid impacting ingestion - Rolling deletion: Delete snapshots older than 30 days using Curator
Result: A power outage corrupted the primary data directory. The team restored from the most recent 6-hour snapshot. Data loss was limited to 6 hours instead of potentially days. The system resumed normal operations within 40 minutes.
FAQs
Can I backup Elasticsearch while its running?
Yes. Elasticsearch snapshots are designed to be taken while the cluster is active. The process is non-disruptive and does not require downtime. However, heavy snapshot activity during peak ingestion periods may impact performance. Schedule snapshots during low-traffic windows.
Do snapshots include all data in the cluster?
By default, snapshots include the cluster state and all indices. You can limit them to specific indices using the indices parameter. The cluster state includes templates, ingest pipelines, and security rolesif you want to preserve these, keep include_global_state: true.
Can I restore a snapshot to a different Elasticsearch version?
You can restore snapshots from older versions to newer ones (e.g., 7.10 ? 8.10), but not the reverse. Always check the official compatibility matrix before upgrading. For major version upgrades, take a snapshot immediately before the upgrade.
Are snapshots stored in the same location as the data?
No. Snapshots are stored in a separate repositoryeither a shared file system or cloud storage. This ensures that even if your primary data directory is corrupted, your backups remain intact.
How much storage do snapshots require?
Snapshots are incremental. The first snapshot of an index is a full copy. Subsequent snapshots store only the changes. Compression reduces size further. As a rule of thumb, expect 2040% of your total index size for daily snapshots over 30 days.
What happens if a snapshot fails?
If a snapshot fails, Elasticsearch marks it as FAILED. You can retry it. Failed snapshots do not corrupt existing data or other snapshots. Use the _snapshot/_status API to identify which shards failed and investigate node-level issues (e.g., disk full, network timeout).
Can I backup only the cluster state without indices?
Yes. Set "indices": "_all" and "include_global_state": true, but exclude all indices by name. Alternatively, use the GET /_cluster/state API to export cluster metadata manually. However, this is not a substitute for index snapshots and should be used only for configuration backup.
Is it safe to delete old snapshots?
Yes, but only after confirming you have a working restore. Use the Curator tool or the DELETE API to remove snapshots. Elasticsearch automatically cleans up orphaned files in the repository. Never delete snapshot files manually from the filesystem or S3 bucket.
Do I need to backup the entire cluster or just indices?
For full recoverability, back up both. Indices contain your data; the cluster state contains mappings, templates, and security policies. If you restore indices without the cluster state, you may need to manually recreate settings and roles.
How do I know if my backup is working?
Run a test restore at least quarterly. Check that:
- All indices appear
- Document counts match
- Queries return correct results
- Aliases and pipelines function
Also monitor snapshot success rates in your alerting system. A 100% success rate over 30 days is a good indicator.
Conclusion
Backing up Elasticsearch data is not a one-time taskits an ongoing discipline that requires planning, automation, and regular validation. The stakes are high: without reliable snapshots, you risk losing critical business data, violating compliance mandates, or suffering extended downtime during outages.
This guide has provided a complete roadmapfrom choosing the right repository type and configuring secure storage to automating backups with Curator and testing restores under realistic conditions. You now understand how snapshots work, why theyre superior to traditional backups, and how to implement them at scale.
Remember: the best backup strategy is the one youve tested. Dont wait for disaster to strike. Start smallcreate a single snapshot today. Then automate it. Then test the restore. Repeat monthly. Over time, youll build a resilient, trustworthy data protection system that gives your organization peace of mind.
Elasticsearch is powerfulbut like any tool, its reliability depends on how well you maintain it. With the practices outlined here, youre not just backing up data. Youre safeguarding your business continuity.