How to Backup Elasticsearch Data

How to Backup Elasticsearch Data Elasticsearch is a powerful, distributed search and analytics engine used by organizations worldwide to store, search, and analyze vast volumes of data in real time. From log aggregation and monitoring systems to e-commerce product catalogs and security information and event management (SIEM) platforms, Elasticsearch powers mission-critical applications. Yet, despi

Nov 10, 2025 - 12:09
Nov 10, 2025 - 12:09
 1

How to Backup Elasticsearch Data

Elasticsearch is a powerful, distributed search and analytics engine used by organizations worldwide to store, search, and analyze vast volumes of data in real time. From log aggregation and monitoring systems to e-commerce product catalogs and security information and event management (SIEM) platforms, Elasticsearch powers mission-critical applications. Yet, despite its robust architecture and high availability features, Elasticsearch is not immune to data loss. Hardware failures, misconfigurations, accidental deletions, software bugs, or even cyberattacks can lead to irreversible data loss. This is why implementing a reliable and automated backup strategy for Elasticsearch data is not optionalit is essential.

Backing up Elasticsearch data ensures business continuity, enables recovery from catastrophic failures, supports compliance with data retention policies, and provides a safety net during upgrades or migrations. Unlike traditional databases, Elasticsearchs distributed nature and dynamic indexing model require specialized backup approaches. A simple file copy is insufficient. You must understand snapshots, repositories, cluster state, and recovery procedures to safeguard your data effectively.

This comprehensive guide walks you through every aspect of backing up Elasticsearch datafrom foundational concepts to advanced automation techniques. Whether youre managing a small development cluster or a large-scale production environment, this tutorial will equip you with the knowledge and tools to implement a resilient backup strategy that protects your data and minimizes downtime.

Step-by-Step Guide

Understanding Elasticsearch Snapshots

At the core of Elasticsearchs backup mechanism lies the concept of snapshots. A snapshot is a point-in-time copy of one or more indices, along with the cluster state, stored in a shared repository. Snapshots are incremental by designonly changes since the last snapshot are saved, making subsequent backups faster and more storage-efficient.

Unlike full database dumps used in relational systems, Elasticsearch snapshots preserve the exact structure and metadata of your indices, including mappings, settings, and even aliases. This ensures that when you restore a snapshot, your data returns to its original state without requiring manual reconfiguration.

Before you begin, ensure your Elasticsearch cluster is healthy. Use the _cluster/health endpoint to verify the status:

GET /_cluster/health?pretty

The response should show a green status. If its yellow or red, resolve underlying issues (e.g., unassigned shards) before proceeding with backups.

Step 1: Choose a Repository Type

Elasticsearch supports multiple repository types for storing snapshots. The most common are:

  • Shared File System Ideal for on-premises deployments where a shared network drive (NFS, SMB) is accessible by all nodes.
  • Amazon S3 Best for cloud-native environments using AWS.
  • Azure Blob Storage For Azure-based deployments.
  • Google Cloud Storage For GCP environments.
  • HDFS For organizations using Hadoop Distributed File System.

For this guide, well focus on the shared file system and Amazon S3 repositories, as they cover the majority of use cases.

Step 2: Configure a File System Repository

To use a shared file system, you must first define a location accessible to all Elasticsearch nodes. This requires modifying the elasticsearch.yml configuration file on each node.

Add the following line to specify the snapshot directory:

path.repo: /mnt/elasticsearch/snapshots

Ensure the directory exists and has proper read/write permissions for the Elasticsearch process. On Linux, you can create and set permissions with:

sudo mkdir -p /mnt/elasticsearch/snapshots

sudo chown -R elasticsearch:elasticsearch /mnt/elasticsearch/snapshots

sudo chmod -R 755 /mnt/elasticsearch/snapshots

Restart Elasticsearch on all nodes after making this change:

sudo systemctl restart elasticsearch

Once the cluster is back online, register the repository using the REST API:

PUT /_snapshot/my_filesystem_repo

{

"type": "fs",

"settings": {

"location": "/mnt/elasticsearch/snapshots",

"compress": true,

"max_snapshot_bytes_per_sec": "50mb",

"max_restore_bytes_per_sec": "50mb"

}

}

The compress setting enables compression of snapshot data, reducing storage usage. The max_snapshot_bytes_per_sec and max_restore_bytes_per_sec settings throttle network and disk I/O to prevent performance degradation during backup or restore operations.

Step 3: Configure an S3 Repository

If youre using AWS, the S3 repository is the preferred choice. First, install the S3 repository plugin on each node:

bin/elasticsearch-plugin install repository-s3

Restart Elasticsearch after installation.

Next, configure AWS credentials. You can use one of the following methods:

  • Environment variables: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  • Instance profile (recommended for EC2): Assign an IAM role to your EC2 instance with S3 read/write permissions
  • Explicit credentials in the repository configuration

For explicit credentials (use only in secure environments):

PUT /_snapshot/my_s3_repo

{

"type": "s3",

"settings": {

"bucket": "my-elasticsearch-backups",

"region": "us-west-2",

"access_key": "YOUR_ACCESS_KEY",

"secret_key": "YOUR_SECRET_KEY",

"compress": true,

"base_path": "snapshots/",

"max_snapshot_bytes_per_sec": "50mb",

"max_restore_bytes_per_sec": "50mb"

}

}

Ensure your S3 bucket exists and has a policy allowing Elasticsearch to write objects. Example bucket policy:

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam::123456789012:root"

},

"Action": [

"s3:PutObject",

"s3:GetObject",

"s3:DeleteObject"

],

"Resource": "arn:aws:s3:::my-elasticsearch-backups/*"

}

]

}

Step 4: Create Your First Snapshot

Now that your repository is registered, you can create your first snapshot. To back up all indices:

PUT /_snapshot/my_filesystem_repo/snapshot_1?wait_for_completion=true

The wait_for_completion=true parameter makes the request block until the snapshot is complete. For large clusters, this may take minutes or hours. For production environments, omit this parameter and monitor progress separately.

To back up only specific indices:

PUT /_snapshot/my_filesystem_repo/snapshot_2

{

"indices": "logstash-2024.04.01,logstash-2024.04.02",

"ignore_unavailable": true,

"include_global_state": false

}

The ignore_unavailable setting prevents the snapshot from failing if one or more specified indices dont exist. The include_global_state setting determines whether cluster-wide metadata (e.g., templates, ingest pipelines, security roles) is included. For most use cases, leave this as true to ensure full recoverability.

Step 5: Monitor Snapshot Status

To check the status of all snapshots in a repository:

GET /_snapshot/my_filesystem_repo/_all

To view details of a specific snapshot:

GET /_snapshot/my_filesystem_repo/snapshot_1

To monitor ongoing snapshot operations:

GET /_snapshot/_status

This returns information such as the number of files processed, bytes transferred, and estimated time remaining.

Step 6: Restore from a Snapshot

Restoring data is just as straightforward. To restore all indices from a snapshot:

POST /_snapshot/my_filesystem_repo/snapshot_1/_restore

To restore only specific indices and rename them during restore:

POST /_snapshot/my_filesystem_repo/snapshot_1/_restore

{

"indices": "logstash-2024.04.01",

"rename_pattern": "logstash-(.+)",

"rename_replacement": "restored_logstash-$1"

}

During restoration, Elasticsearch will create new indices with the specified names. Ensure that the target indices do not already exist, or set include_global_state to false if you want to avoid conflicts with existing cluster settings.

Step 7: Automate Snapshots with Curator or Watcher

Manual snapshots are impractical for production environments. Automate the process using Elasticsearch Curator or Watcher.

Using Elasticsearch Curator (CLI tool):

Install Curator:

pip install elasticsearch-curator

Create a configuration file curator.yml:

client:

hosts:

- 127.0.0.1

port: 9200

url_prefix:

use_ssl: false

certificate:

client_cert:

client_key:

ssl_no_validate: false

http_auth:

timeout: 30

master_only: false

logging:

loglevel: INFO

logfile:

logformat: default

blacklist: ['elasticsearch', 'urllib3']

Create an action file snapshot-action.yml:


actions:

1:

action: snapshot

description: "Create daily snapshot of all indices"

options:

repository: my_filesystem_repo

name: 'snapshot-%Y.%m.%d-%H.%M.%S'

ignore_unavailable: false

include_global_state: true

partial: false

wait_for_completion: true

skip_repo_fs_check: false

filters:

- filtertype: none

Run the snapshot daily via cron:

0 2 * * * /usr/bin/curator --config /etc/curator/curator.yml /etc/curator/snapshot-action.yml

This creates a snapshot every day at 2 AM.

Using Elasticsearch Watcher (for licensed clusters):

Watcher allows you to trigger snapshots based on conditions. Example watch:

PUT _watcher/watch/daily_snapshot

{

"trigger": {

"schedule": {

"cron": "0 0 2 * * ?"

}

},

"input": {

"simple": {}

},

"actions": {

"create_snapshot": {

"snapshot": {

"repository": "my_filesystem_repo",

"snapshot": "daily_snapshot_{{now}}",

"ignore_unavailable": true,

"include_global_state": true

}

}

}

}

Best Practices

1. Schedule Regular Snapshots

Establish a consistent backup schedule based on your data volatility and recovery point objectives (RPO). For high-traffic systems, daily snapshots are standard. For mission-critical systems with real-time data ingestion, consider hourly snapshots. Avoid backing up too frequentlythis can strain disk I/O and network bandwidth.

2. Retain Multiple Versions

Dont overwrite snapshots. Retain at least 7 daily snapshots, 4 weekly, and 12 monthly. Use Curators delete_snapshots action to automatically prune old snapshots:


actions:

1:

action: delete_snapshots

description: "Delete snapshots older than 30 days"

options:

repository: my_filesystem_repo

ignore_unavailable: true

delete_bootstrapped: false

filters:

- filtertype: age

source: creation_date

direction: older

unit: days

unit_count: 30

3. Test Restores Regularly

A backup is only as good as its restore. Schedule quarterly restore drills in a non-production environment. Verify that:

  • All indices are restored correctly
  • Mappings and settings match the original
  • Queries return expected results
  • Aliases and ingest pipelines are functional

Document the restore procedure and train at least two team members to execute it under pressure.

4. Use Separate Repositories for Different Purposes

Separate snapshots by use case:

  • One repository for daily operational backups
  • Another for pre-upgrade snapshots
  • A third for compliance/archival snapshots (long-term retention)

This improves organization, access control, and retention policy enforcement.

5. Enable Compression and Throttling

Always enable compress: true in your repository settings. It reduces storage costs by 3070% depending on data type. Use max_snapshot_bytes_per_sec and max_restore_bytes_per_sec to limit bandwidth usage during backups and restores, especially in shared infrastructure environments.

6. Monitor Snapshot Health

Set up alerts for failed snapshots. Use Elasticsearchs monitoring features or integrate with Prometheus and Grafana to track:

  • Number of successful vs. failed snapshots
  • Snapshot duration
  • Storage consumption trends
  • Repository availability

Failure to detect a failed snapshot can lead to false confidence in data protection.

7. Avoid Snapshots During High Load

Snapshot operations consume CPU, memory, and I/O. Schedule them during off-peak hours. Use the _cluster/allocation/exclude API to temporarily move shards away from nodes undergoing backup if necessary.

8. Secure Your Repositories

Repository locations must be protected. For file systems, restrict access using OS-level permissions. For cloud storage, use IAM policies, bucket encryption, and versioning. Never store snapshots in publicly accessible locations.

9. Document Your Backup Strategy

Create and maintain a runbook that includes:

  • Repository configurations
  • Schedule and automation scripts
  • Restore procedures
  • Contact list for escalation
  • Known issues and workarounds

Store this documentation in a version-controlled repository (e.g., Git) and update it after every change.

10. Plan for Cross-Cluster Recovery

Test restoring snapshots into a different cluster version. Elasticsearch supports restoring snapshots from older versions to newer ones (e.g., 7.x ? 8.x), but not vice versa. Always verify compatibility before upgrading.

Tools and Resources

Elasticsearch Built-in Tools

  • Snapshot and Restore API The native interface for creating, listing, and restoring snapshots.
  • Cluster Health API Monitor cluster state before and after backup operations.
  • Snapshot Status API Track progress and errors during snapshot creation or restoration.
  • Watcher Built-in automation tool for licensed users (requires Elasticsearch Platinum or higher).

Third-Party Tools

  • Elasticsearch Curator A Python-based CLI tool for managing indices and snapshots. Ideal for automation and scheduling. Available on GitHub: https://github.com/elastic/curator
  • OpenSearch Dashboard (for OpenSearch users) If youve migrated to OpenSearch, use its snapshot management UI.
  • Velero (Kubernetes) For clusters running on Kubernetes, Velero can back up persistent volumes that store Elasticsearch data. Works with S3, Azure, and GCS.
  • Portworx / Rook Storage orchestration tools for Kubernetes that offer snapshot capabilities at the volume level.
  • Logstash + Filebeat + S3 For log data, consider archiving raw logs to S3 using Filebeat and then using S3 lifecycle policies. This complements but does not replace Elasticsearch snapshots.

Monitoring and Alerting

  • Prometheus + Elasticsearch Exporter Export snapshot metrics like elasticsearch_snapshot_count and elasticsearch_snapshot_duration_seconds.
  • Grafana Visualize snapshot trends and set up dashboards for operational visibility.
  • PagerDuty / Opsgenie Integrate alerts for failed snapshots or repository unavailability.
  • Elastic Observability If youre on Elastic Stack, use the built-in monitoring UI to track snapshot health.

Documentation and Community

Real Examples

Example 1: E-Commerce Platform with Daily Product Catalog Backups

A global e-commerce company runs Elasticsearch to power its product search and filtering engine. The product catalog updates hourly with new items, prices, and availability. The company requires a 24-hour RPO and 4-hour RTO.

Implementation:

  • Repository: Amazon S3 bucket named prod-ecommerce-backups
  • Snapshot frequency: Hourly (12 AM to 11 PM)
  • Indices backed up: products-*, inventory-*
  • Retention: 30 days
  • Automation: Curator scheduled via cron on a dedicated backup node
  • Restore test: Quarterly, using a cloned staging cluster

Result: After a misconfiguration caused 8 hours of catalog data loss, the team restored from the most recent snapshot. Full service was restored in 22 minutes, meeting RTO. No customer-facing downtime occurred.

Example 2: Log Aggregation for Financial Services

A bank uses Elasticsearch to centralize application and security logs. Logs are ingested via Filebeat from hundreds of servers. Due to regulatory requirements, logs must be retained for 7 years.

Implementation:

  • Repository: Shared NFS mounted on all Elasticsearch nodes
  • Snapshot frequency: Daily at 2 AM
  • Indices: Monthly rolling indices (e.g., logs-2024-04)
  • Retention: 365 days for active snapshots, then archived to cold storage
  • Archive process: After 1 year, snapshots are copied to AWS Glacier using S3 lifecycle policies
  • Monitoring: Prometheus alerts if snapshot fails 2 consecutive times

Result: During a forensic investigation into a data breach, analysts restored logs from a snapshot taken 11 months prior. The evidence was critical in identifying the attack vector and meeting compliance audit requirements.

Example 3: IoT Sensor Data with High Ingest Rate

An industrial IoT provider collects sensor data from 50,000 devices every 10 seconds. The data is stored in time-series indices with a 30-day retention. The cluster runs on-premises with 12 nodes.

Implementation:

  • Repository: Local SSD array with RAID 10, mirrored to a remote data center via rsync
  • Snapshot frequency: Every 6 hours
  • Indices: sensors-YYYY.MM.DD-HH
  • Compression: Enabled
  • Throttling: max_snapshot_bytes_per_sec: 100mb to avoid impacting ingestion
  • Rolling deletion: Delete snapshots older than 30 days using Curator

Result: A power outage corrupted the primary data directory. The team restored from the most recent 6-hour snapshot. Data loss was limited to 6 hours instead of potentially days. The system resumed normal operations within 40 minutes.

FAQs

Can I backup Elasticsearch while its running?

Yes. Elasticsearch snapshots are designed to be taken while the cluster is active. The process is non-disruptive and does not require downtime. However, heavy snapshot activity during peak ingestion periods may impact performance. Schedule snapshots during low-traffic windows.

Do snapshots include all data in the cluster?

By default, snapshots include the cluster state and all indices. You can limit them to specific indices using the indices parameter. The cluster state includes templates, ingest pipelines, and security rolesif you want to preserve these, keep include_global_state: true.

Can I restore a snapshot to a different Elasticsearch version?

You can restore snapshots from older versions to newer ones (e.g., 7.10 ? 8.10), but not the reverse. Always check the official compatibility matrix before upgrading. For major version upgrades, take a snapshot immediately before the upgrade.

Are snapshots stored in the same location as the data?

No. Snapshots are stored in a separate repositoryeither a shared file system or cloud storage. This ensures that even if your primary data directory is corrupted, your backups remain intact.

How much storage do snapshots require?

Snapshots are incremental. The first snapshot of an index is a full copy. Subsequent snapshots store only the changes. Compression reduces size further. As a rule of thumb, expect 2040% of your total index size for daily snapshots over 30 days.

What happens if a snapshot fails?

If a snapshot fails, Elasticsearch marks it as FAILED. You can retry it. Failed snapshots do not corrupt existing data or other snapshots. Use the _snapshot/_status API to identify which shards failed and investigate node-level issues (e.g., disk full, network timeout).

Can I backup only the cluster state without indices?

Yes. Set "indices": "_all" and "include_global_state": true, but exclude all indices by name. Alternatively, use the GET /_cluster/state API to export cluster metadata manually. However, this is not a substitute for index snapshots and should be used only for configuration backup.

Is it safe to delete old snapshots?

Yes, but only after confirming you have a working restore. Use the Curator tool or the DELETE API to remove snapshots. Elasticsearch automatically cleans up orphaned files in the repository. Never delete snapshot files manually from the filesystem or S3 bucket.

Do I need to backup the entire cluster or just indices?

For full recoverability, back up both. Indices contain your data; the cluster state contains mappings, templates, and security policies. If you restore indices without the cluster state, you may need to manually recreate settings and roles.

How do I know if my backup is working?

Run a test restore at least quarterly. Check that:

  • All indices appear
  • Document counts match
  • Queries return correct results
  • Aliases and pipelines function

Also monitor snapshot success rates in your alerting system. A 100% success rate over 30 days is a good indicator.

Conclusion

Backing up Elasticsearch data is not a one-time taskits an ongoing discipline that requires planning, automation, and regular validation. The stakes are high: without reliable snapshots, you risk losing critical business data, violating compliance mandates, or suffering extended downtime during outages.

This guide has provided a complete roadmapfrom choosing the right repository type and configuring secure storage to automating backups with Curator and testing restores under realistic conditions. You now understand how snapshots work, why theyre superior to traditional backups, and how to implement them at scale.

Remember: the best backup strategy is the one youve tested. Dont wait for disaster to strike. Start smallcreate a single snapshot today. Then automate it. Then test the restore. Repeat monthly. Over time, youll build a resilient, trustworthy data protection system that gives your organization peace of mind.

Elasticsearch is powerfulbut like any tool, its reliability depends on how well you maintain it. With the practices outlined here, youre not just backing up data. Youre safeguarding your business continuity.