How to Scale Elasticsearch Nodes
How to Scale Elasticsearch Nodes Elasticsearch is a powerful, distributed search and analytics engine built on Apache Lucene. Its ability to handle massive volumes of data in near real-time makes it a cornerstone of modern search applications, logging systems, observability platforms, and business intelligence tools. However, as data volume, query complexity, and user demand grow, a single-node or
How to Scale Elasticsearch Nodes
Elasticsearch is a powerful, distributed search and analytics engine built on Apache Lucene. Its ability to handle massive volumes of data in near real-time makes it a cornerstone of modern search applications, logging systems, observability platforms, and business intelligence tools. However, as data volume, query complexity, and user demand grow, a single-node or small cluster can quickly become a bottleneck. Scaling Elasticsearch nodes effectively is not merely about adding more hardwareits a strategic process involving architecture design, resource allocation, data distribution, and performance tuning. Without proper scaling, you risk degraded search latency, node failures, inefficient resource utilization, and even data loss. This guide provides a comprehensive, step-by-step roadmap to scaling Elasticsearch nodes, grounded in real-world best practices, tooling insights, and operational experience.
Step-by-Step Guide
Assess Your Current Cluster Health and Bottlenecks
Before adding more nodes, you must understand why scaling is necessary. Blindly increasing node count without diagnosing the root cause can lead to wasted resources and even degraded performance. Use Elasticsearchs built-in monitoring APIs to evaluate your clusters health.
Start by checking the cluster health:
GET _cluster/health
Look for status values: green (all shards allocated), yellow (primary shards allocated, replicas not), or red (some primary shards unallocated). A persistent yellow or red status indicates underlying issues like insufficient disk space, memory pressure, or network instability.
Next, analyze node-level metrics:
GET _nodes/stats
Focus on:
- CPU usage Sustained usage above 7080% suggests compute bottlenecks.
- Heap memory usage Elasticsearch nodes should operate below 50% heap usage. Exceeding this increases garbage collection pressure and risk of OutOfMemoryError.
- Thread pools Check for rejected tasks in search, index, or bulk thread pools. Rejections indicate the cluster cannot keep up with demand.
- Indexing and search latency High latency in write or read operations signals scaling needs.
- Disk I/O and usage If disk utilization exceeds 85%, you risk node failure due to disk full errors.
Use the _cat/nodes API for a quick summary:
GET _cat/nodes?v&h=name,heap.percent,ram.percent,cpu,load_1m,store.size,ip
Identify whether the bottleneck is CPU-bound, memory-bound, I/O-bound, or network-bound. This determines your scaling strategywhether to add more data nodes, co-locate master nodes, or upgrade hardware.
Define Your Scaling Goals
Scaling must align with business objectives. Common goals include:
- Increased throughput Handle more concurrent searches or higher indexing rates.
- Lower latency Reduce P95 or P99 search response times.
- High availability Eliminate single points of failure by ensuring replica shards are distributed.
- Storage expansion Accommodate growing data volume without performance degradation.
- Geographic distribution Deploy nodes closer to users for lower network latency.
For example, if your application serves users across North America and Europe, consider deploying Elasticsearch clusters in multiple regions with cross-cluster replication (CCR) to serve local queries without cross-continent latency.
Plan Your Node Roles
Elasticsearch 7.0+ introduced dedicated node roles to improve stability and scalability. Each node can be assigned one or more roles:
- Master-eligible Participates in cluster state management and leader election. Only 35 nodes should be master-eligible to avoid split-brain scenarios.
- Data Stores shards and handles data-related operations (search, indexing, aggregation). These are the most commonly scaled nodes.
- Ingest Processes documents before indexing (e.g., parsing, enrichment). Useful for heavy preprocessing workloads.
- Coordinating Routes requests to data nodes and aggregates results. Can be deployed as dedicated nodes to offload coordination load from data nodes.
- ML (Machine Learning) Runs anomaly detection jobs. Requires significant memory and CPU.
Best practice: Use dedicated node roles in production. For example:
- 3 dedicated master-eligible nodes (small instance size)
- 612 dedicated data nodes (large instance size, high I/O)
- 24 dedicated ingest nodes (moderate CPU, sufficient RAM)
- 23 dedicated coordinating nodes (if search load is high)
Configure roles in elasticsearch.yml:
node.roles: [ master, data, ingest ]
Or for dedicated roles:
Master node
node.roles: [ master ]
Data node
node.roles: [ data ]
Ingest node
node.roles: [ ingest ]
Coordinating node
node.roles: []
Dedicated roles improve fault isolation and allow independent scaling of each function.
Choose the Right Instance Type
Cloud providers (AWS, Azure, GCP) and on-premises hardware offer varied instance types. Select based on your bottleneck:
- Memory-intensive workloads (e.g., aggregations, large result sets): Choose instances with high RAM-to-CPU ratios (e.g., AWS r6i.xlarge, Azure Standard_D8s_v5).
- Indexing-heavy workloads (e.g., log ingestion): Prioritize high I/O throughput. Use NVMe SSD instances (e.g., AWS i3.large, Azure Ls_v2).
- Search-heavy workloads (e.g., e-commerce product search): Optimize for CPU and RAM. Use balanced instances like AWS m6i.xlarge.
- Large datasets: Ensure sufficient local storage. Avoid EBS volumes for data nodes unless using provisioned IOPS; local SSDs offer superior performance.
Never use burstable instances (e.g., AWS t3) in productionthey cannot sustain performance under load.
Scale Data Nodes Horizontally
Horizontal scalingadding more data nodesis the most effective way to increase capacity. Elasticsearch automatically rebalances shards across nodes when new ones join the cluster.
Before adding nodes:
- Ensure your index has enough primary shards. A single-shard index cannot scale beyond one node.
- Verify replica count is at least 1 for high availability.
Example: If you have 10 data nodes and a 5-shard index with 1 replica, you have 10 shards total (5 primary + 5 replica). Adding a 11th node triggers shard relocation, distributing the load.
Use the _cat/shards API to monitor shard distribution:
GET _cat/shards?v&h=index,shard,prirep,state,docs,store,node
After adding a node, monitor the cluster for relocation status:
GET _cluster/allocation/explain
This reveals why shards are or arent being moved. Common reasons include disk usage thresholds, shard allocation awareness, or insufficient disk space.
Optimize Shard Allocation
Shards are the basic unit of distribution in Elasticsearch. Too many small shards increase overhead; too few large shards limit scalability.
General guidelines:
- Keep shard size between 1050 GB.
- Avoid shards larger than 100 GB.
- Limit total shards per node to 2025 per GB of heap (e.g., a 30GB heap node should have ? 600 shards).
Use index lifecycle management (ILM) to automatically roll over indices when they reach size or age thresholds:
PUT _ilm/policy/my_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50GB",
"max_age": "30d"
}
}
},
"warm": {
"actions": {
"allocate": {
"number_of_replicas": 1
}
}
},
"cold": {
"actions": {
"freeze": {}
}
}
}
}
}
Apply this policy to your index template:
PUT _index_template/my_template
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1,
"index.lifecycle.name": "my_policy"
}
}
}
ILM ensures indices are split appropriately, preventing shard explosion and enabling efficient scaling.
Adjust JVM Heap and OS Settings
Each nodes performance is heavily influenced by JVM configuration. The heap should be set to no more than 50% of available RAM, with a maximum of 31 GB (due to JVM compressed pointers).
Set heap size in jvm.options:
-Xms16g
-Xmx16g
For nodes with 64GB RAM, use 31GB heap. For 128GB RAM, still use 31GBdont exceed it.
Also configure OS settings:
- Set vm.max_map_count to at least 262144:
sysctl -w vm.max_map_count=262144
- Disable swap entirely:
swapoff -a
And ensure its disabled permanently in /etc/fstab.
- Use the deadline or noop I/O scheduler for SSDs:
echo deadline > /sys/block/nvme0n1/queue/scheduler
Apply these settings using configuration management tools (Ansible, Puppet, Terraform) to ensure consistency across all nodes.
Scale Coordinating and Ingest Nodes Separately
As search volume increases, data nodes can become overwhelmed with request aggregation. Offload this to dedicated coordinating nodes.
Deploy 23 coordinating nodes with moderate CPU and RAM (e.g., 816GB). Do not assign them data or master roles.
Configure load balancers (e.g., HAProxy, NGINX, or cloud load balancers) to distribute client traffic to coordinating nodes, which then forward requests to data nodes.
Similarly, if you perform heavy document transformation (e.g., parsing JSON, enriching with external data), deploy dedicated ingest nodes. These should have sufficient CPU and memory to handle pipeline processing without blocking indexing.
Implement Cross-Cluster Replication (CCR) for Multi-Region Scaling
For global applications, deploying a single cluster across regions introduces latency and network fragility. Instead, use Cross-Cluster Replication (CCR) to replicate indices from a primary cluster to one or more follower clusters in different regions.
Example: Primary cluster in us-east-1, follower cluster in eu-west-1.
Steps:
- Enable CCR on both clusters:
PUT _cluster/settings
{
"persistent": {
"cluster.remote.cluster_one.seeds": ["10.0.1.10:9300"]
}
}
- Create a follower index:
PUT /my_follower_index/_ccr/follow
{
"remote_cluster": "cluster_one",
"leader_index": "my_leader_index",
"auto_follow_pattern": {
"name": "logs_pattern",
"leader_index_patterns": ["logs-*"]
}
}
CCR enables low-latency local reads while maintaining data consistency. Use it for read-heavy, globally distributed applications.
Monitor Scaling Impact
After scaling, monitor key metrics for 2472 hours:
- Search latency (P50, P95, P99)
- Indexing throughput (docs/sec)
- Shard relocation speed
- Heap usage trend
- GC frequency and duration
Use Elasticsearchs built-in monitoring (via Kibana) or integrate with Prometheus and Grafana:
- Export metrics via elasticsearch_exporter
- Visualize with dashboards for node health, shard distribution, and thread pool utilization
Set up alerts for:
- Heap usage > 75%
- Cluster status = red
- Search latency > 2s for 5 minutes
- Shard relocation stalled for > 1 hour
Best Practices
Never Scale Without a Backup Strategy
Before scaling, ensure snapshots are configured and tested. Use repository snapshots to back up critical indices to S3, Azure Blob, or HDFS:
PUT _snapshot/my_backup
{
"type": "s3",
"settings": {
"bucket": "my-es-backups",
"region": "us-east-1"
}
}
Test restore procedures regularly. Scaling operations can failhaving a known-good snapshot ensures recovery.
Use Index Templates for Consistent Configuration
Manually configuring indices leads to inconsistency. Use index templates to enforce shard count, replica count, mappings, and ILM policies:
PUT _index_template/logs_template
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1,
"index.lifecycle.name": "logs_policy"
},
"mappings": {
"properties": {
"timestamp": { "type": "date" },
"message": { "type": "text" }
}
}
}
}
This ensures every new log index is created with optimal settings for scaling.
Avoid Over-Sharding
Many teams create 100+ shards per index assuming more is better. This is false. Each shard consumes memory, file handles, and CPU cycles. A cluster with 10,000 shards may appear healthy but will suffer from:
- Slow cluster state updates
- Increased memory pressure
- Longer recovery times after restarts
Stick to 1050 GB per shard. Use ILM to auto-rollover and avoid monolithic indices.
Use Allocation Awareness for Fault Tolerance
Ensure shards are distributed across availability zones (AZs) or physical racks. Configure allocation awareness in elasticsearch.yml:
cluster.routing.allocation.awareness.attributes: az
node.attr.az: us-east-1a
Then set cluster-level awareness:
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.awareness.force.az.values": ["us-east-1a","us-east-1b","us-east-1c"],
"cluster.routing.allocation.awareness.attr": "az"
}
}
This prevents all replicas from being allocated in the same AZ, protecting against zone failures.
Regularly Rebalance and Optimize
Over time, shard distribution becomes uneven due to node additions, failures, or disk pressure. Use the _cluster/reroute API to manually rebalance if needed:
POST _cluster/reroute
{
"commands": [
{
"move": {
"index": "logs-2024-01",
"shard": 2,
"from_node": "node-1",
"to_node": "node-7"
}
}
]
}
Also, periodically run _forcemerge on read-only indices to reduce segment count and improve search performance:
POST /logs-2023-12/_forcemerge?max_num_segments=1
Do not run this on active indicesit blocks writes.
Scale During Off-Peak Hours
Shard relocation consumes network bandwidth and disk I/O. Schedule node additions and major reconfigurations during low-traffic windows to minimize impact on end users.
Test Scaling in Staging First
Replicate your production topology in a staging environment. Load test with realistic data volumes and query patterns before applying changes to production.
Tools and Resources
Elasticsearch Built-in Tools
- Kibana Monitoring Real-time cluster metrics, alerts, and visualization.
- Elasticsearch Head Browser-based cluster explorer (community plugin).
- _cat APIs Lightweight, human-readable output for diagnostics.
- Index Lifecycle Management (ILM) Automate index rollover, deletion, and tiering.
- Cluster Allocation Explain API Diagnose why shards arent allocated.
Third-Party Monitoring Tools
- Prometheus + Elasticsearch Exporter Collect and scrape metrics for alerting and dashboards.
- Grafana Visualize Elasticsearch metrics with customizable dashboards.
- Datadog End-to-end observability with Elasticsearch integration.
- New Relic Application performance monitoring with Elasticsearch tracing.
- ELK Stack (Elasticsearch, Logstash, Kibana) Full-stack logging and monitoring.
Automation and Infrastructure Tools
- Terraform Provision Elasticsearch clusters on AWS, Azure, or GCP.
- Ansible Configure OS and Elasticsearch settings across nodes.
- Docker + Kubernetes Deploy Elasticsearch in containers using Helm charts (e.g., Elastics official Helm chart).
- Curator Automate index management (rollover, deletion, optimization).
Official Documentation and Communities
- Elasticsearch Official Documentation
- Elastic Community Forum
- GitHub Repository
- Elastic Blog Regular updates on scaling, performance, and new features
Real Examples
Example 1: E-Commerce Platform Scaling from 1M to 10M Daily Searches
A retail company experienced slow product search during peak hours. Their cluster had 3 nodes, each with 16GB RAM and 5 shards per index.
Diagnosis: Heap usage peaked at 85%, search latency exceeded 3s, and thread pools were rejecting requests.
Actions Taken:
- Added 6 dedicated data nodes (32GB RAM, NVMe SSD).
- Increased primary shards from 5 to 12 per index.
- Deployed 2 dedicated coordinating nodes.
- Implemented ILM to roll over indices at 30GB.
- Set shard size target to 25GB.
Result: Search latency dropped to 450ms, heap usage stabilized at 40%, and no more task rejections. Throughput increased 5x.
Example 2: Log Aggregation for 500+ Microservices
A fintech firm ingested logs from 500+ services, generating 2TB/day. Their cluster had 8 nodes, but shard count exceeded 8,000, causing instability.
Diagnosis: Cluster state updates took 10+ seconds. Master nodes were overloaded.
Actions Taken:
- Reduced shard count from 20 to 5 per daily log index.
- Implemented ILM to delete indices older than 90 days.
- Added 3 dedicated master nodes (8GB RAM, no data).
- Used ingest nodes to parse and enrich logs before indexing.
- Enabled index aliases for seamless rollover.
Result: Cluster state updates dropped to under 500ms. Stability improved dramatically. Storage costs reduced by 40% due to automated deletion.
Example 3: Global SaaS Application with Multi-Region Deployment
A SaaS company serving users in North America, Europe, and Asia had one cluster in us-west-2. Users in Asia experienced 2.5s search latency.
Actions Taken:
- Deployed a follower cluster in ap-northeast-1 (Tokyo).
- Configured CCR to replicate critical indices from us-west-2.
- Used DNS routing to direct Asian users to the Tokyo cluster.
- Set up cross-cluster search (CCS) for global queries when needed.
Result: Asian users saw sub-300ms search times. Global availability improved, and the primary clusters load decreased by 30%.
FAQs
How many nodes do I need to scale Elasticsearch?
Theres no fixed number. Start with 3 master-eligible nodes for stability. Add data nodes based on your data volume and query load. A typical medium-sized cluster has 612 data nodes. Large enterprises may run 50+ nodes.
Can I scale Elasticsearch vertically instead of horizontally?
You can, but horizontal scaling is preferred. Vertical scaling (upgrading node size) has limitshardware maxes out, and single-node failures cause full cluster disruption. Horizontal scaling offers better fault tolerance and incremental growth.
What happens if I add too many shards?
Too many shards increase memory usage, slow cluster state updates, and degrade performance. Elasticsearch recommends keeping total shards under 1,000 per node. Aim for 1050 GB per shard.
How do I know if my cluster is over-sharded?
Signs include slow cluster health checks, high JVM heap usage despite low data volume, and frequent shard relocation. Use _cat/shards and count shards per node. If average is over 100150 per node, youre likely over-sharded.
Should I use SSDs or HDDs for Elasticsearch nodes?
Always use SSDs in production. Elasticsearch is I/O-intensive. HDDs cause severe latency spikes, especially during merges and searches. NVMe SSDs are ideal for high-throughput environments.
Can I scale Elasticsearch without downtime?
Yes. Adding data nodes, adjusting replicas, and enabling ILM can be done live. Avoid modifying master-eligible nodes or changing shard count on active indices without planning. Always test in staging first.
How does replication affect scaling?
Replicas increase storage requirements and indexing overhead, but they improve search performance and availability. For every 1 primary shard, you need additional storage for each replica. A 5-shard index with 1 replica uses 10 shards worth of storage. Balance replica count with your availability needs1 replica is standard; 2 is for critical systems.
Whats the difference between cross-cluster search and cross-cluster replication?
Cross-cluster search (CCS) allows querying multiple remote clusters as if they were one. Data stays in place. Useful for querying historical data in separate clusters.
Cross-cluster replication (CCR) copies data from a leader cluster to a follower cluster. Data is physically replicated. Used for disaster recovery and low-latency regional reads.
How often should I monitor my Elasticsearch cluster?
Continuous monitoring is essential. Set up real-time alerts for critical metrics (heap, disk, latency). Review dashboards daily. Perform weekly capacity planning and monthly shard optimization.
Is Elasticsearch autoscaling possible?
Not natively. But you can automate scaling using infrastructure-as-code (Terraform) and monitoring triggers. For example, if heap usage exceeds 80% for 10 minutes, trigger a script to add a node via cloud APIs. Third-party tools like Elastic Cloud offer limited autoscaling.
Conclusion
Scaling Elasticsearch nodes is not a one-time taskits an ongoing operational discipline. Success comes from understanding your workload, designing for resilience, and applying incremental, data-driven improvements. Start by diagnosing bottlenecks, then adopt dedicated node roles, optimize shard allocation, and leverage automation tools like ILM and CCR. Always prioritize stability over raw performance: a cluster thats slow but reliable is far better than one thats fast but crashes under load.
By following the practices outlined in this guidemonitoring rigorously, testing changes in staging, avoiding over-sharding, and using appropriate hardwareyoull build an Elasticsearch infrastructure that scales seamlessly with your business. Whether youre handling millions of search queries daily or ingesting terabytes of logs, the principles remain the same: plan, measure, adjust, and repeat.
Remember: The goal isnt just to add more nodesits to create a system that grows intelligently, remains stable under pressure, and delivers consistent performance to your users, no matter how large your data becomes.