How to Index Logs Into Elasticsearch
How to Index Logs Into Elasticsearch Indexing logs into Elasticsearch is a foundational practice for modern observability, security monitoring, and operational intelligence. As systems grow in complexity—spanning microservices, cloud infrastructure, containers, and distributed applications—managing and analyzing log data becomes critical for detecting anomalies, troubleshooting failures, and ensur
How to Index Logs Into Elasticsearch
Indexing logs into Elasticsearch is a foundational practice for modern observability, security monitoring, and operational intelligence. As systems grow in complexityspanning microservices, cloud infrastructure, containers, and distributed applicationsmanaging and analyzing log data becomes critical for detecting anomalies, troubleshooting failures, and ensuring performance. Elasticsearch, part of the Elastic Stack (formerly ELK Stack), is a powerful, scalable, and real-time search and analytics engine designed specifically for handling large volumes of semi-structured data like logs. When properly configured, Elasticsearch enables organizations to centralize, search, visualize, and alert on log events with unprecedented speed and precision.
This tutorial provides a comprehensive, step-by-step guide to indexing logs into Elasticsearch. Whether you're managing application logs from Node.js, system logs from Linux servers, or container logs from Docker and Kubernetes, this guide covers the full lifecyclefrom log collection and transformation to ingestion, mapping, and optimization. Youll learn best practices for structuring your data, selecting the right tools, avoiding common pitfalls, and scaling your logging infrastructure. By the end, youll have a production-ready pipeline capable of handling thousands of log events per second with minimal latency and maximum reliability.
Step-by-Step Guide
1. Understand Your Log Sources and Formats
Before you begin indexing, identify all sources of log data. Common sources include:
- Application logs (e.g., JSON logs from Node.js, Python, Java)
- System logs (e.g., /var/log/syslog, /var/log/messages on Linux)
- Web server logs (e.g., Nginx, Apache access and error logs)
- Container logs (e.g., Docker, Kubernetes)
- Cloud service logs (e.g., AWS CloudTrail, Azure Monitor, GCP Logging)
Each source generates logs in different formatsplain text, JSON, CSV, or proprietary formats. For Elasticsearch to effectively index and query logs, they must be structured. Unstructured logs (e.g., free-form text) are harder to analyze and require additional parsing. JSON is the preferred format because it natively maps to Elasticsearchs document model. If your logs are not in JSON, youll need to transform them during ingestion.
2. Choose a Log Collection Agent
To transport logs from your sources to Elasticsearch, you need a log collector. The most widely used tools are Filebeat, Fluentd, and Logstash. Each has strengths depending on your environment:
- Filebeat: Lightweight, written in Go, ideal for collecting logs from files on servers. Minimal resource usage. Best for simple, high-volume log shipping.
- Logstash: Feature-rich, written in Ruby. Supports complex filtering, parsing, and enrichment. Higher memory footprint. Best for transformation-heavy pipelines.
- Fluentd: Extensible, plugin-based, widely used in Kubernetes environments. Strong integration with cloud-native tools.
For most use cases, we recommend starting with Filebeat due to its simplicity and efficiency. Its developed by Elastic and integrates natively with Elasticsearch.
3. Install and Configure Filebeat
Install Filebeat on each machine generating logs. On Ubuntu/Debian:
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update
sudo apt install filebeat
On CentOS/RHEL:
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
sudo cat > /etc/yum.repos.d/elastic-8.x.repo
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF
sudo yum install filebeat
After installation, configure Filebeat by editing /etc/filebeat/filebeat.yml. Below is a basic configuration to collect Nginx access logs:
filebeat.inputs:
- type: filestream
enabled: true
paths:
- /var/log/nginx/access.log
output.elasticsearch:
hosts: ["http://your-elasticsearch-host:9200"]
username: "filebeat_system"
password: "your-secure-password"
setup.template.enabled: true
setup.template.name: "nginx-logs"
setup.template.pattern: "nginx-logs-*"
setup.ilm.enabled: true
setup.ilm.rollover_alias: "nginx-logs"
setup.ilm.pattern: "{now/d}-000001"
setup.ilm.pattern_rotation: daily
setup.ilm.overwrite: true
Key configuration notes:
- filestream: Newer input type (Filebeat 7.9+) replacing log. Better handling of file rotation and multi-line logs.
- paths: Specify the exact log file path. Use wildcards like
/var/log/nginx/*.logfor multiple files. - output.elasticsearch: Point to your Elasticsearch cluster. Use HTTPS in production with TLS enabled.
- setup.ilm: Enables Index Lifecycle Management (ILM), which automates index rollover and deletion. Essential for production.
4. Configure Elasticsearch Index Templates
Elasticsearch uses index templates to define mappings, settings, and lifecycle policies for new indices. Without a template, Elasticsearch auto-detects field types, which can lead to incorrect mappings (e.g., treating a numeric field as a string).
Create a template named nginx-logs-template using the Elasticsearch REST API:
PUT _index_template/nginx-logs-template
{
"index_patterns": ["nginx-logs-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "5s",
"index.lifecycle.name": "nginx-logs-policy",
"index.lifecycle.rollover_alias": "nginx-logs"
},
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSSZ"
},
"client_ip": {
"type": "ip"
},
"status_code": {
"type": "short"
},
"request_method": {
"type": "keyword"
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user_agent": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"bytes_sent": {
"type": "long"
},
"response_time": {
"type": "float"
}
}
}
},
"priority": 500,
"version": 1
}
This template ensures:
- Fields like
client_ipare mapped asipfor geolocation queries. status_codeusesshortto save storage.urlanduser_agentare bothtext(for full-text search) andkeyword(for aggregations).- ILM policy is attached to automate index rollover.
5. Set Up Index Lifecycle Management (ILM)
ILM automates the management of indices over time. It prevents your cluster from filling up with old logs and ensures performance by moving data to cheaper storage tiers.
Create an ILM policy named nginx-logs-policy:
PUT _ilm/policy/nginx-logs-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50GB",
"max_age": "1d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"allocate": {
"number_of_replicas": 0
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
This policy:
- Rolls over the index when it reaches 50GB or 1 day old.
- After 7 days, removes replicas to save space (warm phase).
- After 30 days, freezes the index (reduces memory usage).
- After 90 days, deletes the index entirely.
Apply this policy to your index template as shown in Step 4.
6. Start and Test Filebeat
After configuration, start Filebeat:
sudo systemctl enable filebeat
sudo systemctl start filebeat
Verify its running:
sudo systemctl status filebeat
Check Filebeat logs for errors:
sudo tail -f /var/log/filebeat/filebeat
To confirm logs are being indexed, query Elasticsearch:
GET nginx-logs-*/_search
{
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
If you receive a document with your log fields, indexing is successful.
7. Use Kibana for Visualization and Monitoring
Install Kibana on the same or a separate server:
sudo apt install kibana
sudo systemctl enable kibana
sudo systemctl start kibana
Access Kibana at http://your-server:5601. Navigate to Stack Management > Index Patterns and create an index pattern matching nginx-logs-*. Select timestamp as the time field.
Now go to Discover to explore your logs in real time. Create visualizations (bar charts, line graphs, heatmaps) and dashboards to monitor request rates, error codes, and client geography.
8. Secure Your Pipeline
In production, never expose Elasticsearch to the public internet. Use:
- Authentication: Enable Elasticsearchs built-in security (X-Pack) and create users with least-privilege roles.
- Encryption: Enable TLS between Filebeat and Elasticsearch using certificates.
- Firewall rules: Restrict access to port 9200 to trusted IP ranges.
- Filebeat SSL settings: Add to filebeat.yml:
output.elasticsearch:
hosts: ["https://your-elasticsearch-host:9200"]
username: "filebeat_writer"
password: "secure-password-here"
ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]
ssl.certificate: "/etc/filebeat/certs/filebeat.crt"
ssl.key: "/etc/filebeat/certs/filebeat.key"
Generate certificates using OpenSSL or a certificate authority like Lets Encrypt.
Best Practices
1. Structure Logs as JSON Whenever Possible
JSON logs are self-describing and eliminate the need for complex grok patterns in Logstash or Filebeat. Most modern frameworks (e.g., Winston for Node.js, Log4j2 for Java) support JSON output natively. Example:
{
"timestamp": "2024-06-15T10:23:45.123Z",
"level": "error",
"message": "Database connection timeout",
"service": "user-auth",
"trace_id": "a1b2c3d4",
"user_id": 12345,
"ip": "192.168.1.10"
}
With JSON logs, Filebeat can use json.keys_under_root: true to flatten fields directly into the Elasticsearch document, reducing parsing overhead.
2. Avoid Over-Mapping
Dont map every possible field upfront. Start with essential fields needed for querying and visualization. Add fields as needed. Over-mapping increases memory usage and slows down indexing.
3. Use Keywords for Aggregations, Text for Search
Always use keyword for fields youll use in filters, terms aggregations, or sorting (e.g., status_code, service_name). Use text only for full-text search (e.g., message, stack_trace). Use multi-fields to support both:
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
4. Implement Log Rotation and Filebeats File State Tracking
Filebeat tracks which lines it has read using a registry file (/var/lib/filebeat/registry). This prevents duplicate indexing when logs are rotated or the agent restarts. Ensure your log rotation tool (e.g., logrotate) uses the copytruncate option to preserve file handles.
5. Monitor Filebeat and Elasticsearch Health
Use Prometheus and Grafana to monitor:
- Filebeat: Events sent, events failed, backlog size
- Elasticsearch: Cluster health, indexing rate, JVM heap usage, segment count
Enable Filebeats internal metrics:
filebeat.monitoring.enabled: true
filebeat.monitoring.elasticsearch:
hosts: ["http://your-elasticsearch:9200"]
Then visualize metrics in Kibana under Monitoring.
6. Dont Index Everything
Not all logs are equally valuable. Filter out noisy logs (e.g., health checks, debug messages) before ingestion. Use Filebeats processors to drop events:
processors:
- drop_event:
when:
contains:
message: "GET /health"
Or use Logstashs if conditions for complex logic.
7. Scale with Multiple Nodes and Shards
For high-volume logging (10K+ events/sec), deploy Elasticsearch as a cluster with dedicated master, data, and ingest nodes. Use the formula: Number of shards = Number of data nodes 12. Avoid shards larger than 50GB. Too many small shards hurt performance; too few limit scalability.
8. Regularly Audit and Clean Up Indices
Use the ILM policy to automate deletion. Monitor index growth with:
GET _cat/indices?v&h=index,docs.count,store.size,pri.store.size
Manually delete orphaned or misnamed indices with:
DELETE /old-logs-2023-*
Tools and Resources
Core Tools
- Elasticsearch: The search and analytics engine. Download at elastic.co/downloads/elasticsearch
- Filebeat: Lightweight log shipper. elastic.co/beats/filebeat
- Kibana: Visualization and management UI. elastic.co/downloads/kibana
- Logstash: For advanced log transformation. elastic.co/downloads/logstash
- Fluent Bit: Lightweight alternative to Fluentd for Kubernetes. fluentbit.io
Helper Tools
- Logstash Config Generator: Online tool to generate grok patterns for log formats: grokdebug.herokuapp.com
- Elasticsearch Mapper Attachments Plugin: For indexing binary logs (e.g., PDFs, images) as text.
- OpenSearch: Open-source fork of Elasticsearch. Compatible with Filebeat and Kibana. opensearch.org
- Vector.dev: High-performance, Rust-based log processor. vector.dev
Documentation and Learning
- Elasticsearch Reference: elastic.co/guide/en/elasticsearch/reference/current
- Filebeat Modules: Pre-built configurations for common logs (Nginx, Apache, MySQL, etc.): elastic.co/guide/en/beats/filebeat/current/filebeat-modules.html
- ELK Stack Best Practices Whitepaper: Available from Elastics resources portal.
- YouTube Channels: Elastics official channel and DevOps with Alex offer practical tutorials.
Community and Support
- Elastic Discuss Forum: discuss.elastic.co
- Stack Overflow: Tag questions with
elasticsearchandfilebeat. - GitHub Repositories: Search for elasticsearch log pipeline for open-source examples.
Real Examples
Example 1: Indexing Docker Container Logs
When running containers with Docker, logs are stored at /var/lib/docker/containers/*/*-json.log. Filebeat can monitor this directory:
filebeat.inputs:
- type: filestream
enabled: true
paths:
- /var/lib/docker/containers/*/*-json.log
json.keys_under_root: true
json.add_error_key: true
json.message_key: log
processors:
- add_docker_metadata:
host: "unix:///var/run/docker.sock"
match_fields: ["container.id"]
match_sources: ["log"]
output.elasticsearch:
hosts: ["https://es-cluster:9200"]
index: "docker-logs-%{+yyyy.MM.dd}"
username: "filebeat"
password: "secret"
ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]
The add_docker_metadata processor enriches logs with container name, image, labels, and network info. This enables queries like:
GET docker-logs-*/_search
{
"query": {
"match": {
"docker.container.name": "nginx-proxy"
}
}
}
Example 2: Kubernetes Logs with Fluent Bit
In Kubernetes, Fluent Bit can be deployed as a DaemonSet to collect logs from all nodes:
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
Decode_Field_As escaped log
[OUTPUT]
Name es
Match *
Host elasticsearch.default.svc.cluster.local
Port 9200
Logstash_Format On
Logstash_Prefix kube-logs
Retry_Limit False
tls On
tls.verify Off
This sends logs to Elasticsearch with index names like kube-logs-2024.06.15. Combine with Kubernetes labels to filter logs by namespace or pod.
Example 3: Application Logs from Node.js with Winston
In a Node.js app, configure Winston to output structured logs:
const { createLogger, format, transports } = require('winston');
const { combine, timestamp, json } = format;
const logger = createLogger({
level: 'info',
format: combine(
timestamp(),
json()
),
transports: [
new transports.File({ filename: 'app.log' })
]
});
logger.info('User logged in', { userId: 123, ip: '192.168.1.1' });
Then configure Filebeat to read app.log with json.keys_under_root: true. The resulting Elasticsearch document will have top-level fields: timestamp, level, message, userId, and ip.
Example 4: Centralized Logging for Microservices
For 50+ microservices, use a centralized Filebeat deployment with dynamic configuration:
- Each service writes logs to a dedicated file:
/var/log/services/service-a.log - Use Filebeats
filestreaminput with glob patterns:/var/log/services/*.log - Use a processor to add a
service_namefield based on filename:
processors:
- add_fields:
target: ''
fields:
service_name: "service-a"
when:
contains:
source: "service-a.log"
- add_fields:
target: ''
fields:
service_name: "service-b"
when:
contains:
source: "service-b.log"
Then create a single index template for all services and use Kibana dashboards grouped by service_name.
FAQs
Can I index logs into Elasticsearch without using Filebeat or Logstash?
Yes. You can write logs directly to Elasticsearch via HTTP POST requests using any programming language. For example, in Python:
import requests
import json
log_entry = {
"timestamp": "2024-06-15T10:23:45Z",
"message": "Application started",
"service": "auth-service"
}
response = requests.post(
"http://elasticsearch:9200/app-logs/_doc",
data=json.dumps(log_entry),
headers={"Content-Type": "application/json"}
)
However, this approach lacks reliability (no retry, no backpressure), file rotation handling, or security. Use only for testing or small-scale scripts.
How do I handle multi-line logs (e.g., Java stack traces)?
Use Filebeats multiline processor:
filebeat.inputs:
- type: filestream
paths:
- /var/log/myapp/*.log
multiline:
pattern: '^[[:space:]]'
match: after
negate: false
max_lines: 500
This combines lines starting with whitespace (e.g., stack trace lines) into a single event. Alternatively, use Logstashs multiline filter.
How much disk space do logs consume in Elasticsearch?
It depends on your log volume and compression. On average, JSON logs compress to 1030% of their original size. A 1GB raw log file may become 150300MB in Elasticsearch. Use ILM to delete old logs and consider hot-warm-cold architectures to reduce costs.
Can I use Elasticsearch to index logs from Windows servers?
Yes. Install Filebeat on Windows and configure it to monitor C:\ProgramData\MyApp\logs\*.log. Use the same configuration syntax. Filebeat supports Windows event logs via the wineventlog input type.
Whats the difference between an index and an index pattern in Kibana?
An index is a physical data structure in Elasticsearch (e.g., nginx-logs-000001). An index pattern is a Kibana configuration that defines which indices to include in a view (e.g., nginx-logs-*). You use index patterns to build dashboards and queries across multiple indices.
Why is my Elasticsearch cluster slow when querying logs?
Common causes:
- Too many shards per index
- Large shards (>50GB)
- Missing keyword fields for aggregations
- High JVM heap usage
- Insufficient RAM or CPU on data nodes
Check the Elasticsearch cluster health API and use the Profiler in Kibanas Dev Tools to analyze slow queries.
Do I need to restart Filebeat after changing the config?
Yes. After modifying filebeat.yml, restart the service:
sudo systemctl restart filebeat
Use filebeat test config to validate syntax before restarting.
Conclusion
Indexing logs into Elasticsearch is not just a technical taskits a strategic investment in your systems visibility, resilience, and performance. By following the steps outlined in this guide, youve built a scalable, secure, and automated log pipeline that transforms raw log data into actionable intelligence. From selecting the right collector to enforcing strict mappings and lifecycle policies, each decision impacts how effectively you can detect issues, respond to incidents, and optimize infrastructure.
Remember: the goal is not to collect every log, but to collect the right logs, in the right format, at the right time. Start simple, validate your pipeline with real data, and iterate. Use templates and automation to eliminate manual configuration. Monitor your pipeline relentlesslylogs are only useful if theyre available, accurate, and searchable.
As your infrastructure evolveswhether moving to serverless, expanding to multi-cloud, or adopting AI-driven anomaly detectionyour logging architecture must scale with it. Elasticsearch, paired with Filebeat and Kibana, provides the foundation for that evolution. Keep refining your templates, update your ILM policies, and embrace structured logging as a core engineering practice.
With this knowledge, youre no longer just collecting logsyoure building the nervous system of your digital operations. And thats the hallmark of a truly observability-driven organization.