How to Index Logs Into Elasticsearch

How to Index Logs Into Elasticsearch Indexing logs into Elasticsearch is a foundational practice for modern observability, security monitoring, and operational intelligence. As systems grow in complexity—spanning microservices, cloud infrastructure, containers, and distributed applications—managing and analyzing log data becomes critical for detecting anomalies, troubleshooting failures, and ensur

alex

Nov 10, 2025 - 12:16

How to Index Logs Into Elasticsearch

Indexing logs into Elasticsearch is a foundational practice for modern observability, security monitoring, and operational intelligence. As systems grow in complexityspanning microservices, cloud infrastructure, containers, and distributed applicationsmanaging and analyzing log data becomes critical for detecting anomalies, troubleshooting failures, and ensuring performance. Elasticsearch, part of the Elastic Stack (formerly ELK Stack), is a powerful, scalable, and real-time search and analytics engine designed specifically for handling large volumes of semi-structured data like logs. When properly configured, Elasticsearch enables organizations to centralize, search, visualize, and alert on log events with unprecedented speed and precision.

This tutorial provides a comprehensive, step-by-step guide to indexing logs into Elasticsearch. Whether you're managing application logs from Node.js, system logs from Linux servers, or container logs from Docker and Kubernetes, this guide covers the full lifecyclefrom log collection and transformation to ingestion, mapping, and optimization. Youll learn best practices for structuring your data, selecting the right tools, avoiding common pitfalls, and scaling your logging infrastructure. By the end, youll have a production-ready pipeline capable of handling thousands of log events per second with minimal latency and maximum reliability.

Step-by-Step Guide

1. Understand Your Log Sources and Formats

Before you begin indexing, identify all sources of log data. Common sources include:

Application logs (e.g., JSON logs from Node.js, Python, Java)
System logs (e.g., /var/log/syslog, /var/log/messages on Linux)
Web server logs (e.g., Nginx, Apache access and error logs)
Container logs (e.g., Docker, Kubernetes)
Cloud service logs (e.g., AWS CloudTrail, Azure Monitor, GCP Logging)

Each source generates logs in different formatsplain text, JSON, CSV, or proprietary formats. For Elasticsearch to effectively index and query logs, they must be structured. Unstructured logs (e.g., free-form text) are harder to analyze and require additional parsing. JSON is the preferred format because it natively maps to Elasticsearchs document model. If your logs are not in JSON, youll need to transform them during ingestion.

2. Choose a Log Collection Agent

To transport logs from your sources to Elasticsearch, you need a log collector. The most widely used tools are Filebeat, Fluentd, and Logstash. Each has strengths depending on your environment:

Filebeat: Lightweight, written in Go, ideal for collecting logs from files on servers. Minimal resource usage. Best for simple, high-volume log shipping.
Logstash: Feature-rich, written in Ruby. Supports complex filtering, parsing, and enrichment. Higher memory footprint. Best for transformation-heavy pipelines.
Fluentd: Extensible, plugin-based, widely used in Kubernetes environments. Strong integration with cloud-native tools.

For most use cases, we recommend starting with Filebeat due to its simplicity and efficiency. Its developed by Elastic and integrates natively with Elasticsearch.

3. Install and Configure Filebeat

Install Filebeat on each machine generating logs. On Ubuntu/Debian:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list sudo apt update

sudo apt install filebeat

On CentOS/RHEL:

sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
sudo cat > /etc/yum.repos.d/elastic-8.x.repo 
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF
sudo yum install filebeat

After installation, configure Filebeat by editing /etc/filebeat/filebeat.yml. Below is a basic configuration to collect Nginx access logs:

filebeat.inputs: - type: filestream enabled: true paths: - /var/log/nginx/access.log output.elasticsearch: hosts: ["http://your-elasticsearch-host:9200"] username: "filebeat_system" password: "your-secure-password" setup.template.enabled: true setup.template.name: "nginx-logs" setup.template.pattern: "nginx-logs-*" setup.ilm.enabled: true setup.ilm.rollover_alias: "nginx-logs" setup.ilm.pattern: "{now/d}-000001" setup.ilm.pattern_rotation: daily

setup.ilm.overwrite: true

Key configuration notes:

filestream: Newer input type (Filebeat 7.9+) replacing log. Better handling of file rotation and multi-line logs.
paths: Specify the exact log file path. Use wildcards like /var/log/nginx/*.log for multiple files.
output.elasticsearch: Point to your Elasticsearch cluster. Use HTTPS in production with TLS enabled.
setup.ilm: Enables Index Lifecycle Management (ILM), which automates index rollover and deletion. Essential for production.

4. Configure Elasticsearch Index Templates

Elasticsearch uses index templates to define mappings, settings, and lifecycle policies for new indices. Without a template, Elasticsearch auto-detects field types, which can lead to incorrect mappings (e.g., treating a numeric field as a string).

Create a template named nginx-logs-template using the Elasticsearch REST API:

PUT _index_template/nginx-logs-template
{
"index_patterns": ["nginx-logs-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "5s",
"index.lifecycle.name": "nginx-logs-policy",
"index.lifecycle.rollover_alias": "nginx-logs"
},
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSSZ"
},
"client_ip": {
"type": "ip"
},
"status_code": {
"type": "short"
},
"request_method": {
"type": "keyword"
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user_agent": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"bytes_sent": {
"type": "long"
},
"response_time": {
"type": "float"
}
}
}
},
"priority": 500,
"version": 1
}

This template ensures:

Fields like client_ip are mapped as ip for geolocation queries.
status_code uses short to save storage.
url and user_agent are both text (for full-text search) and keyword (for aggregations).
ILM policy is attached to automate index rollover.

5. Set Up Index Lifecycle Management (ILM)

ILM automates the management of indices over time. It prevents your cluster from filling up with old logs and ensures performance by moving data to cheaper storage tiers.

Create an ILM policy named nginx-logs-policy:

PUT _ilm/policy/nginx-logs-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50GB",
"max_age": "1d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"allocate": {
"number_of_replicas": 0
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}

This policy:

Rolls over the index when it reaches 50GB or 1 day old.
After 7 days, removes replicas to save space (warm phase).
After 30 days, freezes the index (reduces memory usage).
After 90 days, deletes the index entirely.

Apply this policy to your index template as shown in Step 4.

6. Start and Test Filebeat

After configuration, start Filebeat:

sudo systemctl enable filebeat

sudo systemctl start filebeat

Verify its running:

sudo systemctl status filebeat

Check Filebeat logs for errors:

sudo tail -f /var/log/filebeat/filebeat

To confirm logs are being indexed, query Elasticsearch:

GET nginx-logs-*/_search
{
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}

If you receive a document with your log fields, indexing is successful.

7. Use Kibana for Visualization and Monitoring

Install Kibana on the same or a separate server:

sudo apt install kibana sudo systemctl enable kibana

sudo systemctl start kibana

Access Kibana at http://your-server:5601. Navigate to Stack Management > Index Patterns and create an index pattern matching nginx-logs-*. Select timestamp as the time field.

Now go to Discover to explore your logs in real time. Create visualizations (bar charts, line graphs, heatmaps) and dashboards to monitor request rates, error codes, and client geography.

8. Secure Your Pipeline

In production, never expose Elasticsearch to the public internet. Use:

Authentication: Enable Elasticsearchs built-in security (X-Pack) and create users with least-privilege roles.
Encryption: Enable TLS between Filebeat and Elasticsearch using certificates.
Firewall rules: Restrict access to port 9200 to trusted IP ranges.
Filebeat SSL settings: Add to filebeat.yml:

output.elasticsearch: hosts: ["https://your-elasticsearch-host:9200"] username: "filebeat_writer" password: "secure-password-here" ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"] ssl.certificate: "/etc/filebeat/certs/filebeat.crt"

ssl.key: "/etc/filebeat/certs/filebeat.key"

Generate certificates using OpenSSL or a certificate authority like Lets Encrypt.

Best Practices

1. Structure Logs as JSON Whenever Possible

JSON logs are self-describing and eliminate the need for complex grok patterns in Logstash or Filebeat. Most modern frameworks (e.g., Winston for Node.js, Log4j2 for Java) support JSON output natively. Example:

{ "timestamp": "2024-06-15T10:23:45.123Z", "level": "error", "message": "Database connection timeout", "service": "user-auth", "trace_id": "a1b2c3d4", "user_id": 12345, "ip": "192.168.1.10"

}

With JSON logs, Filebeat can use json.keys_under_root: true to flatten fields directly into the Elasticsearch document, reducing parsing overhead.

2. Avoid Over-Mapping

Dont map every possible field upfront. Start with essential fields needed for querying and visualization. Add fields as needed. Over-mapping increases memory usage and slows down indexing.

3. Use Keywords for Aggregations, Text for Search

Always use keyword for fields youll use in filters, terms aggregations, or sorting (e.g., status_code, service_name). Use text only for full-text search (e.g., message, stack_trace). Use multi-fields to support both:

"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}

4. Implement Log Rotation and Filebeats File State Tracking

Filebeat tracks which lines it has read using a registry file (/var/lib/filebeat/registry). This prevents duplicate indexing when logs are rotated or the agent restarts. Ensure your log rotation tool (e.g., logrotate) uses the copytruncate option to preserve file handles.

5. Monitor Filebeat and Elasticsearch Health

Use Prometheus and Grafana to monitor:

Filebeat: Events sent, events failed, backlog size
Elasticsearch: Cluster health, indexing rate, JVM heap usage, segment count

Enable Filebeats internal metrics:

filebeat.monitoring.enabled: true filebeat.monitoring.elasticsearch:

hosts: ["http://your-elasticsearch:9200"]

Then visualize metrics in Kibana under Monitoring.

6. Dont Index Everything

Not all logs are equally valuable. Filter out noisy logs (e.g., health checks, debug messages) before ingestion. Use Filebeats processors to drop events:

processors: - drop_event: when: contains:

message: "GET /health"

Or use Logstashs if conditions for complex logic.

7. Scale with Multiple Nodes and Shards

For high-volume logging (10K+ events/sec), deploy Elasticsearch as a cluster with dedicated master, data, and ingest nodes. Use the formula: Number of shards = Number of data nodes 12. Avoid shards larger than 50GB. Too many small shards hurt performance; too few limit scalability.

8. Regularly Audit and Clean Up Indices

Use the ILM policy to automate deletion. Monitor index growth with:

GET _cat/indices?v&h=index,docs.count,store.size,pri.store.size

Manually delete orphaned or misnamed indices with:

DELETE /old-logs-2023-*

Tools and Resources

Core Tools

Elasticsearch: The search and analytics engine. Download at elastic.co/downloads/elasticsearch
Filebeat: Lightweight log shipper. elastic.co/beats/filebeat
Kibana: Visualization and management UI. elastic.co/downloads/kibana
Logstash: For advanced log transformation. elastic.co/downloads/logstash
Fluent Bit: Lightweight alternative to Fluentd for Kubernetes. fluentbit.io

Helper Tools

Logstash Config Generator: Online tool to generate grok patterns for log formats: grokdebug.herokuapp.com
Elasticsearch Mapper Attachments Plugin: For indexing binary logs (e.g., PDFs, images) as text.
OpenSearch: Open-source fork of Elasticsearch. Compatible with Filebeat and Kibana. opensearch.org
Vector.dev: High-performance, Rust-based log processor. vector.dev

Documentation and Learning

Elasticsearch Reference: elastic.co/guide/en/elasticsearch/reference/current
Filebeat Modules: Pre-built configurations for common logs (Nginx, Apache, MySQL, etc.): elastic.co/guide/en/beats/filebeat/current/filebeat-modules.html
ELK Stack Best Practices Whitepaper: Available from Elastics resources portal.
YouTube Channels: Elastics official channel and DevOps with Alex offer practical tutorials.

Community and Support

Elastic Discuss Forum: discuss.elastic.co
Stack Overflow: Tag questions with elasticsearch and filebeat.
GitHub Repositories: Search for elasticsearch log pipeline for open-source examples.

Real Examples

Example 1: Indexing Docker Container Logs

When running containers with Docker, logs are stored at /var/lib/docker/containers/*/*-json.log. Filebeat can monitor this directory:

filebeat.inputs: - type: filestream enabled: true paths: - /var/lib/docker/containers/*/*-json.log json.keys_under_root: true json.add_error_key: true json.message_key: log processors: - add_docker_metadata: host: "unix:///var/run/docker.sock" match_fields: ["container.id"] match_sources: ["log"] output.elasticsearch: hosts: ["https://es-cluster:9200"] index: "docker-logs-%{+yyyy.MM.dd}" username: "filebeat" password: "secret"

ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]

The add_docker_metadata processor enriches logs with container name, image, labels, and network info. This enables queries like:

GET docker-logs-*/_search
{
"query": {
"match": {
"docker.container.name": "nginx-proxy"
}
}
}

Example 2: Kubernetes Logs with Fluent Bit

In Kubernetes, Fluent Bit can be deployed as a DaemonSet to collect logs from all nodes:

[INPUT] Name tail Tag kube.* Path /var/log/containers/*.log Parser docker DB /var/log/flb_kube.db Mem_Buf_Limit 5MB Skip_Long_Lines On Refresh_Interval 10 [PARSER] Name docker Format json Time_Key time Time_Format %Y-%m-%dT%H:%M:%S.%L Time_Keep On Decode_Field_As escaped log [OUTPUT] Name es Match * Host elasticsearch.default.svc.cluster.local Port 9200 Logstash_Format On Logstash_Prefix kube-logs Retry_Limit False tls On

tls.verify Off

This sends logs to Elasticsearch with index names like kube-logs-2024.06.15. Combine with Kubernetes labels to filter logs by namespace or pod.

Example 3: Application Logs from Node.js with Winston

In a Node.js app, configure Winston to output structured logs:

const { createLogger, format, transports } = require('winston');
const { combine, timestamp, json } = format;
const logger = createLogger({
level: 'info',
format: combine(
timestamp(),
json()
),
transports: [
new transports.File({ filename: 'app.log' })
]
});
logger.info('User logged in', { userId: 123, ip: '192.168.1.1' });

Then configure Filebeat to read app.log with json.keys_under_root: true. The resulting Elasticsearch document will have top-level fields: timestamp, level, message, userId, and ip.

Example 4: Centralized Logging for Microservices

For 50+ microservices, use a centralized Filebeat deployment with dynamic configuration:

Each service writes logs to a dedicated file: /var/log/services/service-a.log
Use Filebeats filestream input with glob patterns: /var/log/services/*.log
Use a processor to add a service_name field based on filename:

processors: - add_fields: target: '' fields: service_name: "service-a" when: contains: source: "service-a.log" - add_fields: target: '' fields: service_name: "service-b" when: contains:

source: "service-b.log"

Then create a single index template for all services and use Kibana dashboards grouped by service_name.

FAQs

Can I index logs into Elasticsearch without using Filebeat or Logstash?

Yes. You can write logs directly to Elasticsearch via HTTP POST requests using any programming language. For example, in Python:

import requests
import json
log_entry = {
"timestamp": "2024-06-15T10:23:45Z",
"message": "Application started",
"service": "auth-service"
}
response = requests.post(
"http://elasticsearch:9200/app-logs/_doc",
data=json.dumps(log_entry),
headers={"Content-Type": "application/json"}
)

However, this approach lacks reliability (no retry, no backpressure), file rotation handling, or security. Use only for testing or small-scale scripts.

How do I handle multi-line logs (e.g., Java stack traces)?

Use Filebeats multiline processor:

filebeat.inputs:
- type: filestream
paths:
- /var/log/myapp/*.log
multiline:
pattern: '^[[:space:]]'
match: after
negate: false
max_lines: 500

This combines lines starting with whitespace (e.g., stack trace lines) into a single event. Alternatively, use Logstashs multiline filter.

How much disk space do logs consume in Elasticsearch?

It depends on your log volume and compression. On average, JSON logs compress to 1030% of their original size. A 1GB raw log file may become 150300MB in Elasticsearch. Use ILM to delete old logs and consider hot-warm-cold architectures to reduce costs.

Can I use Elasticsearch to index logs from Windows servers?

Yes. Install Filebeat on Windows and configure it to monitor C:\ProgramData\MyApp\logs\*.log. Use the same configuration syntax. Filebeat supports Windows event logs via the wineventlog input type.

Whats the difference between an index and an index pattern in Kibana?

An index is a physical data structure in Elasticsearch (e.g., nginx-logs-000001). An index pattern is a Kibana configuration that defines which indices to include in a view (e.g., nginx-logs-*). You use index patterns to build dashboards and queries across multiple indices.

Why is my Elasticsearch cluster slow when querying logs?

Common causes:

Too many shards per index
Large shards (>50GB)
Missing keyword fields for aggregations
High JVM heap usage
Insufficient RAM or CPU on data nodes

Check the Elasticsearch cluster health API and use the Profiler in Kibanas Dev Tools to analyze slow queries.

Do I need to restart Filebeat after changing the config?

Yes. After modifying filebeat.yml, restart the service:

sudo systemctl restart filebeat

Use filebeat test config to validate syntax before restarting.

Conclusion

Indexing logs into Elasticsearch is not just a technical taskits a strategic investment in your systems visibility, resilience, and performance. By following the steps outlined in this guide, youve built a scalable, secure, and automated log pipeline that transforms raw log data into actionable intelligence. From selecting the right collector to enforcing strict mappings and lifecycle policies, each decision impacts how effectively you can detect issues, respond to incidents, and optimize infrastructure.

Remember: the goal is not to collect every log, but to collect the right logs, in the right format, at the right time. Start simple, validate your pipeline with real data, and iterate. Use templates and automation to eliminate manual configuration. Monitor your pipeline relentlesslylogs are only useful if theyre available, accurate, and searchable.

As your infrastructure evolveswhether moving to serverless, expanding to multi-cloud, or adopting AI-driven anomaly detectionyour logging architecture must scale with it. Elasticsearch, paired with Filebeat and Kibana, provides the foundation for that evolution. Keep refining your templates, update your ILM policies, and embrace structured logging as a core engineering practice.

With this knowledge, youre no longer just collecting logsyoure building the nervous system of your digital operations. And thats the hallmark of a truly observability-driven organization.

alex

How to Index Logs Into Elasticsearch

How to Index Logs Into Elasticsearch

Step-by-Step Guide

1. Understand Your Log Sources and Formats

2. Choose a Log Collection Agent

3. Install and Configure Filebeat

4. Configure Elasticsearch Index Templates

5. Set Up Index Lifecycle Management (ILM)

6. Start and Test Filebeat

7. Use Kibana for Visualization and Monitoring

8. Secure Your Pipeline

Best Practices

1. Structure Logs as JSON Whenever Possible

2. Avoid Over-Mapping

3. Use Keywords for Aggregations, Text for Search

4. Implement Log Rotation and Filebeats File State Tracking

5. Monitor Filebeat and Elasticsearch Health

6. Dont Index Everything

7. Scale with Multiple Nodes and Shards

8. Regularly Audit and Clean Up Indices

Tools and Resources

Core Tools

Helper Tools

Documentation and Learning

Community and Support

Real Examples

Example 1: Indexing Docker Container Logs

Example 2: Kubernetes Logs with Fluent Bit

Example 3: Application Logs from Node.js with Winston

Example 4: Centralized Logging for Microservices

FAQs

Can I index logs into Elasticsearch without using Filebeat or Logstash?

How do I handle multi-line logs (e.g., Java stack traces)?

How much disk space do logs consume in Elasticsearch?

Can I use Elasticsearch to index logs from Windows servers?

Whats the difference between an index and an index pattern in Kibana?

Why is my Elasticsearch cluster slow when querying logs?

Do I need to restart Filebeat after changing the config?

Conclusion

Related Posts

Popular Posts

Recommended Posts

Popular Tags