How to Index Logs Into Elasticsearch

How to Index Logs Into Elasticsearch Indexing logs into Elasticsearch is a foundational practice for modern observability, security monitoring, and operational intelligence. As systems grow in complexity—spanning microservices, cloud infrastructure, containers, and distributed applications—managing and analyzing log data becomes critical for detecting anomalies, troubleshooting failures, and ensur

Nov 10, 2025 - 12:16
Nov 10, 2025 - 12:16
 3

How to Index Logs Into Elasticsearch

Indexing logs into Elasticsearch is a foundational practice for modern observability, security monitoring, and operational intelligence. As systems grow in complexityspanning microservices, cloud infrastructure, containers, and distributed applicationsmanaging and analyzing log data becomes critical for detecting anomalies, troubleshooting failures, and ensuring performance. Elasticsearch, part of the Elastic Stack (formerly ELK Stack), is a powerful, scalable, and real-time search and analytics engine designed specifically for handling large volumes of semi-structured data like logs. When properly configured, Elasticsearch enables organizations to centralize, search, visualize, and alert on log events with unprecedented speed and precision.

This tutorial provides a comprehensive, step-by-step guide to indexing logs into Elasticsearch. Whether you're managing application logs from Node.js, system logs from Linux servers, or container logs from Docker and Kubernetes, this guide covers the full lifecyclefrom log collection and transformation to ingestion, mapping, and optimization. Youll learn best practices for structuring your data, selecting the right tools, avoiding common pitfalls, and scaling your logging infrastructure. By the end, youll have a production-ready pipeline capable of handling thousands of log events per second with minimal latency and maximum reliability.

Step-by-Step Guide

1. Understand Your Log Sources and Formats

Before you begin indexing, identify all sources of log data. Common sources include:

  • Application logs (e.g., JSON logs from Node.js, Python, Java)
  • System logs (e.g., /var/log/syslog, /var/log/messages on Linux)
  • Web server logs (e.g., Nginx, Apache access and error logs)
  • Container logs (e.g., Docker, Kubernetes)
  • Cloud service logs (e.g., AWS CloudTrail, Azure Monitor, GCP Logging)

Each source generates logs in different formatsplain text, JSON, CSV, or proprietary formats. For Elasticsearch to effectively index and query logs, they must be structured. Unstructured logs (e.g., free-form text) are harder to analyze and require additional parsing. JSON is the preferred format because it natively maps to Elasticsearchs document model. If your logs are not in JSON, youll need to transform them during ingestion.

2. Choose a Log Collection Agent

To transport logs from your sources to Elasticsearch, you need a log collector. The most widely used tools are Filebeat, Fluentd, and Logstash. Each has strengths depending on your environment:

  • Filebeat: Lightweight, written in Go, ideal for collecting logs from files on servers. Minimal resource usage. Best for simple, high-volume log shipping.
  • Logstash: Feature-rich, written in Ruby. Supports complex filtering, parsing, and enrichment. Higher memory footprint. Best for transformation-heavy pipelines.
  • Fluentd: Extensible, plugin-based, widely used in Kubernetes environments. Strong integration with cloud-native tools.

For most use cases, we recommend starting with Filebeat due to its simplicity and efficiency. Its developed by Elastic and integrates natively with Elasticsearch.

3. Install and Configure Filebeat

Install Filebeat on each machine generating logs. On Ubuntu/Debian:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list

sudo apt update

sudo apt install filebeat

On CentOS/RHEL:

sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

sudo cat > /etc/yum.repos.d/elastic-8.x.repo

[elastic-8.x]

name=Elastic repository for 8.x packages

baseurl=https://artifacts.elastic.co/packages/8.x/yum

gpgcheck=1

gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch

enabled=1

autorefresh=1

type=rpm-md

EOF

sudo yum install filebeat

After installation, configure Filebeat by editing /etc/filebeat/filebeat.yml. Below is a basic configuration to collect Nginx access logs:

filebeat.inputs:

- type: filestream

enabled: true

paths:

- /var/log/nginx/access.log

output.elasticsearch:

hosts: ["http://your-elasticsearch-host:9200"]

username: "filebeat_system"

password: "your-secure-password"

setup.template.enabled: true

setup.template.name: "nginx-logs"

setup.template.pattern: "nginx-logs-*"

setup.ilm.enabled: true

setup.ilm.rollover_alias: "nginx-logs"

setup.ilm.pattern: "{now/d}-000001"

setup.ilm.pattern_rotation: daily

setup.ilm.overwrite: true

Key configuration notes:

  • filestream: Newer input type (Filebeat 7.9+) replacing log. Better handling of file rotation and multi-line logs.
  • paths: Specify the exact log file path. Use wildcards like /var/log/nginx/*.log for multiple files.
  • output.elasticsearch: Point to your Elasticsearch cluster. Use HTTPS in production with TLS enabled.
  • setup.ilm: Enables Index Lifecycle Management (ILM), which automates index rollover and deletion. Essential for production.

4. Configure Elasticsearch Index Templates

Elasticsearch uses index templates to define mappings, settings, and lifecycle policies for new indices. Without a template, Elasticsearch auto-detects field types, which can lead to incorrect mappings (e.g., treating a numeric field as a string).

Create a template named nginx-logs-template using the Elasticsearch REST API:

PUT _index_template/nginx-logs-template

{

"index_patterns": ["nginx-logs-*"],

"template": {

"settings": {

"number_of_shards": 3,

"number_of_replicas": 1,

"refresh_interval": "5s",

"index.lifecycle.name": "nginx-logs-policy",

"index.lifecycle.rollover_alias": "nginx-logs"

},

"mappings": {

"properties": {

"timestamp": {

"type": "date",

"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSSZ"

},

"client_ip": {

"type": "ip"

},

"status_code": {

"type": "short"

},

"request_method": {

"type": "keyword"

},

"url": {

"type": "text",

"fields": {

"keyword": {

"type": "keyword",

"ignore_above": 256

}

}

},

"user_agent": {

"type": "text",

"fields": {

"keyword": {

"type": "keyword",

"ignore_above": 256

}

}

},

"bytes_sent": {

"type": "long"

},

"response_time": {

"type": "float"

}

}

}

},

"priority": 500,

"version": 1

}

This template ensures:

  • Fields like client_ip are mapped as ip for geolocation queries.
  • status_code uses short to save storage.
  • url and user_agent are both text (for full-text search) and keyword (for aggregations).
  • ILM policy is attached to automate index rollover.

5. Set Up Index Lifecycle Management (ILM)

ILM automates the management of indices over time. It prevents your cluster from filling up with old logs and ensures performance by moving data to cheaper storage tiers.

Create an ILM policy named nginx-logs-policy:

PUT _ilm/policy/nginx-logs-policy

{

"policy": {

"phases": {

"hot": {

"actions": {

"rollover": {

"max_size": "50GB",

"max_age": "1d"

}

}

},

"warm": {

"min_age": "7d",

"actions": {

"allocate": {

"number_of_replicas": 0

}

}

},

"cold": {

"min_age": "30d",

"actions": {

"freeze": {}

}

},

"delete": {

"min_age": "90d",

"actions": {

"delete": {}

}

}

}

}

}

This policy:

  • Rolls over the index when it reaches 50GB or 1 day old.
  • After 7 days, removes replicas to save space (warm phase).
  • After 30 days, freezes the index (reduces memory usage).
  • After 90 days, deletes the index entirely.

Apply this policy to your index template as shown in Step 4.

6. Start and Test Filebeat

After configuration, start Filebeat:

sudo systemctl enable filebeat

sudo systemctl start filebeat

Verify its running:

sudo systemctl status filebeat

Check Filebeat logs for errors:

sudo tail -f /var/log/filebeat/filebeat

To confirm logs are being indexed, query Elasticsearch:

GET nginx-logs-*/_search

{

"size": 1,

"sort": [

{

"timestamp": {

"order": "desc"

}

}

]

}

If you receive a document with your log fields, indexing is successful.

7. Use Kibana for Visualization and Monitoring

Install Kibana on the same or a separate server:

sudo apt install kibana

sudo systemctl enable kibana

sudo systemctl start kibana

Access Kibana at http://your-server:5601. Navigate to Stack Management > Index Patterns and create an index pattern matching nginx-logs-*. Select timestamp as the time field.

Now go to Discover to explore your logs in real time. Create visualizations (bar charts, line graphs, heatmaps) and dashboards to monitor request rates, error codes, and client geography.

8. Secure Your Pipeline

In production, never expose Elasticsearch to the public internet. Use:

  • Authentication: Enable Elasticsearchs built-in security (X-Pack) and create users with least-privilege roles.
  • Encryption: Enable TLS between Filebeat and Elasticsearch using certificates.
  • Firewall rules: Restrict access to port 9200 to trusted IP ranges.
  • Filebeat SSL settings: Add to filebeat.yml:
output.elasticsearch:

hosts: ["https://your-elasticsearch-host:9200"]

username: "filebeat_writer"

password: "secure-password-here"

ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]

ssl.certificate: "/etc/filebeat/certs/filebeat.crt"

ssl.key: "/etc/filebeat/certs/filebeat.key"

Generate certificates using OpenSSL or a certificate authority like Lets Encrypt.

Best Practices

1. Structure Logs as JSON Whenever Possible

JSON logs are self-describing and eliminate the need for complex grok patterns in Logstash or Filebeat. Most modern frameworks (e.g., Winston for Node.js, Log4j2 for Java) support JSON output natively. Example:

{

"timestamp": "2024-06-15T10:23:45.123Z",

"level": "error",

"message": "Database connection timeout",

"service": "user-auth",

"trace_id": "a1b2c3d4",

"user_id": 12345,

"ip": "192.168.1.10"

}

With JSON logs, Filebeat can use json.keys_under_root: true to flatten fields directly into the Elasticsearch document, reducing parsing overhead.

2. Avoid Over-Mapping

Dont map every possible field upfront. Start with essential fields needed for querying and visualization. Add fields as needed. Over-mapping increases memory usage and slows down indexing.

3. Use Keywords for Aggregations, Text for Search

Always use keyword for fields youll use in filters, terms aggregations, or sorting (e.g., status_code, service_name). Use text only for full-text search (e.g., message, stack_trace). Use multi-fields to support both:

"message": {

"type": "text",

"fields": {

"keyword": {

"type": "keyword"

}

}

}

4. Implement Log Rotation and Filebeats File State Tracking

Filebeat tracks which lines it has read using a registry file (/var/lib/filebeat/registry). This prevents duplicate indexing when logs are rotated or the agent restarts. Ensure your log rotation tool (e.g., logrotate) uses the copytruncate option to preserve file handles.

5. Monitor Filebeat and Elasticsearch Health

Use Prometheus and Grafana to monitor:

  • Filebeat: Events sent, events failed, backlog size
  • Elasticsearch: Cluster health, indexing rate, JVM heap usage, segment count

Enable Filebeats internal metrics:

filebeat.monitoring.enabled: true

filebeat.monitoring.elasticsearch:

hosts: ["http://your-elasticsearch:9200"]

Then visualize metrics in Kibana under Monitoring.

6. Dont Index Everything

Not all logs are equally valuable. Filter out noisy logs (e.g., health checks, debug messages) before ingestion. Use Filebeats processors to drop events:

processors:

- drop_event:

when:

contains:

message: "GET /health"

Or use Logstashs if conditions for complex logic.

7. Scale with Multiple Nodes and Shards

For high-volume logging (10K+ events/sec), deploy Elasticsearch as a cluster with dedicated master, data, and ingest nodes. Use the formula: Number of shards = Number of data nodes 12. Avoid shards larger than 50GB. Too many small shards hurt performance; too few limit scalability.

8. Regularly Audit and Clean Up Indices

Use the ILM policy to automate deletion. Monitor index growth with:

GET _cat/indices?v&h=index,docs.count,store.size,pri.store.size

Manually delete orphaned or misnamed indices with:

DELETE /old-logs-2023-*

Tools and Resources

Core Tools

Helper Tools

  • Logstash Config Generator: Online tool to generate grok patterns for log formats: grokdebug.herokuapp.com
  • Elasticsearch Mapper Attachments Plugin: For indexing binary logs (e.g., PDFs, images) as text.
  • OpenSearch: Open-source fork of Elasticsearch. Compatible with Filebeat and Kibana. opensearch.org
  • Vector.dev: High-performance, Rust-based log processor. vector.dev

Documentation and Learning

Community and Support

  • Elastic Discuss Forum: discuss.elastic.co
  • Stack Overflow: Tag questions with elasticsearch and filebeat.
  • GitHub Repositories: Search for elasticsearch log pipeline for open-source examples.

Real Examples

Example 1: Indexing Docker Container Logs

When running containers with Docker, logs are stored at /var/lib/docker/containers/*/*-json.log. Filebeat can monitor this directory:

filebeat.inputs:

- type: filestream

enabled: true

paths:

- /var/lib/docker/containers/*/*-json.log

json.keys_under_root: true

json.add_error_key: true

json.message_key: log

processors:

- add_docker_metadata:

host: "unix:///var/run/docker.sock"

match_fields: ["container.id"]

match_sources: ["log"]

output.elasticsearch:

hosts: ["https://es-cluster:9200"]

index: "docker-logs-%{+yyyy.MM.dd}"

username: "filebeat"

password: "secret"

ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]

The add_docker_metadata processor enriches logs with container name, image, labels, and network info. This enables queries like:

GET docker-logs-*/_search

{

"query": {

"match": {

"docker.container.name": "nginx-proxy"

}

}

}

Example 2: Kubernetes Logs with Fluent Bit

In Kubernetes, Fluent Bit can be deployed as a DaemonSet to collect logs from all nodes:

[INPUT]

Name tail

Tag kube.*

Path /var/log/containers/*.log

Parser docker

DB /var/log/flb_kube.db

Mem_Buf_Limit 5MB

Skip_Long_Lines On

Refresh_Interval 10

[PARSER]

Name docker

Format json

Time_Key time

Time_Format %Y-%m-%dT%H:%M:%S.%L

Time_Keep On

Decode_Field_As escaped log

[OUTPUT]

Name es

Match *

Host elasticsearch.default.svc.cluster.local

Port 9200

Logstash_Format On

Logstash_Prefix kube-logs

Retry_Limit False

tls On

tls.verify Off

This sends logs to Elasticsearch with index names like kube-logs-2024.06.15. Combine with Kubernetes labels to filter logs by namespace or pod.

Example 3: Application Logs from Node.js with Winston

In a Node.js app, configure Winston to output structured logs:

const { createLogger, format, transports } = require('winston');

const { combine, timestamp, json } = format;

const logger = createLogger({

level: 'info',

format: combine(

timestamp(),

json()

),

transports: [

new transports.File({ filename: 'app.log' })

]

});

logger.info('User logged in', { userId: 123, ip: '192.168.1.1' });

Then configure Filebeat to read app.log with json.keys_under_root: true. The resulting Elasticsearch document will have top-level fields: timestamp, level, message, userId, and ip.

Example 4: Centralized Logging for Microservices

For 50+ microservices, use a centralized Filebeat deployment with dynamic configuration:

  • Each service writes logs to a dedicated file: /var/log/services/service-a.log
  • Use Filebeats filestream input with glob patterns: /var/log/services/*.log
  • Use a processor to add a service_name field based on filename:
processors:

- add_fields:

target: ''

fields:

service_name: "service-a"

when:

contains:

source: "service-a.log"

- add_fields:

target: ''

fields:

service_name: "service-b"

when:

contains:

source: "service-b.log"

Then create a single index template for all services and use Kibana dashboards grouped by service_name.

FAQs

Can I index logs into Elasticsearch without using Filebeat or Logstash?

Yes. You can write logs directly to Elasticsearch via HTTP POST requests using any programming language. For example, in Python:

import requests

import json

log_entry = {

"timestamp": "2024-06-15T10:23:45Z",

"message": "Application started",

"service": "auth-service"

}

response = requests.post(

"http://elasticsearch:9200/app-logs/_doc",

data=json.dumps(log_entry),

headers={"Content-Type": "application/json"}

)

However, this approach lacks reliability (no retry, no backpressure), file rotation handling, or security. Use only for testing or small-scale scripts.

How do I handle multi-line logs (e.g., Java stack traces)?

Use Filebeats multiline processor:

filebeat.inputs:

- type: filestream

paths:

- /var/log/myapp/*.log

multiline:

pattern: '^[[:space:]]'

match: after

negate: false

max_lines: 500

This combines lines starting with whitespace (e.g., stack trace lines) into a single event. Alternatively, use Logstashs multiline filter.

How much disk space do logs consume in Elasticsearch?

It depends on your log volume and compression. On average, JSON logs compress to 1030% of their original size. A 1GB raw log file may become 150300MB in Elasticsearch. Use ILM to delete old logs and consider hot-warm-cold architectures to reduce costs.

Can I use Elasticsearch to index logs from Windows servers?

Yes. Install Filebeat on Windows and configure it to monitor C:\ProgramData\MyApp\logs\*.log. Use the same configuration syntax. Filebeat supports Windows event logs via the wineventlog input type.

Whats the difference between an index and an index pattern in Kibana?

An index is a physical data structure in Elasticsearch (e.g., nginx-logs-000001). An index pattern is a Kibana configuration that defines which indices to include in a view (e.g., nginx-logs-*). You use index patterns to build dashboards and queries across multiple indices.

Why is my Elasticsearch cluster slow when querying logs?

Common causes:

  • Too many shards per index
  • Large shards (>50GB)
  • Missing keyword fields for aggregations
  • High JVM heap usage
  • Insufficient RAM or CPU on data nodes

Check the Elasticsearch cluster health API and use the Profiler in Kibanas Dev Tools to analyze slow queries.

Do I need to restart Filebeat after changing the config?

Yes. After modifying filebeat.yml, restart the service:

sudo systemctl restart filebeat

Use filebeat test config to validate syntax before restarting.

Conclusion

Indexing logs into Elasticsearch is not just a technical taskits a strategic investment in your systems visibility, resilience, and performance. By following the steps outlined in this guide, youve built a scalable, secure, and automated log pipeline that transforms raw log data into actionable intelligence. From selecting the right collector to enforcing strict mappings and lifecycle policies, each decision impacts how effectively you can detect issues, respond to incidents, and optimize infrastructure.

Remember: the goal is not to collect every log, but to collect the right logs, in the right format, at the right time. Start simple, validate your pipeline with real data, and iterate. Use templates and automation to eliminate manual configuration. Monitor your pipeline relentlesslylogs are only useful if theyre available, accurate, and searchable.

As your infrastructure evolveswhether moving to serverless, expanding to multi-cloud, or adopting AI-driven anomaly detectionyour logging architecture must scale with it. Elasticsearch, paired with Filebeat and Kibana, provides the foundation for that evolution. Keep refining your templates, update your ILM policies, and embrace structured logging as a core engineering practice.

With this knowledge, youre no longer just collecting logsyoure building the nervous system of your digital operations. And thats the hallmark of a truly observability-driven organization.