How to Forward Logs to Elasticsearch
How to Forward Logs to Elasticsearch Log data is the silent witness to every system operation, application behavior, and security event within modern digital infrastructures. From web servers and databases to containerized microservices and cloud-native platforms, logs generate vast volumes of structured and unstructured information that, when properly collected and analyzed, become invaluable for
How to Forward Logs to Elasticsearch
Log data is the silent witness to every system operation, application behavior, and security event within modern digital infrastructures. From web servers and databases to containerized microservices and cloud-native platforms, logs generate vast volumes of structured and unstructured information that, when properly collected and analyzed, become invaluable for troubleshooting, performance optimization, compliance, and threat detection. However, raw log files scattered across hundreds of servers are nearly impossible to manage manually. This is where Elasticsearch comes in.
Elasticsearch, part of the Elastic Stack (formerly known as the ELK Stack), is a powerful, distributed search and analytics engine designed to store, index, and retrieve massive datasets in near real time. When paired with log forwarders like Filebeat, Fluentd, or Logstash, Elasticsearch becomes the central nervous system of your observability strategy. Forwarding logs to Elasticsearch enables centralized logging, powerful querying, visual dashboards, and automated alerting transforming chaotic log streams into actionable intelligence.
This guide provides a comprehensive, step-by-step walkthrough on how to forward logs to Elasticsearch. Whether you're managing a small on-premises environment or a large-scale Kubernetes cluster, this tutorial covers the core concepts, practical configurations, industry best practices, essential tools, real-world examples, and answers to frequently asked questions all designed to help you implement a robust, scalable, and secure log forwarding pipeline.
Step-by-Step Guide
1. Understand the Log Forwarding Architecture
Before configuring any tool, its critical to understand the typical architecture of log forwarding to Elasticsearch. A standard pipeline consists of three components:
- Log Source: Applications, servers, containers, or network devices generating logs (e.g., Apache access logs, systemd journal, Docker containers).
- Log Forwarder/Collector: A lightweight agent that tails log files, reads from system streams, or receives logs via network protocols and ships them to Elasticsearch.
- Elasticsearch Cluster: The centralized storage and indexing engine that receives, processes, and stores log data for search and analysis.
Often, a middle component called Logstash is inserted between the forwarder and Elasticsearch for parsing, filtering, and enriching logs. However, modern deployments increasingly favor lightweight forwarders like Filebeat or Fluentd that can send data directly to Elasticsearch, reducing complexity and resource overhead.
2. Install and Configure Elasticsearch
Before forwarding logs, ensure Elasticsearch is properly installed and accessible. You can deploy Elasticsearch on-premises, in a private cloud, or use a managed service like Elastic Cloud.
Option A: Self-Hosted Elasticsearch (Linux)
Download and install Elasticsearch from the official repository:
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.12.0-linux-x86_64.tar.gz
tar -xzf elasticsearch-8.12.0-linux-x86_64.tar.gz
cd elasticsearch-8.12.0
Edit the configuration file config/elasticsearch.yml:
cluster.name: my-logging-cluster
node.name: node-1
network.host: 0.0.0.0
http.port: 9200
discovery.type: single-node
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
Generate certificates for secure communication:
bin/elasticsearch-certutil ca
bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12
Move the generated certificates to the config/certs directory and update elasticsearch.yml with:
xpack.security.transport.ssl.certificate: certs/node-1.crt
xpack.security.transport.ssl.key: certs/node-1.key
xpack.security.transport.ssl.certificate_authorities: [ "certs/ca.crt" ]
Start Elasticsearch:
bin/elasticsearch
Option B: Elastic Cloud (Managed)
If using Elastic Cloud, create a deployment via the web interface. Once deployed, note the following details from the deployment dashboard:
- Elasticsearch endpoint (e.g.,
https://your-deployment-id.us-central1.gcp.cloud.es.io:9243) - Username and password (or API key)
- CA certificate (download as PEM file)
3. Choose and Install a Log Forwarder
There are several tools to forward logs to Elasticsearch. The most popular are Filebeat, Fluentd, and Logstash. Each has strengths depending on your use case.
Filebeat Lightweight and Ideal for File-Based Logs
Filebeat is a lightweight, Go-based log shipper developed by Elastic. Its perfect for reading log files from disk and sending them directly to Elasticsearch or Logstash.
Install Filebeat on your log source server:
wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.12.0-linux-x86_64.tar.gz
tar -xzf filebeat-8.12.0-linux-x86_64.tar.gz
cd filebeat-8.12.0
Edit filebeat.yml:
filebeat.inputs:
- type: filestream
enabled: true
paths:
- /var/log/nginx/access.log
- /var/log/nginx/error.log
- /var/log/syslog
output.elasticsearch:
hosts: ["https://your-elasticsearch-host:9243"]
username: "filebeat_system"
password: "your-secure-password"
ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]
ssl.verification_mode: "full"
Copy the Elasticsearch CA certificate to /etc/filebeat/certs/ca.crt.
Test the configuration:
./filebeat test config
./filebeat test output
Start Filebeat:
sudo ./filebeat -e
For systemd-based systems, install Filebeat as a service:
sudo ./filebeat install
sudo systemctl enable filebeat
sudo systemctl start filebeat
Fluentd Flexible and Extensible for Complex Environments
Fluentd is a popular open-source data collector with a rich plugin ecosystem. Its ideal for environments requiring advanced parsing, filtering, and routing of logs from multiple sources (e.g., Docker, Kubernetes, systemd).
Install Fluentd via RubyGems or package manager:
curl -L https://toolbelt.treasuredata.com/sh/install-debian-bullseye-td-agent4.sh | sh
Configure /etc/td-agent/td-agent.conf:
<source>
@type tail
path /var/log/nginx/access.log
pos_file /var/log/td-agent/nginx-access.log.pos
tag nginx.access
format nginx
</source>
<match **>
@type elasticsearch
host your-elasticsearch-host
port 9243
scheme https
ssl_verify false
user filebeat_system
password your-secure-password
logstash_format true
logstash_prefix nginx-logs
flush_interval 10s
</match>
Install the Elasticsearch plugin if needed:
td-agent-gem install fluent-plugin-elasticsearch
Restart Fluentd:
sudo systemctl restart td-agent
Logstash Advanced Processing Layer
Logstash is a server-side data processing pipeline that ingests logs from multiple sources, transforms them, and sends them to Elasticsearch. Its powerful but resource-intensive best used when complex parsing (e.g., grok patterns, geoip enrichment) is required.
Install Logstash:
wget https://artifacts.elastic.co/downloads/logstash/logstash-8.12.0-linux-x86_64.tar.gz
tar -xzf logstash-8.12.0-linux-x86_64.tar.gz
cd logstash-8.12.0
Create a configuration file at config/logstash.conf:
input {
file {
path => "/var/log/nginx/access.log"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
geoip {
source => "clientip"
}
}
output {
elasticsearch {
hosts => ["https://your-elasticsearch-host:9243"]
user => "logstash_writer"
password => "your-secure-password"
ssl_certificate_verification => true
cacert => "/etc/logstash/certs/ca.crt"
index => "nginx-logs-%{+YYYY.MM.dd}"
}
}
Run Logstash:
bin/logstash -f config/logstash.conf
4. Create an Index Template in Elasticsearch
When logs are first indexed, Elasticsearch automatically creates an index. However, for consistent performance and querying, define an index template to control mapping, settings, and lifecycle policies.
Use the Elasticsearch API to create a template:
curl -X PUT "https://your-elasticsearch-host:9243/_index_template/nginx_logs_template" \
-H "Content-Type: application/json" \
-u "elastic:your-password" \
--cacert /etc/filebeat/certs/ca.crt \
-d '{
"index_patterns": ["nginx-logs-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "5s"
},
"mappings": {
"properties": {
"timestamp": { "type": "date" },
"clientip": { "type": "ip" },
"bytes": { "type": "long" },
"method": { "type": "keyword" },
"url": { "type": "text", "analyzer": "standard" },
"user_agent": { "type": "text", "analyzer": "keyword" }
}
}
},
"priority": 500
}'
This ensures all future indices matching nginx-logs-* inherit the same structure, improving search efficiency and reducing mapping conflicts.
5. Verify Log Ingestion
Once the forwarder is running, verify logs are reaching Elasticsearch:
curl -X GET "https://your-elasticsearch-host:9243/_cat/indices?v" \
-u "elastic:your-password" \
--cacert /etc/filebeat/certs/ca.crt
You should see indices like nginx-logs-2024.06.15 with a non-zero document count.
To view sample logs:
curl -X GET "https://your-elasticsearch-host:9243/nginx-logs-*/_search?size=5" \
-u "elastic:your-password" \
--cacert /etc/filebeat/certs/ca.crt
If logs appear, your pipeline is working. If not, check the forwarder logs (/var/log/filebeat/filebeat or /var/log/td-agent/td-agent.log) for errors.
6. Secure the Pipeline
Never expose Elasticsearch to the public internet. Use the following security measures:
- Enable TLS/SSL encryption between forwarders and Elasticsearch.
- Use Elasticsearchs built-in role-based access control (RBAC) create dedicated users with minimal privileges (e.g.,
beats_writerrole). - Use API keys instead of passwords where possible.
- Restrict network access using firewalls or VPCs.
- Regularly rotate certificates and credentials.
Best Practices
1. Use Lightweight Forwarders Where Possible
Filebeat and Fluentd consume far less memory and CPU than Logstash. Use them for edge servers, containers, or resource-constrained environments. Reserve Logstash for centralized processing hubs where complex transformations are needed.
2. Avoid Indexing Unnecessary Fields
Every field indexed increases storage and slows queries. Use the drop_fields processor in Filebeat or record_transformer in Fluentd to remove irrelevant data like internal server IDs or debug flags.
3. Implement Index Lifecycle Management (ILM)
Log data grows rapidly. Configure ILM policies to automatically roll over indices, delete old data, and move warm data to cheaper storage:
PUT _ilm/policy/nginx_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "30d"
}
}
},
"warm": {
"min_age": "30d",
"actions": {
"allocate": {
"number_of_replicas": 0
}
}
},
"delete": {
"min_age": "365d",
"actions": {
"delete": {}
}
}
}
}
}
Apply the policy to your index template:
"index_patterns": ["nginx-logs-*"],
"settings": {
"index.lifecycle.name": "nginx_policy",
"index.lifecycle.rollover_alias": "nginx-logs"
}
4. Use Consistent Timestamps and Time Zones
Ensure all log sources use UTC or a consistent time zone. Elasticsearch stores timestamps in UTC. Mismatched time zones cause confusion in dashboards and alerting. Use the date filter in Logstash or timestamp processor in Filebeat to normalize timestamps.
5. Monitor Forwarder Health
Forwarders can fail silently. Monitor their status using:
- Filebeats built-in metrics endpoint:
http://localhost:5066 - Fluentds
fluentd-monitoringplugin - System-level monitoring (CPU, memory, disk I/O)
Integrate metrics into Grafana or Kibana for real-time dashboards.
6. Avoid Log Bombing
High-frequency applications (e.g., microservices logging every request) can overwhelm Elasticsearch. Use sampling, batching, or rate limiting:
- Set
bulk_max_sizein Filebeat to 5MB - Use
flush_intervalto reduce frequency - Apply
rate_limitin Fluentd plugins
7. Separate Log Types into Different Indices
Dont mix application logs, system logs, and security logs into one index. Use distinct index patterns (app-logs-*, syslog-*, audit-*) to improve search performance and enable granular retention policies.
Tools and Resources
Core Tools
- Filebeat Official lightweight log shipper from Elastic. Ideal for file-based logs.
- Fluentd Highly extensible data collector with 1000+ plugins. Great for Kubernetes and hybrid environments.
- Logstash Full-featured pipeline for complex parsing and enrichment. Best for centralized processing.
- Elasticsearch The indexing and search engine at the core of the pipeline.
- Kibana Visualization and dashboarding tool for Elasticsearch. Essential for log analysis.
- Vector Modern, high-performance log collector written in Rust. Emerging alternative to Filebeat and Fluentd.
- Fluent Bit Lightweight version of Fluentd, designed for containers and edge devices.
Useful Resources
- Elastic Documentation Comprehensive guides for all Elastic Stack components.
- Fluentd Official Docs Plugin reference and configuration examples.
- Filebeat GitHub Repo Source code, issues, and community contributions.
- How to Choose the Right Elastic Stack Component Official comparison guide.
- Fluent Bit GitHub Lightweight alternative for containerized environments.
- Elastic Cloud Fully managed Elasticsearch and Kibana service.
Community and Support
Engage with active communities:
- Elastic Discuss Forum
- Fluentd Slack Channel
- Stack Overflow (tag: elasticsearch, filebeat, fluentd)
- GitHub Issues for tool-specific bugs
Real Examples
Example 1: Forwarding Nginx Access Logs to Elasticsearch
Scenario: You run a web application on Ubuntu with Nginx. You want to centralize access logs for traffic analysis and anomaly detection.
Steps:
- Install Filebeat on the Nginx server.
- Configure
filebeat.ymlto read/var/log/nginx/access.log. - Enable the Nginx module:
sudo filebeat modules enable nginx - Apply default parsing:
sudo filebeat setup - Start Filebeat and verify indices appear in Kibana.
Result: In Kibana, you can create a dashboard showing top clients, HTTP status codes, response times, and geographic distribution of traffic all from raw Nginx logs.
Example 2: Kubernetes Container Logs via Fluent Bit
Scenario: You run a Kubernetes cluster and need to collect logs from all pods.
Steps:
- Deploy Fluent Bit as a DaemonSet using the official Helm chart:
helm repo add fluent https://fluent.github.io/helm-charts
helm install fluent-bit fluent/fluent-bit
- Fluent Bit automatically tails
/var/log/containers/*.logfrom each node. - Configure output to send to Elasticsearch:
[OUTPUT]
Name es
Match *
Host your-elasticsearch-host
Port 9243
TLS On
TLS.Verify Off
Logstash_Format On
Logstash_Prefix k8s-logs
Replace_Dots On
User filebeat_system
Password your-password
- Use Kibanas Kubernetes app to visualize pod logs, resource usage, and errors.
Result: You can search logs from any pod by name, namespace, or container ID, and correlate them with cluster events.
Example 3: Centralized Syslog Aggregation with Logstash
Scenario: You have 50 Linux servers sending syslog data. You want to parse and enrich them before storage.
Steps:
- Configure rsyslog on all servers to forward to a central Logstash server on UDP port 5140.
*.* @central-logserver:5140
- On the Logstash server, create an input for syslog:
input {
udp {
port => 5140
type => "syslog"
}
}
- Use grok patterns to parse RFC3164 or RFC5424 syslog messages:
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:logmessage}" }
}
date {
match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
geoip {
source => "hostname"
target => "geoip"
}
}
}
- Output to Elasticsearch with index naming by date.
Result: All syslog entries are parsed, enriched with geo-location data, and stored in structured fields enabling alerts for failed SSH logins or unusual root activity.
FAQs
Can I forward logs to Elasticsearch without installing agents on every server?
Yes, but with limitations. You can use syslog forwarding (UDP/TCP) or network-based collectors like Vector or Fluent Bit that pull logs remotely. However, for reliability, security, and detailed metadata, installing lightweight agents (Filebeat, Fluent Bit) on each host is strongly recommended.
How much disk space does Elasticsearch need for logs?
It depends on log volume and retention. As a rule of thumb: 10GB per day of 1000 logs/sec at 500 bytes each = ~43GB/day. With 30-day retention, expect ~1.3TB. Always provision 2030% extra for overhead and indexing.
Is it safe to send logs over the public internet?
No. Always encrypt logs in transit using TLS and restrict access via firewalls or private networks. Never expose Elasticsearch directly to the internet. Use VPNs, private endpoints, or Elastic Clouds secure connectivity options.
Can I forward logs from Windows servers?
Yes. Filebeat supports Windows Event Logs. Configure the winlogbeat module to collect Application, Security, and System logs. Use the same Elasticsearch output configuration as Linux.
Whats the difference between Filebeat and Logstash?
Filebeat is a lightweight log shipper designed to collect and forward logs with minimal overhead. Logstash is a full-featured pipeline that can parse, filter, transform, and enrich logs but requires more memory and CPU. Use Filebeat for edge collection; use Logstash for centralized processing.
How do I handle log rotation?
Filebeat and Fluentd automatically handle log rotation. They track file positions using sincedb or pos_file files. Ensure these files are stored on persistent storage and not deleted during container restarts.
Can I use Elasticsearch for real-time alerting on logs?
Yes. Use Kibanas Alerting and Watcher features to create rules based on log patterns (e.g., more than 10 500 errors in 5 minutes). Alerts can trigger email, Slack, or webhook notifications.
Do I need Kibana to use Elasticsearch for logs?
No Elasticsearch can be queried directly via API. However, Kibana provides intuitive dashboards, visualizations, and alerting tools that make log analysis practical and scalable. Its highly recommended for production use.
Conclusion
Forwarding logs to Elasticsearch is not just a technical task its a foundational practice for modern observability. By centralizing your log data, you transform scattered, unstructured text into a powerful resource for debugging, performance tuning, security monitoring, and business intelligence. The pipeline outlined in this guide from selecting the right forwarder to securing the transport and optimizing indexing provides a robust, scalable, and maintainable foundation for any environment.
Remember: the goal is not to collect more logs, but to collect the right logs, in the right format, at the right time. Prioritize security, consistency, and efficiency. Start small with one application or server and expand iteratively. Use index templates, lifecycle policies, and monitoring to keep your system healthy as it grows.
As your infrastructure scales whether into the cloud, containers, or serverless architectures a well-designed log forwarding pipeline will remain your most reliable source of truth. Invest time in building it correctly. The insights you gain will pay dividends in reduced downtime, faster incident resolution, and greater operational confidence.
Now that you understand how to forward logs to Elasticsearch, the next step is to integrate this pipeline into your CI/CD workflows, automate deployment with Terraform or Ansible, and connect it to your alerting systems. The power of observability is in your hands use it wisely.