How to Setup Elk Stack
How to Setup ELK Stack The ELK Stack—comprising Elasticsearch, Logstash, and Kibana—is one of the most powerful and widely adopted open-source solutions for log management, real-time analytics, and observability. Originally developed by Elastic, the ELK Stack enables organizations to collect, process, store, and visualize massive volumes of structured and unstructured data from servers, applicatio
How to Setup ELK Stack
The ELK Stackcomprising Elasticsearch, Logstash, and Kibanais one of the most powerful and widely adopted open-source solutions for log management, real-time analytics, and observability. Originally developed by Elastic, the ELK Stack enables organizations to collect, process, store, and visualize massive volumes of structured and unstructured data from servers, applications, networks, and cloud services. Whether you're monitoring application performance, troubleshooting system errors, or detecting security threats, the ELK Stack provides an end-to-end pipeline that transforms raw logs into actionable insights.
As digital infrastructure becomes increasingly complexwith microservices, containers, hybrid clouds, and distributed systemsthe need for centralized, scalable, and real-time log analysis has never been greater. The ELK Stack addresses this need by offering a flexible, modular architecture that can scale from a single server to enterprise-grade deployments spanning thousands of nodes. This tutorial provides a comprehensive, step-by-step guide to setting up the ELK Stack from scratch, covering installation, configuration, optimization, and real-world use cases. By the end, youll have a fully functional ELK environment ready for production-grade monitoring.
Step-by-Step Guide
Prerequisites
Before beginning the setup, ensure your system meets the following requirements:
- A Linux-based server (Ubuntu 22.04 LTS or CentOS 8/9 recommended)
- At least 4 GB of RAM (8 GB or more recommended for production)
- Minimum 2 CPU cores
- At least 20 GB of free disk space (scalable based on log volume)
- Java 11 or Java 17 installed (Elasticsearch requires Java)
- Root or sudo access
- Internet connectivity for package downloads
It is strongly advised to use a dedicated server or virtual machine for the ELK Stack to avoid resource contention with other services. For testing purposes, you may use cloud providers like AWS, Google Cloud, or Azure, or a local VM using VirtualBox or VMware.
Step 1: Install Java
Elasticsearch, the core search and analytics engine of the ELK Stack, runs on the Java Virtual Machine (JVM). As of Elasticsearch 8.x, Java 17 is the recommended version. Java 11 is also supported but may be deprecated in future releases.
On Ubuntu, run the following commands:
sudo apt update
sudo apt install openjdk-17-jdk -y
Verify the installation:
java -version
You should see output similar to:
openjdk version "17.0.10"
OpenJDK Runtime Environment (build 17.0.10+7-Ubuntu-1ubuntu122.04.1)
OpenJDK 64-Bit Server VM (build 17.0.10+7-Ubuntu-1ubuntu122.04.1, mixed mode, sharing)
On CentOS/RHEL, use:
sudo dnf install java-17-openjdk-devel -y
Set the JAVA_HOME environment variable by editing /etc/environment:
sudo nano /etc/environment
Add this line:
JAVA_HOME="/usr/lib/jvm/java-17-openjdk-amd64"
Save and reload:
source /etc/environment
echo $JAVA_HOME
Step 2: Install Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine capable of handling large volumes of data in near real-time. It stores, searches, and indexes data, making it the backbone of the ELK Stack.
First, import the Elastic GPG key:
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg
Add the Elasticsearch repository:
echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
Update the package list and install Elasticsearch:
sudo apt update
sudo apt install elasticsearch -y
Configure Elasticsearch by editing its main configuration file:
sudo nano /etc/elasticsearch/elasticsearch.yml
Modify the following key settings:
- cluster.name: Set a unique name for your cluster (e.g.,
my-elk-cluster) - node.name: Assign a descriptive name to this node (e.g.,
node-1) - network.host: Set to
0.0.0.0to allow external connections (for testing) or to your servers private IP for production - http.port: Leave as default (
9200) unless you need a custom port - discovery.type: Set to
single-nodefor standalone setups
Example configuration:
cluster.name: my-elk-cluster
node.name: node-1
network.host: 0.0.0.0
http.port: 9200
discovery.type: single-node
Save and exit. Then enable and start the Elasticsearch service:
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch
Verify Elasticsearch is running:
curl -X GET "localhost:9200"
You should receive a JSON response with cluster details, including version, name, and cluster UUID. If you see an error, check the logs with:
sudo journalctl -u elasticsearch -f
Step 3: Install Kibana
Kibana is the visualization layer of the ELK Stack. It provides a web interface to explore data stored in Elasticsearch, create dashboards, and monitor system health.
Install Kibana using the same repository:
sudo apt install kibana -y
Configure Kibana by editing its configuration file:
sudo nano /etc/kibana/kibana.yml
Set the following values:
- server.host: Set to
"0.0.0.0"to allow external access - server.port: Default is
5601 - elasticsearch.hosts: Point to your Elasticsearch instance (e.g.,
["http://localhost:9200"]) - kibana.index: Optional; defaults to
.kibana
Example configuration:
server.host: "0.0.0.0"
server.port: 5601
elasticsearch.hosts: ["http://localhost:9200"]
kibana.index: ".kibana"
Enable and start Kibana:
sudo systemctl enable kibana
sudo systemctl start kibana
Check the service status:
sudo systemctl status kibana
Wait 3060 seconds for Kibana to initialize. Then access it via your browser at http://your-server-ip:5601. You should see the Kibana welcome screen.
Step 4: Install Logstash
Logstash is the data processing pipeline that ingests data from multiple sources, transforms it, and sends it to Elasticsearch. It supports hundreds of input, filter, and output plugins.
Install Logstash:
sudo apt install logstash -y
Logstash configuration files are stored in /etc/logstash/conf.d/. Create a new configuration file:
sudo nano /etc/logstash/conf.d/01-input.conf
Add a basic input configuration to collect system logs:
input {
file {
path => "/var/log/syslog"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
Create a filter configuration to parse syslog data:
sudo nano /etc/logstash/conf.d/02-filter.conf
Add the following:
filter {
if [path] =~ "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
}
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
Create an output configuration to send data to Elasticsearch:
sudo nano /etc/logstash/conf.d/03-output.conf
Add:
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "syslog-%{+YYYY.MM.dd}"
document_type => "syslog"
}
}
Test your Logstash configuration for syntax errors:
sudo /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t
If the test passes, restart Logstash:
sudo systemctl restart logstash
sudo systemctl enable logstash
Step 5: Verify Data Flow
Once all components are running, verify that logs are being ingested and indexed.
First, check if indices are being created in Elasticsearch:
curl -X GET "localhost:9200/_cat/indices?v"
You should see an index named syslog-YYYY.MM.dd.
Next, open Kibana in your browser at http://your-server-ip:5601. Click on Explore on my own or Get started with sample data if prompted.
Go to Stack Management ? Index Patterns ? Create index pattern. Enter syslog* as the pattern and select @timestamp as the time field. Click Create index pattern.
Now go to Discover and select your new index pattern. You should see raw log entries from your systems syslog file. If logs appear, your ELK Stack is successfully collecting, processing, and visualizing data.
Step 6: Secure Your ELK Stack (Optional but Recommended)
By default, the ELK Stack runs without authentication. In production, securing your stack is critical.
Elasticsearch 8.x includes built-in security features. Enable them by editing /etc/elasticsearch/elasticsearch.yml:
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
Generate passwords for built-in users:
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto
Record the generated passwords. Then, update Kibanas configuration to authenticate:
sudo nano /etc/kibana/kibana.yml
Add:
elasticsearch.username: "kibana_system"
elasticsearch.password: "your-generated-password"
Restart Kibana and Elasticsearch:
sudo systemctl restart elasticsearch kibana
Access Kibana again. You will now be prompted to log in. Use the username kibana_system and the generated password.
For Logstash, update the output section in 03-output.conf:
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "syslog-%{+YYYY.MM.dd}"
user => "logstash_writer"
password => "your-logstash-password"
}
}
Create a dedicated user for Logstash using Elasticsearchs security API:
curl -X POST "localhost:9200/_security/user/logstash_writer" -H 'Content-Type: application/json' -d'
{
"password" : "your-logstash-password",
"roles" : [ "logstash_writer" ]
}'
Restart Logstash after making changes.
Best Practices
1. Use Dedicated Hardware or VMs
ELK components are resource-intensive. Elasticsearch, in particular, requires significant memory and fast I/O. Avoid running ELK on shared infrastructure. Use separate servers or VMs for each component, especially in production environments. If deploying on a single machine, ensure it has at least 8 GB RAM and SSD storage.
2. Optimize Elasticsearch Memory Allocation
By default, Elasticsearch allocates 1 GB of heap memory. For production, increase this to 50% of your system RAM, but never exceed 32 GB. Edit /etc/elasticsearch/jvm.options:
-Xms4g
-Xmx4g
Always set both -Xms and -Xmx to the same value to avoid heap resizing overhead.
3. Use Index Lifecycle Management (ILM)
Log data grows rapidly. Without proper retention policies, disk usage can become unmanageable. Use Elasticsearchs Index Lifecycle Management to automate rollover, deletion, and cold storage.
Create an ILM policy via Kibanas Stack Management ? Index Lifecycle Policies. Define phases: hot (active), warm (infrequent access), cold (archival), and delete. Apply the policy to your index patterns to automate cleanup.
4. Enable Monitoring and Alerts
Use Kibanas Uptime and Observability features to monitor the health of your ELK Stack. Set up alerts for high CPU usage, low disk space, or failed Logstash pipelines. You can also integrate with external tools like Prometheus and Grafana for advanced metrics.
5. Use Filebeat Instead of Logstash for Simple Log Collection
While Logstash is powerful, its heavy for simple log shipping. For lightweight, high-performance log collection from agents, use Filebeat (part of the Elastic Beats family). Filebeat is designed to ship logs efficiently with minimal resource usage. Replace Logstash input with Filebeat on client machines and send data directly to Elasticsearch or via a central Logstash instance.
6. Avoid Large Document Sizes
Elasticsearch performs best with documents under 10 KB. If youre ingesting large JSON payloads or binary logs, consider compressing or splitting them. Use the drop filter in Logstash to exclude unnecessary fields and reduce index size.
7. Regular Backups
Use Elasticsearchs snapshot and restore feature to back up indices. Configure a repository (e.g., S3, NFS, or shared filesystem) and schedule periodic snapshots. This ensures data recovery in case of hardware failure or misconfiguration.
8. Network Security
Restrict access to Elasticsearch and Kibana using firewalls. Only allow traffic from trusted IPs. Use reverse proxies like Nginx or Apache to add SSL/TLS termination and authentication layers. Never expose Kibana or Elasticsearch directly to the public internet without authentication and encryption.
9. Monitor Logstash Pipeline Performance
Use Logstashs built-in metrics to track throughput, event processing time, and backpressure. Enable the monitoring plugin and view metrics in Kibana under Monitoring ? Logstash.
10. Use Templates for Consistent Index Mapping
Define custom index templates to enforce consistent field types (e.g., string vs. keyword, date formats). This prevents mapping conflicts and improves search performance. Create templates in Kibana or via the Elasticsearch API before data ingestion begins.
Tools and Resources
Official Documentation
- Elastic Documentation Comprehensive guides for all ELK components
- Elastic Downloads Latest versions of Elasticsearch, Kibana, Logstash, and Beats
- Elastic Observability Advanced monitoring and alerting features
Community and Support
- Elastic Discuss Forum Active community for troubleshooting and best practices
- Elastic GitHub Repositories Source code, examples, and issue tracking
- Elastic Blog Tutorials, case studies, and feature announcements
Third-Party Tools
- Filebeat Lightweight log shipper for agents
- Metricbeat Collects system and service metrics
- Prometheus + Grafana For advanced metrics visualization alongside ELK
- Docker Compose For quick local testing using pre-built images
- Ansible For automated, repeatable ELK deployments across multiple servers
Sample Configurations and Templates
GitHub hosts numerous open-source ELK configuration repositories:
- Elastic Examples Official templates for common use cases
- Docker ELK Stack Docker Compose setup for local development
- Ansible Playbooks Automated deployment scripts
Learning Resources
- Elastic Learning Platform Free and paid courses on ELK Stack fundamentals
- Udemy: ELK Stack: Elasticsearch, Logstash, Kibana Hands-on video tutorials
- YouTube: Elastic Channel Official tutorials and webinars
Real Examples
Example 1: Centralized Web Server Log Monitoring
A company runs 50 Nginx web servers across multiple regions. Each server generates over 10 GB of access logs daily. Using the ELK Stack, they deploy Filebeat on each server to ship logs to a central Logstash instance. Logstash parses the Nginx logs using a custom Grok pattern, extracts fields like client_ip, status_code, request_time, and user_agent.
These fields are indexed into Elasticsearch with daily indices. In Kibana, they create dashboards showing:
- Top 10 most visited URLs
- HTTP status code distribution (4xx/5xx errors)
- Response time percentiles
- Geolocation map of traffic sources
Alerts are configured to trigger when 5xx errors exceed 5% in a 5-minute window. This enables their DevOps team to respond to outages before users report them.
Example 2: Security Incident Detection
A financial institution uses the ELK Stack to monitor SSH login attempts across its Linux servers. They configure Filebeat to collect /var/log/auth.log and send it to Logstash. Logstash filters for failed login attempts and flags repeated failures from the same IP.
A Kibana dashboard displays:
- Number of failed SSH attempts per hour
- Top 10 source IPs with failed logins
- Geolocation of attack sources
An alert rule triggers when an IP attempts more than 10 failed logins in 2 minutes. The system automatically blocks the IP via a script that updates the firewall (via iptables or ufw). This automated detection reduces the risk of brute-force attacks significantly.
Example 3: Container Log Aggregation with Docker and Kubernetes
A DevOps team runs microservices in Docker containers and Kubernetes clusters. Each container outputs logs to stdout. They deploy Filebeat as a DaemonSet in Kubernetes, allowing it to collect logs from all nodes.
Filebeat uses autodiscover to dynamically detect containers and apply specific log parsing rules based on container labels. For example, logs from a Node.js app are parsed with a JSON filter, while Python app logs use a multiline pattern to handle stack traces.
Logs are sent to Elasticsearch and visualized in Kibana with dashboards for:
- Application error rates by service
- Latency trends across microservices
- Resource consumption correlation (CPU/memory vs. log volume)
This setup provides full observability into their microservices architecture, enabling rapid debugging and performance optimization.
Example 4: IoT Sensor Data Ingestion
An industrial IoT platform collects temperature, humidity, and pressure readings from 10,000 sensors every 10 seconds. Data is sent via MQTT to a central broker, which forwards it to Logstash via the MQTT input plugin.
Logstash converts the JSON payload into structured fields and adds timestamps and sensor IDs. Data is indexed into Elasticsearch with hourly indices. Kibana visualizes real-time trends, detects anomalies (e.g., sudden temperature spikes), and triggers alerts to maintenance teams.
Using ILM, data older than 30 days is moved to a low-cost storage tier, and data older than 1 year is automatically deleted to manage costs.
FAQs
Q1: Can I run ELK Stack on Windows?
Yes, Elasticsearch, Kibana, and Logstash are available for Windows. Download the .zip files from the official Elastic website and run them as services. However, Linux is preferred for production due to better performance, stability, and community support.
Q2: How much disk space does ELK Stack require?
Theres no fixed amount. It depends on your log volume, retention period, and compression. As a rule of thumb, expect 15 GB per day per server for moderate log levels. For 100 servers generating 2 GB/day each, youll need 200 GB/day. With 30-day retention, thats 6 TB. Always plan for 2030% overhead for indexing and replicas.
Q3: Is ELK Stack free to use?
Yes, the core ELK Stack (Elasticsearch, Logstash, Kibana) is open-source under the SSPL license and free to use. However, advanced features like machine learning, alerting, and SAML authentication require a paid subscription (Elastic Platinum or Enterprise license).
Q4: Can I use Elasticsearch without Logstash?
Absolutely. Many users send data directly to Elasticsearch using Beats (Filebeat, Metricbeat), HTTP APIs, or custom scripts. Logstash is optional and used only when complex data transformation is needed.
Q5: Why is my Kibana dashboard empty?
Common causes include: incorrect index pattern, misconfigured Logstash filters, firewall blocking connections, or Elasticsearch not running. Check Elasticsearch indices with curl localhost:9200/_cat/indices and verify Logstash logs in /var/log/logstash/logstash-plain.log.
Q6: How do I upgrade the ELK Stack?
Always follow Elastics official upgrade guide. Perform a rolling upgrade: update one node at a time, ensure cluster health is green, and verify data integrity. Never skip versions. Back up your data before upgrading.
Q7: Whats the difference between ELK and EFK?
ELK = Elasticsearch, Logstash, Kibana. EFK = Elasticsearch, Fluentd, Kibana. Fluentd is an alternative to Logstash, often preferred in Kubernetes environments for its lightweight design and plugin ecosystem. Both serve the same purpose: log collection and transformation.
Q8: How do I scale ELK Stack for high availability?
Deploy multiple Elasticsearch nodes in a cluster with replication. Use dedicated master nodes, data nodes, and coordinating nodes. Run Kibana behind a load balancer. Use Filebeat with failover to multiple Logstash instances. Configure Elasticsearch with at least 3 master-eligible nodes and 2 replicas per index.
Q9: Can I use ELK Stack with cloud providers?
Yes. Elastic offers a fully managed service called Elastic Cloud on AWS, GCP, and Azure. Alternatively, you can install ELK manually on cloud VMs. Managed services reduce operational overhead but come with subscription costs.
Q10: How do I troubleshoot slow searches in Kibana?
Check Elasticsearch logs for slow queries. Use the Dev Tools console to run GET _search with "profile": true to analyze performance. Optimize by: reducing the number of fields returned, using filters instead of queries, avoiding wildcards, and increasing shard size (avoid too many small shards).
Conclusion
Setting up the ELK Stack is a powerful step toward gaining full visibility into your digital infrastructure. From monitoring application errors to detecting security threats and optimizing performance, the ELK Stack transforms chaotic log data into structured, searchable, and actionable insights. This guide has walked you through the complete processfrom installing Java and configuring each component to securing your deployment and applying real-world best practices.
Remember, the key to success with ELK is not just installation, but ongoing maintenance: monitoring performance, tuning indexing strategies, managing storage, and automating alerts. As your environment grows, consider scaling with Filebeat, implementing ILM, and integrating with Kubernetes or cloud-native tools.
Whether youre a DevOps engineer, a system administrator, or a security analyst, mastering the ELK Stack empowers you to proactively manage systems, reduce downtime, and make data-driven decisions. Start small with a single server, validate your pipeline, and expand gradually. With the right configuration and practices, your ELK Stack will become the central nervous system of your observability strategy.
Now that youve successfully set up the ELK Stack, the next step is to explore advanced use cases: anomaly detection, machine learning with Elasticsearch, custom Kibana visualizations, and integrating with CI/CD pipelines. The possibilities are limitlessand the insights you uncover could transform how you operate your technology stack.