How to Setup Elk Stack

How to Setup ELK Stack The ELK Stack—comprising Elasticsearch, Logstash, and Kibana—is one of the most powerful and widely adopted open-source solutions for log management, real-time analytics, and observability. Originally developed by Elastic, the ELK Stack enables organizations to collect, process, store, and visualize massive volumes of structured and unstructured data from servers, applicatio

alex

Nov 10, 2025 - 12:04

How to Setup ELK Stack

The ELK Stackcomprising Elasticsearch, Logstash, and Kibanais one of the most powerful and widely adopted open-source solutions for log management, real-time analytics, and observability. Originally developed by Elastic, the ELK Stack enables organizations to collect, process, store, and visualize massive volumes of structured and unstructured data from servers, applications, networks, and cloud services. Whether you're monitoring application performance, troubleshooting system errors, or detecting security threats, the ELK Stack provides an end-to-end pipeline that transforms raw logs into actionable insights.

As digital infrastructure becomes increasingly complexwith microservices, containers, hybrid clouds, and distributed systemsthe need for centralized, scalable, and real-time log analysis has never been greater. The ELK Stack addresses this need by offering a flexible, modular architecture that can scale from a single server to enterprise-grade deployments spanning thousands of nodes. This tutorial provides a comprehensive, step-by-step guide to setting up the ELK Stack from scratch, covering installation, configuration, optimization, and real-world use cases. By the end, youll have a fully functional ELK environment ready for production-grade monitoring.

Step-by-Step Guide

Prerequisites

Before beginning the setup, ensure your system meets the following requirements:

A Linux-based server (Ubuntu 22.04 LTS or CentOS 8/9 recommended)
At least 4 GB of RAM (8 GB or more recommended for production)
Minimum 2 CPU cores
At least 20 GB of free disk space (scalable based on log volume)
Java 11 or Java 17 installed (Elasticsearch requires Java)
Root or sudo access
Internet connectivity for package downloads

It is strongly advised to use a dedicated server or virtual machine for the ELK Stack to avoid resource contention with other services. For testing purposes, you may use cloud providers like AWS, Google Cloud, or Azure, or a local VM using VirtualBox or VMware.

Step 1: Install Java

Elasticsearch, the core search and analytics engine of the ELK Stack, runs on the Java Virtual Machine (JVM). As of Elasticsearch 8.x, Java 17 is the recommended version. Java 11 is also supported but may be deprecated in future releases.

On Ubuntu, run the following commands:

sudo apt update sudo apt install openjdk-17-jdk -y

Verify the installation:

java -version

You should see output similar to:

openjdk version "17.0.10" OpenJDK Runtime Environment (build 17.0.10+7-Ubuntu-1ubuntu122.04.1) OpenJDK 64-Bit Server VM (build 17.0.10+7-Ubuntu-1ubuntu122.04.1, mixed mode, sharing)

On CentOS/RHEL, use:

sudo dnf install java-17-openjdk-devel -y

Set the JAVA_HOME environment variable by editing /etc/environment:

sudo nano /etc/environment

Add this line:

JAVA_HOME="/usr/lib/jvm/java-17-openjdk-amd64"

Save and reload:

source /etc/environment echo $JAVA_HOME

Step 2: Install Elasticsearch

Elasticsearch is a distributed, RESTful search and analytics engine capable of handling large volumes of data in near real-time. It stores, searches, and indexes data, making it the backbone of the ELK Stack.

First, import the Elastic GPG key:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg

Add the Elasticsearch repository:

echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

Update the package list and install Elasticsearch:

sudo apt update sudo apt install elasticsearch -y

Configure Elasticsearch by editing its main configuration file:

sudo nano /etc/elasticsearch/elasticsearch.yml

Modify the following key settings:

cluster.name: Set a unique name for your cluster (e.g., my-elk-cluster)
node.name: Assign a descriptive name to this node (e.g., node-1)
network.host: Set to 0.0.0.0 to allow external connections (for testing) or to your servers private IP for production
http.port: Leave as default (9200) unless you need a custom port
discovery.type: Set to single-node for standalone setups

Example configuration:

cluster.name: my-elk-cluster node.name: node-1 network.host: 0.0.0.0 http.port: 9200 discovery.type: single-node

Save and exit. Then enable and start the Elasticsearch service:

sudo systemctl enable elasticsearch sudo systemctl start elasticsearch

Verify Elasticsearch is running:

curl -X GET "localhost:9200"

You should receive a JSON response with cluster details, including version, name, and cluster UUID. If you see an error, check the logs with:

sudo journalctl -u elasticsearch -f

Step 3: Install Kibana

Kibana is the visualization layer of the ELK Stack. It provides a web interface to explore data stored in Elasticsearch, create dashboards, and monitor system health.

Install Kibana using the same repository:

sudo apt install kibana -y

Configure Kibana by editing its configuration file:

sudo nano /etc/kibana/kibana.yml

Set the following values:

server.host: Set to "0.0.0.0" to allow external access
server.port: Default is 5601
elasticsearch.hosts: Point to your Elasticsearch instance (e.g., ["http://localhost:9200"])
kibana.index: Optional; defaults to .kibana

Example configuration:

server.host: "0.0.0.0" server.port: 5601 elasticsearch.hosts: ["http://localhost:9200"] kibana.index: ".kibana"

Enable and start Kibana:

sudo systemctl enable kibana sudo systemctl start kibana

Check the service status:

sudo systemctl status kibana

Wait 3060 seconds for Kibana to initialize. Then access it via your browser at http://your-server-ip:5601. You should see the Kibana welcome screen.

Step 4: Install Logstash

Logstash is the data processing pipeline that ingests data from multiple sources, transforms it, and sends it to Elasticsearch. It supports hundreds of input, filter, and output plugins.

Install Logstash:

sudo apt install logstash -y

Logstash configuration files are stored in /etc/logstash/conf.d/. Create a new configuration file:

sudo nano /etc/logstash/conf.d/01-input.conf

Add a basic input configuration to collect system logs:

input {
file {
path => "/var/log/syslog"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}

Create a filter configuration to parse syslog data:

sudo nano /etc/logstash/conf.d/02-filter.conf

Add the following:

filter {
if [path] =~ "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
}
date {
match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}

Create an output configuration to send data to Elasticsearch:

sudo nano /etc/logstash/conf.d/03-output.conf

Add:

output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "syslog-%{+YYYY.MM.dd}"
document_type => "syslog"
}
}

Test your Logstash configuration for syntax errors:

sudo /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t

If the test passes, restart Logstash:

sudo systemctl restart logstash sudo systemctl enable logstash

Step 5: Verify Data Flow

Once all components are running, verify that logs are being ingested and indexed.

First, check if indices are being created in Elasticsearch:

curl -X GET "localhost:9200/_cat/indices?v"

You should see an index named syslog-YYYY.MM.dd.

Next, open Kibana in your browser at http://your-server-ip:5601. Click on Explore on my own or Get started with sample data if prompted.

Go to Stack Management ? Index Patterns ? Create index pattern. Enter syslog* as the pattern and select @timestamp as the time field. Click Create index pattern.

Now go to Discover and select your new index pattern. You should see raw log entries from your systems syslog file. If logs appear, your ELK Stack is successfully collecting, processing, and visualizing data.

Step 6: Secure Your ELK Stack (Optional but Recommended)

By default, the ELK Stack runs without authentication. In production, securing your stack is critical.

Elasticsearch 8.x includes built-in security features. Enable them by editing /etc/elasticsearch/elasticsearch.yml:

xpack.security.enabled: true xpack.security.transport.ssl.enabled: true

Generate passwords for built-in users:

sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto

Record the generated passwords. Then, update Kibanas configuration to authenticate:

sudo nano /etc/kibana/kibana.yml

Add:

elasticsearch.username: "kibana_system" elasticsearch.password: "your-generated-password"

Restart Kibana and Elasticsearch:

sudo systemctl restart elasticsearch kibana

Access Kibana again. You will now be prompted to log in. Use the username kibana_system and the generated password.

For Logstash, update the output section in 03-output.conf:

output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "syslog-%{+YYYY.MM.dd}"
user => "logstash_writer"
password => "your-logstash-password"
}
}

Create a dedicated user for Logstash using Elasticsearchs security API:

curl -X POST "localhost:9200/_security/user/logstash_writer" -H 'Content-Type: application/json' -d' { "password" : "your-logstash-password", "roles" : [ "logstash_writer" ] }'

Restart Logstash after making changes.

Best Practices

1. Use Dedicated Hardware or VMs

ELK components are resource-intensive. Elasticsearch, in particular, requires significant memory and fast I/O. Avoid running ELK on shared infrastructure. Use separate servers or VMs for each component, especially in production environments. If deploying on a single machine, ensure it has at least 8 GB RAM and SSD storage.

2. Optimize Elasticsearch Memory Allocation

By default, Elasticsearch allocates 1 GB of heap memory. For production, increase this to 50% of your system RAM, but never exceed 32 GB. Edit /etc/elasticsearch/jvm.options:

-Xms4g -Xmx4g

Always set both -Xms and -Xmx to the same value to avoid heap resizing overhead.

3. Use Index Lifecycle Management (ILM)

Log data grows rapidly. Without proper retention policies, disk usage can become unmanageable. Use Elasticsearchs Index Lifecycle Management to automate rollover, deletion, and cold storage.

Create an ILM policy via Kibanas Stack Management ? Index Lifecycle Policies. Define phases: hot (active), warm (infrequent access), cold (archival), and delete. Apply the policy to your index patterns to automate cleanup.

4. Enable Monitoring and Alerts

Use Kibanas Uptime and Observability features to monitor the health of your ELK Stack. Set up alerts for high CPU usage, low disk space, or failed Logstash pipelines. You can also integrate with external tools like Prometheus and Grafana for advanced metrics.

5. Use Filebeat Instead of Logstash for Simple Log Collection

While Logstash is powerful, its heavy for simple log shipping. For lightweight, high-performance log collection from agents, use Filebeat (part of the Elastic Beats family). Filebeat is designed to ship logs efficiently with minimal resource usage. Replace Logstash input with Filebeat on client machines and send data directly to Elasticsearch or via a central Logstash instance.

6. Avoid Large Document Sizes

Elasticsearch performs best with documents under 10 KB. If youre ingesting large JSON payloads or binary logs, consider compressing or splitting them. Use the drop filter in Logstash to exclude unnecessary fields and reduce index size.

7. Regular Backups

Use Elasticsearchs snapshot and restore feature to back up indices. Configure a repository (e.g., S3, NFS, or shared filesystem) and schedule periodic snapshots. This ensures data recovery in case of hardware failure or misconfiguration.

8. Network Security

Restrict access to Elasticsearch and Kibana using firewalls. Only allow traffic from trusted IPs. Use reverse proxies like Nginx or Apache to add SSL/TLS termination and authentication layers. Never expose Kibana or Elasticsearch directly to the public internet without authentication and encryption.

9. Monitor Logstash Pipeline Performance

Use Logstashs built-in metrics to track throughput, event processing time, and backpressure. Enable the monitoring plugin and view metrics in Kibana under Monitoring ? Logstash.

10. Use Templates for Consistent Index Mapping

Define custom index templates to enforce consistent field types (e.g., string vs. keyword, date formats). This prevents mapping conflicts and improves search performance. Create templates in Kibana or via the Elasticsearch API before data ingestion begins.

Tools and Resources

Official Documentation

Elastic Documentation Comprehensive guides for all ELK components
Elastic Downloads Latest versions of Elasticsearch, Kibana, Logstash, and Beats
Elastic Observability Advanced monitoring and alerting features

Community and Support

Elastic Discuss Forum Active community for troubleshooting and best practices
Elastic GitHub Repositories Source code, examples, and issue tracking
Elastic Blog Tutorials, case studies, and feature announcements

Third-Party Tools

Filebeat Lightweight log shipper for agents
Metricbeat Collects system and service metrics
Prometheus + Grafana For advanced metrics visualization alongside ELK
Docker Compose For quick local testing using pre-built images
Ansible For automated, repeatable ELK deployments across multiple servers

Sample Configurations and Templates

GitHub hosts numerous open-source ELK configuration repositories:

Elastic Examples Official templates for common use cases
Docker ELK Stack Docker Compose setup for local development
Ansible Playbooks Automated deployment scripts

Learning Resources

Elastic Learning Platform Free and paid courses on ELK Stack fundamentals
Udemy: ELK Stack: Elasticsearch, Logstash, Kibana Hands-on video tutorials
YouTube: Elastic Channel Official tutorials and webinars

Real Examples

Example 1: Centralized Web Server Log Monitoring

A company runs 50 Nginx web servers across multiple regions. Each server generates over 10 GB of access logs daily. Using the ELK Stack, they deploy Filebeat on each server to ship logs to a central Logstash instance. Logstash parses the Nginx logs using a custom Grok pattern, extracts fields like client_ip, status_code, request_time, and user_agent.

These fields are indexed into Elasticsearch with daily indices. In Kibana, they create dashboards showing:

Top 10 most visited URLs
HTTP status code distribution (4xx/5xx errors)
Response time percentiles
Geolocation map of traffic sources

Alerts are configured to trigger when 5xx errors exceed 5% in a 5-minute window. This enables their DevOps team to respond to outages before users report them.

Example 2: Security Incident Detection

A financial institution uses the ELK Stack to monitor SSH login attempts across its Linux servers. They configure Filebeat to collect /var/log/auth.log and send it to Logstash. Logstash filters for failed login attempts and flags repeated failures from the same IP.

A Kibana dashboard displays:

Number of failed SSH attempts per hour
Top 10 source IPs with failed logins
Geolocation of attack sources

An alert rule triggers when an IP attempts more than 10 failed logins in 2 minutes. The system automatically blocks the IP via a script that updates the firewall (via iptables or ufw). This automated detection reduces the risk of brute-force attacks significantly.

Example 3: Container Log Aggregation with Docker and Kubernetes

A DevOps team runs microservices in Docker containers and Kubernetes clusters. Each container outputs logs to stdout. They deploy Filebeat as a DaemonSet in Kubernetes, allowing it to collect logs from all nodes.

Filebeat uses autodiscover to dynamically detect containers and apply specific log parsing rules based on container labels. For example, logs from a Node.js app are parsed with a JSON filter, while Python app logs use a multiline pattern to handle stack traces.

Logs are sent to Elasticsearch and visualized in Kibana with dashboards for:

Application error rates by service
Latency trends across microservices
Resource consumption correlation (CPU/memory vs. log volume)

This setup provides full observability into their microservices architecture, enabling rapid debugging and performance optimization.

Example 4: IoT Sensor Data Ingestion

An industrial IoT platform collects temperature, humidity, and pressure readings from 10,000 sensors every 10 seconds. Data is sent via MQTT to a central broker, which forwards it to Logstash via the MQTT input plugin.

Logstash converts the JSON payload into structured fields and adds timestamps and sensor IDs. Data is indexed into Elasticsearch with hourly indices. Kibana visualizes real-time trends, detects anomalies (e.g., sudden temperature spikes), and triggers alerts to maintenance teams.

Using ILM, data older than 30 days is moved to a low-cost storage tier, and data older than 1 year is automatically deleted to manage costs.

FAQs

Q1: Can I run ELK Stack on Windows?

Yes, Elasticsearch, Kibana, and Logstash are available for Windows. Download the .zip files from the official Elastic website and run them as services. However, Linux is preferred for production due to better performance, stability, and community support.

Q2: How much disk space does ELK Stack require?

Theres no fixed amount. It depends on your log volume, retention period, and compression. As a rule of thumb, expect 15 GB per day per server for moderate log levels. For 100 servers generating 2 GB/day each, youll need 200 GB/day. With 30-day retention, thats 6 TB. Always plan for 2030% overhead for indexing and replicas.

Q3: Is ELK Stack free to use?

Yes, the core ELK Stack (Elasticsearch, Logstash, Kibana) is open-source under the SSPL license and free to use. However, advanced features like machine learning, alerting, and SAML authentication require a paid subscription (Elastic Platinum or Enterprise license).

Q4: Can I use Elasticsearch without Logstash?

Absolutely. Many users send data directly to Elasticsearch using Beats (Filebeat, Metricbeat), HTTP APIs, or custom scripts. Logstash is optional and used only when complex data transformation is needed.

Q5: Why is my Kibana dashboard empty?

Common causes include: incorrect index pattern, misconfigured Logstash filters, firewall blocking connections, or Elasticsearch not running. Check Elasticsearch indices with curl localhost:9200/_cat/indices and verify Logstash logs in /var/log/logstash/logstash-plain.log.

Q6: How do I upgrade the ELK Stack?

Always follow Elastics official upgrade guide. Perform a rolling upgrade: update one node at a time, ensure cluster health is green, and verify data integrity. Never skip versions. Back up your data before upgrading.

Q7: Whats the difference between ELK and EFK?

ELK = Elasticsearch, Logstash, Kibana. EFK = Elasticsearch, Fluentd, Kibana. Fluentd is an alternative to Logstash, often preferred in Kubernetes environments for its lightweight design and plugin ecosystem. Both serve the same purpose: log collection and transformation.

Q8: How do I scale ELK Stack for high availability?

Deploy multiple Elasticsearch nodes in a cluster with replication. Use dedicated master nodes, data nodes, and coordinating nodes. Run Kibana behind a load balancer. Use Filebeat with failover to multiple Logstash instances. Configure Elasticsearch with at least 3 master-eligible nodes and 2 replicas per index.

Q9: Can I use ELK Stack with cloud providers?

Yes. Elastic offers a fully managed service called Elastic Cloud on AWS, GCP, and Azure. Alternatively, you can install ELK manually on cloud VMs. Managed services reduce operational overhead but come with subscription costs.

Q10: How do I troubleshoot slow searches in Kibana?

Check Elasticsearch logs for slow queries. Use the Dev Tools console to run GET _search with "profile": true to analyze performance. Optimize by: reducing the number of fields returned, using filters instead of queries, avoiding wildcards, and increasing shard size (avoid too many small shards).

Conclusion

Setting up the ELK Stack is a powerful step toward gaining full visibility into your digital infrastructure. From monitoring application errors to detecting security threats and optimizing performance, the ELK Stack transforms chaotic log data into structured, searchable, and actionable insights. This guide has walked you through the complete processfrom installing Java and configuring each component to securing your deployment and applying real-world best practices.

Remember, the key to success with ELK is not just installation, but ongoing maintenance: monitoring performance, tuning indexing strategies, managing storage, and automating alerts. As your environment grows, consider scaling with Filebeat, implementing ILM, and integrating with Kubernetes or cloud-native tools.

Whether youre a DevOps engineer, a system administrator, or a security analyst, mastering the ELK Stack empowers you to proactively manage systems, reduce downtime, and make data-driven decisions. Start small with a single server, validate your pipeline, and expand gradually. With the right configuration and practices, your ELK Stack will become the central nervous system of your observability strategy.

Now that youve successfully set up the ELK Stack, the next step is to explore advanced use cases: anomaly detection, machine learning with Elasticsearch, custom Kibana visualizations, and integrating with CI/CD pipelines. The possibilities are limitlessand the insights you uncover could transform how you operate your technology stack.

alex