How to Configure Fluentd
How to Configure Fluentd Fluentd is an open-source data collector designed to unify logging and data ingestion across diverse systems. It serves as a powerful, flexible, and scalable solution for aggregating logs from servers, containers, applications, and cloud services before forwarding them to centralized storage or analytics platforms such as Elasticsearch, Amazon S3, Google Cloud Storage, or
How to Configure Fluentd
Fluentd is an open-source data collector designed to unify logging and data ingestion across diverse systems. It serves as a powerful, flexible, and scalable solution for aggregating logs from servers, containers, applications, and cloud services before forwarding them to centralized storage or analytics platforms such as Elasticsearch, Amazon S3, Google Cloud Storage, or Splunk. With its plugin-based architecture, Fluentd supports over 700 plugins, making it one of the most extensible log collection tools in modern DevOps and observability stacks.
Configuring Fluentd correctly is critical for ensuring reliable, high-performance log pipelines. Poorly configured instances can lead to data loss, delayed ingestion, excessive resource consumption, or even system instability. Whether you're managing a small application stack or a large Kubernetes cluster, mastering Fluentd configuration ensures that your monitoring, troubleshooting, and compliance workflows remain robust and efficient.
This guide provides a comprehensive, step-by-step walkthrough of how to configure Fluentd from scratch. Youll learn practical setup methods, industry best practices, real-world examples, essential tools, and answers to common challenges. By the end of this tutorial, youll be equipped to deploy Fluentd confidently in production environmentsoptimized for performance, security, and maintainability.
Step-by-Step Guide
Prerequisites
Before configuring Fluentd, ensure your system meets the following requirements:
- A Linux-based operating system (Ubuntu 20.04+, CentOS 8+, or Debian 11+)
- Root or sudo access
- Network connectivity to your destination systems (e.g., Elasticsearch, S3, etc.)
- Basic familiarity with command-line interfaces and YAML configuration files
- Optional: Docker or Kubernetes if deploying in containerized environments
Fluentd also requires Ruby or a precompiled binary. While its possible to install from source, using package managers or official installers is recommended for production stability.
Step 1: Install Fluentd
Fluentd offers multiple installation methods depending on your platform. Below are the most common approaches.
On Ubuntu/Debian
Use the official td-agent package, which bundles Fluentd with dependencies and is maintained by Treasure Data:
wget https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh
sudo sh install-ubuntu-focal-td-agent4.sh
For older Ubuntu versions, replace focal with your release codename (e.g., bionic for 18.04).
Start and enable the service:
sudo systemctl start td-agent
sudo systemctl enable td-agent
On CentOS/RHEL
Install using the YUM repository:
curl -L https://toolbelt.treasuredata.com/sh/install-redhat-8-td-agent4.sh | sh
sudo systemctl start td-agent
sudo systemctl enable td-agent
Using Docker
For containerized deployments, use the official Fluentd image:
docker run -d --name fluentd -p 24224:24224 -v $(pwd)/fluentd.conf:/etc/fluent/fluent.conf fluent/fluentd:latest
This command runs Fluentd in detached mode, maps port 24224 (the default TCP input port), and mounts a local configuration file.
Using Helm (Kubernetes)
If you're running Fluentd in Kubernetes, use the official Helm chart:
helm repo add fluent https://fluent.github.io/helm-charts
helm install fluentd fluent/fluentd --namespace logging --create-namespace
Ensure you customize the values.yaml to match your logging destination and resource requirements.
Step 2: Locate and Understand the Configuration File
Fluentds main configuration file is typically located at:
/etc/td-agent/td-agent.conf(for td-agent on Ubuntu/CentOS)/etc/fluent/fluent.conf(for standalone Fluentd installations)
The configuration file uses a simple yet powerful syntax based on blocks and directives. Each block defines a component of the data pipeline: input, filter, output, or match.
A minimal configuration looks like this:
<source>
@type tail
path /var/log/*.log
pos_file /var/log/td-agent/tail-containers.pos
tag docker.*
read_from_head true
</source>
<match *>
@type stdout
</match>
Lets break this down:
<source>defines where Fluentd collects data fromin this case, log files using thetailplugin.pathspecifies the log file pattern to monitor.pos_filetracks read positions to avoid duplicate logs after restarts.tagassigns a label to the data stream for routing.<match *>routes all tagged data to the outputhere,stdout, which prints logs to the console.
Step 3: Configure Input Sources
Inputs tell Fluentd where to collect data from. Common input plugins include:
tailReads log files (most common for application logs)forwardAccepts data from other Fluentd instances (used in distributed setups)syslogListens for system syslog messageshttpReceives logs via HTTP POST requestsdockerCollects container logs from Docker daemon
Example: Tail Application Logs
To monitor Nginx access logs:
<source>
@type tail
path /var/log/nginx/access.log
pos_file /var/log/td-agent/nginx-access.pos
tag nginx.access
format nginx
read_from_head true
</source>
The format nginx directive automatically parses Nginxs default log format into structured fields like remote, method, path, status, and size.
Example: Collect Docker Container Logs
To ingest logs from all running containers:
<source>
@type docker
tag docker.*
read_from_head true
</source>
Fluentd automatically reads from Dockers JSON log files located at /var/lib/docker/containers/*/*.log.
Example: Accept Logs via HTTP
For applications that push logs via REST API:
<source>
@type http
port 9880
bind 0.0.0.0
</source>
You can now send logs using curl:
curl -X POST -d 'json={"message":"Hello Fluentd"}' http://localhost:9880/app.log
Step 4: Apply Filters for Data Enrichment
Filters modify or enrich log data before it reaches the output. Common use cases include:
- Adding timestamps or hostnames
- Removing sensitive fields (PII)
- Parsing nested JSON
- Renaming or restructuring fields
Example: Add Hostname and Timestamp
<filter docker.*>
@type record_transformer
<record>
hostname ${HOSTNAME}
env production
</record>
</filter>
This adds two static fields to every log entry from Docker containers.
Example: Parse JSON Log Lines
Some applications output logs as JSON strings. Use the parser filter to extract them:
<filter app.log>
@type parser
key_name log
reserve_data true
<parse>
@type json
</parse>
</filter>
This assumes your log contains a field called log with a JSON string value. The parsed fields are merged into the main record.
Example: Remove Sensitive Data
To redact passwords or tokens:
<filter *.log>
@type grep
<exclude>
key message
pattern (password|token|secret)
</exclude>
</filter>
This removes any log entry containing those keywords. For redaction instead of deletion, use the record_transformer plugin to overwrite values.
Step 5: Configure Output Destinations
Outputs define where Fluentd sends processed logs. Choose based on your storage or analytics platform.
Output to Elasticsearch
Install the plugin:
sudo td-agent-gem install fluent-plugin-elasticsearch
Configure the output:
<match docker.*>
@type elasticsearch
host elasticsearch.example.com
port 9200
logstash_format true
logstash_prefix fluentd
index_name fluentd-${tag}
type_name _doc
flush_interval 10s
request_timeout 30s
<buffer>
@type file
path /var/log/td-agent/buffer/elasticsearch
chunk_limit_size 2MB
queue_limit_length 32
retry_max_interval 30
retry_forever true
</buffer>
</match>
Key parameters:
logstash_format trueUses Logstash-style index naming (e.g.,fluentd-2024.05.15)flush_intervalControls how often data is sent (balance between latency and throughput)bufferCritical for resilience. Uses disk-based buffering to prevent data loss during outages.
Output to Amazon S3
Install the plugin:
sudo td-agent-gem install fluent-plugin-s3
Configure:
<match app.log>
@type s3
aws_key_id YOUR_AWS_KEY
aws_sec_key YOUR_AWS_SECRET
s3_bucket your-logging-bucket
s3_region us-east-1
path logs/
buffer_path /var/log/td-agent/s3
time_slice_format %Y/%m/%d/%H
time_slice_wait 10m
utc
format json
<buffer time>
@type file
timekey 3600
timekey_wait 10m
timekey_use_utc true
chunk_limit_size 256m
</buffer>
</match>
This uploads logs hourly to S3 in structured JSON format, organized by date/time.
Output to Google Cloud Storage
Install:
sudo td-agent-gem install fluent-plugin-google-cloud
Configure:
<match *.log>
@type google_cloud
project your-gcp-project
dataset_id fluentd_logs
table_id logs
auto_create_dataset true
auto_create_table true
<buffer>
@type file
path /var/log/td-agent/buffer/gcs
chunk_limit_size 10m
flush_interval 30s
</buffer>
</match>
Output to Multiple Destinations
Use the copy plugin to send data to multiple outputs:
<match docker.*>
@type copy
<store>
@type elasticsearch
host elasticsearch.example.com
port 9200
logstash_format true
</store>
<store>
@type s3
aws_key_id YOUR_AWS_KEY
aws_sec_key YOUR_AWS_SECRET
s3_bucket your-logging-bucket
path logs/
format json
</store>
</match>
This sends logs to both Elasticsearch (for real-time search) and S3 (for long-term archiving).
Step 6: Test and Validate Configuration
Always validate your configuration before restarting Fluentd:
sudo td-agent --dry-run -c /etc/td-agent/td-agent.conf
If the configuration is valid, youll see a message like:
2024-05-15 10:30:00 +0000 [info]: fluent/engine.rb:123:configure: fluent/conf/config_file.rb:143:configure: configuration is valid
Then restart Fluentd:
sudo systemctl restart td-agent
Monitor logs for errors:
sudo journalctl -u td-agent -f
To verify data flow, check the output destination (e.g., Elasticsearch Kibana, S3 bucket, or console output).
Step 7: Monitor Fluentd Performance
Enable Fluentds built-in metrics endpoint:
<system>
log_level info
<plugin>
@type prometheus
port 24231
<metric>
name fluentd_output_status
type counter
desc The number of events processed
<labels>
tag ${tag}
type ${type}
</labels>
</metric>
</plugin>
</system>
Access metrics at http://localhost:24231/metrics and integrate with Prometheus for alerting and dashboards.
Best Practices
Use Buffering to Prevent Data Loss
Never rely on in-memory buffering in production. Always configure disk-backed buffers using the @type file directive. This ensures logs are persisted to disk during network outages, service restarts, or destination downtime.
Key buffer parameters:
chunk_limit_sizeLimit chunk size to avoid memory spikes (recommended: 210MB)queue_limit_lengthControl backlog size (e.g., 32128)retry_max_intervalSet exponential backoff (e.g., 30s)retry_forever truePrevents data loss during extended outages
Tag Logs Strategically
Use meaningful tags to enable routing and filtering. Avoid generic tags like all. Instead, use hierarchical naming:
app.web.accessapp.api.errorinfra.kubernetes.node
This allows precise control over which logs go where, improving performance and reducing noise.
Secure Your Configuration
Never hardcode credentials in configuration files. Use environment variables or secret managers:
<match s3.*>
@type s3
aws_key_id ${AWS_ACCESS_KEY_ID}
aws_sec_key ${AWS_SECRET_ACCESS_KEY}
...
</match>
Set environment variables in systemd or Docker:
Environment=AWS_ACCESS_KEY_ID=your-key
Environment=AWS_SECRET_ACCESS_KEY=your-secret
For Kubernetes, use Secrets and mount them as environment variables.
Optimize for Resource Usage
Fluentd can consume significant CPU and memory under high load. Monitor usage and tune accordingly:
- Limit the number of concurrent threads using
num_threadsin input/output plugins - Reduce
flush_intervalonly if latency is criticalotherwise, use 30s60s for efficiency - Use
compress gzipin S3 or HTTP outputs to reduce bandwidth - Run Fluentd on dedicated nodes or containers to avoid resource contention
Implement Log Rotation
Fluentds tail plugin works best with log files that are rotated by tools like logrotate. Ensure your rotation strategy preserves file handles and uses the copytruncate option to avoid interrupting Fluentds tailing process.
Example /etc/logrotate.d/nginx:
/var/log/nginx/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 0640 www-data adm
sharedscripts
postrotate
/usr/sbin/invoke-rc.d nginx rotate >/dev/null
endscript
}
Enable Monitoring and Alerting
Integrate Fluentd with monitoring tools:
- Use Prometheus + Grafana to track buffer size, retry counts, and throughput
- Set alerts for buffer overflow (>90% full), high retry rates, or output failures
- Log Fluentds own errors to a separate file for easier debugging
Version Control Your Configurations
Treat Fluentd configurations as code. Store them in Git repositories with clear commit messages and change logs. Use CI/CD pipelines to validate and deploy configurations across environments.
Tools and Resources
Official Documentation
Fluentds official documentation is the most authoritative source:
- https://docs.fluentd.org/ Comprehensive guides, plugin references, and architecture details
- https://github.com/fluent/fluentd Source code, issues, and release notes
Plugin Registry
Explore the Fluentd plugin ecosystem:
- https://rubygems.org/search?query=fluent-plugin Search all Fluentd plugins
- https://github.com/fluent/fluentd/wiki/Plugin-List Community-maintained list
Configuration Validators
Use tools to validate syntax before deployment:
td-agent --dry-runBuilt-in config validator- https://github.com/fluent/fluentd-config-validator Automated validation script
Monitoring and Observability
- Prometheus + Grafana For metrics collection and visualization
- Fluent Bit Lightweight alternative for edge deployments (use alongside Fluentd for aggregation)
- ELK Stack (Elasticsearch, Logstash, Kibana) Popular destination for Fluentd logs
- OpenTelemetry Fluentd can ingest OTLP data via plugins for unified telemetry
Community Support
- Stack Overflow Active community for troubleshooting
- GitHub Discussions Official forum for questions
- Fluentd Slack Channel Real-time help from developers
Sample Configuration Repositories
Study real-world configurations:
- https://github.com/fluent/fluentd-kubernetes-daemonset Official Kubernetes setup
- https://github.com/SumoLogic/fluentd-kubernetes-sumologic Sumo Logic integration
- https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud GCP logging examples
Real Examples
Example 1: Logging a Microservices Architecture
Scenario: You have 10 microservices running in Kubernetes, each outputting JSON logs to stdout. You need to collect, enrich, and send logs to Elasticsearch and S3.
Configuration:
<source>
@type kubernetes
tag kube.*
read_from_head true
<parse>
@type json
</parse>
</source>
<filter kube.**>
@type record_transformer
<record>
service_name ${kubernetes['labels']['app']}
namespace ${kubernetes['namespace_name']}
</record>
</filter>
<match kube.**>
@type copy
<store>
@type elasticsearch
host elasticsearch.logging.svc.cluster.local
port 9200
logstash_format true
logstash_prefix k8s-logs
<buffer>
@type file
path /var/log/fluentd-buffers/kubernetes
chunk_limit_size 5MB
retry_max_interval 60
retry_forever true
</buffer>
</store>
<store>
@type s3
aws_key_id ${AWS_ACCESS_KEY_ID}
aws_sec_key ${AWS_SECRET_ACCESS_KEY}
s3_bucket my-logs-bucket
s3_region us-west-2
path logs/k8s/
time_slice_format %Y/%m/%d/%H
time_slice_wait 10m
format json
<buffer time>
@type file
path /var/log/fluentd-buffers/s3
timekey 3600
timekey_wait 10m
</buffer>
</store>
</match>
This setup:
- Uses the Kubernetes plugin to read container logs
- Extracts app name and namespace from Kubernetes metadata
- Sends logs to Elasticsearch for real-time analysis
- Archives logs in S3 for compliance and backup
Example 2: Centralized Syslog Aggregation
Scenario: You have 50 Linux servers sending syslog messages. You want to centralize them, filter out noise, and store in a time-series database.
Configuration:
<source>
@type syslog
port 5140
bind 0.0.0.0
protocol_type tcp
tag system.syslog
<parse>
@type syslog
</parse>
</source>
<filter system.syslog>
@type grep
<exclude>
key message
pattern (CRON|systemd-logind)
</exclude>
</filter>
<match system.syslog>
@type tdlog
apikey YOUR_TD_API_KEY
endpoint https://in.treasuredata.com
buffer_type file
buffer_path /var/log/td-agent/buffer/td
flush_interval 30s
<buffer>
@type file
path /var/log/td-agent/buffer/td
chunk_limit_size 10MB
queue_limit_length 128
retry_max_interval 60
retry_forever true
</buffer>
</match>
This configuration:
- Accepts TCP syslog from remote hosts
- Filters out non-critical system messages
- Forwards to Treasure Data (or another analytics platform)
Example 3: Redacting PII in Real-Time
Scenario: Your application logs include email addresses and credit card numbers. You must comply with GDPR and PCI-DSS.
Configuration:
<filter app.log>
@type record_transformer
<record>
message ${record["message"].gsub(/[\w\.-]+@[\w\.-]+\.\w+/, "[REDACTED_EMAIL]")}
</record>
</filter>
<filter app.log>
@type record_transformer
<record>
message ${record["message"].gsub(/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/, "[REDACTED_CC]")}
</record>
</filter>
This uses Ruby string substitution to mask sensitive patterns before logs are sent out.
FAQs
What is the difference between Fluentd and Fluent Bit?
Fluentd is a full-featured, Ruby-based log collector with rich plugin support, ideal for centralized aggregation. Fluent Bit is a lightweight, C-based alternative designed for edge devices and resource-constrained environments. Many teams use Fluent Bit for collection and Fluentd for aggregation.
Can Fluentd handle high-throughput logging?
Yes. With proper buffer tuning, disk I/O optimization, and multi-threading, Fluentd can handle tens of thousands of events per second. For extreme scale, use multiple Fluentd instances behind a load balancer or combine with Kafka for buffering.
How do I troubleshoot missing logs?
Check the following:
- Is the log file being tailed? Verify file permissions and path.
- Is the tag matching the filter/output? Use
stdouttemporarily to debug. - Are buffers full? Check
/var/log/td-agent/buffer/for backlogs. - Is the output destination reachable? Test connectivity with telnet or curl.
- Review Fluentd logs:
journalctl -u td-agent -f
How often should I restart Fluentd?
Never restart unless necessary. Fluentd supports hot-reloading via sudo systemctl reload td-agent, which applies configuration changes without dropping logs.
Can Fluentd parse non-JSON logs?
Yes. Fluentd supports regex, Apache, Nginx, Syslog, CSV, and custom parsers. Use the parser filter with @type regexp to define custom patterns.
Is Fluentd secure?
Fluentd itself is secure when configured properly. Use TLS for input/output connections, restrict network access via firewalls, avoid hardcoding secrets, and run Fluentd under a non-root user.
Does Fluentd support Kubernetes out of the box?
Yes. The kubernetes input plugin automatically extracts container logs, pod names, namespaces, labels, and annotations from Kubernetes metadata without requiring sidecars.
What happens if the output destination is down?
Fluentd uses its buffer system to store logs locally until the destination recovers. With retry_forever true and disk-backed buffers, no data is lostonly delayed.
Conclusion
Configuring Fluentd is not merely a technical taskits a strategic decision that impacts the reliability, scalability, and security of your entire observability infrastructure. By following the step-by-step guide above, youve learned how to install Fluentd, define inputs and outputs, enrich data with filters, apply buffers for resilience, and deploy in real-world scenarios.
Remember: Fluentds power lies in its flexibility. Whether youre logging a single server or a thousand microservices, the same core principles applytag wisely, buffer aggressively, secure rigorously, and monitor continuously.
As your infrastructure evolves, so should your Fluentd configuration. Regularly review logs, audit plugin usage, update to newer versions, and integrate with modern toolchains like OpenTelemetry and Prometheus. Fluentd is not a set and forget toolits a living component of your data pipeline that demands attention, tuning, and care.
With the knowledge in this guide, youre now equipped to deploy Fluentd confidently in any environment. Start small, validate thoroughly, and scale deliberately. Your logs are your systems memoryprotect them well.