How to Configure Fluentd

How to Configure Fluentd Fluentd is an open-source data collector designed to unify logging and data ingestion across diverse systems. It serves as a powerful, flexible, and scalable solution for aggregating logs from servers, containers, applications, and cloud services before forwarding them to centralized storage or analytics platforms such as Elasticsearch, Amazon S3, Google Cloud Storage, or

alex

Nov 10, 2025 - 12:06

How to Configure Fluentd

Fluentd is an open-source data collector designed to unify logging and data ingestion across diverse systems. It serves as a powerful, flexible, and scalable solution for aggregating logs from servers, containers, applications, and cloud services before forwarding them to centralized storage or analytics platforms such as Elasticsearch, Amazon S3, Google Cloud Storage, or Splunk. With its plugin-based architecture, Fluentd supports over 700 plugins, making it one of the most extensible log collection tools in modern DevOps and observability stacks.

Configuring Fluentd correctly is critical for ensuring reliable, high-performance log pipelines. Poorly configured instances can lead to data loss, delayed ingestion, excessive resource consumption, or even system instability. Whether you're managing a small application stack or a large Kubernetes cluster, mastering Fluentd configuration ensures that your monitoring, troubleshooting, and compliance workflows remain robust and efficient.

This guide provides a comprehensive, step-by-step walkthrough of how to configure Fluentd from scratch. Youll learn practical setup methods, industry best practices, real-world examples, essential tools, and answers to common challenges. By the end of this tutorial, youll be equipped to deploy Fluentd confidently in production environmentsoptimized for performance, security, and maintainability.

Step-by-Step Guide

Prerequisites

Before configuring Fluentd, ensure your system meets the following requirements:

A Linux-based operating system (Ubuntu 20.04+, CentOS 8+, or Debian 11+)
Root or sudo access
Network connectivity to your destination systems (e.g., Elasticsearch, S3, etc.)
Basic familiarity with command-line interfaces and YAML configuration files
Optional: Docker or Kubernetes if deploying in containerized environments

Fluentd also requires Ruby or a precompiled binary. While its possible to install from source, using package managers or official installers is recommended for production stability.

Step 1: Install Fluentd

Fluentd offers multiple installation methods depending on your platform. Below are the most common approaches.

On Ubuntu/Debian

Use the official td-agent package, which bundles Fluentd with dependencies and is maintained by Treasure Data:

wget https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh sudo sh install-ubuntu-focal-td-agent4.sh

For older Ubuntu versions, replace focal with your release codename (e.g., bionic for 18.04).

Start and enable the service:

sudo systemctl start td-agent sudo systemctl enable td-agent

On CentOS/RHEL

Install using the YUM repository:

curl -L https://toolbelt.treasuredata.com/sh/install-redhat-8-td-agent4.sh | sh sudo systemctl start td-agent sudo systemctl enable td-agent

Using Docker

For containerized deployments, use the official Fluentd image:

docker run -d --name fluentd -p 24224:24224 -v $(pwd)/fluentd.conf:/etc/fluent/fluent.conf fluent/fluentd:latest

This command runs Fluentd in detached mode, maps port 24224 (the default TCP input port), and mounts a local configuration file.

Using Helm (Kubernetes)

If you're running Fluentd in Kubernetes, use the official Helm chart:

helm repo add fluent https://fluent.github.io/helm-charts
helm install fluentd fluent/fluentd --namespace logging --create-namespace

Ensure you customize the values.yaml to match your logging destination and resource requirements.

Step 2: Locate and Understand the Configuration File

Fluentds main configuration file is typically located at:

/etc/td-agent/td-agent.conf (for td-agent on Ubuntu/CentOS)
/etc/fluent/fluent.conf (for standalone Fluentd installations)

The configuration file uses a simple yet powerful syntax based on blocks and directives. Each block defines a component of the data pipeline: input, filter, output, or match.

A minimal configuration looks like this:

<source>
@type tail
path /var/log/*.log
pos_file /var/log/td-agent/tail-containers.pos
tag docker.*
read_from_head true
</source>
<match *>
@type stdout
</match>

Lets break this down:

<source> defines where Fluentd collects data fromin this case, log files using the tail plugin.
path specifies the log file pattern to monitor.
pos_file tracks read positions to avoid duplicate logs after restarts.
tag assigns a label to the data stream for routing.
<match *> routes all tagged data to the outputhere, stdout, which prints logs to the console.

Step 3: Configure Input Sources

Inputs tell Fluentd where to collect data from. Common input plugins include:

tail Reads log files (most common for application logs)
forward Accepts data from other Fluentd instances (used in distributed setups)
syslog Listens for system syslog messages
http Receives logs via HTTP POST requests
docker Collects container logs from Docker daemon

Example: Tail Application Logs

To monitor Nginx access logs:

<source>
@type tail
path /var/log/nginx/access.log
pos_file /var/log/td-agent/nginx-access.pos
tag nginx.access
format nginx
read_from_head true
</source>

The format nginx directive automatically parses Nginxs default log format into structured fields like remote, method, path, status, and size.

Example: Collect Docker Container Logs

To ingest logs from all running containers:

<source>
@type docker
tag docker.*
read_from_head true
</source>

Fluentd automatically reads from Dockers JSON log files located at /var/lib/docker/containers/*/*.log.

Example: Accept Logs via HTTP

For applications that push logs via REST API:

<source>
@type http
port 9880
bind 0.0.0.0
</source>

You can now send logs using curl:

curl -X POST -d 'json={"message":"Hello Fluentd"}' http://localhost:9880/app.log

Step 4: Apply Filters for Data Enrichment

Filters modify or enrich log data before it reaches the output. Common use cases include:

Adding timestamps or hostnames
Removing sensitive fields (PII)
Parsing nested JSON
Renaming or restructuring fields

Example: Add Hostname and Timestamp

<filter docker.*>
@type record_transformer
<record>
hostname ${HOSTNAME}
env production
</record>
</filter>

This adds two static fields to every log entry from Docker containers.

Example: Parse JSON Log Lines

Some applications output logs as JSON strings. Use the parser filter to extract them:

<filter app.log>
@type parser
key_name log
reserve_data true
<parse>
@type json
</parse>
</filter>

This assumes your log contains a field called log with a JSON string value. The parsed fields are merged into the main record.

Example: Remove Sensitive Data

To redact passwords or tokens:

<filter *.log>
@type grep
<exclude>
key message
pattern (password|token|secret)
</exclude>
</filter>

This removes any log entry containing those keywords. For redaction instead of deletion, use the record_transformer plugin to overwrite values.

Step 5: Configure Output Destinations

Outputs define where Fluentd sends processed logs. Choose based on your storage or analytics platform.

Output to Elasticsearch

Install the plugin:

sudo td-agent-gem install fluent-plugin-elasticsearch

Configure the output:

<match docker.*>
@type elasticsearch
host elasticsearch.example.com
port 9200
logstash_format true
logstash_prefix fluentd
index_name fluentd-${tag}
type_name _doc
flush_interval 10s
request_timeout 30s
<buffer>
@type file
path /var/log/td-agent/buffer/elasticsearch
chunk_limit_size 2MB
queue_limit_length 32
retry_max_interval 30
retry_forever true
</buffer>
</match>

Key parameters:

logstash_format true Uses Logstash-style index naming (e.g., fluentd-2024.05.15)
flush_interval Controls how often data is sent (balance between latency and throughput)
buffer Critical for resilience. Uses disk-based buffering to prevent data loss during outages.

Output to Amazon S3

Install the plugin:

sudo td-agent-gem install fluent-plugin-s3

Configure:

<match app.log> @type s3 aws_key_id YOUR_AWS_KEY aws_sec_key YOUR_AWS_SECRET s3_bucket your-logging-bucket s3_region us-east-1 path logs/ buffer_path /var/log/td-agent/s3 time_slice_format %Y/%m/%d/%H time_slice_wait 10m utc format json <buffer time> @type file timekey 3600 timekey_wait 10m timekey_use_utc true chunk_limit_size 256m </buffer> </match>

This uploads logs hourly to S3 in structured JSON format, organized by date/time.

Output to Google Cloud Storage

Install:

sudo td-agent-gem install fluent-plugin-google-cloud

Configure:

<match *.log>
@type google_cloud
project your-gcp-project
dataset_id fluentd_logs
table_id logs
auto_create_dataset true
auto_create_table true
<buffer>
@type file
path /var/log/td-agent/buffer/gcs
chunk_limit_size 10m
flush_interval 30s
</buffer>
</match>

Output to Multiple Destinations

Use the copy plugin to send data to multiple outputs:

<match docker.*>
@type copy
<store>
@type elasticsearch
host elasticsearch.example.com
port 9200
logstash_format true
</store>
<store>
@type s3
aws_key_id YOUR_AWS_KEY
aws_sec_key YOUR_AWS_SECRET
s3_bucket your-logging-bucket
path logs/
format json
</store>
</match>

This sends logs to both Elasticsearch (for real-time search) and S3 (for long-term archiving).

Step 6: Test and Validate Configuration

Always validate your configuration before restarting Fluentd:

sudo td-agent --dry-run -c /etc/td-agent/td-agent.conf

If the configuration is valid, youll see a message like:

2024-05-15 10:30:00 +0000 [info]: fluent/engine.rb:123:configure: fluent/conf/config_file.rb:143:configure: configuration is valid

Then restart Fluentd:

sudo systemctl restart td-agent

Monitor logs for errors:

sudo journalctl -u td-agent -f

To verify data flow, check the output destination (e.g., Elasticsearch Kibana, S3 bucket, or console output).

Step 7: Monitor Fluentd Performance

Enable Fluentds built-in metrics endpoint:

<system>
log_level info
<plugin>
@type prometheus
port 24231
<metric>
name fluentd_output_status
type counter
desc The number of events processed
<labels>
tag ${tag}
type ${type}
</labels>
</metric>
</plugin>
</system>

Access metrics at http://localhost:24231/metrics and integrate with Prometheus for alerting and dashboards.

Best Practices

Use Buffering to Prevent Data Loss

Never rely on in-memory buffering in production. Always configure disk-backed buffers using the @type file directive. This ensures logs are persisted to disk during network outages, service restarts, or destination downtime.

Key buffer parameters:

chunk_limit_size Limit chunk size to avoid memory spikes (recommended: 210MB)
queue_limit_length Control backlog size (e.g., 32128)
retry_max_interval Set exponential backoff (e.g., 30s)
retry_forever true Prevents data loss during extended outages

Tag Logs Strategically

Use meaningful tags to enable routing and filtering. Avoid generic tags like all. Instead, use hierarchical naming:

app.web.access
app.api.error
infra.kubernetes.node

This allows precise control over which logs go where, improving performance and reducing noise.

Secure Your Configuration

Never hardcode credentials in configuration files. Use environment variables or secret managers:

<match s3.*>
@type s3
aws_key_id ${AWS_ACCESS_KEY_ID}
aws_sec_key ${AWS_SECRET_ACCESS_KEY}
...
</match>

Set environment variables in systemd or Docker:

Environment=AWS_ACCESS_KEY_ID=your-key
Environment=AWS_SECRET_ACCESS_KEY=your-secret

For Kubernetes, use Secrets and mount them as environment variables.

Optimize for Resource Usage

Fluentd can consume significant CPU and memory under high load. Monitor usage and tune accordingly:

Limit the number of concurrent threads using num_threads in input/output plugins
Reduce flush_interval only if latency is criticalotherwise, use 30s60s for efficiency
Use compress gzip in S3 or HTTP outputs to reduce bandwidth
Run Fluentd on dedicated nodes or containers to avoid resource contention

Implement Log Rotation

Fluentds tail plugin works best with log files that are rotated by tools like logrotate. Ensure your rotation strategy preserves file handles and uses the copytruncate option to avoid interrupting Fluentds tailing process.

Example /etc/logrotate.d/nginx:

/var/log/nginx/*.log { daily missingok rotate 14 compress delaycompress notifempty create 0640 www-data adm sharedscripts postrotate /usr/sbin/invoke-rc.d nginx rotate >/dev/null endscript }

Enable Monitoring and Alerting

Integrate Fluentd with monitoring tools:

Use Prometheus + Grafana to track buffer size, retry counts, and throughput
Set alerts for buffer overflow (>90% full), high retry rates, or output failures
Log Fluentds own errors to a separate file for easier debugging

Version Control Your Configurations

Treat Fluentd configurations as code. Store them in Git repositories with clear commit messages and change logs. Use CI/CD pipelines to validate and deploy configurations across environments.

Tools and Resources

Official Documentation

Fluentds official documentation is the most authoritative source:

https://docs.fluentd.org/ Comprehensive guides, plugin references, and architecture details
https://github.com/fluent/fluentd Source code, issues, and release notes

Plugin Registry

Explore the Fluentd plugin ecosystem:

https://rubygems.org/search?query=fluent-plugin Search all Fluentd plugins
https://github.com/fluent/fluentd/wiki/Plugin-List Community-maintained list

Configuration Validators

Use tools to validate syntax before deployment:

td-agent --dry-run Built-in config validator
https://github.com/fluent/fluentd-config-validator Automated validation script

Monitoring and Observability

Prometheus + Grafana For metrics collection and visualization
Fluent Bit Lightweight alternative for edge deployments (use alongside Fluentd for aggregation)
ELK Stack (Elasticsearch, Logstash, Kibana) Popular destination for Fluentd logs
OpenTelemetry Fluentd can ingest OTLP data via plugins for unified telemetry

Community Support

Stack Overflow Active community for troubleshooting
GitHub Discussions Official forum for questions
Fluentd Slack Channel Real-time help from developers

Sample Configuration Repositories

Study real-world configurations:

https://github.com/fluent/fluentd-kubernetes-daemonset Official Kubernetes setup
https://github.com/SumoLogic/fluentd-kubernetes-sumologic Sumo Logic integration
https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud GCP logging examples

Real Examples

Example 1: Logging a Microservices Architecture

Scenario: You have 10 microservices running in Kubernetes, each outputting JSON logs to stdout. You need to collect, enrich, and send logs to Elasticsearch and S3.

Configuration:

<source>
@type kubernetes
tag kube.*
read_from_head true
<parse>
@type json
</parse>
</source>
<filter kube.**>
@type record_transformer
<record>
service_name ${kubernetes['labels']['app']}
namespace ${kubernetes['namespace_name']}
</record>
</filter>
<match kube.**>
@type copy
<store>
@type elasticsearch
host elasticsearch.logging.svc.cluster.local
port 9200
logstash_format true
logstash_prefix k8s-logs
<buffer>
@type file
path /var/log/fluentd-buffers/kubernetes
chunk_limit_size 5MB
retry_max_interval 60
retry_forever true
</buffer>
</store>
<store>
@type s3
aws_key_id ${AWS_ACCESS_KEY_ID}
aws_sec_key ${AWS_SECRET_ACCESS_KEY}
s3_bucket my-logs-bucket
s3_region us-west-2
path logs/k8s/
time_slice_format %Y/%m/%d/%H
time_slice_wait 10m
format json
<buffer time>
@type file
path /var/log/fluentd-buffers/s3
timekey 3600
timekey_wait 10m
</buffer>
</store>
</match>

This setup:

Uses the Kubernetes plugin to read container logs
Extracts app name and namespace from Kubernetes metadata
Sends logs to Elasticsearch for real-time analysis
Archives logs in S3 for compliance and backup

Example 2: Centralized Syslog Aggregation

Scenario: You have 50 Linux servers sending syslog messages. You want to centralize them, filter out noise, and store in a time-series database.

Configuration:

<source>
@type syslog
port 5140
bind 0.0.0.0
protocol_type tcp
tag system.syslog
<parse>
@type syslog
</parse>
</source>
<filter system.syslog>
@type grep
<exclude>
key message
pattern (CRON|systemd-logind)
</exclude>
</filter>
<match system.syslog>
@type tdlog
apikey YOUR_TD_API_KEY
endpoint https://in.treasuredata.com
buffer_type file
buffer_path /var/log/td-agent/buffer/td
flush_interval 30s
<buffer>
@type file
path /var/log/td-agent/buffer/td
chunk_limit_size 10MB
queue_limit_length 128
retry_max_interval 60
retry_forever true
</buffer>
</match>

This configuration:

Accepts TCP syslog from remote hosts
Filters out non-critical system messages
Forwards to Treasure Data (or another analytics platform)

Example 3: Redacting PII in Real-Time

Scenario: Your application logs include email addresses and credit card numbers. You must comply with GDPR and PCI-DSS.

Configuration:

<filter app.log>
@type record_transformer
<record>
message ${record["message"].gsub(/[\w\.-]+@[\w\.-]+\.\w+/, "[REDACTED_EMAIL]")}
</record>
</filter>
<filter app.log>
@type record_transformer
<record>
message ${record["message"].gsub(/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/, "[REDACTED_CC]")}
</record>
</filter>

This uses Ruby string substitution to mask sensitive patterns before logs are sent out.

FAQs

What is the difference between Fluentd and Fluent Bit?

Fluentd is a full-featured, Ruby-based log collector with rich plugin support, ideal for centralized aggregation. Fluent Bit is a lightweight, C-based alternative designed for edge devices and resource-constrained environments. Many teams use Fluent Bit for collection and Fluentd for aggregation.

Can Fluentd handle high-throughput logging?

Yes. With proper buffer tuning, disk I/O optimization, and multi-threading, Fluentd can handle tens of thousands of events per second. For extreme scale, use multiple Fluentd instances behind a load balancer or combine with Kafka for buffering.

How do I troubleshoot missing logs?

Check the following:

Is the log file being tailed? Verify file permissions and path.
Is the tag matching the filter/output? Use stdout temporarily to debug.
Are buffers full? Check /var/log/td-agent/buffer/ for backlogs.
Is the output destination reachable? Test connectivity with telnet or curl.
Review Fluentd logs: journalctl -u td-agent -f

How often should I restart Fluentd?

Never restart unless necessary. Fluentd supports hot-reloading via sudo systemctl reload td-agent, which applies configuration changes without dropping logs.

Can Fluentd parse non-JSON logs?

Yes. Fluentd supports regex, Apache, Nginx, Syslog, CSV, and custom parsers. Use the parser filter with @type regexp to define custom patterns.

Is Fluentd secure?

Fluentd itself is secure when configured properly. Use TLS for input/output connections, restrict network access via firewalls, avoid hardcoding secrets, and run Fluentd under a non-root user.

Does Fluentd support Kubernetes out of the box?

Yes. The kubernetes input plugin automatically extracts container logs, pod names, namespaces, labels, and annotations from Kubernetes metadata without requiring sidecars.

What happens if the output destination is down?

Fluentd uses its buffer system to store logs locally until the destination recovers. With retry_forever true and disk-backed buffers, no data is lostonly delayed.

Conclusion

Configuring Fluentd is not merely a technical taskits a strategic decision that impacts the reliability, scalability, and security of your entire observability infrastructure. By following the step-by-step guide above, youve learned how to install Fluentd, define inputs and outputs, enrich data with filters, apply buffers for resilience, and deploy in real-world scenarios.

Remember: Fluentds power lies in its flexibility. Whether youre logging a single server or a thousand microservices, the same core principles applytag wisely, buffer aggressively, secure rigorously, and monitor continuously.

As your infrastructure evolves, so should your Fluentd configuration. Regularly review logs, audit plugin usage, update to newer versions, and integrate with modern toolchains like OpenTelemetry and Prometheus. Fluentd is not a set and forget toolits a living component of your data pipeline that demands attention, tuning, and care.

With the knowledge in this guide, youre now equipped to deploy Fluentd confidently in any environment. Start small, validate thoroughly, and scale deliberately. Your logs are your systems memoryprotect them well.

alex

How to Configure Fluentd

How to Configure Fluentd

Step-by-Step Guide

Prerequisites

Step 1: Install Fluentd

On Ubuntu/Debian

On CentOS/RHEL

Using Docker

Using Helm (Kubernetes)

Step 2: Locate and Understand the Configuration File

Step 3: Configure Input Sources

Example: Tail Application Logs

Example: Collect Docker Container Logs

Example: Accept Logs via HTTP

Step 4: Apply Filters for Data Enrichment

Example: Add Hostname and Timestamp

Example: Parse JSON Log Lines

Example: Remove Sensitive Data

Step 5: Configure Output Destinations

Output to Elasticsearch

Output to Amazon S3

Output to Google Cloud Storage

Output to Multiple Destinations

Step 6: Test and Validate Configuration

Step 7: Monitor Fluentd Performance

Best Practices

Use Buffering to Prevent Data Loss

Tag Logs Strategically

Secure Your Configuration

Optimize for Resource Usage

Implement Log Rotation

Enable Monitoring and Alerting

Version Control Your Configurations

Tools and Resources

Official Documentation

Plugin Registry

Configuration Validators

Monitoring and Observability

Community Support

Sample Configuration Repositories

Real Examples

Example 1: Logging a Microservices Architecture

Example 2: Centralized Syslog Aggregation

Example 3: Redacting PII in Real-Time

FAQs

What is the difference between Fluentd and Fluent Bit?

Can Fluentd handle high-throughput logging?

How do I troubleshoot missing logs?

How often should I restart Fluentd?

Can Fluentd parse non-JSON logs?

Is Fluentd secure?

Does Fluentd support Kubernetes out of the box?

What happens if the output destination is down?

Conclusion

Related Posts

Popular Posts

Recommended Posts

Popular Tags