How to Configure Fluentd

How to Configure Fluentd Fluentd is an open-source data collector designed to unify logging and data ingestion across diverse systems. It serves as a powerful, flexible, and scalable solution for aggregating logs from servers, containers, applications, and cloud services before forwarding them to centralized storage or analytics platforms such as Elasticsearch, Amazon S3, Google Cloud Storage, or

Nov 10, 2025 - 12:06
Nov 10, 2025 - 12:06
 0

How to Configure Fluentd

Fluentd is an open-source data collector designed to unify logging and data ingestion across diverse systems. It serves as a powerful, flexible, and scalable solution for aggregating logs from servers, containers, applications, and cloud services before forwarding them to centralized storage or analytics platforms such as Elasticsearch, Amazon S3, Google Cloud Storage, or Splunk. With its plugin-based architecture, Fluentd supports over 700 plugins, making it one of the most extensible log collection tools in modern DevOps and observability stacks.

Configuring Fluentd correctly is critical for ensuring reliable, high-performance log pipelines. Poorly configured instances can lead to data loss, delayed ingestion, excessive resource consumption, or even system instability. Whether you're managing a small application stack or a large Kubernetes cluster, mastering Fluentd configuration ensures that your monitoring, troubleshooting, and compliance workflows remain robust and efficient.

This guide provides a comprehensive, step-by-step walkthrough of how to configure Fluentd from scratch. Youll learn practical setup methods, industry best practices, real-world examples, essential tools, and answers to common challenges. By the end of this tutorial, youll be equipped to deploy Fluentd confidently in production environmentsoptimized for performance, security, and maintainability.

Step-by-Step Guide

Prerequisites

Before configuring Fluentd, ensure your system meets the following requirements:

  • A Linux-based operating system (Ubuntu 20.04+, CentOS 8+, or Debian 11+)
  • Root or sudo access
  • Network connectivity to your destination systems (e.g., Elasticsearch, S3, etc.)
  • Basic familiarity with command-line interfaces and YAML configuration files
  • Optional: Docker or Kubernetes if deploying in containerized environments

Fluentd also requires Ruby or a precompiled binary. While its possible to install from source, using package managers or official installers is recommended for production stability.

Step 1: Install Fluentd

Fluentd offers multiple installation methods depending on your platform. Below are the most common approaches.

On Ubuntu/Debian

Use the official td-agent package, which bundles Fluentd with dependencies and is maintained by Treasure Data:

wget https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh

sudo sh install-ubuntu-focal-td-agent4.sh

For older Ubuntu versions, replace focal with your release codename (e.g., bionic for 18.04).

Start and enable the service:

sudo systemctl start td-agent

sudo systemctl enable td-agent

On CentOS/RHEL

Install using the YUM repository:

curl -L https://toolbelt.treasuredata.com/sh/install-redhat-8-td-agent4.sh | sh

sudo systemctl start td-agent

sudo systemctl enable td-agent

Using Docker

For containerized deployments, use the official Fluentd image:

docker run -d --name fluentd -p 24224:24224 -v $(pwd)/fluentd.conf:/etc/fluent/fluent.conf fluent/fluentd:latest

This command runs Fluentd in detached mode, maps port 24224 (the default TCP input port), and mounts a local configuration file.

Using Helm (Kubernetes)

If you're running Fluentd in Kubernetes, use the official Helm chart:

helm repo add fluent https://fluent.github.io/helm-charts

helm install fluentd fluent/fluentd --namespace logging --create-namespace

Ensure you customize the values.yaml to match your logging destination and resource requirements.

Step 2: Locate and Understand the Configuration File

Fluentds main configuration file is typically located at:

  • /etc/td-agent/td-agent.conf (for td-agent on Ubuntu/CentOS)
  • /etc/fluent/fluent.conf (for standalone Fluentd installations)

The configuration file uses a simple yet powerful syntax based on blocks and directives. Each block defines a component of the data pipeline: input, filter, output, or match.

A minimal configuration looks like this:

<source>

@type tail

path /var/log/*.log

pos_file /var/log/td-agent/tail-containers.pos

tag docker.*

read_from_head true

</source>

<match *>

@type stdout

</match>

Lets break this down:

  • <source> defines where Fluentd collects data fromin this case, log files using the tail plugin.
  • path specifies the log file pattern to monitor.
  • pos_file tracks read positions to avoid duplicate logs after restarts.
  • tag assigns a label to the data stream for routing.
  • <match *> routes all tagged data to the outputhere, stdout, which prints logs to the console.

Step 3: Configure Input Sources

Inputs tell Fluentd where to collect data from. Common input plugins include:

  • tail Reads log files (most common for application logs)
  • forward Accepts data from other Fluentd instances (used in distributed setups)
  • syslog Listens for system syslog messages
  • http Receives logs via HTTP POST requests
  • docker Collects container logs from Docker daemon

Example: Tail Application Logs

To monitor Nginx access logs:

<source>

@type tail

path /var/log/nginx/access.log

pos_file /var/log/td-agent/nginx-access.pos

tag nginx.access

format nginx

read_from_head true

</source>

The format nginx directive automatically parses Nginxs default log format into structured fields like remote, method, path, status, and size.

Example: Collect Docker Container Logs

To ingest logs from all running containers:

<source>

@type docker

tag docker.*

read_from_head true

</source>

Fluentd automatically reads from Dockers JSON log files located at /var/lib/docker/containers/*/*.log.

Example: Accept Logs via HTTP

For applications that push logs via REST API:

<source>

@type http

port 9880

bind 0.0.0.0

</source>

You can now send logs using curl:

curl -X POST -d 'json={"message":"Hello Fluentd"}' http://localhost:9880/app.log

Step 4: Apply Filters for Data Enrichment

Filters modify or enrich log data before it reaches the output. Common use cases include:

  • Adding timestamps or hostnames
  • Removing sensitive fields (PII)
  • Parsing nested JSON
  • Renaming or restructuring fields

Example: Add Hostname and Timestamp

<filter docker.*>

@type record_transformer

<record>

hostname ${HOSTNAME}

env production

</record>

</filter>

This adds two static fields to every log entry from Docker containers.

Example: Parse JSON Log Lines

Some applications output logs as JSON strings. Use the parser filter to extract them:

<filter app.log>

@type parser

key_name log

reserve_data true

<parse>

@type json

</parse>

</filter>

This assumes your log contains a field called log with a JSON string value. The parsed fields are merged into the main record.

Example: Remove Sensitive Data

To redact passwords or tokens:

<filter *.log>

@type grep

<exclude>

key message

pattern (password|token|secret)

</exclude>

</filter>

This removes any log entry containing those keywords. For redaction instead of deletion, use the record_transformer plugin to overwrite values.

Step 5: Configure Output Destinations

Outputs define where Fluentd sends processed logs. Choose based on your storage or analytics platform.

Output to Elasticsearch

Install the plugin:

sudo td-agent-gem install fluent-plugin-elasticsearch

Configure the output:

<match docker.*>

@type elasticsearch

host elasticsearch.example.com

port 9200

logstash_format true

logstash_prefix fluentd

index_name fluentd-${tag}

type_name _doc

flush_interval 10s

request_timeout 30s

<buffer>

@type file

path /var/log/td-agent/buffer/elasticsearch

chunk_limit_size 2MB

queue_limit_length 32

retry_max_interval 30

retry_forever true

</buffer>

</match>

Key parameters:

  • logstash_format true Uses Logstash-style index naming (e.g., fluentd-2024.05.15)
  • flush_interval Controls how often data is sent (balance between latency and throughput)
  • buffer Critical for resilience. Uses disk-based buffering to prevent data loss during outages.

Output to Amazon S3

Install the plugin:

sudo td-agent-gem install fluent-plugin-s3

Configure:

<match app.log>

@type s3

aws_key_id YOUR_AWS_KEY

aws_sec_key YOUR_AWS_SECRET

s3_bucket your-logging-bucket

s3_region us-east-1

path logs/

buffer_path /var/log/td-agent/s3

time_slice_format %Y/%m/%d/%H

time_slice_wait 10m

utc

format json

<buffer time>

@type file

timekey 3600

timekey_wait 10m

timekey_use_utc true

chunk_limit_size 256m

</buffer>

</match>

This uploads logs hourly to S3 in structured JSON format, organized by date/time.

Output to Google Cloud Storage

Install:

sudo td-agent-gem install fluent-plugin-google-cloud

Configure:

<match *.log>

@type google_cloud

project your-gcp-project

dataset_id fluentd_logs

table_id logs

auto_create_dataset true

auto_create_table true

<buffer>

@type file

path /var/log/td-agent/buffer/gcs

chunk_limit_size 10m

flush_interval 30s

</buffer>

</match>

Output to Multiple Destinations

Use the copy plugin to send data to multiple outputs:

<match docker.*>

@type copy

<store>

@type elasticsearch

host elasticsearch.example.com

port 9200

logstash_format true

</store>

<store>

@type s3

aws_key_id YOUR_AWS_KEY

aws_sec_key YOUR_AWS_SECRET

s3_bucket your-logging-bucket

path logs/

format json

</store>

</match>

This sends logs to both Elasticsearch (for real-time search) and S3 (for long-term archiving).

Step 6: Test and Validate Configuration

Always validate your configuration before restarting Fluentd:

sudo td-agent --dry-run -c /etc/td-agent/td-agent.conf

If the configuration is valid, youll see a message like:

2024-05-15 10:30:00 +0000 [info]: fluent/engine.rb:123:configure: fluent/conf/config_file.rb:143:configure: configuration is valid

Then restart Fluentd:

sudo systemctl restart td-agent

Monitor logs for errors:

sudo journalctl -u td-agent -f

To verify data flow, check the output destination (e.g., Elasticsearch Kibana, S3 bucket, or console output).

Step 7: Monitor Fluentd Performance

Enable Fluentds built-in metrics endpoint:

<system>

log_level info

<plugin>

@type prometheus

port 24231

<metric>

name fluentd_output_status

type counter

desc The number of events processed

<labels>

tag ${tag}

type ${type}

</labels>

</metric>

</plugin>

</system>

Access metrics at http://localhost:24231/metrics and integrate with Prometheus for alerting and dashboards.

Best Practices

Use Buffering to Prevent Data Loss

Never rely on in-memory buffering in production. Always configure disk-backed buffers using the @type file directive. This ensures logs are persisted to disk during network outages, service restarts, or destination downtime.

Key buffer parameters:

  • chunk_limit_size Limit chunk size to avoid memory spikes (recommended: 210MB)
  • queue_limit_length Control backlog size (e.g., 32128)
  • retry_max_interval Set exponential backoff (e.g., 30s)
  • retry_forever true Prevents data loss during extended outages

Tag Logs Strategically

Use meaningful tags to enable routing and filtering. Avoid generic tags like all. Instead, use hierarchical naming:

  • app.web.access
  • app.api.error
  • infra.kubernetes.node

This allows precise control over which logs go where, improving performance and reducing noise.

Secure Your Configuration

Never hardcode credentials in configuration files. Use environment variables or secret managers:

<match s3.*>

@type s3

aws_key_id ${AWS_ACCESS_KEY_ID}

aws_sec_key ${AWS_SECRET_ACCESS_KEY}

...

</match>

Set environment variables in systemd or Docker:

Environment=AWS_ACCESS_KEY_ID=your-key

Environment=AWS_SECRET_ACCESS_KEY=your-secret

For Kubernetes, use Secrets and mount them as environment variables.

Optimize for Resource Usage

Fluentd can consume significant CPU and memory under high load. Monitor usage and tune accordingly:

  • Limit the number of concurrent threads using num_threads in input/output plugins
  • Reduce flush_interval only if latency is criticalotherwise, use 30s60s for efficiency
  • Use compress gzip in S3 or HTTP outputs to reduce bandwidth
  • Run Fluentd on dedicated nodes or containers to avoid resource contention

Implement Log Rotation

Fluentds tail plugin works best with log files that are rotated by tools like logrotate. Ensure your rotation strategy preserves file handles and uses the copytruncate option to avoid interrupting Fluentds tailing process.

Example /etc/logrotate.d/nginx:

/var/log/nginx/*.log {

daily

missingok

rotate 14

compress

delaycompress

notifempty

create 0640 www-data adm

sharedscripts

postrotate

/usr/sbin/invoke-rc.d nginx rotate >/dev/null

endscript

}

Enable Monitoring and Alerting

Integrate Fluentd with monitoring tools:

  • Use Prometheus + Grafana to track buffer size, retry counts, and throughput
  • Set alerts for buffer overflow (>90% full), high retry rates, or output failures
  • Log Fluentds own errors to a separate file for easier debugging

Version Control Your Configurations

Treat Fluentd configurations as code. Store them in Git repositories with clear commit messages and change logs. Use CI/CD pipelines to validate and deploy configurations across environments.

Tools and Resources

Official Documentation

Fluentds official documentation is the most authoritative source:

Plugin Registry

Explore the Fluentd plugin ecosystem:

Configuration Validators

Use tools to validate syntax before deployment:

Monitoring and Observability

  • Prometheus + Grafana For metrics collection and visualization
  • Fluent Bit Lightweight alternative for edge deployments (use alongside Fluentd for aggregation)
  • ELK Stack (Elasticsearch, Logstash, Kibana) Popular destination for Fluentd logs
  • OpenTelemetry Fluentd can ingest OTLP data via plugins for unified telemetry

Community Support

Sample Configuration Repositories

Study real-world configurations:

Real Examples

Example 1: Logging a Microservices Architecture

Scenario: You have 10 microservices running in Kubernetes, each outputting JSON logs to stdout. You need to collect, enrich, and send logs to Elasticsearch and S3.

Configuration:

<source>

@type kubernetes

tag kube.*

read_from_head true

<parse>

@type json

</parse>

</source>

<filter kube.**>

@type record_transformer

<record>

service_name ${kubernetes['labels']['app']}

namespace ${kubernetes['namespace_name']}

</record>

</filter>

<match kube.**>

@type copy

<store>

@type elasticsearch

host elasticsearch.logging.svc.cluster.local

port 9200

logstash_format true

logstash_prefix k8s-logs

<buffer>

@type file

path /var/log/fluentd-buffers/kubernetes

chunk_limit_size 5MB

retry_max_interval 60

retry_forever true

</buffer>

</store>

<store>

@type s3

aws_key_id ${AWS_ACCESS_KEY_ID}

aws_sec_key ${AWS_SECRET_ACCESS_KEY}

s3_bucket my-logs-bucket

s3_region us-west-2

path logs/k8s/

time_slice_format %Y/%m/%d/%H

time_slice_wait 10m

format json

<buffer time>

@type file

path /var/log/fluentd-buffers/s3

timekey 3600

timekey_wait 10m

</buffer>

</store>

</match>

This setup:

  • Uses the Kubernetes plugin to read container logs
  • Extracts app name and namespace from Kubernetes metadata
  • Sends logs to Elasticsearch for real-time analysis
  • Archives logs in S3 for compliance and backup

Example 2: Centralized Syslog Aggregation

Scenario: You have 50 Linux servers sending syslog messages. You want to centralize them, filter out noise, and store in a time-series database.

Configuration:

<source>

@type syslog

port 5140

bind 0.0.0.0

protocol_type tcp

tag system.syslog

<parse>

@type syslog

</parse>

</source>

<filter system.syslog>

@type grep

<exclude>

key message

pattern (CRON|systemd-logind)

</exclude>

</filter>

<match system.syslog>

@type tdlog

apikey YOUR_TD_API_KEY

endpoint https://in.treasuredata.com

buffer_type file

buffer_path /var/log/td-agent/buffer/td

flush_interval 30s

<buffer>

@type file

path /var/log/td-agent/buffer/td

chunk_limit_size 10MB

queue_limit_length 128

retry_max_interval 60

retry_forever true

</buffer>

</match>

This configuration:

  • Accepts TCP syslog from remote hosts
  • Filters out non-critical system messages
  • Forwards to Treasure Data (or another analytics platform)

Example 3: Redacting PII in Real-Time

Scenario: Your application logs include email addresses and credit card numbers. You must comply with GDPR and PCI-DSS.

Configuration:

<filter app.log>

@type record_transformer

<record>

message ${record["message"].gsub(/[\w\.-]+@[\w\.-]+\.\w+/, "[REDACTED_EMAIL]")}

</record>

</filter>

<filter app.log>

@type record_transformer

<record>

message ${record["message"].gsub(/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/, "[REDACTED_CC]")}

</record>

</filter>

This uses Ruby string substitution to mask sensitive patterns before logs are sent out.

FAQs

What is the difference between Fluentd and Fluent Bit?

Fluentd is a full-featured, Ruby-based log collector with rich plugin support, ideal for centralized aggregation. Fluent Bit is a lightweight, C-based alternative designed for edge devices and resource-constrained environments. Many teams use Fluent Bit for collection and Fluentd for aggregation.

Can Fluentd handle high-throughput logging?

Yes. With proper buffer tuning, disk I/O optimization, and multi-threading, Fluentd can handle tens of thousands of events per second. For extreme scale, use multiple Fluentd instances behind a load balancer or combine with Kafka for buffering.

How do I troubleshoot missing logs?

Check the following:

  • Is the log file being tailed? Verify file permissions and path.
  • Is the tag matching the filter/output? Use stdout temporarily to debug.
  • Are buffers full? Check /var/log/td-agent/buffer/ for backlogs.
  • Is the output destination reachable? Test connectivity with telnet or curl.
  • Review Fluentd logs: journalctl -u td-agent -f

How often should I restart Fluentd?

Never restart unless necessary. Fluentd supports hot-reloading via sudo systemctl reload td-agent, which applies configuration changes without dropping logs.

Can Fluentd parse non-JSON logs?

Yes. Fluentd supports regex, Apache, Nginx, Syslog, CSV, and custom parsers. Use the parser filter with @type regexp to define custom patterns.

Is Fluentd secure?

Fluentd itself is secure when configured properly. Use TLS for input/output connections, restrict network access via firewalls, avoid hardcoding secrets, and run Fluentd under a non-root user.

Does Fluentd support Kubernetes out of the box?

Yes. The kubernetes input plugin automatically extracts container logs, pod names, namespaces, labels, and annotations from Kubernetes metadata without requiring sidecars.

What happens if the output destination is down?

Fluentd uses its buffer system to store logs locally until the destination recovers. With retry_forever true and disk-backed buffers, no data is lostonly delayed.

Conclusion

Configuring Fluentd is not merely a technical taskits a strategic decision that impacts the reliability, scalability, and security of your entire observability infrastructure. By following the step-by-step guide above, youve learned how to install Fluentd, define inputs and outputs, enrich data with filters, apply buffers for resilience, and deploy in real-world scenarios.

Remember: Fluentds power lies in its flexibility. Whether youre logging a single server or a thousand microservices, the same core principles applytag wisely, buffer aggressively, secure rigorously, and monitor continuously.

As your infrastructure evolves, so should your Fluentd configuration. Regularly review logs, audit plugin usage, update to newer versions, and integrate with modern toolchains like OpenTelemetry and Prometheus. Fluentd is not a set and forget toolits a living component of your data pipeline that demands attention, tuning, and care.

With the knowledge in this guide, youre now equipped to deploy Fluentd confidently in any environment. Start small, validate thoroughly, and scale deliberately. Your logs are your systems memoryprotect them well.