Yet Another Article About Docker Logging With Fluentd¶
Motivation¶
I have three hosts with Docker + Portainer:
- Two VPS server with public IP address
- A home server behind NAT
I want to show all the logs from containers in one place.
I already have a Kubernetes cluster at home, on which I have Kibana and Elasticsearch deployed for the cluster logging. It is obvious to use my already existing logging solution to collect logs from the Docker hosts.
Install Fluentd On Docker Hosts¶
Official Documentation: https://docs.fluentd.org/installation/install-by-deb
Example installation on Debian bullseye:
Configure Td-Agent¶
Default config file location: /etc/td-agent/td-agent.conf
Input For Docker Daemon¶
Documentation: https://docs.fluentd.org/input/forward
This section is responsible for receiving the logs from the Docker daemon.
Output Configuration¶
As I mentioned in the short motivation section I want to store the logs in my Elasticsearch cluster.
Documentation: https://docs.fluentd.org/output/elasticsearch
Configuration example:
<match {syslog.**,dockerdaemon.**}>
@type elasticsearch
suppress_type_name true
host "10.8.0.30"
scheme http
path ""
port 32367
include_tag_key true
reload_connections false
reconnect_on_error true
reload_on_failure false
logstash_format true
logstash_prefix "vps10"
<buffer>
@type file
path /var/log/td-agent/buffer
flush_thread_count 8
flush_interval 5s
chunk_limit_size 2M
queue_limit_length 32
retry_max_interval 30
retry_forever true
</buffer>
</match>
Some important settings and its explanation:
suppress_type_name
-
In Elasticsearch 7.x, Elasticsearch cluster complains the following types removal warnings
{ "type": "deprecation", "timestamp": "2020-07-03T08:02:20,830Z", "level": "WARN", "component": "o.e.d.a.b.BulkRequestParser", "cluster.name": "docker-cluster", "node.name": "70dd5c6b94c3", "message": "[types removal] Specifying types in bulk requests is deprecated.", "cluster.uuid": "NoJJmtzfTtSzSMv0peG8Wg", "node.id": "VQ-PteHmTVam2Pnbg7xWHw" }
host "10.8.0.30"
-
Elasticsearch hostname or IP address. This VPS server is connecting to Elasticsearch over Wireguard VPN.
port 32367
-
My Elasticsearch is running on my Kubernetes cluster, and I'm using NodePort to access it.
logstash_format
-
This is meant to make writing data into Elasticsearch indices compatible to what Logstash calls them. By doing this, one could take advantage of Kibana. See
logstash_prefix
andlogstash_dateformat
to customize this index name pattern. The index name will be#{logstash_prefix}-#{formatted_date}
reload_connections false
-
You can tune how the elasticsearch-transport host reloading feature works. By default it will reload the host list from the server every 10,000th request to spread the load. This can be an issue if your Elasticsearch cluster is behind a Reverse Proxy, as Fluentd process may not have direct network access to the Elasticsearch nodes.
reconnect_on_error
-
Indicates that the plugin should reset connection on any error (reconnect on next send). By default it will reconnect only on "host unreachable exceptions". We recommended to set this true in the presence of elasticsearch shield.
reload_on_failure
-
Indicates that the elasticsearch-transport will try to reload the nodes addresses if there is a failure while making the request, this can be useful to quickly remove a dead node from the list of addresses.
The reload_connections
, reconnect_on_error
, reload_on_failure
setting are needed because may Elasticsearch cluster has only one node and Fluentd connects to it over VPN and NodePort
.
Syslog Input¶
<source>
@type tail
path /var/log/syslog,/var/log/messages
pos_file /var/log/td-agent/syslog.pos
tag syslog.*
<parse>
@type syslog
</parse>
</source>
Parser documentation: https://docs.fluentd.org/parser/syslog
Syslog Filter¶
<filter syslog.**>
@type record_transformer
<record>
hostname "#{Socket.gethostname}"
tag ${tag}
</record>
</filter>
What does this filter do?
- Adds the fluentd tag to the json message. (
Line 13
) This can be very useful for debugging, as well. - Adds hostname field to the json message. (
Line 12
)
Example:
Docker Filter¶
This is similar to the previous syslog filter.
Docker Daemon Config¶
- Line 4
-
Send logs to fluentd. Related fluentd config: input-for-docker-daemon
- Line 5
-
Docker connects to Fluentd in the background. Messages are buffered until the connection is established.
Doc: https://docs.docker.com/config/containers/logging/fluentd/#fluentd-async - Line 6
-
Related Fluentd config: docker-filter
Example JSON:
Warning
It's not enough to restart the docker daemon to take affects logging settings on containers. Every container have to be recreated. (not just restarted)
Benefits Of Using Proper Tags¶
In the examples above all of you containers are tagged with their name. It is useful when you want to parse default type of container logs.
Another /etc/docker/daemon.json
example:
{
"log-driver": "fluentd",
"log-opts": {
"fluentd-address": "localhost:24224",
"fluentd-async": "true",
"tag": "docker.{{.Name}}.{{.ID}}"
}
}
On this host I have an apache web server container:
docker ps --filter name=apache
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d0379e25ca01 httpd "httpd-foreground -d…" 2 weeks ago Up 2 days 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp apache
I want to parse the logs from apache container. Thanks to the proper tag settings I can write filter like this:
<filter docker.apache.**>
@type parser
key_name log
reserve_data true
emit_invalid_record_to_error false
<parse>
@type apache
expression /^(?<vhost>[^ ]*) (?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>(?:[^\"]|\\.)*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) (?:"(?<referer>(?:[^\"]|\\.)*)" "(?<agent>(?:[^\"]|\\.)*)")?$/
</parse>
</filter>
Matching apache log format:
Example JSON log message:
Complete Fluentd Config¶
Click Here For Raw Source | |
---|---|
|