Complete filebeat documentation

baf3049c · Andrea Chimenti · 606d5de4 · baf3049c · baf3049c · baf3049c
Commit baf3049c authored Jun 27, 2023 by Andrea Chimenti
--- a/Beats/Filebeat/README.md
+++ b/Beats/Filebeat/README.md
 # Filebeat
+Filebeat je lightweight nástroj, který slouží primárně ke sběru logů na klientských stanicích a jejich odeslání na Logstash. Jeho výhodou je garantované doručení zpráv i po výpadku sítě, Logstashe apod. a jednoduchá konfigurace. Filebeat běží jako služba pod systemd. Podrobnější popis toho, jak Filebeat funguje, je dostupný v [dokumentaci výrobce](https://www.elastic.co/guide/en/beats/filebeat/8.6/how-filebeat-works.html#_how_does_filebeat_keep_the_state_of_files). Ve zkratce: pamatuje si polohu v souborech a sleduje inkrementální změny.
+Sběr se nastavuje definicí cest k log souborům v souboru `filebeat.yml` v sekci `filestream.inputs` (viz [filestream input](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html)). Parsování dat bude obstaráno logstashem. Jednotlivé vstupy je vhodné označit označit polem `service`, které bude využito pro volby parsovacích filterů v logstashi a pro název indexu. Také je možné přidat tagy pro upřesnění vstupu. Tahle pole nedoporučujeme vynechávat protože nebude možná identifikace dat na straně Logstashe a v úložišti bude vznikat datová bažina.
+## Instalace
+V repozitáři jsou přiloženy 2 konfigurační soubory
+- `filebeat.yml`: Naše šablona pro zjednodušené nasazení. Obsahuje popis všech námi použitých polí.
+- `filebeat.reference.yml`: Referenční soubor z oficiální dokumentace. Obsahuje veškeré nastavení i s popisem.
+### Kroky
+1. Instalaci balíčku je možno provést přes DPKG, RPM, APT nebo YUM (preferujte apt a yum). Viz bod 1 v [dokumentaci](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-installation-configuration.html#installation).
+2. Pozastavit upgrade beatů např. pomocí `sudo apt-mark hold filebeat`. Aktualizace klientů by měla probíhat zároveň s aktualizací ostatních komponent ELK.
+3. Nastavit konfigurační soubor `/etc/filebeat/filebeat.yml`. Doporučujeme vycházet z námi vytvořené šablony.
+    1. Položka `filebeat.inputs.paths`, určuje které soubory budou sledovány.
+    2. Položku `fields.group` je potřeba nastavit na název pracovní skupiny tj. kdo sbírá dat. Slouží k identifikaci.
+    3. Položku `fields.os` je potřeba nastavit na hodnotu `linux` nebo `windows`. (Filebeat v defaultu neodesílá informaci o os a může být nasazen i na Windows)
+    4. Případně změnit cestu k certifikátu.
+4. Nahrát certifikát na server/stanici, v šabloně nastaven název `http_ca.crt`.
+5. Ověřit že je port 5044 otevřen a není blokován firewallem apod.
+6. Příkazem `sudo systemctl enable filebeat` nastavit službu filebeat, aby se sama spouštěla po restartu systému. Nebo možno jednorázově spustit příkazem `start`.
+**Poznámka k rotaci logů**: Prosím zkontrolovat že pro rotaci logů není použita strategie `copytruncate`. Viz zmínka v [dokumentaci](https://www.elastic.co/guide/en/beats/filebeat/current/file-log-rotation.html).
+### Troubleshooting
+Pro kontrolu správnosti konfiguračního souboru a nastaveného outputu lze použít následující příkazy:
+```text
+sudo filebeat test config
+sudo filebeat test output
+```
--- a/Beats/Filebeat/filebeat.reference.yml
+++ b/Beats/Filebeat/filebeat.reference.yml
+###################### Filebeat Configuration Example #########################
+# This file is an example configuration file highlighting only the most common
+# options. The filebeat.reference.yml file from the same directory contains all the
+# supported options with more comments. You can use it as a reference.
+#
+# You can find the full configuration reference here:
+# https://www.elastic.co/guide/en/beats/filebeat/index.html
+# For more available modules and options, please see the filebeat.reference.yml sample
+# configuration file.
+# ============================== Filebeat inputs ===============================
+filebeat.inputs:
+  # Each - is an input. Most options can be set at the input level, so
+  # you can use different inputs for various configurations.
+  # Below are the input-specific configurations.
+  # filestream is an input for collecting log messages from files.
+  - type: filestream
+    # Unique ID among all inputs, an ID is required.
+    id: my-filestream-id
+    # Change to true to enable this input configuration.
+    enabled: false
+    # Paths that should be crawled and fetched. Glob based paths.
+    paths:
+      - /var/log/*.log
+      #- c:\programdata\elasticsearch\logs\*
+    # Exclude lines. A list of regular expressions to match. It drops the lines that are
+    # matching any regular expression from the list.
+    # Line filtering happens after the parsers pipeline. If you would like to filter lines
+    # before parsers, use include_message parser.
+    #exclude_lines: ['^DBG']
+    # Include lines. A list of regular expressions to match. It exports the lines that are
+    # matching any regular expression from the list.
+    # Line filtering happens after the parsers pipeline. If you would like to filter lines
+    # before parsers, use include_message parser.
+    #include_lines: ['^ERR', '^WARN']
+    # Exclude files. A list of regular expressions to match. Filebeat drops the files that
+    # are matching any regular expression from the list. By default, no files are dropped.
+    #prospector.scanner.exclude_files: ['.gz$']
+    # Optional additional fields. These fields can be freely picked
+    # to add additional information to the crawled log files for filtering
+    #fields:
+    #  level: debug
+    #  review: 1
+# ============================== Filebeat modules ==============================
+filebeat.config.modules:
+  # Glob pattern for configuration loading
+  path: ${path.config}/modules.d/*.yml
+  # Set to true to enable config reloading
+  reload.enabled: false
+  # Period on which files under path should be checked for changes
+  #reload.period: 10s
+# ======================= Elasticsearch template setting =======================
+setup.template.settings:
+  index.number_of_shards: 1
+  #index.codec: best_compression
+  #_source.enabled: false
+# ================================== General ===================================
+# The name of the shipper that publishes the network data. It can be used to group
+# all the transactions sent by a single shipper in the web interface.
+#name:
+# The tags of the shipper are included in their field with each
+# transaction published.
+#tags: ["service-X", "web-tier"]
+# Optional fields that you can specify to add additional information to the
+# output.
+#fields:
+#  env: staging
+# ================================= Dashboards =================================
+# These settings control loading the sample dashboards to the Kibana index. Loading
+# the dashboards is disabled by default and can be enabled either by setting the
+# options here or by using the `setup` command.
+#setup.dashboards.enabled: false
+# The URL from where to download the dashboard archive. By default, this URL
+# has a value that is computed based on the Beat name and version. For released
+# versions, this URL points to the dashboard archive on the artifacts.elastic.co
+# website.
+#setup.dashboards.url:
+# =================================== Kibana ===================================
+# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
+# This requires a Kibana endpoint configuration.
+setup.kibana:
+  # Kibana Host
+  # Scheme and port can be left out and will be set to the default (http and 5601)
+  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
+  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
+  #host: "localhost:5601"
+  # Kibana Space ID
+  # ID of the Kibana Space into which the dashboards should be loaded. By default,
+  # the Default Space will be used.
+  #space.id:
+# =============================== Elastic Cloud ================================
+# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).
+# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
+# `setup.kibana.host` options.
+# You can find the `cloud.id` in the Elastic Cloud web UI.
+#cloud.id:
+# The cloud.auth setting overwrites the `output.elasticsearch.username` and
+# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
+#cloud.auth:
+# ================================== Outputs ===================================
+# Configure what output to use when sending the data collected by the beat.
+# ---------------------------- Elasticsearch Output ----------------------------
+output.elasticsearch:
+  # Array of hosts to connect to.
+  hosts: ["localhost:9200"]
+  # Protocol - either `http` (default) or `https`.
+  #protocol: "https"
+  # Authentication credentials - either API key or username/password.
+  #api_key: "id:api_key"
+  #username: "elastic"
+  #password: "changeme"
+# ------------------------------ Logstash Output -------------------------------
+#output.logstash:
+# The Logstash hosts
+#hosts: ["localhost:5044"]
+# Optional SSL. By default is off.
+# List of root certificates for HTTPS server verifications
+#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
+# Certificate for SSL client authentication
+#ssl.certificate: "/etc/pki/client/cert.pem"
+# Client Certificate Key
+#ssl.key: "/etc/pki/client/cert.key"
+# ================================= Processors =================================
+processors:
+  - add_host_metadata:
+      when.not.contains.tags: forwarded
+  - add_cloud_metadata: ~
+  - add_docker_metadata: ~
+  - add_kubernetes_metadata: ~
+# ================================== Logging ===================================
+# Sets log level. The default log level is info.
+# Available log levels are: error, warning, info, debug
+#logging.level: debug
+# At debug level, you can selectively enable logging only for some components.
+# To enable all selectors, use ["*"]. Examples of other selectors are "beat",
+# "publisher", "service".
+#logging.selectors: ["*"]
+# ============================= X-Pack Monitoring ==============================
+# Filebeat can export internal metrics to a central Elasticsearch monitoring
+# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
+# reporting is disabled by default.
+# Set to true to enable the monitoring reporter.
+#monitoring.enabled: false
+# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
+# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
+# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
+#monitoring.cluster_uuid:
+# Uncomment to send the metrics to Elasticsearch. Most settings from the
+# Elasticsearch outputs are accepted here as well.
+# Note that the settings should point to your Elasticsearch *monitoring* cluster.
+# Any setting that is not set is automatically inherited from the Elasticsearch
+# output configuration, so if you have the Elasticsearch output configured such
+# that it is pointing to your Elasticsearch monitoring cluster, you can simply
+# uncomment the following line.
+#monitoring.elasticsearch:
+# ============================== Instrumentation ===============================
+# Instrumentation support for the filebeat.
+#instrumentation:
+# Set to true to enable instrumentation of filebeat.
+#enabled: false
+# Environment in which filebeat is running on (eg: staging, production, etc.)
+#environment: ""
+# APM Server hosts to report instrumentation results to.
+#hosts:
+#  - http://localhost:8200
+# API Key for the APM Server(s).
+# If api_key is set then secret_token will be ignored.
+#api_key:
+# Secret token for the APM Server(s).
+#secret_token:
+# ================================= Migration ==================================
+# This allows to enable 6.7 migration aliases
+#migration.6_to_7.enabled: true
--- a/Beats/Filebeat/filebeat.yml
+++ b/Beats/Filebeat/filebeat.yml
+filebeat.inputs:
+  # EXAMPLE: collecting syslog messages
+  - type: filestream
+    id: fs_syslog
+    paths:
+      - /var/log/syslog*
+    fields:
+      service: "syslog"
+    prospector.scanner.exclude_files: ['\.gz$', '\.zst$']
+    parsers:
+      - multiline:
+        type: pattern
+        pattern: "^[[:space:]]+"
+        match: after
+    processors:
+      - add_locale: ~
+  # TEMPLATE: how to define your own inputs
+  - type: filestream
+    id: fs_custom_logs
+    # set the paths to files you want to harvest, you can use * to match anything
+    paths:
+      - /var/log/file_you_want_to_log.log
+      - /var/log/my_folder/*.log
+    # exclude compressed files, use only if you use the * wildcard in defined paths
+    prospector.scanner.exclude_files: ['\.gz$', '\.zst$']
+    # use this settings if you have multiline messages in your file (that start with a whitespace), otherwise you can remove it
+    parsers:
+      - multiline:
+        type: pattern
+        pattern: "^[[:space:]]+"
+        match: after
+    # add information about timezone
+    processors:
+      - add_locale: ~
+    # service name will be used for parsing data in logstash and will be part of the name of the index
+    # consult with administrator
+    fields:
+      service: "service_name"
+    # possible additional tags for further specification of input
+    tags: ["tag1", "tag2"]
+# OTHER SETTINGS:
+# you can change the level of logging of the filebeat service, default is info
+logging:
+  level: info
+# if you have a lot of data, you should consider uploading only the last N hours.
+# this setting is important when first running beats
+ignore_older: 24h
+# change the group name and os for better data sorting
+fields:
+  group: "name_of_my_group"
+  os: "linux" # or "windows"
+output.logstash:
+  hosts: ["log.ucn.muni.cz:5045"]
+  ssl.enabled: true
+  ssl.certificate_authorities: ["/etc/filebeat/http_ca.crt"]
--- a/Beats/Filebeat/http_ca.crt
+++ b/Beats/Filebeat/http_ca.crt
+-----BEGIN CERTIFICATE-----
+DUMMY_CERT
+-----END CERTIFICATE-----
--- a/Beats/Winlogbeat/README.md
+++ b/Beats/Winlogbeat/README.md
@@ -26,7 +26,6 @@ Winlogbeat je open-source nástroj vyvinutý společností Elastic, který slou
 ![List of Windows services with Winlogbeat](../../img/services-windows.png)
 ### Hromadná instalace
 Souběžnou instalaci na více zařízení lze provést přes SCCM konzoli.

--- a/README.md
+++ b/README.md
@@ -3,9 +3,11 @@
 ## Základní informace
 ### Název projektu
 Vytvoření datového jezera pro potřeby uchovávání provozních dat Masarykovy univerzity
 ### Řešitelé
 - Ing. Jindřich Zechmeister (MU)
 - RNDr. Daniel Tovarňák, Ph.D. (MU)
 - Mgr. Martin Kotlík (MU)
@@ -13,16 +15,18 @@ Vytvoření datového jezera pro potřeby uchovávání provozních dat Masaryko
 - Bc. Andrea Chimenti (VUT)
 ### Období
 29\. 6\. 2022 - 29\. 6\. 2023
 ### URL
-https://fondrozvoje.cesnet.cz/projekt.aspx?ID=690
+<https://fondrozvoje.cesnet.cz/projekt.aspx?ID=690>
 ## Obsah repozitáře
 Tento repozitář obsahuje výstupní artefakty projektu. Jednotlivé technologie jsou rozděleny po složkách s následující strukturou:
-```
+```text
 ├── Beats
 │   ├── Filebeat
 │   └── Winlogbeat