Skip to content
Snippets Groups Projects
Verified Commit af7dc067 authored by Jiří Prokop's avatar Jiří Prokop
Browse files

feat: run_probes - add timeout for each check

BREAKING CHANGE: config structure change + new required option
parent 1e10e173
No related branches found
No related tags found
1 merge request!74feat: run_probes - add timeout for each check
Pipeline #474347 passed
...@@ -25,6 +25,8 @@ There are several extras which are required only for some scripts: ...@@ -25,6 +25,8 @@ There are several extras which are required only for some scripts:
- script designed to execute multiple monitoring probes - script designed to execute multiple monitoring probes
- output is compatible with CheckMK - output is compatible with CheckMK
- it is required to put configuration file to `/etc/run_probes_cfg.yaml` - it is required to put configuration file to `/etc/run_probes_cfg.yaml`
- default timeout in seconds for all checks is set by `default_timeout` in the config, and each check can optionally have its own `timeout` setting overriding the default one.
For usage instructions, run: For usage instructions, run:
......
check_mongodb: default_timeout: 30 # in seconds
# module with checks checks:
module: perun.proxy.utils.nagios.check_mongodb check_mongodb:
check_mongodb_shared: &check_mongodb_shared # module with checks
host: "hostname" module: perun.proxy.utils.nagios.check_mongodb
u: "username" check_mongodb_shared: &check_mongodb_shared
p: "password" host: "hostname"
tls: true
tls-ca-file: "/etc/ssl/chain.crt"
tls-cert-key-file: "/etc/ssl/certificate_and_key.pem"
runs:
# check with parameter
check_mongodb_connect:
<<: *check_mongodb_shared
A: connect
W: 2
C: 4
check_mongodb_connections:
<<: *check_mongodb_shared
A: connections
W: 70
C: 80
check_mongodb_replication_lag:
<<: *check_mongodb_shared
A: replication_lag
W: 15
C: 30
check_mongodb_replset_state:
<<: *check_mongodb_shared
A: replset_state
W: 0
C: 0
check_rpc_status:
module: perun.proxy.utils.nagios.check_rpc_status
runs:
check_rpc_status:
u: "username" u: "username"
p: "password" p: "password"
d: "domain" tls: true
i: 1 tls-ca-file: "/etc/ssl/chain.crt"
tls-cert-key-file: "/etc/ssl/certificate_and_key.pem"
runs:
# check with parameter
check_mongodb_connect:
<<: *check_mongodb_shared
A: connect
W: 2
C: 4
check_mongodb_connections:
<<: *check_mongodb_shared
A: connections
W: 70
C: 80
timeout: 60
check_mongodb_replication_lag:
<<: *check_mongodb_shared
A: replication_lag
W: 15
C: 30
check_mongodb_replset_state:
<<: *check_mongodb_shared
A: replset_state
W: 0
C: 0
check_rpc_status:
module: perun.proxy.utils.nagios.check_rpc_status
runs:
check_rpc_status:
u: "username"
p: "password"
d: "domain"
i: 1
check_syncrepl: check_syncrepl:
module: perun.proxy.utils.nagios.check_ldap_syncrepl module: perun.proxy.utils.nagios.check_ldap_syncrepl
runs: runs:
check_ldap_syncrepl: check_ldap_syncrepl:
p: "ldaps://ldapmaster.foo:636" p: "ldaps://ldapmaster.foo:636"
c: "ldaps://ldapslave.foo:636" c: "ldaps://ldapslave.foo:636"
b: "o=example" b: "o=example"
D: "uid=nagios,ou=sysaccounts,o=example" D: "uid=nagios,ou=sysaccounts,o=example"
P: "bind_password" P: "bind_password"
n: n:
only-check-contextCSN: only-check-contextCSN:
W: 900 W: 900
C: 3600 C: 3600
check_exabgp_propagation: check_exabgp_propagation:
module: perun.proxy.utils.nagios.check_exabgp_propagation module: perun.proxy.utils.nagios.check_exabgp_propagation
runs: runs:
check_exabgp_propagation: check_exabgp_propagation:
...@@ -41,7 +41,7 @@ def get_metrics_and_new_output(output): ...@@ -41,7 +41,7 @@ def get_metrics_and_new_output(output):
return None, output return None, output
def run_probe(probe_name, command): def run_probe(probe_name, command, timeout):
""" """
Runs nagios monitoring probe and prints output in following formats: Runs nagios monitoring probe and prints output in following formats:
1) return_code probe_name metrics output 1) return_code probe_name metrics output
...@@ -50,9 +50,17 @@ def run_probe(probe_name, command): ...@@ -50,9 +50,17 @@ def run_probe(probe_name, command):
metrics output format: metrics output format:
metric1=val;|metric2=val2|metric3=val3;val3;;;|metric4=val4 metric1=val;|metric2=val2|metric3=val3;val3;;;|metric4=val4
""" """
result = subprocess.run( try:
command, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT result = subprocess.run(
) command,
text=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
timeout=timeout,
)
except subprocess.TimeoutExpired:
print(f"3 {probe_name} - probe TIMED OUT after {timeout}s")
return 3
output = re.sub("[ \t\n]+", " ", result.stdout) output = re.sub("[ \t\n]+", " ", result.stdout)
search = re.search(r" - .*", output) search = re.search(r" - .*", output)
if search: if search:
...@@ -71,12 +79,17 @@ def main(): ...@@ -71,12 +79,17 @@ def main():
if not config: if not config:
return return
for _, options in config.items(): global_timeout = config["default_timeout"]
for _, options in config["checks"].items():
module = options["module"] module = options["module"]
for name, args in options.get("runs").items(): for name, args in options.get("runs").items():
command = ["python3", "-m", module] command = ["python3", "-m", module]
timeout = global_timeout
if args is not None: if args is not None:
for arg_name, arg_val in args.items(): for arg_name, arg_val in args.items():
if arg_name == "timeout":
timeout = arg_val
continue
if len(arg_name) == 1: if len(arg_name) == 1:
arg_name = "-" + arg_name arg_name = "-" + arg_name
else: else:
...@@ -88,7 +101,7 @@ def main(): ...@@ -88,7 +101,7 @@ def main():
command.append(arg_name) command.append(arg_name)
if arg_val is not None: if arg_val is not None:
command.append(str(arg_val)) command.append(str(arg_val))
Thread(target=run_probe, args=[name, command]).start() Thread(target=run_probe, args=[name, command, timeout]).start()
if __name__ == "__main__": if __name__ == "__main__":
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment