alerts: Add alerts for missed client backups

When BURP fails to even *start* a backup, it does not trigger a
notification at all.  As a result, I may not notice for a few days when
backups are not happening.  That was the case this week, when clients'
backups were failing immediately, because of a file permissions issue on
the server.  To hopefully avoid missing backups for too long in the
future, I've added two new alerts:

* The *no recent backups* alert fires if there have not been *any* BURP
  backups recently.  This may also fire, for example, if the BURP
  exporter is not working, or if there is something wrong with the BURP
  data volume.
* The *missed client backup* alert fires if an active BURP client (i.e.
  one that has had at least one backup in the past 90 days) has not been
  backed up in the last 24 hours.
step-ssh
Dustin 2023-05-10 07:48:00 -05:00
parent a2bcd5ccbb
commit 877dcc3879
1 changed files with 20 additions and 1 deletions

View File

@ -47,8 +47,27 @@ vmalert_rules:
- alert: mdraid failed disk
expr: collectd_md_md_disks{type="failed"} != 0
- name: BURP RAID
- name: BURP
rules:
- alert: no recent backups
expr: absent(burp_client_last_backup_timestamp)
for: 8h
annotations:
summary: No clients have been backed up recently
description: >-
This alert indicates that NO clients have been backed up within the
last day. There is likely a problem with the BURP server.
- alert: missed client backup
expr:
time() - (burp_client_last_backup_timestamp > now() - 86400 * 90) > 86400 * 2
for: 3h
annotations:
summary: A client has not backed up today
description: >-
A client has not been backed up for more than a day. This may be
because the client is offline, or because the backup process has
failed. Clients that have not been backed up for more than 90 days
will not trigger this alert.
- alert: disks need swapped
expr:
time() - tlast_change_over_time(