alerts: Add alerts for missed client backups
When BURP fails to even *start* a backup, it does not trigger a notification at all. As a result, I may not notice for a few days when backups are not happening. That was the case this week, when clients' backups were failing immediately, because of a file permissions issue on the server. To hopefully avoid missing backups for too long in the future, I've added two new alerts: * The *no recent backups* alert fires if there have not been *any* BURP backups recently. This may also fire, for example, if the BURP exporter is not working, or if there is something wrong with the BURP data volume. * The *missed client backup* alert fires if an active BURP client (i.e. one that has had at least one backup in the past 90 days) has not been backed up in the last 24 hours.step-ssh
parent
a2bcd5ccbb
commit
877dcc3879
|
@ -47,8 +47,27 @@ vmalert_rules:
|
|||
- alert: mdraid failed disk
|
||||
expr: collectd_md_md_disks{type="failed"} != 0
|
||||
|
||||
- name: BURP RAID
|
||||
- name: BURP
|
||||
rules:
|
||||
- alert: no recent backups
|
||||
expr: absent(burp_client_last_backup_timestamp)
|
||||
for: 8h
|
||||
annotations:
|
||||
summary: No clients have been backed up recently
|
||||
description: >-
|
||||
This alert indicates that NO clients have been backed up within the
|
||||
last day. There is likely a problem with the BURP server.
|
||||
- alert: missed client backup
|
||||
expr:
|
||||
time() - (burp_client_last_backup_timestamp > now() - 86400 * 90) > 86400 * 2
|
||||
for: 3h
|
||||
annotations:
|
||||
summary: A client has not backed up today
|
||||
description: >-
|
||||
A client has not been backed up for more than a day. This may be
|
||||
because the client is offline, or because the backup process has
|
||||
failed. Clients that have not been backed up for more than 90 days
|
||||
will not trigger this alert.
|
||||
- alert: disks need swapped
|
||||
expr:
|
||||
time() - tlast_change_over_time(
|
||||
|
|
Loading…
Reference in New Issue