v-m: Remove BURP metrics, alerts
BURP is officially decommissioned, replaced by Restic.pull/22/head
parent
4adb9cd243
commit
f182479d34
|
@ -41,58 +41,6 @@ groups:
|
||||||
- alert: mdraid failed disk
|
- alert: mdraid failed disk
|
||||||
expr: collectd_md_md_disks{type="failed"} != 0
|
expr: collectd_md_md_disks{type="failed"} != 0
|
||||||
|
|
||||||
- name: BURP
|
|
||||||
rules:
|
|
||||||
- alert: no recent backups
|
|
||||||
expr: absent(burp_client_last_backup_timestamp)
|
|
||||||
for: 8h
|
|
||||||
annotations:
|
|
||||||
summary: No clients have been backed up recently
|
|
||||||
description: >-
|
|
||||||
This alert indicates that NO clients have been backed up within the
|
|
||||||
last day. There is likely a problem with the BURP server.
|
|
||||||
- alert: missed client backup
|
|
||||||
expr:
|
|
||||||
time() - (burp_client_last_backup_timestamp > now() - 86400 * 90) > 86400 * 2
|
|
||||||
for: 3h
|
|
||||||
annotations:
|
|
||||||
summary: A client has not backed up today
|
|
||||||
description: >-
|
|
||||||
A client has not been backed up for more than a day. This may be
|
|
||||||
because the client is offline, or because the backup process has
|
|
||||||
failed. Clients that have not been backed up for more than 90 days
|
|
||||||
will not trigger this alert.
|
|
||||||
- alert: disks need swapped
|
|
||||||
expr:
|
|
||||||
time() - tlast_change_over_time(
|
|
||||||
(
|
|
||||||
collectd_md_md_disks{instance="burp1.pyrocufflink.blue", type="active"}
|
|
||||||
or last_over_time(collectd_md_md_disks{instance="burp1.pyrocufflink.blue", type="active"})[1d]
|
|
||||||
)[90d]
|
|
||||||
) > 86400 * 30
|
|
||||||
annotations:
|
|
||||||
summary: The disks in the BURP array need swapped
|
|
||||||
description: >-
|
|
||||||
The disks in the BURP RAID-1 (mirror) array should be swapped
|
|
||||||
periodically. One disk should be online and mounted while the other
|
|
||||||
is stored in the fireproof safe. Switching them ensures that even if
|
|
||||||
something happens to the active disk, such as hardware failure, power
|
|
||||||
surge, fire, or accidental `rm -rf`, the offline disk is only out of
|
|
||||||
date by a few weeks.
|
|
||||||
- alert: disk needs archived
|
|
||||||
expr:
|
|
||||||
sum(
|
|
||||||
collectd_md_md_disks{instance="burp1.pyrocufflink.blue", type=~"missing|spare"}
|
|
||||||
) < 1
|
|
||||||
annotations:
|
|
||||||
summary: One of the disks in the BURP array should be archived
|
|
||||||
description: >-
|
|
||||||
The disks in the BURP RAID-1 (mirror) array should be swapped
|
|
||||||
periodically. One disk should be online and mounted while the other
|
|
||||||
is stored in the fireproof safe. All of the disks are currently
|
|
||||||
online; one needs to be disconnected and moved to the safe as soon as
|
|
||||||
possible.
|
|
||||||
|
|
||||||
- name: certificates
|
- name: certificates
|
||||||
rules:
|
rules:
|
||||||
- alert: certificate will expire soon
|
- alert: certificate will expire soon
|
||||||
|
|
|
@ -218,20 +218,6 @@ scrape_configs:
|
||||||
- targets:
|
- targets:
|
||||||
- jenkins.pyrocufflink.blue
|
- jenkins.pyrocufflink.blue
|
||||||
|
|
||||||
- job_name: burp
|
|
||||||
scrape_interval: 270s
|
|
||||||
scrape_timeout: 30s
|
|
||||||
static_configs:
|
|
||||||
- targets:
|
|
||||||
- burp.pyrocufflink.blue:9645
|
|
||||||
|
|
||||||
- job_name: minio-backups
|
|
||||||
metrics_path: /minio/v2/metrics/cluster
|
|
||||||
scheme: https
|
|
||||||
static_configs:
|
|
||||||
- targets:
|
|
||||||
- burp.pyrocufflink.blue:9000
|
|
||||||
|
|
||||||
- job_name: kubernetes
|
- job_name: kubernetes
|
||||||
scheme: https
|
scheme: https
|
||||||
tls_config:
|
tls_config:
|
||||||
|
|
Loading…
Reference in New Issue