1
0
Fork 0
Commit Graph

1 Commits (9f78f01f14ec4feffb41f51601a4ab636da102ad)

Author SHA1 Message Date
Dustin 8ecee4133f v-m/alerts: Rework free disk space alert
Fedora CoreOS fills `/boot` beyond the 75% alert threshold under normal
circumstances on aarch64 machines.  This is not a problem, because it
cleans up old files on its own, so we do not need to alert on it.
Unfortunately, the _DiskUsage_ alert is already quite complex, and
adding in exclusions for these devices would make it even worse.

To simplify the logic, we can use a recording rule to precomupte the
used/free space ratio.  By using `sum(...) without (type)` instead of
`sum(...) on (df, instance)`, we keep the other labels, which we can
then use to identify the metrics coming from machines we don't care to
monitor.

Instead of having different thresholds for different volumes
encoded in the same expression, we can use multiple alerts to alert on
"low" vs "very low" thresholds.  Since this will of course cause
duplicate alerts for most volumes, we can use AlertManager inhibition
rules to disable the "low" alert once the metric crosses the "very low"
threshold.
2024-11-02 09:38:02 -05:00