kubernetes

infra

Author	SHA1	Message	Date
bot	0db830a670	zigbee2mqtt: Update to 1.41.0	2024-11-09 12:32:08 +00:00
bot	6d137af6dc	home-assistant: Update to 2024.11.1	2024-11-09 12:32:08 +00:00
Dustin	3d40424cf7	fleetlock: Use patched server from Github PR The _fleetlock_ server drains all pods from a node before allocating the reboot lock to that node. Unfortunately, it doesn't actually wait for those pods to be completely evicted. If some pods take too long to shut down, they may get stuck in `Terminating` state once the machine starts rebooting. This makes it so those pods cannot be replaced on another node with the original one is offline, which pretty much defeats the purpose of using Fleetlock in the first place. It seems upstream has abandoned this project, as there is an open [Pull Request][0] to fix this issue that has so far been ignored. Fortunately, building a new container image containing the patch is easy enough, so we can run our own patched build. [0]: https://github.com/poseidon/fleetlock/pull/271	2024-11-05 07:05:55 -06:00
Dustin	ac62a77c96	Merge branch '20125'	2024-11-05 07:05:19 -06:00
Dustin	e1d9833e83	cert-manager: Add cert for apps.du5t1n.xyz	2024-11-05 07:04:27 -06:00
Dustin	4ad5518f18	cert-manager: Migrate config to configMapGenerator	2024-11-05 07:04:09 -06:00
Dustin	9f287d0f71	v-m/alerts: Add alerts for backup RAID array Just like I did with the RAID-1 array in the old BURP server, I will keep one member active and one in the fireproof safe, swapping them each month. We can use the same metrics queries to alert on when the swap should happen that we used with the BURP server.	2024-11-04 20:46:03 -06:00
Dustin	2380468658	v-m/scrape: Collect Jellyfin metrics	2024-11-04 20:38:25 -06:00
Dustin	db7c07ee55	v-m/scrape: Ignore cloud Kubernetes nodes The ephemeral Jenkins worker nodes that run in AWS don't have colletcd, promtail, or Zincati. We don't needto get three alerts every time a worker starts up to handle am ARM build job, so we drop these discovered targets for these scrape jobs.	2024-11-04 20:35:17 -06:00
Dustin	d76a1360c8	v-m/alerts: Ignore Paperless consume_file task Paperless-ngx uses a Celery task to process uploaded files, converting them to PDF, running OCR, etc. This task can be marked as "failed" for various reasons, most of which are more about the document itself than the health of the application. The GUI displays the results of failed tasks when they occur. It doesn't really make sense to have an alert about this scenario, especially since there's nothing to do to directly clear the alert anyway.	2024-11-04 20:28:11 -06:00
Dustin	71b52e4c6f	20125: Deploy Status server https://20125.home/ is the URL the Status Android application loads in its main WebView. This site is powered by a server that generates a custom page showing the status of our self-hosted applications, based on alerts retrieved from the AlertManager API. Android WebView does not allow cleartext HTTP connections. It does, however, allow connecting an HTTPS server and ignoring the certificate it presents, which is effectively the same thing. Thus, we generate a self-signed certificate for the Ingress for this site.	2024-11-02 19:51:53 -05:00
Dustin	8ecee4133f	v-m/alerts: Rework free disk space alert Fedora CoreOS fills `/boot` beyond the 75% alert threshold under normal circumstances on aarch64 machines. This is not a problem, because it cleans up old files on its own, so we do not need to alert on it. Unfortunately, the _DiskUsage_ alert is already quite complex, and adding in exclusions for these devices would make it even worse. To simplify the logic, we can use a recording rule to precomupte the used/free space ratio. By using `sum(...) without (type)` instead of `sum(...) on (df, instance)`, we keep the other labels, which we can then use to identify the metrics coming from machines we don't care to monitor. Instead of having different thresholds for different volumes encoded in the same expression, we can use multiple alerts to alert on "low" vs "very low" thresholds. Since this will of course cause duplicate alerts for most volumes, we can use AlertManager inhibition rules to disable the "low" alert once the metric crosses the "very low" threshold.	2024-11-02 09:38:02 -05:00
Dustin	4cef41688f	v-m/alerts: Add Zigbee+ZWave network alerts	2024-11-01 18:14:56 -05:00
Dustin	6cf11f9f61	v-m: Scrape HAProxy	2024-11-01 18:14:37 -05:00
Dustin	7a768cbb76	v-m: Update jobs for new Loki server loki1.pyrocufflink.blue is a regular Fedora machine, a member of the AD domain, and managed by Ansible. Thus, it does not need to be explicitly listed as a scrape target. For scraping metrics from Loki itself, I've changed the job to use DNS-SD because it seems like `vmagent` does _not_ re-resolve host names from static configuration.	2024-11-01 18:07:34 -05:00
Dustin	0101040634	v-m/alerts: Add Paperless-ngx email task alert This alert should fire if the background task to fetch e-mail and import them into Paperless-ngx has not run for a while.	2024-11-01 18:04:06 -05:00
Dustin	3f9601dc94	v-m/alerts: Improve Paperless-ngx Celery task alert The `flower_events_total` metric is a counter, so its value only ever increases (discounting restarts of the server process). As such, nonzero values do not necessarily indicate a _current_ problem, but rather that there was one at some point in the past. To identify current issues, we need to use the `increase` function, and then apply the `max_over_time` function so that the alert doesn't immediately reset itself.	2024-11-01 18:00:50 -05:00
Dustin	d12e66f58a	v-m: Scrape Frigate exporter	2024-11-01 17:47:51 -05:00
Dustin	045eea89a9	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-10-19 09:49:59 -05:00
Dustin	8ff45a8c01	paperless-ngx/gotenberg: Run as correct user The Gotenberg container image uses UID 1001 for the _gotenberg_ user. Using any other UID number, even when the home directory is set and owned by that UID, results in random issues, especially when using LibreOffice conversions.	2024-10-19 09:46:15 -05:00
giteadmin	d3e00680c0	Merge pull request 'home-assistant: Update to 2024.10.3' (#29 ) from updatebot/home-assistant into master Reviewed-on: #29	2024-10-19 13:13:12 +00:00
bot	c5daf23f71	mosquitto: Update to 2.0.20	2024-10-19 11:32:16 +00:00
bot	6e2c8d1a25	zwavejs2mqtt: Update to 9.24.0	2024-10-19 11:32:16 +00:00
bot	0e3f719e32	whisper: Update to 2.2.0	2024-10-19 11:32:16 +00:00
bot	94e10207d2	home-assistant: Update to 2024.10.3	2024-10-19 11:32:15 +00:00
Dustin	99c8f7694c	paperless-ngx: Split resources into separate files The Paperless-ngx ecosystem consists of several services. Defining the resources for each service in separate manifest files will make maintenance a little bit easier.	2024-10-17 07:27:33 -05:00
Dustin	e19e8f50ab	v-m/alerts: Add alerts for Paperless-ngx	2024-10-17 07:18:23 -05:00
Dustin	78651eb5f8	v-m/alerts: Add alerts for PostgreSQL WAL archiver	2024-10-17 07:18:09 -05:00
Dustin	ee3e078b20	v-m/alerts: Add alerts for Restic backups	2024-10-17 06:58:48 -05:00
Dustin	ea89e0cde4	v-m/scrape: Remove synapse job The Synapse server is now completely decommissioned.	2024-10-17 06:50:27 -05:00
Dustin	e581957f9d	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-10-15 07:59:42 -05:00
Dustin	b01300f8cc	Merge pull request 'zwavejs2mqtt: Update to 9.20.0' (#26 ) from updatebot/home-assistant into master Reviewed-on: #26	2024-10-15 12:43:28 +00:00
bot	55ae979a1d	mosquitto: Update to 2.0.19	2024-10-15 12:42:36 +00:00
bot	1de05f2ccc	zwavejs2mqtt: Update to 9.23.0	2024-10-15 12:42:36 +00:00
bot	58f7f9e2cc	zigbee2mqtt: Update to 1.40.2	2024-10-15 12:42:35 +00:00
bot	390eacf209	home-assistant: Update to 2024.10.2	2024-10-15 12:42:35 +00:00
Dustin	145fa6286e	storage: Add Longhorn backup target secret Longhorn uses a special Secret resource to configure the backup target. This secret includes the credentials and CA certificate for accessing the MinIO S3 service. Longhorn must be configured to use this Secret by setting the `backup-target-credential-secret` setting to `minio-backups-credentials`.	2024-10-13 14:03:49 -05:00
Dustin	1b4bb234c8	Merge pull request 'gotenberg: Update to 8.10.0' (#25 ) from updatebot/paperless-ngx into master Reviewed-on: #25	2024-10-12 20:44:58 +00:00
Dustin	7e2512c261	Merge pull request 'authelia: Update to 4.38.12' (#28 ) from updatebot/authelia into master Reviewed-on: #28	2024-10-12 20:44:41 +00:00
bot	281ec623c4	authelia: Update to 4.38.16	2024-10-12 11:33:03 +00:00
bot	51fe6f39af	gotenberg: Update to 8.12.0	2024-10-12 11:33:00 +00:00
Dustin	2ccbcd494c	firefly-iii: Update to 6.1.21 Notably, this version fixes the ~4s delay when creating/editing transactions.	2024-10-02 09:08:58 -05:00
Dustin	e9bfc63a74	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-10-02 09:08:31 -05:00
Dustin	32171cc76e	Merge pull request 'firefly-iii: Update to 6.1.20' (#27 ) from updatebot/firefly-iii into master Reviewed-on: #27	2024-09-29 21:09:41 +00:00
bot	71f091fa05	firefly-iii: Update to 6.1.20	2024-09-28 11:32:18 +00:00
Dustin	df50decba1	argocd: apps/authelia: Enable auto-sync This way, merging PRs from updatebot will automatically trigger updating Paperless-ngx et al.	2024-09-24 07:16:45 -05:00
Dustin	0022171616	argocd: apps/ntfy: Enable auto-sync This way, merging PRs from updatebot will automatically trigger updating Paperless-ngx et al.	2024-09-24 07:16:34 -05:00
Dustin	a149bc8761	updatebot: Manage Authelia	2024-09-24 07:15:41 -05:00
Dustin	76588c3e20	updatebot: Manage Mosquitto	2024-09-24 07:08:56 -05:00
Dustin	bdc24e1778	updatebot: Manage ntfy	2024-09-24 07:05:37 -05:00

... 4 5 6 7 8 ...

657 Commits (69f8c1a27d3e3f54a5b807ef34cd51aab3e24997) All Branches Search

657 Commits (69f8c1a27d3e3f54a5b807ef34cd51aab3e24997)

All Branches