kubernetes

infra

Author	SHA1	Message	Date
Dustin	fefbaa9991	ingress: Use Deployment+Service with externalIPs Now that we have `keepalived` managing the "virtual" IP address for the ingress controller, we can change _ingress-nginx_ to run as a Deployment rather than a DaemonSet. It no longer needs to use the host network namespace, as `kube-proxy` will route all traffic sent to the configured external IP address to the controller pods. Using the _Local_ external traffic policy disables NAT, so incoming traffic is seen by the nginx unmodified.	2024-11-22 22:35:37 -06:00
Dustin	e7ea2b0659	keepalived: Initial commit Running `keepalived` as a DaemonSet will allow managing floating "virtual" IP addresses for Kubernetes services with configured external IP addresses. The main services we want to expose outside the cluster are _ingress-nginx_, Mosquitto, and RabbitMQ. The `keepalived` cluster will negotiate using the VRRF protocol to determine which node should have each external address. Using the process tracking feature of `keepalived`, we can steer traffic directly to the node where the target service is running.	2024-11-22 22:26:48 -06:00
Dustin	5c78bb89b5	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-11-22 19:38:00 -06:00
Dustin	0a6086eb2a	longhorn: Run on dedicated nodes I've created new worker nodes that are dedicated to running Longhorn replicas. These nodes are tainted with the `node-role.kubernetes.io/longhorn` taint, so no regular pods will be scheduled there by default. Longhorn pods thus needs to be configured to tolerate that taint, and to be scheduled on nodes with the similarly-named label.	2024-11-21 22:59:14 -06:00
Dustin	d6c83565ec	rabbitmq: Update to 4.0 RabbitMQ Server 3.13 is out of support now.	2024-11-21 22:59:14 -06:00
Dustin	121e6e7111	rabbitmq: Switch to using volume claim templates This will make it easier to "blow away" the RabbitMQ data volume on the occasions when it gets into a weird state. Simply scale the StatefulSet down to 0 replicas, delete the PVC, then scale back up. Kubernetes will handle creating a new PVC automatically.	2024-11-21 22:59:14 -06:00
Dustin	3d5dd52eb9	ingress: Use upstream resources w/ patches This will make it easier to upgrade, since we keep track of _exactly_ what we changed from the upstream resources with Kustomize patches.	2024-11-21 19:42:35 -06:00
Dustin	3b3d4c38ed	dynk8s: Move Wireguard config to SealedSecret	2024-11-21 19:41:55 -06:00
Dustin	da81a336e1	dynk8s-provisioner: Migrate to Kustomize	2024-11-19 10:43:42 -06:00
Dustin	14492d827a	Merge pull request 'home-assistant: Update to 2024.11.2' (#34 ) from updatebot/home-assistant into master Reviewed-on: #34	2024-11-16 18:04:43 +00:00
Dustin	444686cb1e	Merge pull request 'paperless-ngx: Update to 2.13.0' (#31 ) from updatebot/paperless-ngx into master Reviewed-on: #31	2024-11-16 17:55:04 +00:00
Dustin	ceea84d7f9	Merge pull request 'firefly-iii: Update to 6.1.22' (#33 ) from updatebot/firefly-iii into master Reviewed-on: #33	2024-11-16 17:45:08 +00:00
bot	4d2cc40b5e	tika: Update to 3.0.0.0	2024-11-16 12:32:14 +00:00
bot	c31db5fde2	gotenberg: Update to 8.13.0	2024-11-16 12:32:14 +00:00
bot	74ce0e1b0a	paperless-ngx: Update to 2.13.5	2024-11-16 12:32:14 +00:00
bot	f0b16fd53c	firefly-iii: Update to 6.1.22	2024-11-16 12:32:12 +00:00
bot	acd9a0fa92	zwavejs2mqtt: Update to 9.27.2	2024-11-16 12:32:08 +00:00
bot	115b4ade39	home-assistant: Update to 2024.11.2	2024-11-16 12:32:08 +00:00
Dustin	c1927eecfc	Merge pull request 'home-assistant: Update to 2024.10.4' (#30 ) from updatebot/home-assistant into master Reviewed-on: #30	2024-11-12 15:56:50 +00:00
Dustin	04ef1faf75	Merge pull request 'authelia: Update to 4.38.17' (#32 ) from updatebot/authelia into master Reviewed-on: #32	2024-11-12 15:14:50 +00:00
Dustin	0209f921c3	v-m: Remove nut0 from scrape targets _nut0.pyrocufflink.blue_ is decommissioned.	2024-11-12 08:02:00 -06:00
Dustin	62b19e942b	sshca: Add machine ID for nut1.p.b	2024-11-10 11:19:53 -06:00
bot	b956e9ac05	authelia: Update to 4.38.17	2024-11-09 12:32:16 +00:00
bot	f7eb3b49e7	zwavejs2mqtt: Update to 9.26.0	2024-11-09 12:32:08 +00:00
bot	0db830a670	zigbee2mqtt: Update to 1.41.0	2024-11-09 12:32:08 +00:00
bot	6d137af6dc	home-assistant: Update to 2024.11.1	2024-11-09 12:32:08 +00:00
Dustin	3d40424cf7	fleetlock: Use patched server from Github PR The _fleetlock_ server drains all pods from a node before allocating the reboot lock to that node. Unfortunately, it doesn't actually wait for those pods to be completely evicted. If some pods take too long to shut down, they may get stuck in `Terminating` state once the machine starts rebooting. This makes it so those pods cannot be replaced on another node with the original one is offline, which pretty much defeats the purpose of using Fleetlock in the first place. It seems upstream has abandoned this project, as there is an open [Pull Request][0] to fix this issue that has so far been ignored. Fortunately, building a new container image containing the patch is easy enough, so we can run our own patched build. [0]: https://github.com/poseidon/fleetlock/pull/271	2024-11-05 07:05:55 -06:00
Dustin	ac62a77c96	Merge branch '20125'	2024-11-05 07:05:19 -06:00
Dustin	e1d9833e83	cert-manager: Add cert for apps.du5t1n.xyz	2024-11-05 07:04:27 -06:00
Dustin	4ad5518f18	cert-manager: Migrate config to configMapGenerator	2024-11-05 07:04:09 -06:00
Dustin	9f287d0f71	v-m/alerts: Add alerts for backup RAID array Just like I did with the RAID-1 array in the old BURP server, I will keep one member active and one in the fireproof safe, swapping them each month. We can use the same metrics queries to alert on when the swap should happen that we used with the BURP server.	2024-11-04 20:46:03 -06:00
Dustin	2380468658	v-m/scrape: Collect Jellyfin metrics	2024-11-04 20:38:25 -06:00
Dustin	db7c07ee55	v-m/scrape: Ignore cloud Kubernetes nodes The ephemeral Jenkins worker nodes that run in AWS don't have colletcd, promtail, or Zincati. We don't needto get three alerts every time a worker starts up to handle am ARM build job, so we drop these discovered targets for these scrape jobs.	2024-11-04 20:35:17 -06:00
Dustin	d76a1360c8	v-m/alerts: Ignore Paperless consume_file task Paperless-ngx uses a Celery task to process uploaded files, converting them to PDF, running OCR, etc. This task can be marked as "failed" for various reasons, most of which are more about the document itself than the health of the application. The GUI displays the results of failed tasks when they occur. It doesn't really make sense to have an alert about this scenario, especially since there's nothing to do to directly clear the alert anyway.	2024-11-04 20:28:11 -06:00
Dustin	71b52e4c6f	20125: Deploy Status server https://20125.home/ is the URL the Status Android application loads in its main WebView. This site is powered by a server that generates a custom page showing the status of our self-hosted applications, based on alerts retrieved from the AlertManager API. Android WebView does not allow cleartext HTTP connections. It does, however, allow connecting an HTTPS server and ignoring the certificate it presents, which is effectively the same thing. Thus, we generate a self-signed certificate for the Ingress for this site.	2024-11-02 19:51:53 -05:00
Dustin	8ecee4133f	v-m/alerts: Rework free disk space alert Fedora CoreOS fills `/boot` beyond the 75% alert threshold under normal circumstances on aarch64 machines. This is not a problem, because it cleans up old files on its own, so we do not need to alert on it. Unfortunately, the _DiskUsage_ alert is already quite complex, and adding in exclusions for these devices would make it even worse. To simplify the logic, we can use a recording rule to precomupte the used/free space ratio. By using `sum(...) without (type)` instead of `sum(...) on (df, instance)`, we keep the other labels, which we can then use to identify the metrics coming from machines we don't care to monitor. Instead of having different thresholds for different volumes encoded in the same expression, we can use multiple alerts to alert on "low" vs "very low" thresholds. Since this will of course cause duplicate alerts for most volumes, we can use AlertManager inhibition rules to disable the "low" alert once the metric crosses the "very low" threshold.	2024-11-02 09:38:02 -05:00
Dustin	4cef41688f	v-m/alerts: Add Zigbee+ZWave network alerts	2024-11-01 18:14:56 -05:00
Dustin	6cf11f9f61	v-m: Scrape HAProxy	2024-11-01 18:14:37 -05:00
Dustin	7a768cbb76	v-m: Update jobs for new Loki server loki1.pyrocufflink.blue is a regular Fedora machine, a member of the AD domain, and managed by Ansible. Thus, it does not need to be explicitly listed as a scrape target. For scraping metrics from Loki itself, I've changed the job to use DNS-SD because it seems like `vmagent` does _not_ re-resolve host names from static configuration.	2024-11-01 18:07:34 -05:00
Dustin	0101040634	v-m/alerts: Add Paperless-ngx email task alert This alert should fire if the background task to fetch e-mail and import them into Paperless-ngx has not run for a while.	2024-11-01 18:04:06 -05:00
Dustin	3f9601dc94	v-m/alerts: Improve Paperless-ngx Celery task alert The `flower_events_total` metric is a counter, so its value only ever increases (discounting restarts of the server process). As such, nonzero values do not necessarily indicate a _current_ problem, but rather that there was one at some point in the past. To identify current issues, we need to use the `increase` function, and then apply the `max_over_time` function so that the alert doesn't immediately reset itself.	2024-11-01 18:00:50 -05:00
Dustin	d12e66f58a	v-m: Scrape Frigate exporter	2024-11-01 17:47:51 -05:00
Dustin	045eea89a9	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-10-19 09:49:59 -05:00
Dustin	8ff45a8c01	paperless-ngx/gotenberg: Run as correct user The Gotenberg container image uses UID 1001 for the _gotenberg_ user. Using any other UID number, even when the home directory is set and owned by that UID, results in random issues, especially when using LibreOffice conversions.	2024-10-19 09:46:15 -05:00
giteadmin	d3e00680c0	Merge pull request 'home-assistant: Update to 2024.10.3' (#29 ) from updatebot/home-assistant into master Reviewed-on: #29	2024-10-19 13:13:12 +00:00
bot	c5daf23f71	mosquitto: Update to 2.0.20	2024-10-19 11:32:16 +00:00
bot	6e2c8d1a25	zwavejs2mqtt: Update to 9.24.0	2024-10-19 11:32:16 +00:00
bot	0e3f719e32	whisper: Update to 2.2.0	2024-10-19 11:32:16 +00:00
bot	94e10207d2	home-assistant: Update to 2024.10.3	2024-10-19 11:32:15 +00:00
Dustin	99c8f7694c	paperless-ngx: Split resources into separate files The Paperless-ngx ecosystem consists of several services. Defining the resources for each service in separate manifest files will make maintenance a little bit easier.	2024-10-17 07:27:33 -05:00

1 2 3 4 5 ...

431 Commits (fefbaa9991fb106dcde2d9b089f7a32df0f32ad8) All Branches Search

431 Commits (fefbaa9991fb106dcde2d9b089f7a32df0f32ad8)

All Branches