configpolicy

dustin

Author	SHA1	Message	Date
Dustin	d7f778b01c	metricspi: Scrape metrics from k8s-aarch64-n0 collectd is now running on k8s-aarch64-n0.pyrocufflink.blue, exposing system metrics. As it is not a member of the AD domain, it has to be explicitly listed in the `scrape_collectd_extra_targets` variable.	2023-10-03 10:29:11 -05:00
Dustin	50f4b565f8	hosts: Remove nvr1.p.b as managed system nvr1.pyrocufflink.blue has been migrated to Fedora CoreOS. As such, it is no longer managed by Ansible; its configuration is done via Butane/Ignition. It is no longer a member of the Active Directory domain, but it does still run collectd and export Prometheus metrics.	2023-09-27 20:24:47 -05:00
Dustin	7a9c678ff3	burp-server: Keep more backups New retention policy: * 7 daily backups * 4 weekly backups * 12 ~monthly backups * 5 ~yearly backups	2023-07-17 16:36:37 -05:00
Dustin	06782b03bb	vm-hosts: Update VM autostart list * dc2 is gone for a long time, replaced by two new domain controllers * unifi0 was recently replaced by unifi1	2023-07-07 10:05:22 -05:00
Dustin	71a43ccf07	unifi: Deploy Unifi Network controller Since Ubiquiti only publishes Debian packages for the Unifi Network controller software, running it on Fedora has historically been neigh impossible. Fortunately, a modern solution is available: containers. The linuxserver.io project publishes a container image for the controller software, making it fairly easy to deploy on any host with an OCI runtime. I briefly considered creating my own image, since theirs must be run as root, but I decided the maintenance burden would not be worth it. Using Podman's user namespace functionality, I was able to work around this requirement anyway.	2023-07-07 10:05:01 -05:00
Dustin	61844e8a95	pyrocufflink: Add Luma SSH keys for root Sometimes I need to connect to a machine when there is an AD issue (e.g. domain controllers are down, clocks are out of sync, etc.) but I can't do it from my desktop.	2023-07-05 16:35:57 -05:00
Dustin	0a68d84121	metricspi: Scrape hatchlearningcenter.org To monitor site availability and certificate expiration.	2023-06-21 14:31:33 -05:00
Dustin	4e608e379f	metricspi/alerts: Correct BURP archive alert query When the RAID array is being resynchronized after the archived disk has been reconnected, md changes the disk status from "missing" to "spare." Once the synchronization is complete, it changes from "spare" to "active." We only want to trigger the "disk needs archived" alert once the synchronization process is complete; otherwise, both the "disks need swapped" and "disk needs archived" alerts would be active at the same time, which makes no sense. By adjusting the query for the "disk needs archived" alert to consider disks in both "missing" and "spare" status, we can delay firing that alert until the proper time.	2023-06-20 11:58:35 -05:00
Dustin	bf4d57b5cb	frigate: Configure journal2ntfy for MD RAID The Frigate server has a RAID array that it uses to store video recordings. Since there have been a few occasions where the array has suddenly stopped functioning, probably because of the cheap SATA controller, it will be nice to get an alert as soon as the kernel detects the problem, so as to minimize data loss.	2023-06-08 10:05:36 -05:00
Dustin	87e8ec2ed4	synapse: Back up data using BURP Most of the Synapse server's state is in its SQLite database. It also has a `media_store` directory that needs to be backed up, though. In order to back up the SQLite database while the server is running, the database must be in "WAL mode." By default, Synapse leaves the database in the default "rollback journal mode," which disallows multiple processes from accessing the database, even for read-only operations. To change the journal mode: ```sh sudo systemctl stop synapse sudo -u synapse sqlite3 /var/lib/synapse/homeserver.db 'PRAGMA journal_mode=WAL;' sudo systemctl start synapse ```	2023-05-23 09:52:50 -05:00
Dustin	78296f7198	Merge branch 'journal2ntfy'	2023-05-23 08:31:52 -05:00
Dustin	347cda74fd	metrics: Scrape metrics from Kubernetes API server Kubernetes exports a lot of metrics in Prometheus format. I am not sure what all is there, yet, but apparently several thousand time series were added. To allow anonymous access to the metrics, I added this RoleBinding: ```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: - "" resources: - nodes/metrics verbs: - get - nonResourceURLs: - /metrics verbs: - get ```	2023-05-22 21:21:08 -05:00
Dustin	c0bb387b18	metricspi: Scrape metrics from MinIO backup storage MinIO exposes metrics in Prometheus exposition format. By default, it requires an authentication token to access the metrics, but I was unable to get this to work. Fortunately, it can be configured to allow anonymous access to the metrics, which is fine, in my opinion.	2023-05-22 21:19:25 -05:00
Dustin	a7319c561d	journal2ntfy: Script to send log messagess via ntfy The `journal2ntfy.py` script follows the systemd journal by spawning `journalctl` as a child process and reading from its standard output stream. Any command-line arguments passed to `journal2ntfy` are passed to `journalctl`, which allows the caller to specify message filters. For any matching journal message, `journal2ntfy` sends a message via the ntfy web service. For the BURP server, we're going to use `journal2ntfy` to generate alerts about the RAID array. When I reconnect the disk that was in the fireproof safe, the kernel will log a message from the md subsystem indicating that the resynchronization process has begun. Then, when the disks are again in sync, it will log another message, which will let me know it is safe to archive the other disk.	2023-05-17 14:51:21 -05:00
Dustin	2c002aa7c5	alerts: Add alert to archive BURP disk This alert will fire once the MD RAID resynchronization process has completed and both disks in the array are online. It will clear when one disk is disconnected and moved to the safe.	2023-05-16 08:33:13 -05:00
Dustin	877dcc3879	alerts: Add alerts for missed client backups When BURP fails to even start a backup, it does not trigger a notification at all. As a result, I may not notice for a few days when backups are not happening. That was the case this week, when clients' backups were failing immediately, because of a file permissions issue on the server. To hopefully avoid missing backups for too long in the future, I've added two new alerts: * The no recent backups alert fires if there have not been any BURP backups recently. This may also fire, for example, if the BURP exporter is not working, or if there is something wrong with the BURP data volume. * The missed client backup alert fires if an active BURP client (i.e. one that has had at least one backup in the past 90 days) has not been backed up in the last 24 hours.	2023-05-14 11:48:36 -05:00
Dustin	a2bcd5ccbb	alerts: Adjust BURP RAID disk swap alert Using a 30-day window for the `tlast_change_over_time` function effectively "caps out" the value at 30 days. Thus, the alert reminding me to swap the BURP backup volume will never fire, since the value will never be greater than the 30-day threshold. Using a wider window resolves that issue (though the query will still produce inaccurate results beyond the window).	2023-05-14 11:38:00 -05:00
Dustin	ad9fb6798e	samba-dc: Omit tls cafile setting The `tls cafile` setting in `smb.conf` is not necessary. It is used for verifying peer certificates for mutual TLS authentication, not to specify the intermediate certificate authority chain like I thought. The setting cannot simply be left out, though. If it is not specified, Samba will attempt to load a file from a built-in default path, which will fail, causing the server to crash. This is avoided by setting the value to the empty string.	2023-05-10 08:28:49 -05:00
Dustin	9722fed1b8	metricspi: Scrape dustinandtabitha.com	2023-05-09 21:30:11 -05:00
Dustin	f6f286ac24	alerts: Correct BURP volume swap alert The `tlast_change_over_time` function needs an interval wide enough to consider the range of time we are intrested in. In this case, we want to see if the BURP volume has been swapped in the last thirty days, so the interval needs to be `30d`.	2023-05-03 11:06:34 -05:00
Dustin	5ed3ee525e	synapse: Update LDAP server URI	2023-05-01 12:36:33 -05:00
Dustin	a4cc9d0c46	metricspi: Scrape tabitha.biz	2023-04-23 20:03:43 -05:00
Dustin	6c68126a3a	grafana: Update LDAP server host name dc0.p.b has been gone for a while now. All the current domain controllers use LDAPS certificates signed by Let's Encrypt and include the pyrocufflink.blue name, so we can now use the apex domain A record to connect to the directory.	2023-04-12 14:07:51 -05:00
Dustin	78f65355fa	gitea: Back up with BURP	2023-04-12 14:07:51 -05:00
Dustin	1da4c17a8c	alerts: Add alerts for HTTPS certificates These alerts will generate notifications when websites' HTTPS certificates are not properly renewed automatically and become in danger of expiring.	2023-04-12 13:55:31 -05:00
Dustin	bf4133652c	metrics: Scrape Jenkins with blackbox exporter This is mostly to monitor the HTTPS certificate expiration.	2023-04-12 13:55:31 -05:00
Dustin	dc2a05dc8f	alerts: Add alert for BURP RAID array swap This alert counts how long its been since the number of "active" disks in the RAID array on the BURP server has changed. The assumption is that the number will typically be `1`, but it will be `2` when the second disk synchronized before the swap occurs.	2023-04-11 22:25:36 -05:00
Dustin	2394bf7436	metricspi: Fix vmalert links 1. Grafana 8 changed the format of the query string parameters for the Explore page. 2. vmalert no longer needs the http.pathPrefix argument when behind a reverse proxy, rather it uses the request path like the other Victoria Metrics components.	2023-04-11 21:46:43 -05:00
Dustin	6c562c9821	alerts: Ignore missing mdraid disk for BURP The way I am handling swapping out the BURP disk now is by using the Linux MD RAID driver to manage a RAID 1 mirror array. The array normally operates with one disk missing, as it is in the fireproof safe. When it is time to swap the disks, I reattach the offline disk, let the array resync, then disconnect and store the other disk. This works considerably better than the previous method, as it does not require BURP or the NFS server to be offline during the synchronization.	2023-04-11 20:08:07 -05:00
Dustin	a59f24a8b5	metricspi: Stop scraping speedtest Running the speed test periodically was just wasting bandwidth. It failed frequently, and generally did not provide useful information.	2023-04-02 11:05:16 -05:00
Dustin	94de5d6067	samba-dc: Decrease Samba log level The default log level (3) produces too much output and quickly fills the `/var/log` volume on the domain controllers.	2023-03-08 11:26:57 -06:00
Dustin	748c432334	vaultwarden: Change Domain URL The rule is "if it is accessible on the Internet, its name ends in .net" Although Vaultwarden can be accessed by either name, the one specified in the Domain URL setting is the only one that works for WebAuthn.	2023-03-03 11:17:07 -06:00
Dustin	632e1dd906	metricspi: Update LDAP configuration All domain controllers now use the Let's Encrypt wildcard certificate for the pyrocufflink.blue domain. Further, dc2.p.b is decommissioned.	2023-01-09 12:23:54 -06:00
Dustin	90f9e5eba5	samba-dc: Manage sudoers Domain controllers only allow users in the Domain Admins AD group to use `sudo` by default. dustin and jenkins need to be able to apply configuration policy to these machines, but they are not members of said group.	2022-12-23 08:47:31 -06:00
Dustin	9408ee31c3	home-assistant: Back up Zigbee/ZWave/Mosquitto Mosquitto, Zigbee2MQTT, and ZWaveJS2MQTT all have persistent state that needs to be backed up in addition to Home Assistant's own data.	2022-12-23 06:56:52 -06:00
Dustin	77191c8b5a	Fedora37: Set collectd SELinux domain permissive collectd is broken by default on Fedora 36 and 36. Several plugins generate AVC denials.	2022-12-19 10:22:00 -06:00
Dustin	637289036a	blackbox: Update pyrocufflink DNS check I changed the naming convention for domain controller machines. They are no longer "numbered," since the plan is to rotate through them quickly. For each release of Fedora, we'll create two new domain controllers, replacing the existing ones. Their names are now randomly generated and contain letters and numbers, so the Blackbox Exporter check for DNS records needs to account for this.	2022-12-19 09:04:37 -06:00
Dustin	caef7f342b	vm-hosts: Update autostart list * Remove DC0 (decommissioned) * Remove Jenkins and its build VMs (Migrated to Kubernetes) * Add pxe0 (Required for Basement HUD)	2022-12-18 19:55:48 -06:00
Dustin	77c6408187	metricspi: Remove sensors scrape job Sensor data are retrieved via Home Assistant.	2022-12-18 19:16:10 -06:00
Dustin	244482ac52	websites: Add hatchlearningcenter.org This is the website for Tabitha's new hybrid private school! 👩‍🎓	2022-11-30 22:04:29 -06:00
Dustin	772f669ab2	r/gitea: Handle encoded / characters in HTTP paths Gitea package names (e.g. OCI images, etc.) can contain `/` charactres. These are encoded as %2F in request paths. Apache needs to forward these sequences to the Gitea server without decoding them. Unfortunately, the `AllowEncodedSlashes` setting, which controls this behavior, is a per-virtualhost setting that is not inherited from the main server configuration, and therefore must be explicitly set inside the `VirtualHost` block. This means Gitea needs its own virtual host definition, and cannot rely on the default virtual host.	2022-11-27 17:21:03 -06:00
Dustin	4511d5447e	vm-hosts: Add missing kube.network config When I added the systemd-networkd configuration for the Kubernetes network interface on the VM hosts, I only added the `.netdev` configuration and forgot the `.network` part. Without the latter, systemd-networkd creates the interface, but does not configure or activate it, so it is not able to handle traffic for the VMs attached to the bridge.	2022-08-22 20:00:47 -05:00
Dustin	b8b8ae5798	vm-hosts: Define machines to auto start	2022-08-20 21:19:01 -05:00
Dustin	bc60451949	metricspi: Update DNS server address DNS is now handled by the border firewall.	2022-08-20 18:19:13 -05:00
Dustin	4622240c6c	r/netboot/jenkins-agent: Configure NBD exports The netboot/jenkins-agent Ansible role configures three NBD exports: * A single, shared, read-only export containing the Jenkins agent root filesystem, as a SquashFS filesystem * For each defined agent host, a writable data volume for Jenkins workspaces * For each defined agent host, a writable data volume for Docker Agent hosts must have some kind of unique value to identify their persistent data volumes. Raspberry Pi devices, for example, can use the SoC serial number.	2022-08-15 17:14:06 -05:00
Dustin	dbc18022f2	metricspi: Increase scrape_timeout for speedtest Running the Internet speed test can often take longer than a minute.	2022-08-12 14:54:49 -05:00
Dustin	ce3e88932d	vmalert: Allow configuring http.pathPrefix vmalert requires explicit configuration when it is behind a reverse proxy.	2022-08-12 13:10:36 -05:00
Dustin	fe87edea21	r/vmalert: Allow configuring external source URLs The `-external.url` and `-external.alert.source` command line arguments and their corresponding environment variables can be used to configure the "Source" links associated with alerts created by `vmalert`.	2022-08-12 12:58:53 -05:00
Dustin	c57500a9f4	metricspi: Update speedtest scrape target The firewall hardware is too slow to run the prometheus_speedtest program. It always showed way lower speeds than were actually available. I've moved the service to the Kubernetes cluster and it works a lot better there.	2022-08-12 12:55:52 -05:00
Dustin	4ddbc9f256	hosts: Add mtrcs0.p.r mtrcs0.pyrocufflink.red is a Raspberry Pi CM4 on a Waveshare CM4-IO-BASE-B carrier board with a NVMe SSD. It runs a custom OS built using Buildroot, and is not a member of the pyrocufflink.blue AD domain. mtrcs0.p.r hosts Victoria Metrics/`vmagent`, `vmalert`, AlertManager, and Grafana. I've created a unique group and playbook for it, metricspi, to manage all these applications together.	2022-08-11 21:40:19 -05:00

1 2 3 4

197 Commits (d7f778b01c76f962a282ce033d201ee482b33edb)