configpolicy

dustin

Author	SHA1	Message	Date
Dustin	a399591f16	hosts: Decommission node-refrain.k.p.b I did something stupid to this machine trying to clear up its `/var/lib/containers/storage` volume and now it won't start any new pods. Killing it and replacing.	2025-06-21 17:51:06 -05:00
Dustin	025f2ddd8c	hosts: Remove VM hosts from AD domain Having the VM hosts as members of the domain has been troublesome since the very beginning. In full shutdown events, it's often difficult or impossible to log in to the VM hosts while the domain controller VMs are down or still coming up, even with winbind caching. Now that we have the `users.yml` playbook, the SSH certificate authority, and `doas`+pam_ssh_agent_auth, we really don't need the AD domain for centralized authentication.	2025-06-08 09:04:27 -05:00
Dustin	d4d3f0ef81	r/victoria-logs: Deploy VictoriaLogs I've become rather frusted witih Grafana Loki lately. It has several bugs that affect my usage, including issues with counting and aggregation, completely broken retention and cleanup, spamming itself with bogus error log messages, and more. Now that VitoriaLogs has first-class support in Grafana and support for alerts, it seems like a good time to try it out. It's under very active development, with bugs getting fixed extremely quickly, and new features added constantly. Indeed, as I was experimenting with it, I thought, "it would be nice if the web UI could decode ANSI escapes for terminal colors," and just a few days later, that feature was added! Native support for syslog is also a huge benefit, as it will allow me to collect logs directly from network devices, without first collecting them into a file on the Unifi controller. This new role deploys VictoriaLogs in a manner very similar to how I have Loki set up, as a systemd-managed Podman container. As it has no built-in authentication or authorization, we rely on Caddy to handle that. As with Loki, mTLS is used to prevent anonymous access to querying the logs, however, authentication via Authelia is also an option for human+browser usage. I'm re-using the same certificate authority as with Loki to simplify Grafana configuration. Eventually, I would like to have a more robust PKI, probably using OpenBao, at which point I will (hopefully) have decided which log database I will be using, and can use a proper CA for it.	2025-05-30 21:19:05 -05:00
Dustin	6df0cc39da	unifi: Back up with Restic The Unifi Network data will now be backed up by Restic.	2025-03-29 09:36:37 -05:00
Dustin	78d70af574	hosts: Add Unifi controllers to needproxy group Since the network device management network does not have access to the Internet, the Unifi controller machines must access it via the proxy.	2025-03-19 07:50:52 -05:00
Dustin	db54b03aa8	r/unifi: Switching to custom container image The _linuxserver.io_ image for UniFi Network is deprecated. It sucked anyway. I've created a simple image based on Debian that installs the _unifi_ package from the upstream apt repository. This image doesn't require running anything as _root_, so it doesn't need a user namespace.	2025-03-16 16:40:57 -05:00
Dustin	c300dc1b6c	chrony: Add role/PB for chrony I continually struggle with machines' (physical and virtual, even the Roku devices!) clocks getting out of sync. I have been putting off fixing this because I wanted to set up a Windows-compatible NTP server (i.e. on the domain controllers, with Kerberos signing), but there's really no reason to wait for that to fix the clocks on all the non-Windows machines, especially since there are exactly 0 Windows machines on the network right now. The chrony role and corresponding `chrony.yml` playbook are generic, configured via the `chrony_pools`, `chrony_servers`, and `chrony_allow` variables. The values for these variables will configure the firewall to act as an NTP server, synchronizing with the NTP pool on the Internet, while all other machines will synchronize with it. This allows machines on networks without Internet access to keep their clocks in sync.	2025-03-16 16:37:19 -05:00
Dustin	5f4b1627db	hosts: Add nut1.p.b to pyrocufflink group nut1.pyrocufflink.blue is a member of the pyrocufflink.blue AD domain. I'm not sure how it got to be so without belonging to the _pyrocufflink_ Ansible group...	2025-02-25 21:03:14 -06:00
Dustin	f705e98fab	hosts: Add k8s-iot-net-ctrl group The k8s-iot-net-ctrl group is for the Raspberry Pi that has the Zigbee and Z-Wave controllers connected to it. This node runs the Zigbee2MQTT and ZWaveJS2MQTT servers as Kubernetes pods.	2025-01-31 19:49:51 -06:00
Dustin	b1c29fc12a	hosts: Remove hostvds group Since the _hostvds_ group is not defined in the static inventory but by the OpenStack inventory plugin via `hostvds.openstack.yml`, when the static inventory is used by itself, Ansible fails to load it with an error: > Section [vps:children] includes undefined group: hostvds To fix this, we could explicitly define an empty _hostvds_ group in the static inventory, but since we aren't currently running any HostVDS instances, we might as well just get rid of it.	2025-01-31 19:45:58 -06:00
Dustin	ec4fa25bd8	Merge remote-tracking branch 'refs/remotes/origin/master'	2025-01-30 21:15:40 -06:00
Dustin	c00d6f49de	hosts: Add OVH VPS It turns out, $0.99/mo might be _too_ cheap for a cloud server. Running the Blackbox Exporter+vmagent on the HostVDS instance worked for a few days, but then it started having frequent timeouts when probing the websites. I tried redeploying the instance, switching to a larger instance, and moving it to different networks. Unfortunately, none of this seemed to help. Switching over to a VPS running in OVH cloud. OVH VPS servers are managed statically, as opposed to via API, so we can't use Pulumi to create them. This one was created for me when I signed up for an OVH acount.	2025-01-26 13:08:59 -06:00
Dustin	33f315334e	users: Configure sudo on some machines `doas` is not available on Alma Linux, so we still have to use `sudo` on the VPS.	2025-01-26 13:08:59 -06:00
Dustin	ad0bd7d4a5	remote-blackbox: Add group The _remote-blackbox_ group defines a system that runs _blackbox-exporter_ and _vmagent_ in a remote (cloud) location. This system will monitor our public web sites. This will give a better idea of their availability from the perspective of a user on the Internet, which can be by factors that are necessarily visible from within the network.	2025-01-26 13:08:59 -06:00
Dustin	f5bee79bac	hosts: Decommission bw0.p.b Vaultwarden is now hosted in Kubernetes.	2025-01-10 20:09:53 -06:00
Dustin	d993d59bee	Deploy new Kubernetes nodes The stor- nodes are dedicated to Longhorn replicas. The other nodes handle general workloads.	2024-11-24 10:33:21 -06:00
Dustin	0f600b9e6e	kubernetes: Manage worker nodes So far, I have been managing Kubernetes worker nodes with Fedora CoreOS Ignition, but I have decided to move everything back to Fedora and Ansible. I like the idea of an immutable operating system, but the FCOS implementation is not really what I want. I like the automated updates, but that can be accomplished with _dnf-automatic_. I do _not_ like giving up control of when to upgrade to the next Fedora release. Mostly, I never did come up with a good way to manage application-level configuration on FCOS machines. None of my experiments (Cue+tmpl, KCL+etcd+Luci) were successful, which mostly resulted in my manually managing configuration on nodes individually. Managing OS-level configuration is also rather cumbersome, since it requires redeploying the machine entirely. Altogether, I just don't think FCOS fits with my model of managing systems. This commit introduces a new playbook, `kubernetes.yml`, and a handful of new roles to manage Kubernetes worker nodes running Fedora Linux. It also adds two new deploy scripts, `k8s-worker.sh` and `k8s-longhorn.sh`, which fully automate the process of bringing up worker nodes.	2024-11-24 10:33:21 -06:00
Dustin	a82700a257	chromie: Configure serial terminal server	2024-11-10 13:15:08 -06:00
Dustin	010f652060	hosts: Add loki1.p.b _loki1.pyrocufflink.blue_ replaces _loki0.pyrocufflink.blue_. The former runs Fedora Linux and is managed by Ansible, while the latter ran Fedora CoreOS and was managed by Ignition and _cfg_.	2024-11-05 06:54:27 -06:00
Dustin	4cd983d5f4	loki: Add role+playbook for Grafana Loki The current Grafana Loki server, loki0.pyrocufflink.blue, runs Fedora CoreOS and is managed by Ignition and cfg. Since I have declared cfg a failed experiment, I'm going to re-deploy Loki on a new VM running Fedora Linux and managed by Ansible. The loki role installs Podman and defines a systemd-managed container to run Grafana Loki.	2024-10-20 12:10:55 -05:00
Dustin	ceaef3f816	hosts: Decommission burp1.p.b Everything has finally been moved to Chromie.	2024-10-13 17:52:48 -05:00
Dustin	5ced24f2be	hosts: Decommission matrix0.p.b The Synapse server hasn't been working for a while, but we don't use it for anything any more anyway.	2024-10-13 12:53:49 -05:00
Dustin	621f82c88d	hosts: Migrate remaining hosts to Restic Gitea and Vaultwarden both have SQLite databases. We'll need to add some logic to ensure these are in a consistent state before beginning the backup. Fortunately, neither of them are very busy databases, so the likelihood of an issue is pretty low. It's definitely more important to get backups going again sooner, and we can deal with that later.	2024-09-07 20:45:24 -05:00
Dustin	c2c283c431	nextcloud: Back up Nextcloud with Restic Now that the database is hosted externally, we don't have to worry about backing it up specifically. Restic only backs up the data on the filesystem.	2024-09-04 17:41:42 -05:00
Dustin	0f4dea9007	restic: Add role+playbook for Restic backups The `restic.yml` playbook applies the _restic_ role to hosts in the _restic_ group. The _restic_ role installs `restic` and creates a systemd timer and service unit to run `restic backup` every day. Restic doesn't really have a configuration file; all its settings are controlled either by environment variables or command-line options. Some options, such as the list of files to include in or exclude from backups, take paths to files containing the values. We can make use of these to provide some configurability via Ansible variables. The `restic_env` variable is a map of environment variables and values to set for `restic`. The `restic_include` and `restic_exclude` variables are lists of paths/patterns to include and exclude, respectively. Finally, the `restic_password` variable contains the password to decrypt the repository contents. The password is written to a file and exposed to the _restic-backup.service_ unit using [systemd credentials][0]. When using S3 or a compatible service for respository storage, Restic of course needs authentication credentials. These can be set using the `restic_aws_credentials` variable. If this variable is defined, it should be a map containing the`aws_access_key_id` and `aws_secret_access_key` keys, which will be written to an AWS shared credentials file. This file is then exposed to the _restic-backup.service_ unit using [systemd credentials][0]. [0]: https://systemd.io/CREDENTIALS/	2024-09-04 09:40:29 -05:00
Dustin	708bcbc87e	Merge remote-tracking branch 'refs/remotes/origin/master'	2024-09-03 17:18:18 -05:00
Dustin	a0378feda8	nextcloud: Move database to db0 Moving the Nextcloud database to the central PostgreSQL server will allow it to take advantage of the monitoring and backups in place there. For backups specifically, this will make it easier to switch from BURP to Restic, since now only the contents of the filesystem need backed up. The PostgreSQL server on _db0_ requires certificate authentication for all clients. The certificate for Nextcloud is stored in a Secret in Kubernetes, so we need to use the _nextcloud-db-cert_ role to install the script to fetch it. Nextcloud configuration doesn't expose the parameters for selecting the certificate and private key files, but fortunately, they can be encoded in the value provided to the `host` parameter, though it makes for a rather cumbersome value.	2024-09-02 21:03:33 -05:00
Dustin	d3a09a2e88	hosts: Add chromie, nvr2 to nut-monitor group Deploy `nut-monitor` on these physical machines so they will shut down safely in the event of a power outage.	2024-09-01 18:52:33 -05:00
Dustin	db74e9ac3f	btop: Install btop and run it on the console `btop` is so much better than `top`. It makes a really nice status indicator for machine health, so I like running it on tty1.	2024-09-01 09:24:53 -05:00
Dustin	fbf587414a	hosts: Add chromie.p.b chromie.pyrocufflink.blue will replace burp1.pyrocufflink.blue as the backup server. It is running on the hardware that was originally nvr1.pyrocufflink.blue: a 1U Jetway server with an Intel Celeron N3160 CPU and 4 GB of RAM.	2024-09-01 09:01:04 -05:00
Dustin	9d60ae1a61	minio-backups: Deploy MinIO for backups This playbook uses the minio-nginx and minio-backups-cert role to deploy MinIO with nginx. The S3 API server is s3.backups.pyrocufflink.blue, and buckets can be accessed as subdomains of this name. The Admin Console is minio.backups.pyrocufflink.blue. Certificates are issued by DCH CA via ACME using `certbot`.	2024-09-01 08:59:28 -05:00
Dustin	2a110d7aba	hosts: Deploy haproxy0 _haproxy0.pyrocufflink.blue_ is a Fedora Linux VM that runs HAProxy to provide reverse proxy, exposing web sites and applications to the Internet. It has a static MAC address because it will need a static IP address, at least initially, in order for DNAT to work.	2024-08-24 11:46:40 -05:00
Dustin	aab581e859	hosts: Move VM hosts from hosts.offline Originally, the VM hosts were in a separate inventory so they would not be managed with the rest of the servers. It used to be that one server was running all the VMs, while the other was asleep. That's no longer the case; both alre always running and each has about half of the VMs. Since they're both always online, they can be managed normally now.	2024-08-23 09:33:29 -05:00
Dustin	6e5e12f8b6	hosts: Add nvr2.p.b to collectd-sensors group To enable collecting temperature et al. sensor data.	2024-08-14 20:26:11 -05:00
Dustin	d2b3b1f7b3	hosts: Deploy production Frigate on nvr2.p.b nvr2.pyrocufflink.blue originally ran Fedora CoreOS. Since I'm tired of the tedium and difficulty involved in making configuration changes to FCOS machines, I am migrating it to Fedora Linux, managed by Ansible.	2024-08-12 22:22:50 -05:00
Dustin	7b61a7da7e	r/useproxy: Configure system-wide proxy The useproxy role configures the `http_proxy` et al. environmet variables for systemd services and interactive shells. Additionally, it configures Yum repositories to use a single mirror via the `baseurl` setting, rather than a list of mirrors via `metalink`, since the proxy a) the proxy only allows access to _dl.fedoraproject.org_ and b) the proxy caches RPM files, but this is only effective if all clients use the same mirror all the time. The `useproxy.yml` playbook applies this role to servers in the needproxy group.	2024-08-12 18:47:04 -05:00
Dustin	2ce211b5ea	hosts: Add db0.p.b db0.pyrocufflink.blue will be the primary server in the new PostgreSQL database cluster. We're starting with Fedora 39 so we can have PostgreSQL 15, to match the version managed by the Postgres Operator in the Kubernetes cluster right now.	2024-07-02 20:44:29 -05:00
Dustin	208fadd2ba	postgresql: Configure for dedicated DB servers I am going to use the postgresql group for the dedicated database servers. The configuration for those machines will be quite a bit different than for the one existing machine that is a member of that group already: the Nextcloud server. Rather than undefine/override all the group-level settings at the host level, I have removed the Nextcloud server from the postgresql group, and updated the `nextcloud.yml` playbook to apply the postgresql-server role itself. Eventually, I want to move the Nextcloud database to the central database servers. At that point, I will remove the postgresql-server role from the `nextcloud.yml` playbook.	2024-07-02 20:44:29 -05:00
Dustin	332ef18600	hosts: Decommission old Kubernetes workers k8s-amd64-n0.pyrocufflink.blue, k8s-amd64-n1.pyrocufflink.blue, and k8s-amd64-n2.pyrocufflink.blue, which ran Fedora Linux, have been replaced by k8s-amd64-n4.pyrocufflink.blue, k8s-amd64-n5.pyrocufflink.blue, and k8s-amd64-n6.pyrocufflink.blue, respectively. The new machines run Fedora CoreOS, and are thus not managed by the Ansible configuration policy.	2024-06-23 10:43:15 -05:00
Dustin	afcd2f2f05	hosts: Replace domain controllers New AD DC servers run Fedora 40. Their LDAP server certificates are issued by step-ca via ACME, signed by dch-ca r2. I've changed the naming convention for domain controllers again. I found the random sequenc of characters to be too difficult to remember and identify. Using a short random word (chosen from the EFF word list used by Diceware) should be a lot nicer. These names are chosen by the `create-dc.sh` script.	2024-06-12 19:01:37 -05:00
Dustin	5a9b8b178a	hosts: Decommission unifi1 unifi1.pyrocufflink.blue is being replaced with unifi2.pyrocufflink.blue. The new server runs Fedora CoreOS.	2024-05-26 10:50:32 -05:00
Dustin	226a9e05fa	nut: Drop group NUT is managed by _cfg.git_ now.	2024-02-22 10:24:16 -06:00
Dustin	493663e77f	frigate: Drop group Frigate is no longer managed by Ansible. Dropping the group so the file encrypted with Ansible Vault can go away.	2024-02-22 10:23:19 -06:00
Dustin	fdc59fe73b	pyrocufflink-dns: Drop group The internal DNS server for the pyrocufflink.blue et al. domains runs on the firewall now, and is thus no longer managed by Ansible. Dropping the group variables so the file encrypted with Ansible Vault can go away.	2024-02-22 10:23:19 -06:00
Dustin	f9f8d5aa29	Remove grafana, metricspi groups With the Metrics Pi decommissioned and Victoria Metrics and Grafana running in Kubernetes now, these groups are no longer needed.	2024-02-22 10:23:19 -06:00
Dustin	13e6433fff	hosts: Remove logs0.p.b Decommissioning Graylog	2024-02-13 16:12:20 -06:00
Dustin	2e77502a2f	hosts: Decommission serial0.p.b serial0.pyrocufflink.blue has been replaced by serial1.pyrocufflink.blue. The latter runs Fedora CoreOS and is managed by the CUE-based configuration policy in cfg.git.	2024-01-25 20:22:00 -06:00
Dustin	423951bac1	{burp1, gw1}: Configure upsmon	2024-01-19 21:55:36 -06:00
Dustin	f31018f514	hosts: Remove serial0 from nut group nut0.pyrocufflink.blue is the new NUT server. It's not managed by this configuration policy.	2024-01-16 17:41:50 -06:00
Dustin	1226f1f005	hosts: Decommission mtrcs0.p.b The Metrics Pi has bit the dust. The NVMe disk has never been particularly reliable, but now it's gotten to the point where it's a real issue. The Pi needs rebooted at least once a day. I've moved the Victoria Metrics/Grafana ecosystem to Kubernetes.	2023-12-31 19:15:55 -06:00

1 2 3 4

200 Commits (6d1442faf0b4968845bd279f9a1632ff2a71ad0f)