configpolicy

Author	SHA1	Message	Date
Dustin C. Hatch	7b61a7da7e	r/useproxy: Configure system-wide proxy The useproxy role configures the `http_proxy` et al. environmet variables for systemd services and interactive shells. Additionally, it configures Yum repositories to use a single mirror via the `baseurl` setting, rather than a list of mirrors via `metalink`, since the proxy a) the proxy only allows access to _dl.fedoraproject.org_ and b) the proxy caches RPM files, but this is only effective if all clients use the same mirror all the time. The `useproxy.yml` playbook applies this role to servers in the needproxy group.	2024-08-12 18:47:04 -05:00
Dustin C. Hatch	96bc8c2c09	vm-hosts: Update autostart list k8s-amd64-n0, k8s-amd64-n1, and k8s-amd64-n2 have been replaced by k8s-amd64-n4, k8s-amd64-n5, k8s-amd64-n6, respectively. db0 is the new database server, which needs to be up before anything in Kubernetes starts, since a lot of applications running there depend on it.	2024-07-03 08:52:15 -05:00
Dustin C. Hatch	4f202c55e4	r/postgres-exporter: Deploy postgres-exporter The [postgres-exporter][0] exposes PostgreSQL server statistics to Prometheus. It connects to a specified PostgreSQL server (in this case, a server on the local machine via UNIX socket) and collects data from the `pg_stat_activity`, et al. views. It needs the `pg_monitor` role in order to be allowed to read the relevant metrics. Since we're setting up the exporter to connect via UNIX socket, it needs a dedicated OS user to match the PostgreSQL user in order to authenticate via the _peer_ method. [0]: https://github.com/prometheus-community/postgres_exporter/	2024-07-02 20:44:29 -05:00
Dustin C. Hatch	3f5550ee6c	postgresql: wal-g: Set PGHOST By default, WAL-G tries to connect to the PostgreSQL server via TCP socket on the loopback interface. Our HBA configuration requires certificate authentication for TCP sockets, so we need to configure WAL-G to use the UNIX socket.	2024-07-02 20:44:29 -05:00
Dustin C. Hatch	6caf28259e	hosts: db0: Promote to primary All data have been migrated from the PostgreSQL server in Kubernetes and the three applications that used it (Firefly-III, Authelia, and Home Assistant) have been updated to point to the new server. To avoid comingling the backups from the old server with those from the new server, we're reconfiguring WAL-G to push and pull from a new S3 prefix.	2024-07-02 20:44:29 -05:00
Dustin C. Hatch	208fadd2ba	postgresql: Configure for dedicated DB servers I am going to use the postgresql group for the dedicated database servers. The configuration for those machines will be quite a bit different than for the one existing machine that is a member of that group already: the Nextcloud server. Rather than undefine/override all the group-level settings at the host level, I have removed the Nextcloud server from the postgresql group, and updated the `nextcloud.yml` playbook to apply the postgresql-server role itself. Eventually, I want to move the Nextcloud database to the central database servers. At that point, I will remove the postgresql-server role from the `nextcloud.yml` playbook.	2024-07-02 20:44:29 -05:00
Dustin C. Hatch	7201f7ed5c	vm-hosts: Expose storage VLAN to VMs To improve the performance of persistent volumes accessed directly from the Synology by Kubernetes pods, I've decided to expose the storage network to the Kubernetes worker node VMs. This way, iSCSI traffic does not have to go through the firewall. I chose not to use the physical interfaces that are already directly connected to the storage network for this for two reasons: 1) I like the physical separation of concerns and 2) it would add complexity to the setup by introducing a bridge on top of the existing bond.	2024-06-23 10:43:15 -05:00
Dustin C. Hatch	6520b86958	k8s-controller: Do not reboot after auto-updates I don't want the Kubernetes control plane servers rebooting themselves randomly; I need to coordinate that with other goings-on on the network.	2024-06-23 10:43:15 -05:00
Dustin C. Hatch	f0445ebe53	nextcloud: Do not auto-update Nextcloud Nextcloud usually (always?) wants the `occ upgrade` command to be run after an update. If the nextcloud package gets updated along with the rest of the OS, Nextcloud will be down until I manually run that command hours/days later.	2024-06-23 10:43:15 -05:00
Dustin C. Hatch	24bf145a34	all: Do not auto-update on weekends I don't want machines updating themselves, rebooting, and potentially breaking stuff over the weekend.	2024-06-21 22:08:03 -05:00
Dustin C. Hatch	88c45e22b6	vm-hosts: Update VM autostart for new DCs	2024-06-20 18:49:04 -05:00
Dustin C. Hatch	292ab4585c	all: promtail: Update trusted CA certificate Loki uses a certificate signed by dch-ca r2 now (actually has for quite some time...)	2024-06-12 18:57:01 -05:00
Dustin C. Hatch	ffe972d79b	r/samba-cert: Obtain LDAP/TLS cert via ACME The samba-cert role configures `lego` and HAProxy to obtain an X.509 certificate via the ACME HTTP-01 challenge. HAProxy is necessary because LDAP server certificates need to have the apex domain in their SAN field, and the ACME server may contact any domain controller server with an A record for that name. HAProxy will forward the challenge request on to the first available host on port 5000, where `lego` is listening to provide validation. Issuing certificates this way has a couple of advantages: 1. No need for the wildcard certificate for the pyrocufflink.blue domain any more 2. Renewals are automatic and handled by the server itself rather than Ansible via scheduled Jenkins job Item (2) is particularly interesting because it avoids the bi-monthly issue where replacing the LDAP server certificate and restarting Samba causes the Jenkins job to fail. Naturally, for this to work correctly, all LDAP client applications need to trust the certificates issued by the ACME server, in this case DCH Root CA R2.	2024-06-12 18:33:24 -05:00
Dustin C. Hatch	58972cf188	auto-updates: Install and configure dnf-automatic dnf-automatic is an add-on for `dnf` that performs scheduled, automatic updates. It works pretty much how I would want it to: triggered by a systemd timer, sends email reports upon completion, and only reboots for kernel et al. updates. In its default configuration, `dnf-automatic.timer` fires every day. I want machines to update weekly, but I want them to update on different days (so as to avoid issues if all the machines reboot at once). Thus, the _dnf-automatic_ role uses a systemd unit extension to change the schedule. The day-of-the-week is chosen pseudo-randomly based on the host name of the managed system.	2024-06-12 06:25:17 -05:00
Dustin C. Hatch	1f86fa27b6	vm-hosts: Auto-start unifi2	2024-05-26 10:51:16 -05:00
Dustin C. Hatch	5a9b8b178a	hosts: Decommission unifi1 unifi1.pyrocufflink.blue is being replaced with unifi2.pyrocufflink.blue. The new server runs Fedora CoreOS.	2024-05-26 10:50:32 -05:00
Dustin C. Hatch	06b399994e	public-web: Add Tabitha's new SSH key We got Nicepage to work on Tabitha's Fedora Thinkpad, so now she'll do most of her website work on that machine.	2024-03-15 10:29:03 -05:00
Dustin C. Hatch	0578736596	unifi: Scrape logs from UniFi and device syslog The UniFi controller can act as a syslog server, receiving log messages from managed devices and writing them to files in the `logs/remote` directory under the application data directory. We can scrape these logs, in addition to the logs created by the UniFi server itself, with Promtail to get more information about what's happening on the network.	2024-02-28 19:04:30 -06:00
Dustin C. Hatch	19009bde1a	promtail: Role/Playbook to deploy Promtail Promtail is the log sending client for Grafana Loki. For traditional Linux systems, an RPM package is available from upstream, making installation fairly simple. Configuration is stored in a YAML file, so again, it's straightforward to configure via Ansible variables. Really, the only interesting step is adding the _promtail_ user, which is created by the RPM package, to the _systemd-journal_ group, so that Promtail can read the systemd journal files.	2024-02-22 19:23:31 -06:00
Dustin C. Hatch	226a9e05fa	nut: Drop group NUT is managed by _cfg.git_ now.	2024-02-22 10:24:16 -06:00
Dustin C. Hatch	493663e77f	frigate: Drop group Frigate is no longer managed by Ansible. Dropping the group so the file encrypted with Ansible Vault can go away.	2024-02-22 10:23:19 -06:00
Dustin C. Hatch	fdc59fe73b	pyrocufflink-dns: Drop group The internal DNS server for the pyrocufflink.blue et al. domains runs on the firewall now, and is thus no longer managed by Ansible. Dropping the group variables so the file encrypted with Ansible Vault can go away.	2024-02-22 10:23:19 -06:00
Dustin C. Hatch	19d833cc76	websites/d&t.com: drop obsolete formsubmit config The dustinandtabitha.com website no longer uses formsubmit (the time for RSVP has long passed). Removing the configuration so the file encrypted with Ansible Vault can go away.	2024-02-22 10:23:19 -06:00
Dustin C. Hatch	f9f8d5aa29	Remove grafana, metricspi groups With the Metrics Pi decommissioned and Victoria Metrics and Grafana running in Kubernetes now, these groups are no longer needed.	2024-02-22 10:23:19 -06:00
Dustin C. Hatch	f83cea50e9	r/ssu-user-ca: Configure sshd TrustedUserCAKeys The `TrustedUserCAKeys` setting for sshd(8) tells the server to accept any certificates signed by keys listed in the specified file. The authenticating username has to match one of the principals listed in the certificate, of course. This role is applied to all machines, via the `base.yml` playbook. Certificates issued by the user CA managed by SSHCA will therefore be trusted everywhere. This brings us one step closer to eliminating the dependency on Active Directory/Samba.	2024-02-01 18:46:40 -06:00
Dustin C. Hatch	0d30e54fd5	r/fileserver: Restrict non-administrators to SFTP Normal users do not need shell access to the file server, and certainly should not be allowed to e.g. forward ports through it. Using a `Match` block, we can apply restrictions to users who do not need administrative functionality. In this case, we restrict everyone who is not a member of the Server Admins group in the PYROCUFFLINK AD domain.	2024-02-01 10:29:32 -06:00
Dustin C. Hatch	4b8b5fa90b	pyrocufflink: Enable pam_ssh_agent_auth for sudo By default, `sudo` requires users to authenticate with their passwords before granting them elevated privileges. It can be configured to allow (some) users access to (some) privileged commands without prompting for a password (i.e. `NOPASSWD`), however this has a real security implication. Disabling the password requirement would effectively grant any program root privileges. Prompting for a password prevents malicious software from running privileged commands without the user knowing. Unfortunately, handling `sudo` authentication for Ansible is quite cumbersome. For interactive use, the `--ask-become-pass`/`-K` argument is useful, though entering the password for each invocation of `ansible-playbook` while iterating on configuration policy development is a bit tedious. For non-interactive use, though, the password of course needs to be stored somewhere. Encrypting it with Ansible Vault is one way to protect it, but it still ends up stored on disk somewhere and needs to be handled carefully. pam_ssh_agent_auth provides an acceptable solution to both issues. It is better than disabling `sudo` authentication entirely, but a lot more convenient than dealing with passwords. It uses the calling user's SSH agent to assert that the user has access to a private key corresponding to one of the authorized public keys. Using SSH agent forwarding, that private key can even exist on a remote machine. If the user does not have a corresponding private key, `sudo` will fall back to normal password-based authentication. The security of this solution is highly dependent on the client to store keys appropriately. FIDO2 keys are supported, though when used with Ansible, it is quite annoying to have to touch the token for _every task_ on _every machine_. Thus, I have created new FIDO2 keys for both my laptop and my desktop that have the `no-touch-required` option enabled. This means that in order to use `sudo` remotely, I still need to have my token plugged in to my computer, but I do not have to tap it every time it's used. For Jenkins, a hardware token is obviously impossible, but using a dedicated key stored as a Jenkins credential is probably sufficient.	2024-01-28 12:16:35 -06:00
Dustin C. Hatch	7b54bc4400	nut-monitor: Require both UPS to be online Unfortunately, the automatic transfer switch does not seem to work correctly. When the standby source is a UPS running on battery, it does not switch sources if the primary fails. In other words, when the power is out and both UPS are running on battery, when the first one dies, it will NOT switch to the second one. It has no trouble switching when the second source is mains power, though, which is very strange. I have tried messing with all the settings including nominal input voltage, sensitivity, and frequency tolerence, but none seem to have any effect. Since it is more important for the machines to shut down safely than it is to have an extra 10-15 minutes of runtime during an outage, the best solution for now is to configure the hosts to shut down as soon as the first UPS battery gets low. This is largely a waste of the second UPS, but at least it will help prevent data loss.	2024-01-25 21:22:04 -06:00
Dustin C. Hatch	236e6dced6	r/web/hlc: Add formsubmit config for summer signup And of course, Tabitha lost her SSH key so she had to get another one.	2024-01-23 22:04:29 -06:00
Dustin C. Hatch	07f84e7fdc	vm-hosts: Increase VM start delay after K8s Increasing the delay after starting the Kubernetes cluster to hopefully allow things to "settle down" enough that starting services on follow up VMs doesn't time out.	2024-01-22 08:35:40 -06:00
Dustin C. Hatch	6f4fb70baa	vm-hosts: Clean up vm-autostart list Start Kubernetes earlier. Start Synapse later (it takes a long time to start up and often times out when the VM hosts are under heavy load). Start SMTP relay later as it's not really needed.	2024-01-21 18:42:28 -06:00
Dustin C. Hatch	b4fcbb8095	unifi: Deploy unifi_exporter `unifi_exporter` provides Prometheus metrics for UniFi controller.	2024-01-21 16:12:29 -06:00
Dustin C. Hatch	6f5b400f4a	vm-hosts: Fix test network device name The network device for the test/pyrocufflink.red network is named `br1`. This needs to match in the systemd-networkd configuration or libvirt will not be able to attach virtual machines to the bridge.	2024-01-21 15:55:37 -06:00
Dustin C. Hatch	fb445224a0	vm-hosts: Add k8s-amd64-n3 to autostart list	2024-01-21 15:55:23 -06:00
Dustin C. Hatch	525f2b2a04	nut-monitor: Configure upsmon `upsmon` is the component of [NUT] that monitors (local or remote) UPS devices and reacts to changes in their state. Notably, it is responsible for powering down the system when there is insufficient power to the system.	2024-01-19 20:50:03 -06:00
Dustin C. Hatch	ab30fa13ca	file-servers: Set Apache ServerName Since file0.pyrocufflink.blue now hosts a couple of VirtualHosts, accessing its HTTP server by the files.pyrocufflink.blue alias no longer works, as Apache routes unknown hostnames to the first VirtualHost, rather than the global configuration. To resolve this, we must set `ServerName` to the alias.	2023-12-29 10:46:13 -06:00
Dustin C. Hatch	dfd828af08	r/ssh-host-certs: Manage SSH host certificates The ssh-host-certs role, which is now applied as part of the `base.yml` playbook and therefore applies to all managed nodes, is responsible for installing the sshca-cli package and using it to request signed SSH host certificates. The sshca-cli-systemd sub-package includes systemd units that automate the process of requesting and renewing host certificates. These units need to be enabled and provided the URL of the SSHCA service. Additionally, the SSH daemon needs to be configured to load the host certificates.	2023-11-07 21:27:02 -06:00
Dustin C. Hatch	c6f0ea9720	r/repohost: Configure Yum package repo host So it turns out Gitea's RPM package repository feature is less than stellar. Since each organization/user can only have a single repository, separating packages by OS would be extremely cumbersome. Presumably, the feature was designed for projects that only build a single PRM for each version, but most of my packages need multiple builds, as they tend to link to system libraries. Further, only the repository owner can publish to user-scoped repositories, so e.g. Jenkins cannot publish anything to a repository under my dustin account. This means I would ultimately have to create an Organization for every OS/version I need to support, and make Jenkins a member of it. That sounds tedious and annoying, so I decided against using that feature for internal packages. Instead, I decided to return to the old ways, publishing packages with `rsync` and serving them with Apache. It's fairly straightforward to set this up: just need a directory with the appropriate permissions for users to upload packages, and configure Apache to serve from it. One advantage Gitea's feature had over a plain directory is its automatic management of repository metadata. Publishers only have to upload the RPMs they want to serve, and Gitea handles generating the index, database, etc. files necessary to make the packages available to Yum/dnf. With a plain file host, the publisher would need to use `createrepo` to generate the repository metadata and upload that as well. For repositories with multiple packages, the publisher would need a copy of every RPM file locally in order for them to be included in the repository metadata. This, too, seems like it would be too much trouble to be tenable, so I created a simple automatic metadata manager for the file-based repo host. Using `inotifywatch`, the `repohost-createrepo` script watches for file modifications in the repository base directory. Whenever a file is added or changed, the directory containing it is added to a queue. Every thirty seconds, the queue is processed; for each unique directory in the queue, repository metadata are generated. This implementation combines the flexibility of a plain file host, supporting an effectively unlimited number of repositories with fully-configurable permissions, and the ease of publishing of a simple file upload.	2023-11-07 20:51:10 -06:00
Dustin C. Hatch	6955c4e7ad	hosts: Decommission dc-4k6s8e.p.b Replaced by dc-nrtxms.pyrocufflink.blue	2023-10-28 16:07:56 -05:00
Dustin C. Hatch	420764d795	hosts: Add dc-nrtxms.p.b New Fedora 38 Active Directory Domain Controller	2023-10-28 16:07:39 -05:00
Dustin C. Hatch	a8c184d68c	hosts: Decommission dc-ag62kz.p.b Replaced by dc-qi85ia.pyrocufflink.blue	2023-10-28 16:07:08 -05:00
Dustin C. Hatch	686817571e	smtp-relay: Switch to Fastmail AWS is going to begin charging extra for routable IPv4 addresses soon. There's really no point in having a relay in the cloud anymore anyway, since a) all outbound messages are sent via the local relay and b) no messages are sent to anyone except me.	2023-10-24 17:27:21 -05:00
Dustin C. Hatch	1b9543b88f	metricspi: alerts: Increase Frigate disk threshold We want the Frigate recording volume to be basically full at all times, to ensure we are keeping as much recording as possible.	2023-10-15 09:52:12 -05:00
Dustin C. Hatch	2f554dda72	metricspi: Scrape k8s-aarch64-n1 I've added a new Kubernetes worker node, k8s-aarch64-n1.pyrocufflink.blue. This machine is a Raspberry Pi CM4 mounted on a Waveshare CM4-IO-Base A and clipped onto the DIN rail. It's got 8 GB of RAM and 32 GB of eMMC storage. I intend to use it to build container images locally, instead of bringing up cloud instances.	2023-10-05 14:32:19 -05:00
Dustin C. Hatch	a74113d95f	metricspi: Scrape Zincati metrics from CoreOS hosts Zincati is the automatic update manager on Fedora CoreOS. It exposes Prometheus metrics for host/update statistics, which are useful to track the progress of automatic updates and identify update issues. Zinciti actually exposes its metrics via a Unix socket on the filesystem. Another process, [local_exporter], is required to expose the metrics from this socket via HTTP so Prometheus can scrape them. [local_exporter]: https://github.com/lucab/local_exporter	2023-10-03 10:29:12 -05:00
Dustin C. Hatch	d7f778b01c	metricspi: Scrape metrics from k8s-aarch64-n0 collectd is now running on k8s-aarch64-n0.pyrocufflink.blue, exposing system metrics. As it is not a member of the AD domain, it has to be explicitly listed in the `scrape_collectd_extra_targets` variable.	2023-10-03 10:29:11 -05:00
Dustin C. Hatch	50f4b565f8	hosts: Remove nvr1.p.b as managed system nvr1.pyrocufflink.blue has been migrated to Fedora CoreOS. As such, it is no longer managed by Ansible; its configuration is done via Butane/Ignition. It is no longer a member of the Active Directory domain, but it does still run collectd and export Prometheus metrics.	2023-09-27 20:24:47 -05:00
Dustin C. Hatch	7a9c678ff3	burp-server: Keep more backups New retention policy: * 7 daily backups * 4 weekly backups * 12 ~monthly backups * 5 ~yearly backups	2023-07-17 16:36:37 -05:00
Dustin C. Hatch	06782b03bb	vm-hosts: Update VM autostart list * dc2 is gone for a long time, replaced by two new domain controllers * unifi0 was recently replaced by unifi1	2023-07-07 10:05:22 -05:00
Dustin C. Hatch	71a43ccf07	unifi: Deploy Unifi Network controller Since Ubiquiti only publishes Debian packages for the Unifi Network controller software, running it on Fedora has historically been neigh impossible. Fortunately, a modern solution is available: containers. The linuxserver.io project publishes a container image for the controller software, making it fairly easy to deploy on any host with an OCI runtime. I briefly considered creating my own image, since theirs must be run as root, but I decided the maintenance burden would not be worth it. Using Podman's user namespace functionality, I was able to work around this requirement anyway.	2023-07-07 10:05:01 -05:00

1 2 3 4 5

242 Commits