configpolicy

dustin

Author	SHA1	Message	Date
Dustin	c1dc52ac29	Merge branch 'loki'	2024-11-05 07:01:13 -06:00
Dustin	39d9985fbd	r/loki-caddy: Caddy reverse proxy for Loki Caddy handles TLS termination for Loki, automatically requesting and renewing its certificate via ACME.	2024-11-05 06:54:27 -06:00
Dustin	010f652060	hosts: Add loki1.p.b _loki1.pyrocufflink.blue_ replaces _loki0.pyrocufflink.blue_. The former runs Fedora Linux and is managed by Ansible, while the latter ran Fedora CoreOS and was managed by Ignition and _cfg_.	2024-11-05 06:54:27 -06:00
Dustin	168bfee911	r/webites: Add apps.du5t1n.xyz F-Droid repo I want to publish the _20125_ Status application to an F-Droid repository to make it easy for Tabitha to install and update. F-Droid repositories are similar to other package repositories: a collection of packages and some metadata files. Although there is a fully-fledged server-side software package that can manage F-Droid repositories, it's not required: the metadata files can be pre-generated and then hosted by a static web server just fine. This commit adds configuration for the web server and reverse proxy to host the F-Droid repository at _apps.du5t1n.xyz_.	2024-11-05 06:47:02 -06:00
Dustin	7e8aee072e	r/bitwarden_rs: Redirect to canonical host name Bitwarden has not worked correctly for clients using the non-canonical domain name (i.e. _bitwarden.pyrocufflink.blue_) for quite some time. This still trips me up occasionally, though, so hopefully adding a server-side redirect will help. Eventually, I'll probably remove the non-canonical name entirely.	2024-11-05 06:37:03 -06:00
Dustin	370a1df7ac	dch-proxy: Proxy for dynk8s-provisioner The reverse proxy needs to handle traffic for the _dynk8s-provisioner_ in order for the ephemeral Jenkins worker nodes in the cloud to work properly.	2024-11-05 06:30:02 -06:00
Dustin	4cd983d5f4	loki: Add role+playbook for Grafana Loki The current Grafana Loki server, loki0.pyrocufflink.blue, runs Fedora CoreOS and is managed by Ignition and cfg. Since I have declared cfg a failed experiment, I'm going to re-deploy Loki on a new VM running Fedora Linux and managed by Ansible. The loki role installs Podman and defines a systemd-managed container to run Grafana Loki.	2024-10-20 12:10:55 -05:00
Dustin	4ac79ba18d	minio-backups: No syslog for nginx access logs MinIO/S3 clients generate a _lot_ of requests. It's also not particularly useful to have these stored in Loki anyway. As such, we'll stop routing them to syslog/journal. Having access logs is somewhat useful for troubleshooting, but really for only live requests (i.e. what's happening right now). We therefore keep the access logs around in a file, but only for one day, so as not to fill up the filesystem with logs we'll never see.	2024-10-20 12:10:17 -05:00
Dustin	4ae25192d0	vm-hosts: Fix domain label The `__path__` label is automatically changed to `filename` before the processing pipeline begins.	2024-10-14 12:32:25 -05:00
Dustin	36145cb2ee	minio-backups: Disable nginx log files We don't need local log files when messages are already stored locally in the journal and remotely in Loki.	2024-10-14 12:00:19 -05:00
Dustin	a0c5ffc869	postgresql: Collect Wal-G metrics with statsd_exporter _wal-g_ can send StatsD metrics when it completes an upload/backup/etc. task. Using the `statsd_exporter`, we can capture these metrics and make them available to Victoria Metrics.	2024-10-13 20:01:19 -05:00
Dustin	221d3a2be9	vm-hosts: Scrape libvirt logs with Promtail Collecting logs from VM serial consoles and QEMU monitor.	2024-10-13 18:33:25 -05:00
Dustin	9bea8e1ce7	nextcloud: Scrape logs with Promtail Nextcloud writes JSON-structured logs to `/var/lib/nextcloud/data/nextcloud.log`. These logs contain errors, etc. from the Nextcloud server, which are useful for troubleshooting. Having them in Loki will allow us to view them in Grafan as well as generate alerts for certain events.	2024-10-13 18:05:50 -05:00
Dustin	5ced24f2be	hosts: Decommission matrix0.p.b The Synapse server hasn't been working for a while, but we don't use it for anything any more anyway.	2024-10-13 12:53:49 -05:00
Dustin	dfdddd551f	minio-backups: Keep nginx logs for 3 days _WAL-G_ and _restic_ both generate a lot of HTTP traffic, which fills up the log volume pretty quickly. Let's reduce the number of days logs are kept on the file system. Logs are shipped to Loki anyway, so there's not much need to have them local very long.	2024-09-29 11:21:24 -05:00
Dustin	0353360360	dch-proxy: Allow Internet access to IN Invoice Ninja needs to be accessible from the Internet in order to receive webhooks from Stripe. Additionally, Apple Pay requires contacting Invoice Ninja for domain verification.	2024-09-10 12:01:00 -05:00
Dustin	621f82c88d	hosts: Migrate remaining hosts to Restic Gitea and Vaultwarden both have SQLite databases. We'll need to add some logic to ensure these are in a consistent state before beginning the backup. Fortunately, neither of them are very busy databases, so the likelihood of an issue is pretty low. It's definitely more important to get backups going again sooner, and we can deal with that later.	2024-09-07 20:45:24 -05:00
Dustin	c2c283c431	nextcloud: Back up Nextcloud with Restic Now that the database is hosted externally, we don't have to worry about backing it up specifically. Restic only backs up the data on the filesystem.	2024-09-04 17:41:42 -05:00
Dustin	0f4dea9007	restic: Add role+playbook for Restic backups The `restic.yml` playbook applies the _restic_ role to hosts in the _restic_ group. The _restic_ role installs `restic` and creates a systemd timer and service unit to run `restic backup` every day. Restic doesn't really have a configuration file; all its settings are controlled either by environment variables or command-line options. Some options, such as the list of files to include in or exclude from backups, take paths to files containing the values. We can make use of these to provide some configurability via Ansible variables. The `restic_env` variable is a map of environment variables and values to set for `restic`. The `restic_include` and `restic_exclude` variables are lists of paths/patterns to include and exclude, respectively. Finally, the `restic_password` variable contains the password to decrypt the repository contents. The password is written to a file and exposed to the _restic-backup.service_ unit using [systemd credentials][0]. When using S3 or a compatible service for respository storage, Restic of course needs authentication credentials. These can be set using the `restic_aws_credentials` variable. If this variable is defined, it should be a map containing the`aws_access_key_id` and `aws_secret_access_key` keys, which will be written to an AWS shared credentials file. This file is then exposed to the _restic-backup.service_ unit using [systemd credentials][0]. [0]: https://systemd.io/CREDENTIALS/	2024-09-04 09:40:29 -05:00
Dustin	72936b3868	postgresql: Allow access by IPv6 Since LAN clients have IPv6 addresses now, some may try to connect to the database over IPv6, so we need to allow this in the host-based authentication rules.	2024-09-02 21:20:26 -05:00
Dustin	a0378feda8	nextcloud: Move database to db0 Moving the Nextcloud database to the central PostgreSQL server will allow it to take advantage of the monitoring and backups in place there. For backups specifically, this will make it easier to switch from BURP to Restic, since now only the contents of the filesystem need backed up. The PostgreSQL server on _db0_ requires certificate authentication for all clients. The certificate for Nextcloud is stored in a Secret in Kubernetes, so we need to use the _nextcloud-db-cert_ role to install the script to fetch it. Nextcloud configuration doesn't expose the parameters for selecting the certificate and private key files, but fortunately, they can be encoded in the value provided to the `host` parameter, though it makes for a rather cumbersome value.	2024-09-02 21:03:33 -05:00
Dustin	7f599e9058	dch-proxy: Proxy Jellyfin Allow access to Jellyfin from the Internet via the reverse proxy. The Jellyfin backend server has a separate port that supports the PROXY protocol.	2024-09-01 12:42:07 -05:00
Dustin	e323324c54	postgresql: Switch wal-g to use new MinIO server Switching to the MinIO server on _chromie.pyrocufflink.blue_ as _burp1.pyrocufflink.blue_ is being decommissioned.	2024-09-01 09:01:04 -05:00
Dustin	fbf587414a	hosts: Add chromie.p.b chromie.pyrocufflink.blue will replace burp1.pyrocufflink.blue as the backup server. It is running on the hardware that was originally nvr1.pyrocufflink.blue: a 1U Jetway server with an Intel Celeron N3160 CPU and 4 GB of RAM.	2024-09-01 09:01:04 -05:00
Dustin	9d60ae1a61	minio-backups: Deploy MinIO for backups This playbook uses the minio-nginx and minio-backups-cert role to deploy MinIO with nginx. The S3 API server is s3.backups.pyrocufflink.blue, and buckets can be accessed as subdomains of this name. The Admin Console is minio.backups.pyrocufflink.blue. Certificates are issued by DCH CA via ACME using `certbot`.	2024-09-01 08:59:28 -05:00
Dustin	3511176c31	r/gitea: Configure SMTP mailer Gitea needs SMTP configuration in order to send e-mail notifications about e.g. pull requests. The `gitea_smtp` variable can be defined to enable this feature.	2024-08-25 08:46:37 -05:00
Dustin	85da487cb8	r/dch-proxy: Define sites declaratively I've already made a couple of mistakes keeping the HTTP and HTTPS rules in sync. Let's define the sites declaratively and derive the HAProxy rules from the data, rather then manually type the rules.	2024-08-24 11:48:45 -05:00
Dustin	2fa28dfa5f	r/dch-proxy: Update and clean up The dch-proxy role has not been used for quite some time. The web server has been handling the reerse proxy functionality, in addition to hosting websites. The drawback to using Apache as the reverse proxy, though, is that it operates in TLS-terminating mode, so it needs to have the correct certificate for every site and application it proxies for. This is becoming cumbersome, especially now that there are several sites that do not use the _pyrocufflink.net_ wildcard certificate. Notably, Tabitha's _hatchlearningcenter.org_ is problematic because although the main site are hosted by the web server, the Invoice Ninja client portal is hosted in Kubernetes. Switching back to HAProxy to provide the reverse proxy functionality will eliminate the need to have the server certificate both on the backend and on the reverse proxy, as it can operate in TLS-passthrough mode. The main reason I stopped using HAProxy in the first place was because when using TLS-passthrough mode, the original source IP address is lost. Fortunately, HAProxy and Apache can both be configured to use the PROXY protocol, which provides a mechanism for communicating the original IP address while still passing through the TLS connection unmodified. This is particularly important for Nextcloud because of its built-in intrusion prevention; without knowing the actual source IP address, it blocks _everyone_, since all connections appear to come from the reverse proxy's IP address. Combining TLS-passthrough mode with the PROXY protocol resolves both the certificate management issue and the source IP address issue. I've cleaned up the _dch-proxy_ role quite a bit in this commit. Notably, I consolidated all the backend and frontend definitions into a single file; it didn't really make sense to have them all separate, since they were managed by the same role and referred to each other. Of course, I had to update the backends to match the currently-deployed applications as well.	2024-08-24 11:46:28 -05:00
Dustin	153b210a73	vm-hosts: Do not reboot after auto updates For obvious reasons, the VM hosts cannot automatically reboot themselves.	2024-08-23 09:33:29 -05:00
Dustin	c546f09335	smtp-relay: Rewrite dustin@hatch.name Sometimes, the mail server for hatch.name is extremely slow. While there isn't much I can do about it for external senders, I can at least ensure that email messages sent by internal services like Authelia are always delivered quickly by rewriting the recipient address to my actualy email address, bypassing the hatch.name exchange entirely.	2024-08-22 16:17:00 -05:00
Dustin	a2cf78f3f5	vm-hosts: Update vm-autostart logs0.pyrocufflink.blue has been replaced by loki0.pyrocufflink.blue since ages, so I'm not sure how I hadn't updated the autostart list with it yet. unifi3.pyrocufflink.blue replaced unifi2.p.b recently, when I was testing Luci/etcd.	2024-08-14 20:26:11 -05:00
Dustin	6d65e0594f	frigate: Configure HTTPS proxy with creds Only the _frigate_ user is allowed to access the Github API via the proxy.	2024-08-14 20:26:11 -05:00
Dustin	d2b3b1f7b3	hosts: Deploy production Frigate on nvr2.p.b nvr2.pyrocufflink.blue originally ran Fedora CoreOS. Since I'm tired of the tedium and difficulty involved in making configuration changes to FCOS machines, I am migrating it to Fedora Linux, managed by Ansible.	2024-08-12 22:22:50 -05:00
Dustin	6c71d96f81	r/frigate-caddy: Deploy Caddy in front of Frigate Deploying Caddy as a reverse proxy for Frigate enables HTTPS with a certificate issued by the internal CA (via ACME) and authentication via Authelia. Separating the installation and base configuratieon of Caddy into its own role will allow us to reuse that part for other sapplications that use Caddy for similar reasons.	2024-08-12 18:47:04 -05:00
Dustin	7b61a7da7e	r/useproxy: Configure system-wide proxy The useproxy role configures the `http_proxy` et al. environmet variables for systemd services and interactive shells. Additionally, it configures Yum repositories to use a single mirror via the `baseurl` setting, rather than a list of mirrors via `metalink`, since the proxy a) the proxy only allows access to _dl.fedoraproject.org_ and b) the proxy caches RPM files, but this is only effective if all clients use the same mirror all the time. The `useproxy.yml` playbook applies this role to servers in the needproxy group.	2024-08-12 18:47:04 -05:00
Dustin	96bc8c2c09	vm-hosts: Update autostart list k8s-amd64-n0, k8s-amd64-n1, and k8s-amd64-n2 have been replaced by k8s-amd64-n4, k8s-amd64-n5, k8s-amd64-n6, respectively. db0 is the new database server, which needs to be up before anything in Kubernetes starts, since a lot of applications running there depend on it.	2024-07-03 08:52:15 -05:00
Dustin	4f202c55e4	r/postgres-exporter: Deploy postgres-exporter The [postgres-exporter][0] exposes PostgreSQL server statistics to Prometheus. It connects to a specified PostgreSQL server (in this case, a server on the local machine via UNIX socket) and collects data from the `pg_stat_activity`, et al. views. It needs the `pg_monitor` role in order to be allowed to read the relevant metrics. Since we're setting up the exporter to connect via UNIX socket, it needs a dedicated OS user to match the PostgreSQL user in order to authenticate via the _peer_ method. [0]: https://github.com/prometheus-community/postgres_exporter/	2024-07-02 20:44:29 -05:00
Dustin	3f5550ee6c	postgresql: wal-g: Set PGHOST By default, WAL-G tries to connect to the PostgreSQL server via TCP socket on the loopback interface. Our HBA configuration requires certificate authentication for TCP sockets, so we need to configure WAL-G to use the UNIX socket.	2024-07-02 20:44:29 -05:00
Dustin	6caf28259e	hosts: db0: Promote to primary All data have been migrated from the PostgreSQL server in Kubernetes and the three applications that used it (Firefly-III, Authelia, and Home Assistant) have been updated to point to the new server. To avoid comingling the backups from the old server with those from the new server, we're reconfiguring WAL-G to push and pull from a new S3 prefix.	2024-07-02 20:44:29 -05:00
Dustin	208fadd2ba	postgresql: Configure for dedicated DB servers I am going to use the postgresql group for the dedicated database servers. The configuration for those machines will be quite a bit different than for the one existing machine that is a member of that group already: the Nextcloud server. Rather than undefine/override all the group-level settings at the host level, I have removed the Nextcloud server from the postgresql group, and updated the `nextcloud.yml` playbook to apply the postgresql-server role itself. Eventually, I want to move the Nextcloud database to the central database servers. At that point, I will remove the postgresql-server role from the `nextcloud.yml` playbook.	2024-07-02 20:44:29 -05:00
Dustin	7201f7ed5c	vm-hosts: Expose storage VLAN to VMs To improve the performance of persistent volumes accessed directly from the Synology by Kubernetes pods, I've decided to expose the storage network to the Kubernetes worker node VMs. This way, iSCSI traffic does not have to go through the firewall. I chose not to use the physical interfaces that are already directly connected to the storage network for this for two reasons: 1) I like the physical separation of concerns and 2) it would add complexity to the setup by introducing a bridge on top of the existing bond.	2024-06-23 10:43:15 -05:00
Dustin	6520b86958	k8s-controller: Do not reboot after auto-updates I don't want the Kubernetes control plane servers rebooting themselves randomly; I need to coordinate that with other goings-on on the network.	2024-06-23 10:43:15 -05:00
Dustin	f0445ebe53	nextcloud: Do not auto-update Nextcloud Nextcloud usually (always?) wants the `occ upgrade` command to be run after an update. If the nextcloud package gets updated along with the rest of the OS, Nextcloud will be down until I manually run that command hours/days later.	2024-06-23 10:43:15 -05:00
Dustin	24bf145a34	all: Do not auto-update on weekends I don't want machines updating themselves, rebooting, and potentially breaking stuff over the weekend.	2024-06-21 22:08:03 -05:00
Dustin	88c45e22b6	vm-hosts: Update VM autostart for new DCs	2024-06-20 18:49:04 -05:00
Dustin	292ab4585c	all: promtail: Update trusted CA certificate Loki uses a certificate signed by dch-ca r2 now (actually has for quite some time...)	2024-06-12 18:57:01 -05:00
Dustin	ffe972d79b	r/samba-cert: Obtain LDAP/TLS cert via ACME The samba-cert role configures `lego` and HAProxy to obtain an X.509 certificate via the ACME HTTP-01 challenge. HAProxy is necessary because LDAP server certificates need to have the apex domain in their SAN field, and the ACME server may contact any domain controller server with an A record for that name. HAProxy will forward the challenge request on to the first available host on port 5000, where `lego` is listening to provide validation. Issuing certificates this way has a couple of advantages: 1. No need for the wildcard certificate for the pyrocufflink.blue domain any more 2. Renewals are automatic and handled by the server itself rather than Ansible via scheduled Jenkins job Item (2) is particularly interesting because it avoids the bi-monthly issue where replacing the LDAP server certificate and restarting Samba causes the Jenkins job to fail. Naturally, for this to work correctly, all LDAP client applications need to trust the certificates issued by the ACME server, in this case DCH Root CA R2.	2024-06-12 18:33:24 -05:00
Dustin	58972cf188	auto-updates: Install and configure dnf-automatic dnf-automatic is an add-on for `dnf` that performs scheduled, automatic updates. It works pretty much how I would want it to: triggered by a systemd timer, sends email reports upon completion, and only reboots for kernel et al. updates. In its default configuration, `dnf-automatic.timer` fires every day. I want machines to update weekly, but I want them to update on different days (so as to avoid issues if all the machines reboot at once). Thus, the _dnf-automatic_ role uses a systemd unit extension to change the schedule. The day-of-the-week is chosen pseudo-randomly based on the host name of the managed system.	2024-06-12 06:25:17 -05:00
Dustin	1f86fa27b6	vm-hosts: Auto-start unifi2	2024-05-26 10:51:16 -05:00
Dustin	5a9b8b178a	hosts: Decommission unifi1 unifi1.pyrocufflink.blue is being replaced with unifi2.pyrocufflink.blue. The new server runs Fedora CoreOS.	2024-05-26 10:50:32 -05:00

1 2 3 4 5 ...

276 Commits (2d5f9e66c1c1c2b1db970b564ff0176273a1a727)