Since Ubiquiti only publishes Debian packages for the Unifi Network
controller software, running it on Fedora has historically been neigh
impossible. Fortunately, a modern solution is available: containers.
The *linuxserver.io* project publishes a container image for the
controller software, making it fairly easy to deploy on any host with an
OCI runtime. I briefly considered creating my own image, since theirs
must be run as root, but I decided the maintenance burden would not be
worth it. Using Podman's user namespace functionality, I was able to
work around this requirement anyway.
The Frigate server has a RAID array that it uses to store video
recordings. Since there have been a few occasions where the array has
suddenly stopped functioning, probably because of the cheap SATA
controller, it will be nice to get an alert as soon as the kernel
detects the problem, so as to minimize data loss.
Most of the Synapse server's state is in its SQLite database. It also
has a `media_store` directory that needs to be backed up, though.
In order to back up the SQLite database while the server is running, the
database must be in "WAL mode." By default, Synapse leaves the database
in the default "rollback journal mode," which disallows multiple
processes from accessing the database, even for read-only operations.
To change the journal mode:
```sh
sudo systemctl stop synapse
sudo -u synapse sqlite3 /var/lib/synapse/homeserver.db 'PRAGMA journal_mode=WAL;'
sudo systemctl start synapse
```
The `journal2ntfy.py` script follows the systemd journal by spawning
`journalctl` as a child process and reading from its standard output
stream. Any command-line arguments passed to `journal2ntfy` are passed
to `journalctl`, which allows the caller to specify message filters.
For any matching journal message, `journal2ntfy` sends a message via
the *ntfy* web service.
For the BURP server, we're going to use `journal2ntfy` to generate
alerts about the RAID array. When I reconnect the disk that was in the
fireproof safe, the kernel will log a message from the *md* subsystem
indicating that the resynchronization process has begun. Then, when
the disks are again in sync, it will log another message, which will
let me know it is safe to archive the other disk.
We're going to run MinIO on the BURP server to provide a backup target
for the [Postgres Operator][0]/[WAL-E][1]. Although the Postgres
Operator also supports backups via [WAL-G][2], which supports more
backup targets like SFTP, the operator does not support restoring from
those targets. As such, the best way to get fully-featured backups for
the Postgres Operator, including environment cloning, etc., is to use
S3. Since I absolutely do not want to store my backups "in the cloud,"
using MinIO seems a decent alternative. Running it on the BURP server
allows the backups to be stored and rotated along with regular system
backups.
[0]: https://github.com/zalando/postgres-operator/
[1]: https://github.com/wal-e/wal-e
[2]: https://github.com/wal-g/wal-g
I moved the metrics Pi from the red network to the blue network. I
started to get uncormfortable with the firewall changes that were
required to host a service on the red network. I think it makes the
most sense to define the red network as egress only.
I've moved handling of DNS to the border firewall instead of a dedicated
virtual machine. Originally, the VM was necessary because the UniFi
Security Gateway sucked and could not (easily) handle the complex
configuration I wanted to use. Since moving to the new firewall, this
is no longer a problem.
Having DNS on a VM is problematic when full-network outages occur, like
the one that happened on 16 August 2022. When everything starts back
up, DNS is unavailable. libvirt VM autostart does not work for machines
that have been migrated between hosts (the auto-start flag is not
migrated, and libvirt "forgets" that the VM was supposed to autostart if
it is migrated away and back). I plan to script a solution for this at
some point, but I still think it makes more sense for the firewall to
handle it. It will certainly make it come up quicker regardless.
The *pxe* role configures the TFTP and NBD stages of PXE network
booting. The TFTP server provides the files used for the boot stage,
which may either be a kernel and initramfs, or another bootloader like
SYSLINUX/PXELINUX or GRUB. The NBD server provides the root filesystem,
typically mounted by code in early userspace/initramfs.
The *pxe* role also creates a user group called *pxeadmins*. Users in
this group can publish content via TFTP; they have write-access to the
`/var/lib/tftpboot` directory.
*mtrcs0.pyrocufflink.red* is a Raspberry Pi CM4 on a Waveshare
CM4-IO-BASE-B carrier board with a NVMe SSD. It runs a custom OS built
using Buildroot, and is not a member of the *pyrocufflink.blue* AD
domain.
*mtrcs0.p.r* hosts Victoria Metrics/`vmagent`, `vmalert`, AlertManager,
and Grafana. I've created a unique group and playbook for it,
*metricspi*, to manage all these applications together.
The *victoria-metrics* role deploys a single-server instance of the
Victoria Metrics time series database server. It installs the selected
version by downloading the binary release from Github and copying it to
`/usr/local/sbin` on the managed node. Scrape configuration is optional
and can be specified with the `scrape_configs` variable.
There is no specific playbook or role for Kubernetes. All OS
configuration is done at install time via kickstart scripts, and
deploying Kubernetes itself is done (manually) using `kubeadm init` and
`kubeadm join`.
*nvr1.pyrocufflink.blue* is the new video recording server. It is a
1U rack-mounted physical machine based on the [Jetway
JBC150F596-3160-B][0] barebone system. It replaces
*nvr0.pyrocufflink.blue* in this role.
[0]: https://www.jetwaycomputer.com/JBC150F596.html
The *sensors* plugin for collectd reads temperature information from the
I²C/SMBus using *lm_sensors*. Naturally, it is only useful on physical
machines, so it is not installed or enabled by default.
Transitioning from push-based to pull-based monitoring with
Prometheus/collectd. The *write_prometheus* plugin will be installed on
all hosts, and Prometheus will be configured to scrape them directly.
Fedora 34 does not include the *ntp* package, as it has been "obsoleted
by ntpsec." Until I can create a role for *ntpsec*,
*dc2.pyrocufflink.blue* cannot be an NTP server.
*hass1.pyrocufflink.blue* and *hassdb0.pyrocufflink.blue* were part of
the old Home Assistant deployment. Everything has been migrated to
*hass2.pyrocufflink.blue*, so these machines can be decommissioned now.
*nvr0.pyrocufflink.blue* hosts Frigate. It is deployed on a separate
subnet, for two reasons:
* To avoid streaming video from the cameras through the firewall
* To prevent any hosts on the LAN except Home Assistant from
communicating with Frigate, since it does not have any kind of
authentication or access control
Frigate is an NVR that uses machine learning to detect objects on camera
in real time. It integrates with Home Assistant to expose sensors which
can be used for automation, etc.
The only official way to deploy Frigate is with a container, so we use
Podman and systemd to manage it.
*hass2.pyrocufflink.blue* is a Raspberry Pi Compute Module 4-based
system, currently mounted in a WaveShare CM4 Mini Base Board (A). With
an NVMe SSD for primary storage, it runs significantly faster than a
standard Raspberry Pi 4, and blows the old Raspberry Pi 3-based Home
Assistant deployment out of the water. It has a Zooz 700 series Z-Wave
Plus S2 USB stick and a ConBee II Zigbee USB stick attached to its USB
2.0 ports. It runs a customized Fedora Minimal distribution.
Although configuration policy is not yet available for Prometheus
itself, the `collectd.yml` playbook also uses the *prometheus* host
group. Specifically, hosts in this group are configured to receive
collectd data from other hosts and expose those data through the
`write_prometheus` plugin.
This commit introduces the *grafana* role and the corresponding
`grafana.yml` playbook. The role installs Grafana using the system
package manager, and configures the server (including LDAP
authentication).
The *synapse* role and the corresponding `synapse.yml` playbook deploy
Synapse, the reference Matrix homeserver implementation.
Deploying Synapse itself is fairly straightforward: it is packaged by
Fedora and therefore can simply be installed via `dnf` and started by
`systemd`. Making the service available on the Internet, however, is
more involved. The Matrix protocol mostly works over HTTPS on the
standard port (443), so a typical reverse proxy deployment is mostly
sufficient. Some parts of the Matrix protocol, however, involve
communication over an alternate port (8448). This could be handled by a
reverse proxy as well, but since it is a fairly unique port, it could
also be handled by NAT/port forwarding. In order to support both
deployment scenarios (as well as the hypothetical scenario wherein the
Synapse machine is directly accessible from the Internet), the *synapse*
role supports specifying an optional `matrix_tls_cert` variable. If
this variable is set, it should contain the path to a certificate file
on the Ansible control machine that will be used for the "direct"
connections (i.e. on port 8448). If it is not set, the default Apache
certificate will be used for both virtual hosts.
Synapse has a pretty extensive configuration schema, but most of the
options are set to their default values by the *synapse* role. Other
than substituting secret keys, the only exposed configuration option is
the LDAP authentication provider.
I doubt I will be using Koji much if at all any more. In preparation
for decommissioning it, I am moving the Koji inventory to hosts.offline,
to prevent Jenkins jobs from failing.