The internal DNS server for the *pyrocufflink.blue* et al. domains runs
on the firewall now, and is thus no longer managed by Ansible. Dropping
the group variables so the file encrypted with Ansible Vault can go
away.
The *dustinandtabitha.com* website no longer uses *formsubmit* (the time
for RSVP has **long** passed). Removing the configuration so the
file encrypted with Ansible Vault can go away.
The `TrustedUserCAKeys` setting for *sshd(8)* tells the server to accept
any certificates signed by keys listed in the specified file.
The authenticating username has to match one of the principals listed in
the certificate, of course.
This role is applied to all machines, via the `base.yml` playbook.
Certificates issued by the user CA managed by SSHCA will therefore be
trusted everywhere. This brings us one step closer to eliminating the
dependency on Active Directory/Samba.
Normal users do not need shell access to the file server, and certainly
should not be allowed to e.g. forward ports through it. Using a `Match`
block, we can apply restrictions to users who do not need administrative
functionality. In this case, we restrict everyone who is not a member
of the *Server Admins* group in the PYROCUFFLINK AD domain.
By default, `sudo` requires users to authenticate with their passwords
before granting them elevated privileges. It can be configured to
allow (some) users access to (some) privileged commands without
prompting for a password (i.e. `NOPASSWD`), however this has a real
security implication. Disabling the password requirement would
effectively grant *any* program root privileges. Prompting for a
password prevents malicious software from running privileged commands
without the user knowing.
Unfortunately, handling `sudo` authentication for Ansible is quite
cumbersome. For interactive use, the `--ask-become-pass`/`-K` argument
is useful, though entering the password for each invocation of
`ansible-playbook` while iterating on configuration policy development
is a bit tedious. For non-interactive use, though, the password of
course needs to be stored somewhere. Encrypting it with Ansible Vault
is one way to protect it, but it still ends up stored on disk somewhere
and needs to be handled carefully.
*pam_ssh_agent_auth* provides an acceptable solution to both issues. It
is better than disabling `sudo` authentication entirely, but a lot more
convenient than dealing with passwords. It uses the calling user's SSH
agent to assert that the user has access to a private key corresponding
to one of the authorized public keys. Using SSH agent forwarding, that
private key can even exist on a remote machine. If the user does not
have a corresponding private key, `sudo` will fall back to normal
password-based authentication.
The security of this solution is highly dependent on the client to store
keys appropriately. FIDO2 keys are supported, though when used with
Ansible, it is quite annoying to have to touch the token for _every
task_ on _every machine_. Thus, I have created new FIDO2 keys for both
my laptop and my desktop that have the `no-touch-required` option
enabled. This means that in order to use `sudo` remotely, I still need
to have my token plugged in to my computer, but I do not have to tap it
every time it's used.
For Jenkins, a hardware token is obviously impossible, but using a
dedicated key stored as a Jenkins credential is probably sufficient.
Unfortunately, the automatic transfer switch does not seem to work
correctly. When the standby source is a UPS running on battery, it does
*not* switch sources if the primary fails. In other words, when the
power is out and both UPS are running on battery, when the first one
dies, it will NOT switch to the second one. It has no trouble switching
when the second source is mains power, though, which is very strange.
I have tried messing with all the settings including nominal input
voltage, sensitivity, and frequency tolerence, but none seem to have any
effect.
Since it is more important for the machines to shut down safely than it
is to have an extra 10-15 minutes of runtime during an outage, the best
solution for now is to configure the hosts to shut down as soon as the
first UPS battery gets low. This is largely a waste of the second UPS,
but at least it will help prevent data loss.
Increasing the delay after starting the Kubernetes cluster to hopefully
allow things to "settle down" enough that starting services on follow up
VMs doesn't time out.
Start Kubernetes earlier. Start Synapse later (it takes a long time to
start up and often times out when the VM hosts are under heavy load).
Start SMTP relay later as it's not really needed.
The network device for the test/*pyrocufflink.red* network is named
`br1`. This needs to match in the systemd-networkd configuration or
libvirt will not be able to attach virtual machines to the bridge.
`upsmon` is the component of [NUT] that monitors (local or remote) UPS
devices and reacts to changes in their state. Notably, it is
responsible for powering down the system when there is insufficient
power to the system.
Since *file0.pyrocufflink.blue* now hosts a couple of VirtualHosts,
accessing its HTTP server by the *files.pyrocufflink.blue* alias no
longer works, as Apache routes unknown hostnames to the first
VirtualHost, rather than the global configuration. To resolve this, we
must set `ServerName` to the alias.
The *ssh-host-certs* role, which is now applied as part of the
`base.yml` playbook and therefore applies to all managed nodes, is
responsible for installing the *sshca-cli* package and using it to
request signed SSH host certificates. The *sshca-cli-systemd*
sub-package includes systemd units that automate the process of
requesting and renewing host certificates. These units need to be
enabled and provided the URL of the SSHCA service. Additionally, the
SSH daemon needs to be configured to load the host certificates.
So it turns out Gitea's RPM package repository feature is less than
stellar. Since each organization/user can only have a single
repository, separating packages by OS would be extremely cumbersome.
Presumably, the feature was designed for projects that only build a
single PRM for each version, but most of my packages need multiple
builds, as they tend to link to system libraries. Further, only the
repository owner can publish to user-scoped repositories, so e.g.
Jenkins cannot publish anything to a repository under my *dustin*
account. This means I would ultimately have to create an Organization
for every OS/version I need to support, and make Jenkins a member of it.
That sounds tedious and annoying, so I decided against using that
feature for internal packages.
Instead, I decided to return to the old ways, publishing packages with
`rsync` and serving them with Apache. It's fairly straightforward to
set this up: just need a directory with the appropriate permissions for
users to upload packages, and configure Apache to serve from it.
One advantage Gitea's feature had over a plain directory is its
automatic management of repository metadata. Publishers only have to
upload the RPMs they want to serve, and Gitea handles generating the
index, database, etc. files necessary to make the packages available to
Yum/dnf. With a plain file host, the publisher would need to use
`createrepo` to generate the repository metadata and upload that as
well. For repositories with multiple packages, the publisher would need
a copy of every RPM file locally in order for them to be included in the
repository metadata. This, too, seems like it would be too much trouble
to be tenable, so I created a simple automatic metadata manager for the
file-based repo host. Using `inotifywatch`, the `repohost-createrepo`
script watches for file modifications in the repository base directory.
Whenever a file is added or changed, the directory containing it is
added to a queue. Every thirty seconds, the queue is processed; for
each unique directory in the queue, repository metadata are generated.
This implementation combines the flexibility of a plain file host,
supporting an effectively unlimited number of repositories with
fully-configurable permissions, and the ease of publishing of a simple
file upload.
AWS is going to begin charging extra for routable IPv4 addresses soon.
There's really no point in having a relay in the cloud anymore anyway,
since a) all outbound messages are sent via the local relay and b) no
messages are sent to anyone except me.
I've added a new Kubernetes worker node,
*k8s-aarch64-n1.pyrocufflink.blue*. This machine is a Raspberry Pi CM4
mounted on a Waveshare CM4-IO-Base A and clipped onto the DIN rail.
It's got 8 GB of RAM and 32 GB of eMMC storage. I intend to use it to
build container images locally, instead of bringing up cloud instances.
Zincati is the automatic update manager on Fedora CoreOS. It exposes
Prometheus metrics for host/update statistics, which are useful to track
the progress of automatic updates and identify update issues.
Zinciti actually exposes its metrics via a Unix socket on the
filesystem. Another process, [local_exporter], is required to expose
the metrics from this socket via HTTP so Prometheus can scrape them.
[local_exporter]: https://github.com/lucab/local_exporter
*collectd* is now running on *k8s-aarch64-n0.pyrocufflink.blue*,
exposing system metrics. As it is not a member of the AD domain, it has
to be explicitly listed in the `scrape_collectd_extra_targets` variable.
*nvr1.pyrocufflink.blue* has been migrated to Fedora CoreOS. As such,
it is no longer managed by Ansible; its configuration is done via
Butane/Ignition. It is no longer a member of the Active Directory
domain, but it does still run *collectd* and export Prometheus metrics.
Since Ubiquiti only publishes Debian packages for the Unifi Network
controller software, running it on Fedora has historically been neigh
impossible. Fortunately, a modern solution is available: containers.
The *linuxserver.io* project publishes a container image for the
controller software, making it fairly easy to deploy on any host with an
OCI runtime. I briefly considered creating my own image, since theirs
must be run as root, but I decided the maintenance burden would not be
worth it. Using Podman's user namespace functionality, I was able to
work around this requirement anyway.
Sometimes I need to connect to a machine when there is an AD issue (e.g.
domain controllers are down, clocks are out of sync, etc.) but I can't
do it from my desktop.
When the RAID array is being resynchronized after the archived disk has
been reconnected, md changes the disk status from "missing" to "spare."
Once the synchronization is complete, it changes from "spare" to
"active." We only want to trigger the "disk needs archived" alert once
the synchronization process is complete; otherwise, both the "disks need
swapped" and "disk needs archived" alerts would be active at the same
time, which makes no sense. By adjusting the query for the "disk needs
archived" alert to consider disks in both "missing" and "spare" status,
we can delay firing that alert until the proper time.
The Frigate server has a RAID array that it uses to store video
recordings. Since there have been a few occasions where the array has
suddenly stopped functioning, probably because of the cheap SATA
controller, it will be nice to get an alert as soon as the kernel
detects the problem, so as to minimize data loss.
Most of the Synapse server's state is in its SQLite database. It also
has a `media_store` directory that needs to be backed up, though.
In order to back up the SQLite database while the server is running, the
database must be in "WAL mode." By default, Synapse leaves the database
in the default "rollback journal mode," which disallows multiple
processes from accessing the database, even for read-only operations.
To change the journal mode:
```sh
sudo systemctl stop synapse
sudo -u synapse sqlite3 /var/lib/synapse/homeserver.db 'PRAGMA journal_mode=WAL;'
sudo systemctl start synapse
```
Kubernetes exports a *lot* of metrics in Prometheus format. I am not
sure what all is there, yet, but apparently several thousand time series
were added.
To allow anonymous access to the metrics, I added this RoleBinding:
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
```
MinIO exposes metrics in Prometheus exposition format. By default, it
requires an authentication token to access the metrics, but I was unable
to get this to work. Fortunately, it can be configured to allow
anonymous access to the metrics, which is fine, in my opinion.
The `journal2ntfy.py` script follows the systemd journal by spawning
`journalctl` as a child process and reading from its standard output
stream. Any command-line arguments passed to `journal2ntfy` are passed
to `journalctl`, which allows the caller to specify message filters.
For any matching journal message, `journal2ntfy` sends a message via
the *ntfy* web service.
For the BURP server, we're going to use `journal2ntfy` to generate
alerts about the RAID array. When I reconnect the disk that was in the
fireproof safe, the kernel will log a message from the *md* subsystem
indicating that the resynchronization process has begun. Then, when
the disks are again in sync, it will log another message, which will
let me know it is safe to archive the other disk.
This alert will fire once the MD RAID resynchronization process has
completed and both disks in the array are online. It will clear when
one disk is disconnected and moved to the safe.
When BURP fails to even *start* a backup, it does not trigger a
notification at all. As a result, I may not notice for a few days when
backups are not happening. That was the case this week, when clients'
backups were failing immediately, because of a file permissions issue on
the server. To hopefully avoid missing backups for too long in the
future, I've added two new alerts:
* The *no recent backups* alert fires if there have not been *any* BURP
backups recently. This may also fire, for example, if the BURP
exporter is not working, or if there is something wrong with the BURP
data volume.
* The *missed client backup* alert fires if an active BURP client (i.e.
one that has had at least one backup in the past 90 days) has not been
backed up in the last 24 hours.
Using a 30-day window for the `tlast_change_over_time` function
effectively "caps out" the value at 30 days. Thus, the alert reminding
me to swap the BURP backup volume will never fire, since the value will
never be greater than the 30-day threshold. Using a wider window
resolves that issue (though the query will still produce inaccurate
results beyond the window).
The `tls cafile` setting in `smb.conf` is not necessary. It is used for
verifying peer certificates for mutual TLS authentication, not to
specify the intermediate certificate authority chain like I thought.
The setting cannot simply be left out, though. If it is not specified,
Samba will attempt to load a file from a built-in default path, which
will fail, causing the server to crash. This is avoided by setting the
value to the empty string.
The `tlast_change_over_time` function needs an interval wide enough to
consider the range of time we are intrested in. In this case, we want
to see if the BURP volume has been swapped in the last thirty days, so
the interval needs to be `30d`.
*dc0.p.b* has been gone for a while now. All the current domain
controllers use LDAPS certificates signed by Let's Encrypt and include
the *pyrocufflink.blue* name, so we can now use the apex domain A record
to connect to the directory.
This alert counts how long its been since the number of "active" disks
in the RAID array on the BURP server has changed. The assumption is
that the number will typically be `1`, but it will be `2` when the
second disk synchronized before the swap occurs.
1. Grafana 8 changed the format of the query string parameters for the
Explore page.
2. vmalert no longer needs the http.pathPrefix argument when behind a
reverse proxy, rather it uses the request path like the other
Victoria Metrics components.
The way I am handling swapping out the BURP disk now is by using the
Linux MD RAID driver to manage a RAID 1 mirror array. The array
normally operates with one disk missing, as it is in the fireproof safe.
When it is time to swap the disks, I reattach the offline disk, let the
array resync, then disconnect and store the other disk.
This works considerably better than the previous method, as it does not
require BURP or the NFS server to be offline during the synchronization.
The rule is "if it is accessible on the Internet, its name ends in .net"
Although Vaultwarden can be accessed by either name, the one specified
in the Domain URL setting is the only one that works for WebAuthn.
Domain controllers only allow users in the *Domain Admins* AD group to
use `sudo` by default. *dustin* and *jenkins* need to be able to apply
configuration policy to these machines, but they are not members of said
group.
I changed the naming convention for domain controller machines. They
are no longer "numbered," since the plan is to rotate through them
quickly. For each release of Fedora, we'll create two new domain
controllers, replacing the existing ones. Their names are now randomly
generated and contain letters and numbers, so the Blackbox Exporter
check for DNS records needs to account for this.
Gitea package names (e.g. OCI images, etc.) can contain `/` charactres.
These are encoded as %2F in request paths. Apache needs to forward
these sequences to the Gitea server without decoding them.
Unfortunately, the `AllowEncodedSlashes` setting, which controls this
behavior, is a per-virtualhost setting that is *not* inherited from the
main server configuration, and therefore must be explicitly set inside
the `VirtualHost` block. This means Gitea needs its own virtual host
definition, and cannot rely on the default virtual host.
When I added the *systemd-networkd* configuration for the Kubernetes
network interface on the VM hosts, I only added the `.netdev`
configuration and forgot the `.network` part. Without the latter,
*systemd-networkd* creates the interface, but does not configure or
activate it, so it is not able to handle traffic for the VMs attached to
the bridge.
The *netboot/jenkins-agent* Ansible role configures three NBD exports:
* A single, shared, read-only export containing the Jenkins agent root
filesystem, as a SquashFS filesystem
* For each defined agent host, a writable data volume for Jenkins
workspaces
* For each defined agent host, a writable data volume for Docker
Agent hosts must have some kind of unique value to identify their
persistent data volumes. Raspberry Pi devices, for example, can use the
SoC serial number.
The `-external.url` and `-external.alert.source` command line arguments
and their corresponding environment variables can be used to configure
the "Source" links associated with alerts created by `vmalert`.
The firewall hardware is too slow to run the *prometheus_speedtest*
program. It always showed *way* lower speeds than were actually
available. I've moved the service to the Kubernetes cluster and it
works a lot better there.
*mtrcs0.pyrocufflink.red* is a Raspberry Pi CM4 on a Waveshare
CM4-IO-BASE-B carrier board with a NVMe SSD. It runs a custom OS built
using Buildroot, and is not a member of the *pyrocufflink.blue* AD
domain.
*mtrcs0.p.r* hosts Victoria Metrics/`vmagent`, `vmalert`, AlertManager,
and Grafana. I've created a unique group and playbook for it,
*metricspi*, to manage all these applications together.
In addition to ignoring particular types of filesystems, e.g. OverlayFS,
we can also ignore filesystems by their mount point. This could be
useful, for example, for bind-mounted directories, such as those used on
Kubernetes nodes.
There is no specific playbook or role for Kubernetes. All OS
configuration is done at install time via kickstart scripts, and
deploying Kubernetes itself is done (manually) using `kubeadm init` and
`kubeadm join`.
Adding a second camera to the back yard, on the North side of the porch,
to try and figure out how the possums keep getting under the porch even
with the chicken wire around it!
We're trying to discover how the possums are getting into and out of the
house. Let's enable continuous video recording from the back yard
camera so we can observe them and come up with a plan to get rid of
them.
To handle the RSVP form on *dustinandtabitha.com*, we are going to use
*formsubmit*. It runs on the same machine that hosts the website, so
there's no dealing with CORS. The */submit/rsvp* path, which is proxied
to the backend, is the RSVP form's target.
The state history database is entirely too big. It takes over an hour
to create a backup of it, which usually causes BURP to time out. The
data it stores isn't particularly interesting anyway. Instead of trying
to back it up and ultimately not getting any backup at all, we'll just
skip it altogether to ensure we have a consistent backup of everything
else that is actually important.
If Nextcloud does not have the Internet-facing reverse proxy listed in
its "trusted proxies" setting, it will mark all traffic as being from
the proxy itself. This breaks brute force detection, etc.
Transitioning from push-based to pull-based monitoring with
Prometheus/collectd. The *write_prometheus* plugin will be installed on
all hosts, and Prometheus will be configured to scrape them directly.