Commit Graph

30 Commits (b5455e519a3c716783087a82e3a8d290af901029)

Author SHA1 Message Date
Dustin b5455e519a Revert "collectd: Run collectd in privileged container"
Unfortunately, running *collectd* in a container is not going to work.
Although containers can be configured to share some of the host's
namespaces, one notable exception is the mount namespace.  Naturally,
containers must have their own mount namespace, which prevents them from
seeing filesystems that are actually mounted on the host.  For
*collectd*, this effectively makes the `df` plugin useless, which
ultimately prevents us from monitoring disk space.

This reverts commit 4048e5cc0a.
2023-10-04 20:50:30 -05:00
Dustin 5862ff4cc2 local_exporter: Remove After=zincati dependency
For some reason, the *zincati.service* unit has an `After=` dependency
on *multi-user.target*.  This creates a dependency loop between
*local_exporter.service* and *zincati.service* if the former has an
`After=` dependency on the latter an an (implicit) `Before=` dependency
on *multi-user.target*.  systemd will resolve this loop by removing one
or the other units from the bootup sequence, so either Zincati or the
local exporter will not start at boot.

We can avoid this dependency loop by removing the `After=` dependency
from *local_exporter.service*.  This may cause requests for Zincati
metrics to fail if it happens to come in after the local exporter starts
but before Zincati does, but this is unlikely to actually be an issue.
2023-10-04 20:50:30 -05:00
Dustin dd3be7a24a collectd: Restart service automatically
The *collectd.service* unit may fail for various reasons.  Notably, if
the container image is not present, it may fail to start if it is
activated before the network is fully available.  Using systemd's
automatic restart mechanism will help ensure *collectd* is running
whenever possible.
2023-10-04 20:50:30 -05:00
Dustin 40bde4df26 flash: Clean up/add support for RPi 3
Although the official Fedora CoreOS documentation only provides
instructions for running CoreOS on a Raspberry Pi 4, it does actually
work on older boards as well.  `coreos-installer` creates a GPT disk
label, which the older devices do not support, but this can be worked
around using a hybrid MBR label.

Unfortunately, after I put all the effort into refactoring this script
and adding support for the older devices, I realized that it was rather
pointless as those boards simply do not have enough memory to be useful
Kubernetes nodes.  I was hoping to move the Zigbee and ZWave controllers
to a Raspberry Pi 3, but these processes take way too much memory for
that.
2023-10-04 20:50:30 -05:00
Dustin 364f4fed50 common: Add config shared by all hosts
The `common.yaml` Butane configuration file merges in all the other
various Butane configuration files that we want to share amonst all
CoreOS machines.  These include the authorized SSH keys list, collectd
deployment, SSH host certificate configuration, etc.
2023-10-03 20:07:29 -05:00
Dustin 859deb0664 sshkeys: Trust certificates issued by the CA
Now that we have an internal SSH certificate authority, instead of
explicitly listing all M×N keys for each user and client machine, we can
list only the CA certificate in the SSH authorized keys file for the
*core* user.  This will allow any user who presents a valid, signed SSH
certificate for the *core* principal to log in.
2023-10-03 20:06:37 -05:00
Dustin 88f165363d step-ssh: Automatically issue/renew SSH host certs
The `ssh-bootstrap` script, which is run by the *ssh-bootstrap.service*
systemd unit, requests SSH host certificates for each of the existing
SSH host keys.  The certificates are issued by the *POST /sshkeys/sign*
operation of *dch-webhooks* web service.

The *step-ssh-renew* timer/service runs `step ssh renew`, in a
container, on a weekly basis to renew the SSH host certificate.  A host
certificate must already exist, and its private key is used to
authenticate to the CA server.

Since `step ssh renew` can only operate on one certificate/key file at a
time, the `step-ssh-renew@.container` defines a template unit.  The
template instance specifies the key type (i.e. `rsa`, `ecdsa`, or
`ed25519`), which in turn defines which certificate and private key file
to use.  The timer unit activates a target unit, which depends on the
concrete service units.  Note that the target unit must have
`StopWhenUnneeded=yes` so that it can be restarted again the next time
the timer fires.
2023-10-03 20:06:37 -05:00
Dustin 4048e5cc0a collectd: Run collectd in privileged container
Installing packages with `rpm-ostree` is somewhat problematic.  Notably,
if a new package needs an update of an already-installed package (e.g.
shared library), the new package cannot be installed until a new version
of CoreOS is published with the updated dependency.

In order for collectd to be effective, the container it runs in has to
have most isolation features disabled.  Most importantly, the PID, UTS,
and network namespaces need to be shared with the host, so that
*collectd* can "see" the actual values.  Additionally, the default
SELinux policy for containerized processes denies practically all of the
instrumentation syscalls *collectd* needs, so it needs to run in the
unconfined `spc_t` domain.  Finally, the `/run` directory needs to be
shared with the host, so *collectd* can communicate with various daemons
via UNIX sockets.
2023-10-03 20:03:21 -05:00
Dustin ebdf587de1 local_exporter: Exporter for Zincati metrics
Zincati provides Prometheus metrics via a Unix socket.  In order for
these to be scraped by `vmagent`, they need to be exposed over HTTP.
The `local_exporter` is designed to do specifically this.

Unfortunately, the Zincati metrics socket is only accessible by the
*zincati* user, so the `local_exporter` also needs to run as that user.
Hopefully, the user ID will remain consistent in future versions of
CoreOS.
2023-10-03 15:29:58 -05:00
Dustin 517151f2c8 sshkeys: Add Luma's SSH public key 2023-09-21 22:34:14 -05:00
Dustin cb282f0bce nvr1: Deploy notify-shutdown service 2023-09-21 22:34:14 -05:00
Dustin 11cd8ce8e9 notify-shutdown: Send a message on shutdown
Since Fedora CoreOS machines tend to reboot at seemingly random times
to apply updates, it would be nice to get a notification when they go
down.
2023-09-21 22:34:14 -05:00
Dustin 8828bb3069 nvr1: Deploy nginx
Deploying nginx on the NVR server to proxy for Frigate.
2023-09-21 22:34:14 -05:00
Dustin 9fd3aa0cd3 frigate: Configure nginx reverse proxy
Using nginx, we can expose the Frigate web server via HTTPS.  Since
Frigate has no built-in authentication, we need to use Authelia via the
nginx proxy auth feature.
2023-09-21 22:32:59 -05:00
Dustin d907b47db1 fetchcert: Add script to fetch certs from K8s
Since Fedora CoreOS machines are not managed by Ansible, we need another
way to keep the HTTPS certificate up-to-date.  To that end, I've added
the `fetchcert.sh` script, along with a corresponding systemd service
and timer unit, that will fetch the latest certificate from the Secret
resource managed by the Kubernetes API.  The script authenticates with
a long-lived bearer token associated with a particular Kubernetes
service account and downloads the current Secret to a local file.  If
the certificate in the Secret is different than the one already in
place, the certificate and key files are updated and nginx is reloaded.
2023-09-21 22:30:23 -05:00
Dustin 222f40426a nginx: Deploy nginx in a container 2023-09-21 22:29:51 -05:00
Dustin a32e6676eb nvr1: Install collectd
Also enabling the `md` plugin, which is disabled by default, to monitor
the software RAID array where Frigate recordings are stored.
2023-09-21 22:29:51 -05:00
Dustin d22a65c1bd collectd: Install and configure collectd
The `collectd.yaml` Butane configuration fragment configures the machine
to install *collectd* and its various plugin packages directly on the
host using `rpm-ostree` (via *install-packages.service*).
2023-09-21 22:29:51 -05:00
Dustin 2048713452 packages: Add framework for installing packages
Some machines may need to install multiple packages for separate use
cases.  Requiring each use case to define a systemd unit that runs
`rpm-ostree install` directly would be cumbersome and also quite slow,
as each one would have to run in turn.  Instead, now there is a single
*install-packages.service* which installs all of the packages listed in
files in `/etc/ignition/packages.d`.  On first boot, all files in that
directory are read and all the packages they list will be installed in a
single `rpm-ostree install` invocation.
2023-09-21 22:29:51 -05:00
Dustin 22c085b35d frigate: Disable systemd filesystem isolation
When`ProtectSystem` is enabled, systemd sets up a separate mount
namespace for the service.  Unfortunately, this appears to interfere
with Podman and prevents it from cleaning up containers on shutdown.
2023-09-21 22:29:51 -05:00
Dustin dffa17410f frigate: Enable Frigate+ integration
To keep the API key a secret, we're encrypting the environment file in
the repository with GnuPG.  The decrypted copy only lives in the work
tree and is never committed. Changes have to be re-encrypted and
committed.
2023-09-21 22:29:51 -05:00
Dustin b80bee461a frigate: Pass DRI device for hardware acceleration
Enabling hardware acceleration using VA-API dramatically reduces
`ffmpeg` CPU usage.  For this to work, the Frigate container needs
access to the DRI device node.
2023-09-19 10:46:52 -05:00
Dustin ddd137a2e9 frigate: Manage state dir with tmpfiles.d
Since *frigate.service* runs as root, the directories created by
`StateDirectory` are owned by root.  The processes inside the container,
therefore, cannot access them.  Thus, we have to use `systemd-tmpfiles`
to create the state directories with the appropriate permissions.
2023-09-19 10:44:34 -05:00
Dustin 2a0b23c9a8 meta: Add Makefile
When developing Butane/Ignition files, I frequently forget to update the
parent files after making a change to an included file.  This causes a
lot of wasted time re-provisioning, only to discover that my change
did not take effect.  To alleviate this, we'll use `make` with some
macro magic to scan the Butane files for their dependencies, and let it
generate whatever Ignition files need updating any time a dependant file
changes.

I've also added a "publish" step to the Makefile, since I also
frequently forget to upload the regenerated Ignition files to the
server, causing the same headaches.
2023-09-16 08:15:08 -05:00
Dustin 2efce551ba zram: Configure swap-on-zram
CoreOS does not enable swap-on-zram by default.
2023-09-16 08:15:08 -05:00
Dustin 1a60688cc1 nvr1: Deploy Frigate on the nvr1.p.b 2023-09-16 08:13:03 -05:00
Dustin 533cdc2c09 frigate: Run Frigate in a container
The *frigate* container must run as root, so we use a custom user
namespace to map root in the container to an unprivilged user on the
host.

For some reason, Podman (on CoreOS anyway) fails to stop a container
that uses a separate network namespace.  It reports "invalid argument"
when attempting to unmount the `netns` file, which then causes the
container to get "stuck" in `Storage` state.  Rebooting the host is
apparently the only way to get the container to start again correctly.
Fortunately, there's no particular reason to use an alternate network
namespace for Frigate, so it can use the host's network and avoid this
problem.
2023-09-16 08:06:07 -05:00
Dustin 1d71f874cf gasket-driver: Install Coral EdgeTPU driver
The *gasket-driver* container installs the `gasket` and `apex` kernel
modules, which provide the driver for the Google Coral EdgeTPU AI
accellerator module.  The container image must be built ahead of time,
of course, and contains modules built for a specific Fedora kernel
version.

The udev rule has two purposes: to set the permissions on the device
node so that any user on the system can access it, and to "tag" the
device so that systemd will generate a `.device` unit for it.  The
latter allows other units (e.g. Frigate) to express a `Requires=` and
`After=` dependency on the device unit, so that they do not start until
the driver is loaded.
2023-09-16 07:58:48 -05:00
Dustin afadd7dcf5 Add flash.sh
This simple script helps automate the process of flashing Fedora CoreOS
onto a SD card for a Raspberry Pi.
2023-08-04 15:01:18 -05:00
Dustin 9dc46e2eff Initial commit
The first host running Fedora CoreOS (FCOS) is
*k8s-aarch64-n0.pyrocufflink.blue*.  This is a Raspberry Pi 4 that is a
specialized member of the Kubernetes cluster.  It hosts the Zigbee2MQTT
and ZWaveJS2MQTT containers, and has the Zigbee and ZWave controller USB
devices attached.
2023-07-17 15:16:01 -05:00