The `upsrw` command, which is used to set individual UPS configuration
parameters like low battery level, etc., needs a username and password
to authenticate to `upsd`.
Setting `AutoUpdate=registry` will tell Podman to automatically fetch
an updated container image from its corresponding registry and restart
the container. The `podman-auto-update.timer` systemd unit needs to be
active for this to happen on a schedule.
Since the "primary" `upsmon` is always (for our purposes) running on the
same host as `upsd`, there's no reason to specify both values.
All systems need a shutdown command; one is not set by default.
The primary system is the only one that should send notifications.
`dest` is not a valid option for the `--mount` argument to `podman`. To
specify where the target path, only `target`, `destination`, and `dst`
are valid.
`upsmon` is the component of NUT that tracks the status of UPSs and
reacts to their changing by sending notifications and/or shutting down
the system. It is a networked application that can run on any system;
it can run on a different system than `upsd`, and indeed can run on
multiple systems simultaneously.
Each system that runs `upsmon` will need a username and password for
each UPS it will monitor. Using the CUE [function pattern][0], I've
made it pretty simple to declare the necessary values under
`nut.monitor`.
[0]: https://cuetorials.com/patterns/functions/
*collectd* logs to syslog, so its output is lost when it's running in a
container. We can capture messages from it by mounting the journald
syslog socket into the container.
The `/run/udev/rules.d` directory may not always exist, especially at
boot. We need to ensure that it does before we try to copy rules
exported by containers into it, or the unit will fail.
Even with *collectd* configured to report filesystem usage by device, it
still only reports filesystems that are mounted (in its namespace).
Thus, in order for it to report filesystems like `/boot`, these need to
be mounted in the container.
I keep going back-and-forth on whether or not collectd should run in a
container on Fedora CoreOS machines. On the one hand, running it
directly on the host allows it to monitor filesystem usage by mount
point, which is consistent with how non-FCOS machines are monitored.
On the other hand, installing packages on FCOS with `rpm-ostree` is a
nightmare. It's _incredibly_ slow. There's also occasionally issues
installing packages if the base layer has not been updated in a while
and the new packages require an existing package to be updated.
For the NUT server specifically, I have changed my mind again: the
*collectd-nut* package depends on *nut-client*, which in turn depends on
Python. I definitely want to avoid installing Python on the host, but I
do not want to lose the ability to monitor the UPSs via collectd. Using
a container, I can strip out the unnecessary bits of *nut-client* and
avoid installing Python at all. I think that's worth having to monitor
filesystem usage by device instead of by mount point.
Without the `...` prefix, CUE interprets a type enclosed in square
brackets as a list of exactly one of that type. The ellipsis changes it
to mean a list of any number of that type.
I don't want Jenkins to build a new runtime container every time I make
a change to the configuration policy. As such, I've moved the container
image definition and corresponding CI pipeline script to their own
repository.
A bunch of stuff that wasn't schema definitions ended up in the `schema`
package. Rather than split values up in a bunch of top-level packages,
I think it would be better to have a package-per-app model.
Although KCL is unquestionably a more powerful language, and maps more
closely to my mental model of how host/environment/application
configuration is defined, the fact that it doesn't work on ARM (issue
982]) makes it a non-starter. It's also quite slow (owing to how it
compiles a program to evaluate the code) and cumbersome to distribute.
Fortunately, `tmpl` doesn't care how the values it uses were computed,
so we freely change configuration languages, so long as whatever we use
generates JSON/YAML.
CUE is probably a lot more popular than KCL, and is quite a bit simpler.
It's more restrictive (values cannot be overridden once defined), but
still expressive enough for what I am trying to do (so far).
`tmpl` takes a long time to compile on a Raspberry Pi, so I've created a
CI pipeline to build it separately.
`kcl` seems to have a [bug][0] that causes it to include the x86_64
builds of `kclvm_cli` and `libkclvm_cli_cdylib.so` on aarch64. This
naturally doesn't work, so we need to fetch the correct builds
ourselves.
[0]: https://github.com/kcl-lang/cli/issues/31
The only privilege NUT needs is access to the USB device nodes. Using a
device CGroup rule to allow this is significantly better than disabling
all restrictions. Especially since I discovered that `--privileged`
implies `--security-opt label=disable`, effectively disabling SELinux
confinement of the container.
NUT needs some udev rules in order to set the proper permissions on USB
etc. devices so it can run as an otherwise unprivileged user. Since
udev rules can only be processed on the host, these rules need to be
copied out of the container and evaluated before the NUT server starts.
To enable this, the *nut-server* container image copies the rules it
contains to `/etc/udev/rules.d` if that directory is a mount point. By
bind mounting a directory on the host at that path, we can get a copy of
the rules files outside the container. Then, using a systemd path unit,
we can tell the udev daemon to reload and reevaluate its rules.
SELinux prevents processes in containers from writing to
`/etc/udev/rules.d` directly, so we have to use an intermediate location
and then copy the rules files to their final destination.
Need to run `systemctl daemon-reload` after creating or modifying the
`nut-server.container` unit file, so that the corresponding service unit
will be generated.
When `tmpl` runs `systemd-sysusers` after generating the `sysusers.d`
file for NUT, the `/etc/passwd` and `/etc/group` files on the host are
created anew and replaced, which "breaks" the bind mount. Since new
files are put in their place, the container and the host no longer see
the same files. We can work around this by using a symbolic link for
each file, pointing to the respective file in the `/host` directory
(which is the host's `/` directory bind mounted into the container's
namespace). Since the symlinks follow the file by name rather than
inode, the container's view is always in sync with the host's.
As it turns out, KCL literally *compiles* a program from the KCL
sources. The program it creates needs to link with its runtime library,
`libkclvm_cli_cdylib.so`. The `kcl` command extracts this library,
along with a helper utility `kclvm_cli`, which performs the actual
compilation and linking. In a container, `/root/go` is probably mounted
read-only, so we need to extract these files ahead of time and put them
in another location, so the `kcl` command does not have to do it each
time it runs.
When `tmpl` substitutes the path of the generated file for `%s` in hook
commands, it uses the full path including the `destdir` prefix. Since
we're running `tmpl` inside a container, but `systemd-sysusers` outside
it (via `nsenter -t 1`), that path is not correct. Thus, we need to
explicitly pass the path as `systemd-sysusers` will see it.