So it turns out Gitea's RPM package repository feature is less than
stellar. Since each organization/user can only have a single
repository, separating packages by OS would be extremely cumbersome.
Presumably, the feature was designed for projects that only build a
single PRM for each version, but most of my packages need multiple
builds, as they tend to link to system libraries. Further, only the
repository owner can publish to user-scoped repositories, so e.g.
Jenkins cannot publish anything to a repository under my *dustin*
account. This means I would ultimately have to create an Organization
for every OS/version I need to support, and make Jenkins a member of it.
That sounds tedious and annoying, so I decided against using that
feature for internal packages.
Instead, I decided to return to the old ways, publishing packages with
`rsync` and serving them with Apache. It's fairly straightforward to
set this up: just need a directory with the appropriate permissions for
users to upload packages, and configure Apache to serve from it.
One advantage Gitea's feature had over a plain directory is its
automatic management of repository metadata. Publishers only have to
upload the RPMs they want to serve, and Gitea handles generating the
index, database, etc. files necessary to make the packages available to
Yum/dnf. With a plain file host, the publisher would need to use
`createrepo` to generate the repository metadata and upload that as
well. For repositories with multiple packages, the publisher would need
a copy of every RPM file locally in order for them to be included in the
repository metadata. This, too, seems like it would be too much trouble
to be tenable, so I created a simple automatic metadata manager for the
file-based repo host. Using `inotifywatch`, the `repohost-createrepo`
script watches for file modifications in the repository base directory.
Whenever a file is added or changed, the directory containing it is
added to a queue. Every thirty seconds, the queue is processed; for
each unique directory in the queue, repository metadata are generated.
This implementation combines the flexibility of a plain file host,
supporting an effectively unlimited number of repositories with
fully-configurable permissions, and the ease of publishing of a simple
file upload.
The `net cache flush` command does not seem to always work to clear the
identity mapping cache used by winbind. Explicitly moving the file
does, though.
On a new DC, the `idmap.ldb` file does not yet exist the first time
`sysvolsync` runs. This causes a syntax error in the condition that
checks the modification timestamp of the file.
Forcing the PDC lookup to use localhost as the DNS server does not work
when first adding a new domain controller, as the `sysvolsync` script
runs before Samba starts. There isn't much advantage to using the local
DNS server over the system-defined server anyway.
The `winbind offline login` setting seems to cause issues when one of
the domain controllers is offline. Rather than try the other DC,
winbind seems to just "give up" and return NT_STATUS_NO_SUCH_USER for
all authentication requests until the offline cache is flushed. There's
not really any reason to use this setting on servers anyway, since they
are always connected to the LAN, as opposed to laptops that may
occasionally disconnect. Let's disable this option in the hopes that it
makes logins more resilient to DC downtime. After all, there's not much
point in having multiple DCs if they all have to be available in order
to log in.
AWS is going to begin charging extra for routable IPv4 addresses soon.
There's really no point in having a relay in the cloud anymore anyway,
since a) all outbound messages are sent via the local relay and b) no
messages are sent to anyone except me.
The *authconfig* package has been gone from Fedora since ages. There's
no reason to have this no-op step any more, especially since it has the
side-effect of making a network request to refresh the dnf cache.
*nvr1.pyrocufflink.blue* has been migrated to Fedora CoreOS. As such,
it is no longer managed by Ansible; its configuration is done via
Butane/Ignition. It is no longer a member of the Active Directory
domain, but it does still run *collectd* and export Prometheus metrics.
The `scrape_collectd_extra_targets` variable can be used to specify a
list of additional targets to scrape, in addition to the hosts in the
*collectd-prometheus* group. This will allow us to scrape hosts that
are not managed by the configuration policy, but still expose Prometheus
metrics via collectd.
MinIO is supposed to automatically reload itself when the certificate
changes, but this does not appear to happen in all cases. To ensure the
updated certificate gets used, we need to send SIGHUP to the MinIO
server process.
Since Jellyfin is running on the file server, which also hosts a few
other websites that do not define virtual hosts, the HTTP-to-HTTPS
redirect was applied to *all* requests. To avoid this, we simply add a
rewrite condition so that the redirect only applies to requests for
Jellyfin.
Jellyfin is a multimedia library manager. Clients can browse and stream
music, movies, and TV shows from the server and play them locally
(including in the browser).
Since Ubiquiti only publishes Debian packages for the Unifi Network
controller software, running it on Fedora has historically been neigh
impossible. Fortunately, a modern solution is available: containers.
The *linuxserver.io* project publishes a container image for the
controller software, making it fairly easy to deploy on any host with an
OCI runtime. I briefly considered creating my own image, since theirs
must be run as root, but I decided the maintenance burden would not be
worth it. Using Podman's user namespace functionality, I was able to
work around this requirement anyway.
When the Home Assistant container restarts, Podman relabels the entire
`/var/lib/homeassistant` directory as `container_file_t`. Since the
*homeassistant* user's home directory is `/var/lib/homeassistant`, its
`~/.ssh` directory is thus also relabeled, preventing the SSH daemon
from accessing it. Since Home Assistant itself does not need access to
this path, we can tell systemd to mount an empty tmpfs filesystem there
in the service unit's mount namespace. This way, when Podman relabels
the directory, it will change the label of the tmpfs mount point instead
of the actual directory.
Most of the Synapse server's state is in its SQLite database. It also
has a `media_store` directory that needs to be backed up, though.
In order to back up the SQLite database while the server is running, the
database must be in "WAL mode." By default, Synapse leaves the database
in the default "rollback journal mode," which disallows multiple
processes from accessing the database, even for read-only operations.
To change the journal mode:
```sh
sudo systemctl stop synapse
sudo -u synapse sqlite3 /var/lib/synapse/homeserver.db 'PRAGMA journal_mode=WAL;'
sudo systemctl start synapse
```
The Samba KDC log file seems to grow rather quickly sometimes, outpacing
the monthly rotation policy. Let's rotate it weekly and keep 4
historical versions.
Kubernetes exports a *lot* of metrics in Prometheus format. I am not
sure what all is there, yet, but apparently several thousand time series
were added.
To allow anonymous access to the metrics, I added this RoleBinding:
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
```
MinIO exposes metrics in Prometheus exposition format. By default, it
requires an authentication token to access the metrics, but I was unable
to get this to work. Fortunately, it can be configured to allow
anonymous access to the metrics, which is fine, in my opinion.
The `journal2ntfy.py` script follows the systemd journal by spawning
`journalctl` as a child process and reading from its standard output
stream. Any command-line arguments passed to `journal2ntfy` are passed
to `journalctl`, which allows the caller to specify message filters.
For any matching journal message, `journal2ntfy` sends a message via
the *ntfy* web service.
For the BURP server, we're going to use `journal2ntfy` to generate
alerts about the RAID array. When I reconnect the disk that was in the
fireproof safe, the kernel will log a message from the *md* subsystem
indicating that the resynchronization process has begun. Then, when
the disks are again in sync, it will log another message, which will
let me know it is safe to archive the other disk.
The `tls cafile` setting in `smb.conf` is not necessary. It is used for
verifying peer certificates for mutual TLS authentication, not to
specify the intermediate certificate authority chain like I thought.
The setting cannot simply be left out, though. If it is not specified,
Samba will attempt to load a file from a built-in default path, which
will fail, causing the server to crash. This is avoided by setting the
value to the empty string.
[MinIO][0] is an S3-compatible object storage server. It is designed to
provide storage for cloud-native applications for on-premises
deployments.
MinIO has not been packaged for Fedora (yet?). As such, the best way to
deploy it is usining its official container image. Here, we are using
`podman-systemd-generator` (Quadlet) to generate a systemd service
unit to manage the container process.
The HTTP->HTTPS redirect for chmod777.sh was only working by
coincidence. It needs its own virtual host to ensure it works
irrespective of how other websites are configured.
Tabitha's Hatch Learning Center site has two user submission forms: one
for signing in/out students for class, and another for parents to
register new students for the program. These are handled by
*formsubmit* and store data in CSV spreadsheets.
If the Python bindings for SELinux policy management are not installed
when Ansible gathers host facts, no SELinux-related facts will be set.
Thus, any tasks that are conditional based on these facts will not run.
Typically, such tasks are required for SELinux-enabled hosts, but must
not be performed for non-SELinux hosts. If they are not run when they
should, the deployment may fail or applications may experience issues at
runtime.
To avoid these potential issues, the *base* role now forces Ansible to
gather facts again if it installed the Python SELinux bindings.
Note: one might suggest using `meta: clear_facts` instead of `setup` and
letting Ansible decide if and when to gather facts again. Unfortunately,
this for some reason doesn't work; the `clear_facts` meta task just
causes Ansible to crash with a "shared connection to {host} closed."
The *dch-selinux* package contains customized SELinux policy modules.
I haven't worked out exactly how to build an publish it through a
continuous integration pipeline yet, so for now it's just hosted in my
user `public_html` folder on the main file server.
Samba AD DC does not implement [DFS-R for replication of the SYSVOL][0]
contents. This does not make much of a difference to me, since
the SYSVOL is really only used for Group Policy. Windows machines may
log an error if they cannot access the (basically empty) GPO files, but
that's pretty much the only effect if the SYSVOL is in sync between
domain controllers.
Unfortunately, there is one side-effect of the missing DFS-R
functionality that does matter. On domain controllers, all user,
computer, and group accounts need to have Unix UID/GID numbers mapped.
This is different than regular member machines, which only need UID/GID
numbers for users that will/are allowed to log into them. LDAP entries
only have ID numbers mapped for the latter class of users, which does
not include machine accounts. As a result, Samba falls back to
generating local ID numbers for the rest of the accounts. Those ID
numbers are stored in a local database file,
`/var/lib/samba/private/idmap.ldb`. It would seem that it wouldn't
actually matter if accounts have different ID numbers on different
domain controllers, but there are evidently [situations][1] where DCs
refuse to allocate ID numbers at all, which can cause authentication to
fail. As such, the `idmap.ldb` file needs to be kept in sync.
If we're going to go through the effort of synchronizing `idmap.ldb`, we
might as well keep the SYSVOL in sync as well. To that end, I've
written a script to synchronize both the SYSVOL contents and the
`idmap.ldb` file. It performs a simple one-way synchronization using
`rsync` from the DC with the PDC emulator role, as discovered using DNS
SRV records. To ensure the `idmap.ldb` file is in a consistent state,
it only copies the most recent backup file. If the copied file differs
from the local one, the script stops Samba and restores the local
database from the backup. It then flushes Samba's caches and restarts
the service. Finally, it fixes the NT ACLs on the contents of the
SYSVOL.
Since the contents of the SYSVOL are owned by root, naturally the
synchronization process has to run as root as well. To attempt to limit
the scope of control this would give the process, we use as much of the
systemd sandbox capabilities as possible. Further, the SSH key pairs
the DCs use to authenticate to one another are restricted to only
running rsync. As such, the `sysvolsync` script itself cannot run
`tdbbackup` to back up `idmap.ldb`. To handle that, I've created a
systemd service and corresponding timer unit to run `tdbbackup`
periodically.
I considered for a long time how to best implement this process, and
although I chose this naïve implementation, I am not exactly happy with
it. Since I do not fully understand *why* keeping
the `idmap.ldb` file in sync is necessary, there are undoubtedly cases
where blindly copying it from the PDC emulator is not correct. There
are definitely cases where the contents of the SYSVOL can be updated on
a DC besides the PDC emulator, but again, we should not run into them
because we don't really use the SYSVOL at all. In the end, I think this
solution is good enough for our needs, without being so complicated
[0]: https://wiki.samba.org/index.php?title=SysVol_replication_(DFS-R)&oldid=18120
[1]: https://lists.samba.org/archive/samba/2021-November/238370.html
Zigbee2MQTT now has a web GUI, which makes it *way* easier to manage the
Zigbee network. Now that I've got all the Philips Hue bulbs controlled
by Zigbee2MQTT instead of the Hue Hub, having access to the GUI is
awesome.
Gitea package names (e.g. OCI images, etc.) can contain `/` charactres.
These are encoded as %2F in request paths. Apache needs to forward
these sequences to the Gitea server without decoding them.
Unfortunately, the `AllowEncodedSlashes` setting, which controls this
behavior, is a per-virtualhost setting that is *not* inherited from the
main server configuration, and therefore must be explicitly set inside
the `VirtualHost` block. This means Gitea needs its own virtual host
definition, and cannot rely on the default virtual host.
I moved the metrics Pi from the red network to the blue network. I
started to get uncormfortable with the firewall changes that were
required to host a service on the red network. I think it makes the
most sense to define the red network as egress only.
The only major change that affects the configuration policy is the
introduction of the `webhook.ALLOWED_HOST_LIST` setting. For some dumb
reason, the default value of this setting *denies* access to machines on
the local network. This makes no sense; why do they expect you to host
your CI or whatever on a *public* network? Of course, the only reason
given is "for security reasons."
This work-around is no longer necessary as the default Fedora policy now
covers the Samba DC daemon. It never really worked correctly, anyway,
because Samba doesn't start `winbindd` fast enough for the
`/run/samba/winbindd` directory to be created before systemd spawns the
`restorecon` process, so it would usually fail to start the service the
first time after a reboot.
Sometimes, Frigate crashes in situations that should be recoverable or
temporary. For example, it will fail to start if the MQTT server is
unreachable initially, and does not attempt to connect more than once.
To avoid having to manually restart the service once the MQTT server is
ready, we can configure the systemd unit to enable automatic restarts.
If the *vaultwarden* service terminates unexpectedly, e.g. due to a
power loss, `podman` may not successfully remove the container. We
therefore need to try to delete it before starting it again, or `podman`
will exit with an error because the container already exists.