Custom builds of Aimee OS can now specify additional paths under `/etc`
that should be writable. This is accomplished by populating a file
named `/etc/aimee-os/writable-etc` with a list of paths. Each line must
indicate the type of file (regular file: `f`, directory: `d`) and the
*relative* path under `/etc`.
Rather than hard-code the GPT partition label into the `init-storage`
and `factory-reset` scripts, these now determine the block device by
reading `/etc/fstab` and using the device specified for `/var`.
It turns out that we cannot use `systemd-tmpfiles` to create our Btrfs
subvolumes. Since the directories we are interested in, specifically
`/var/log` and `/var/tmp` already exist in the rootfs image and are
therefore copied into the mutable filesystem, `systemd-tmpfiles` ignores
them.
To avoid having to explicitly specify the SELinux context for each
subvolume created on the persistent filesystem, `init-storage` now
executes `setfiles` to set the appropriate labels.
The `set-root-password` command sets up an alternate mount namespace
with a writable `/etc` directory and then runs `passwd` in it. This
allows `passwd` to create its lock files and backup files, without
requiring that the real `/etc` to be mutable. After `passwd` finishes
and has updated its private copy of `/etc/shadow`, the script rewrites
the real one with its contents.
In order for users to be able to log in locally or via SSH without an
authorized key, they will need to have passwords set in `/etc/shadow`.
We do not really want to make all of `/etc` writable, so we will store
the actual `shadow` file on the persistent data volume, in a separate
Btrfs subvolume, and then bind-mount it at `/etc/shadow`.
While this makes `/etc/shadow` mutable, it does not actually let the
`passwd` program modify it. This is because `passwd` creates lock files
and backup files in `/etc`. We will ultimately need a wrapper to
"trick" `passwd` into modifying `/etc/shadow`, without making the whole
`/etc` directory mutable.
Apparently, BusyBox's `cp` does NOT copy SELinux contexts when the `-a`
argument is specified. This differs from GNU coreutils's `cp`, and
explains why the files copied from the rootfs image to the persistent
storage volume were not being labelled correctly. The `-c` argument is
required.
Now that files are labelled correctly when they are copied, the step to
run `restorecon` is no longer necessary.
We're going to want the ability for processes to have unique categories,
to enforce separation of container processes. Gentoo's SELinux policy
supports both Multi-Category Security and Multi-Level Security modes,
although the latter does not seem to work out of the box.
*systemd-tmpfiles* can create btrfs subvolumes with the `v` entry type.
Using this mechanism instead of the `init-storage` script will allow for
greater flexibility when adding other subvolumes later.
Unfortunately, the default configuration for *systemd-tmpfiles* already
includes an entry for `/var/log` with the `d` (directory) type. Since
individual entries cannot be overridden, we need to modify this entry.
The `factory-reset` command provides a way to completely wipe the data
partition, thus erasing any local configuration and state. The command
itself simply enables a special systemd service unit that is activated
during the shutdown process. This unit runs a script, after all
filesystems, except rootfs, have been unmmounted. It then erases the
signature of the filesystem on the data partition, so it will appear
blank the next time the system boots. This will trigger the
`init-storage` process, to create a new filesystem on the partition.
There's no particular reason why the directory used as the temporary
mount point for the data volume needs to be random. Using a static
name, on the other hand, makes it easier for the SELinux policy to
apply the correct type transition and ensure the directory is labelled
correctly.
For some reason, when OverlayFS is mounted at `/etc/ssh`, SELinux
prevents access both `sshd` and `ssh-keygen` access to the files there.
The AVC denials indicate that (some part of) the process is running in
the `mount_t` domain, which is not allowed to read or write `sshd_key_t`
files.
To work around this issue, without granting `mount_t` overly-permissive
access, we now configure the SSH daemon to read host keys from the
persistent data volume directly, instead of "tricking" it with
OverlayFS. The `ssh-keygen` tool does not read the `HostKey` options
from `sshd_config`, though, so it has to be explicitly instructed to
create keys in this alternate location. By using a systemd template
unit with `ConditionPathExists`, we avoid regnerating the keys on every
boot, since the `ssh-keygen` command is only run if the file does not
already exist.
Enabling SELinux on the target system needs build-time and run-time
configuration changes for ther kernel and userspace. Additionally,
SELinux requires a policy that defines allowed operations. Gentoo
provides a reasonable baseline for all of these changes, but some
modifications are required.
First and foremost, the Gentoo SELinux policy is missing several
necessary rules for systemd-based systems. Notably, services that use
alternate namespaces will fail to start because the base policy does not
allow systemd components the necessary privileges, so these rules have
to be added. Similarly, `systemd-journald` needs additional privileges
in order to be able to capture all metadata for processes generating
syslog messages. Finally, additional rules are necessary in order to
allow systemd to create files and directories prior to launching
servies.
Besides patching the policy, we also do some hackery to avoid shipping
the Python runtime in SELinux-enabled builds. Several SELinux-related
packages, including *libselinux* and *policycoreutils* have dependencies
on Python modules for some of their functionality. Unfortunately, the
Python build system does NOT properly cross-compile native extension
modules, so this functionality is not available on the target system.
Fortunately, none of the features provided by these modules are actually
needed at runtime, so we can safely ignore them and thus omit the entire
Python runtime and all Python programs from the final image.
It is important to note that it is impossible to build an
SELinux-enabled image on a host that is itself SELinux-enabled.
Operations such as changing file labels are checked against the SELinux
policy in the running kernel, and may be denied if the target policy
differs significantly from the running policy. The `setfiles` command
fails, for example, when run on a Fedora host. As such, building an
SELinux-enabled system should be done in a virtual machine using a
kernel that does not have a loaded SELinux policy. The `ocivm` script
can be used to create a suitable runtime from a container image.
There's really no sense in creating a writable copy of the whole `/etc`
hierarchy at `/run/etc/rw`. Instead, let's just mount overlays at the
paths we want to make writable (which for now is only `/etc/ssh`).