Gitea package names (e.g. OCI images, etc.) can contain `/` charactres.
These are encoded as %2F in request paths. Apache needs to forward
these sequences to the Gitea server without decoding them.
Unfortunately, the `AllowEncodedSlashes` setting, which controls this
behavior, is a per-virtualhost setting that is *not* inherited from the
main server configuration, and therefore must be explicitly set inside
the `VirtualHost` block. This means Gitea needs its own virtual host
definition, and cannot rely on the default virtual host.
When I added the *systemd-networkd* configuration for the Kubernetes
network interface on the VM hosts, I only added the `.netdev`
configuration and forgot the `.network` part. Without the latter,
*systemd-networkd* creates the interface, but does not configure or
activate it, so it is not able to handle traffic for the VMs attached to
the bridge.
The *netboot/jenkins-agent* Ansible role configures three NBD exports:
* A single, shared, read-only export containing the Jenkins agent root
filesystem, as a SquashFS filesystem
* For each defined agent host, a writable data volume for Jenkins
workspaces
* For each defined agent host, a writable data volume for Docker
Agent hosts must have some kind of unique value to identify their
persistent data volumes. Raspberry Pi devices, for example, can use the
SoC serial number.
The `-external.url` and `-external.alert.source` command line arguments
and their corresponding environment variables can be used to configure
the "Source" links associated with alerts created by `vmalert`.
The firewall hardware is too slow to run the *prometheus_speedtest*
program. It always showed *way* lower speeds than were actually
available. I've moved the service to the Kubernetes cluster and it
works a lot better there.
*mtrcs0.pyrocufflink.red* is a Raspberry Pi CM4 on a Waveshare
CM4-IO-BASE-B carrier board with a NVMe SSD. It runs a custom OS built
using Buildroot, and is not a member of the *pyrocufflink.blue* AD
domain.
*mtrcs0.p.r* hosts Victoria Metrics/`vmagent`, `vmalert`, AlertManager,
and Grafana. I've created a unique group and playbook for it,
*metricspi*, to manage all these applications together.
In addition to ignoring particular types of filesystems, e.g. OverlayFS,
we can also ignore filesystems by their mount point. This could be
useful, for example, for bind-mounted directories, such as those used on
Kubernetes nodes.
There is no specific playbook or role for Kubernetes. All OS
configuration is done at install time via kickstart scripts, and
deploying Kubernetes itself is done (manually) using `kubeadm init` and
`kubeadm join`.
Adding a second camera to the back yard, on the North side of the porch,
to try and figure out how the possums keep getting under the porch even
with the chicken wire around it!
We're trying to discover how the possums are getting into and out of the
house. Let's enable continuous video recording from the back yard
camera so we can observe them and come up with a plan to get rid of
them.
To handle the RSVP form on *dustinandtabitha.com*, we are going to use
*formsubmit*. It runs on the same machine that hosts the website, so
there's no dealing with CORS. The */submit/rsvp* path, which is proxied
to the backend, is the RSVP form's target.
The state history database is entirely too big. It takes over an hour
to create a backup of it, which usually causes BURP to time out. The
data it stores isn't particularly interesting anyway. Instead of trying
to back it up and ultimately not getting any backup at all, we'll just
skip it altogether to ensure we have a consistent backup of everything
else that is actually important.
If Nextcloud does not have the Internet-facing reverse proxy listed in
its "trusted proxies" setting, it will mark all traffic as being from
the proxy itself. This breaks brute force detection, etc.
Transitioning from push-based to pull-based monitoring with
Prometheus/collectd. The *write_prometheus* plugin will be installed on
all hosts, and Prometheus will be configured to scrape them directly.
Having this option enabled dramatically improves the reliability of
collectd multicast traffic from physical machines and VMs on a separate
VM host from the receiving machine.
1. Set a password for *root* on all machines (useful for logging in via
serial console if network is down)
2. Set an authorized SSH key for root on all machines:
* For Fedora 34, use my FIDO2 security token key
* For all other hosts, use my ED25519 key
Filesystems like NFS and CIFS require "helper" utilities (i.e.
`mount.nfs` and `mount.cifs`, respectively). These need to be installed
in order for a system to be able to mount those filesystems.
The current shared storage system uses NFSv4, and as such, the
*nfs-utils* package needs to be installed on the VM hosts.
Originally, the network configuration for the VM networks and the
storage network was configured using the *netifaces* role. This has
effectively stopped working in recent versions of Fedora, as it sort of
relied on `dhcpcd`, which has not been maintained in Fedora for a while
and no longer behaves correctly. After evaluating *NetworkManager* as a
replacement, I decided that *systemd-networkd* is a more appropriate
solution.
There are effectively two "layers" of network configuration needed for
the VM hosts: the host-specific settings, and the common settings. The
host-specific settings include such properties as the IP address of the
management interface and the names of the physical ports that make up
the bonded interfaces. The common settings are the bonded interfaces,
the VLAN interfaces created on top of the bond, and the bridges that
provide access to VMs.
To configure the host-specific settings, each host simply needs the
appropriate `networkd_*` variables in its `host_vars` file. For the
common settings, we apply the *systemd-networkd* role again in the
`vmhost.yml` with different values for these variables. Thus,
effectively, `systemd-networkd.yml` manages the host-specific settings,
while `vmhost.yml` manages the common settings.
I couldn't get RTMP to work on the Back Yard camera because the `ffmpeg`
process kept crashing:
```
ffmpeg.back_yard.clips_rtmp ERROR : av_interleaved_write_frame(): Connection reset by peer
ffmpeg.back_yard.clips_rtmp ERROR : [flv @ 0x5562090c8ec0] Failed to update header with correct duration.
ffmpeg.back_yard.clips_rtmp ERROR : [flv @ 0x5562090c8ec0] Failed to update header with correct filesize.
ffmpeg.back_yard.clips_rtmp ERROR : Error writing trailer of rtmp://127.0.0.1/live/back_yard: Connection reset by peer
watchdog.back_yard INFO : Terminating the existing ffmpeg process...
watchdog.back_yard INFO : Waiting for ffmpeg to exit gracefully...
```
I thought increasing the value of `--shm-size` argument for `podman`
would help, but even going as high as 1024 mebibytes did not resolve the
problem.
Ultimately, I decided that it is not really necessary to view the full
4k stream in real time. The back yard camera supports three streams, so
I set them all up for different roles. I briefly considered using a
single 1080p stream for both object detection and RTMP streaming, but
this consumed considerable CPU time, so I decided against it for now. I
may re-evaluate that option if I decide to purchase a TPU.
*nvr0.pyrocufflink.blue* hosts Frigate. It is deployed on a separate
subnet, for two reasons:
* To avoid streaming video from the cameras through the firewall
* To prevent any hosts on the LAN except Home Assistant from
communicating with Frigate, since it does not have any kind of
authentication or access control
For hosts that cannot send metrics via multicast (e.g. because they are
on a different subnet), *collectd* needs to listen on the all-hosts
unicast address.
ZwaveJS2Mqtt includes a very powerful web-based UI for configuring and
controlling the Z-Wave network. This functionality is no longer
available within Home Assistant itself, so being able to access the
ZwaveJS2Mqtt UI is crucial to operating the network.
I wanted to make the UI available at */zwave/*, which requires using
*mod_rewrite* to conditionally proxy requests based on the `Connection`
HTTP header, since the UI passes both HTTP and WebSocket requests to the
same paths. *mod_rewrite* configuration is not inherited from the main
server configuration to virtual hosts, so the
`RewriteRule`/`RewriteCond` directives have to be specified within the
`<VirtualHost>` block. This means that the Home Assistant proxy
configuration has to be within its own virtual host, and the
Zwavejs2Mqtt configuration has to be there as well.
Mosquitto 2.x included two significant changes from 1.6:
* There is no longer a "default" listener; all listeners are configured
in the same way
* The daemon drops privileges *before* reading TLS certificates and
private keys
Although configuration policy is not yet available for Prometheus
itself, the `collectd.yml` playbook also uses the *prometheus* host
group. Specifically, hosts in this group are configured to receive
collectd data from other hosts and expose those data through the
`write_prometheus` plugin.
This commit introduces the *grafana* role and the corresponding
`grafana.yml` playbook. The role installs Grafana using the system
package manager, and configures the server (including LDAP
authentication).
Since the Nextcloud configuration file is managed by the configuration
policy, all of the settings configurable through the web UI need to be
templated. One important group of settings is the outbound email
configuration. This can now be configured using the `nextcloud_smtp`
Ansible variable.
Fedora now includes a packaged version of Nextcloud. This will be
_much_ easier to maintain than the tarball-based distribution method.
There are some minor differences in how the Fedora package works,
compared to the upstream tarball. Notably, it puts the configuration
file in `/etc/` and makes it read-only, and it stores persistent data
separate from the application. These differences require modifications
to the Apache and PHP-FPM configuration, but the package also included
examples to make this easier. Since the `config.php` is read-only now,
it has to be managed by the configuration policy; it cannot be modified
by the Administration web UI.
*Mosquitto* implements an MQTT server. It is the recommended
implementation for using MQTT with Home Assistant.
I have added this role to deploy Mosquitto on the Home Assistant server.
It will be used to send data from custom sensors, such as the
temperature/pressure/humidity sensor connected to the living room wall
display.
When there is a network issue that prevents DNS names from being
resolved, it can be difficult to troubleshoot. For example, last night,
the Samba domain controller crashed, so *pyrocufflink.blue* names were
unavailable. Furthermore, the domain controller VM was apparently
locked up, so I could not SSH into it directly, and it needed to be
rebooted. Since the VM host's name did not resolve, I could not find
its address to log into it and reboot the VM. I resorted to scanning
the SSH keys of every IP address on the network until I found the one
that matched the cached key in ~/.ssh/known_hosts. This was cumbersome
and annoying.
Assigning DHCP reservations to the VM hosts will ensure that when a
situation like this arises again, I can quickly connect to the correct
VM host and manage its virtual machines, as its address is recorded in
the configuration policy.
The *synapse* role and the corresponding `synapse.yml` playbook deploy
Synapse, the reference Matrix homeserver implementation.
Deploying Synapse itself is fairly straightforward: it is packaged by
Fedora and therefore can simply be installed via `dnf` and started by
`systemd`. Making the service available on the Internet, however, is
more involved. The Matrix protocol mostly works over HTTPS on the
standard port (443), so a typical reverse proxy deployment is mostly
sufficient. Some parts of the Matrix protocol, however, involve
communication over an alternate port (8448). This could be handled by a
reverse proxy as well, but since it is a fairly unique port, it could
also be handled by NAT/port forwarding. In order to support both
deployment scenarios (as well as the hypothetical scenario wherein the
Synapse machine is directly accessible from the Internet), the *synapse*
role supports specifying an optional `matrix_tls_cert` variable. If
this variable is set, it should contain the path to a certificate file
on the Ansible control machine that will be used for the "direct"
connections (i.e. on port 8448). If it is not set, the default Apache
certificate will be used for both virtual hosts.
Synapse has a pretty extensive configuration schema, but most of the
options are set to their default values by the *synapse* role. Other
than substituting secret keys, the only exposed configuration option is
the LDAP authentication provider.
Since DNS only allowed to be sent over the VPN, it is not possible to
resolve the VPN server name unless the VPN is already connected. This
naturally creates a chicken-and-egg scenario, which we can resolve by
manually providing the IP address of the server we want to connect to.
This commit adds a new playbook, `protonvpn.yml`, and its supporting
roles *strongswan-swanctl* and *protonvpn*. This playbook configures
strongSwan to connect to ProtonVPN using IPsec/IKEv2.
With this playbook, we configure the name servers on the Pyrocufflink
network to route all DNS requests through the Cloudflare public DNS
recursive servers at 1.1.1.1/1.0.0.1 over ProtonVPN. Using this setup,
we have the benefit of the speed of using a public DNS server (which is
*significantly* faster than running our own recursive server, usually by
1-2 seconds per request), and the benefit of anonymity from ProtonVPN.
Using the public DNS server alone is great for performance, but allows
the server operator (in this case Cloudflare) to track and analyze usage
patterns. Using ProtonVPN gives us anonymity (assuming we trust
ProtonVPN not to do the very same tracking), but can have a negative
performance impact if its used for all Internet traffic. By combining
these solutions, we can get the benefits of both!
This commit adds two new variables to the *named* role:
`named_queries_syslog` and `named_rpz_syslog`. These variables control
whether BIND will send query and RPZ log messages to the local syslog
daemon, respectively.
BIND response policy zones (RPZ) support provides a mechanism for
overriding the responses to DNS queries based on a wide range of
criteria. In the simplest form, a response policy zone can be used to
provide different responses to different clients, or "block" some DNS
names.
For the Pyrocufflink and related networks, I plan to use an RPZ to
implement ad/tracker blocking. The goal will be to generate an RPZ
definition from a collection of host lists (e.g. those used by uBlock
Origin) periodically.
This commit introduces basic support for RPZ configuration in the
*named* role. It can be activated by providing a list of "response
policy" definitions (e.g. `zone "name"`) in the `named_response_policy`
variable, and defining the corresponding zones in `named_zones`.
The TCP ports 80 and 443 are NAT-forwarded by the gateway to 172.30.0.6.
This address was originally occupied by *rprx0.pyrocufflink.blue*, which
operated a reverse proxy for Bitwarden, Gitea, Jenkins, Nextcloud,
OpenVPN, and the public web sites. Now, *web0.pyrocufflink.blue* is
configured to proxy for those services that it does not directly host
itself, effectively making it a web server, a reverse proxy, and a
forward proxy (for OpenVPN only). Thus, it must take over this address
in order to receive forwarded traffic from the Internet.
This commit updates the configuration for *pyrocufflink.net* to use the
wildcard certificate managed by *lego* instead of an unique certificate
managed by *certbot*.
*chmod777.sh* is a simple static website, generated by Hugo. It is
built and published from a Jenkins pipeline, which runs automatically
when new commits are pushed to Gitea.
The HTTPS certificate for this site is signed by Let's Encrypt and
managed by `lego` in the `certs` submodule.
The *nextcloud* role installs Nextcloud from the specified release
archive, downloading it to the control machine first if necessary, and
configures Apache and PHP-FPM to serve it.
The `nextcloud.yml` playbook uses the *cert* role to install the X.509
certificate for the Nextcloud server, sets up Apache HTTPD with the
*apache* role, and installs Nextcloud using the *nextcloud* role.
The host *cloud0.pyrocufflink.blue* is the Nextcloud server for
Pyrocufflink.
*burp1.pyrocufflink.blue* will replace *burp0.pyrocufflink.blue* as the
BURP server for Pyrocufflink. It is a physical machine (Fitlet), making
it simpler to manage the USB drives. The old virtual machine will be
decommissioned soon.
Using the generic *burp.pyrocufflink.blue* name will allow easier
transition to a new BURP server. However, since this is not the actual
name, it cannot be used for task delegation, so a separate variable is
required to store the real name of the BURP server. This is only used
during client deployment, and not by BURP itself.
In order to allow Jenkins to connect to the Docker daemon socket, the
socket must be owned by the *docker* group, and the *jenkins* user must
be a member of it.
*serial0.pyrocufflink.blue* has a manually-configured IP address now, to
ensure it always has an addresss, even if the DHCP server is
unavailable. Recording it here to ensure the address does not
accidentally get reused.
This commit configures *bw0.pyrocufflink.blue* as a BURP client, so that
the Bitwarden data can be backed up. A pre-backup script is used to
take a consistent snapshot of the SQLite database before copying it to
the BURP server.
*dns1.pyrocufflink.blue* has been decommissioned. Having a second DNS
server never really worked correctly for some reason, and the
maintenance overhead of the Raspberry Pi is just not worth it right now.
The DHCP service has been moved to *dns0.pyrocufflink.blue*.
The Zabbix server resolves *localhost* to `::1`, but Postfix resolves it
to `127.0.0.1`. This causes Postfix to reject incoming mail from Zabbix
with "Relay access denied." Explicitly setting the `mynetworks` setting
to include both the IPv4 and IPv6 loopback addresses will ensure that no
mail is rejected from local processes, regardless of how name resolution
happens.
*hass0.pyrocufflink.blue* is a virtual machine that runs Home Assistant.
It is dual-homed on the *pyrocufflink.blue* network and the isolated IoT
network.
The `smtp_mynetworks` variable expects a list. Setting it to a string
resulted in each character in the string being interpreted as an item in
the list.
This commit sets the `apache_userdir` variable, which enables the
per-user directories feature. This allows users to serve content via
HTTP by placing it in the `public_html` directory within their home
directories.
Although Apache is already installed on the file server in order to
serve the Aria2 web UI, it is not explicitly included in the
`fileserver.yml` playbook.
FirewallD cannot be configured to allow traffic to be routed through the
system without NAT. This makes it unsuitable for running on a VPN
concentrator. Thus, any role that would configure FirewallD needs to be
informed that this machine does not use it.
The Zabbix server also serves an SMTP relay, to minimize reliance on
external services when sending notifications. Since it inherits
configuration of the relay from the general *smtp-relay* group, it ends
up allowing all hosts to relay off of it. To avoid this, we set
`smtp_rmynetworks` at the *zabbix-server* group level to only allow the
local machine to relay messages.
The VPN capability of the UniFi Security Gateway is extremely limited.
It does not support road-warrior IPsec/IKEv2 configuration, and its
OpenVPN configuration is inflexible. As with DHCP, the best solution is
to simply move service to another machine.
To that end, I created a new VM, *vpn0.pyrocufflink.blue*, to host both
strongSwan and OpenVPN. For this to work, the necessary TCP/UDP ports
need to be forwarded, of course, and all of the remote subnets need
static routes on the gateway, specifying this machine as the next hop.
Additionally, ICMP redirects need to be disabled, to prevent confusing
the routing tables of devices on the same subnet as the VPN gateway.
The DHCP server on the UniFi Security Gateway is pretty limited; it
cannot manage static leases (reservations), and does not offer any way
to build dynamic values for e.g. hostname or boot filename. Rather than
give up these features, I decided to just move the DHCP server to one of
the Raspberry Pis; the DNS server made the most sense.
To facilitate this move, I created the *pyrocufflink-dhcp* host group,
and moved the DHCP configuration variables there. Thus, it was a simple
matter of adding *dns1.pyrocufflink.blue* to this group to relocate the
service.
Of course, to serve clients on the other subnets, the gateway needs to
have DHCP relay enabled and pointing to the new server.
This commit updates the list of FireMon networks to include the Caverns
Production (172.16.0.0/24) and Caverns Admin (172.24.16.0/20) networks.
This is necessary to ensure OpenVPN routes are created for these
networks.
The *aria2* role installs the *aria2* download manager and sets it up to
run as a system service with RPC enabled. It also sets up the web UI,
though that must be installed manually from an archive, for now.
In order to support adding a second DNS server, the BIND zone
configuration needs to be partially modularized. While the forwarder
definitions for *pyrocufflink.blue*, etc. will remain the same, the
*pyrocufflink.red* zone will be different, as it will be a slave zone on
the second server. This commit breaks up the definition of the
`named_zones` variable into two parts:
* `pyrocufflink_red_zones`: This is a list of zone objects for
*pyrocufflink.red* and its corresponding reverse zone. On
*dns1.pyrocufflink.blue*, these are master zones. On the new server,
these will be slaves.
* `pyrocufflink_common_zones`: This is a list of zone objects for the
zones that are the same on both servers, since they are all forwarding
zones.
Similarly, the `named_keys` variable only needs to be defined on the
master, since DHCP will only send updates there.
*file0.pyrocufflink.blue* hosts syncthing. Forwarding the transport is
not strictly required, as syncthing can use relays to encapsulate
traffic in HTTPS, but allowing direct access improves performance.
The `burp-client.yml` and `burp-server.yml` playbooks apply the
*burp-client* and *burp-server* roles to BURP clients and servers,
respectively. The server playbook also applies the *postfix* role to
ensure that SMTP is configured and backup notifications can be sent.
Usually, the *samba* role is deployed as a dependency of the *winbind*
role, which explicitly sets `samba_security` to `ads`. The new
*fileserver* role also depends on the *samba* role, but it does NOT sett
that variable. This can cause `smb.conf` to be rewritten with a
different value whenever one or the other role is applied.
Explicitly setting the `samba_security` variable at the group level
ensures that the value is consistent no matter how the *samba* role is
applied. Since all domain member machines need the same value,
regardless of what function they perform, this is safe.
The UniFi controller has been moved to a Raspberry Pi on the Management
network. This machine needs a static address to use in the "inform URL"
it sends to managed devices.
The Management network (VLAN 10, 172.30.0.240/28) will be used for
communication with and configuration of network devices including
switches and access points. This keeps configuration separate from
normal traffic, and allows complete isolation of infrastructure devices.
Converting the *pyrocufflink* group variables definition from a file to
a directory will allow Jenkins jobs to place a Vault-envrypted file
within it that defines the `ansible_become_password` variable. In this
way, a different password can be used for machines that are members of
the *pyrocufflink.blue* domain than for other hosts. The existing
mechanism of specifying the path to the Vault-encrypted file that
defines the variable allows only a single password to be defined, so it
does not work when multiple machines in the same play have different
passwords.
Since Gitea servers may be exposed directly to the Internet, it is
important to prevent SSH tunneling, lest the server become an ingress
point into the network.
Additionally, the *gitea* user should not be allowed to use password
authentication, as this would only work if the user actually has a
password (which it does not) and would result in shell access instead of
Gitea.
This commit adjusts the firewall and networking configuration on dc0 to
host the Pyrocufflink remote access IPsec VPN locally instead of
forwarding it to the internal VPN server.
The host *zbx0.pyrocufflink.blue* (a Raspberry Pi) runs the Zabbix
server and web UI. It has a reserved IPv4 address to simplify reverse
DNS management for now, since Samba's dynamic DNS client does not
register PTR records.
These IPv6 reverse DNS zones are managed by the Samba AD DCs for the
*pyrocufflink.blue* domain. These zones correspond to the IPv6 prefixes
used by the "blue" network.
Connection Tracking does not work for DHCP messages, since many are
broadcast. As such, the firewall must explicitly allow datagrams
destined for the DHCP client port.
For internal services, particularly DNS, it is easier to use a ULA
prefix than rely exclusively on routed addresses, since these can change
relatively frequently.
Instead of listing the addresses for DNS and NTP servers again in the
DHCP server configuration, these are now taken from the canonical
definitions in the `dch_networks` variable.
The *filter* table is responsible for deciding which packets will be
accepted and which will be rejected. It has three chains, which classify
packets according to whether they are destined for the local machine
(input), passing through this machine (forward) or originating from the
local machine (output).
The *dch-gw* role now configures all three chains in this table. For
now, it defines basic rules, mostly based on TCP/UDP destination port:
* Traffic destined for a service hosted by the local machine (DNS, DHCP,
SSH), is allowed if it does not come from the Internet
* Traffic passing through the machine is allowed if:
* It is passing between internal networks
* It is destined for a host on the FireMon network (VPN)
* It was NATed to in internal host (marked 323)
* It is destined for the Internet
* Only DHCP, HTTP, and DNS are allowed to originate from the local
machine
This configuration requires an `internet_iface` variable, which
indicates the name of the network interface connected to the Internet
directly.
The *samba-dc* role now configures `winbindd` on domain controllers to
support identity mapping on the local machine. This will allow domain
users to log into the domain controller itself, e.g. via SSH.
The Fedora packaging of *samba4* still has some warts. Specifically, it
does not have a proper SELinux policy, so some work-arounds need to be
put into place in order for confined processes to communicate with
winbind.
The BIND9_DLZ plugin turned out to be pretty flaky. It craps out
whenever `named` is reloaded, which seems to happen occasionally for
reasons I cannot identify. Combined with the weird SELinux issues, and
the fact that upstream recommends against it anyway, I decided to just
use the built-in DNS server in Samba.