The *dch-storage-net* role configures a machine to connect the storage
network and mount shared folders from the storage appliance.
The `wait-global-address.sh` script and corresponding
*wait-global-address@.service* systemd unit template are necessary to
ensure that the storage network is actually available before attempting
to mount the shared volumes. This is particularly important at boot,
since `dhcpcd` does not implement any kind of signaling that can be used
by *network-online.target*, so the network is considered "online" as
soon as the `dhcpcd` process has started. This typically results in
"network unreachable" errors.
The *net-ifaces* role manages a script that creates virtual network
interfaces, such as bridge, bond, and VLAN, that `dhcpcd`/`dhclient`
alone cannot. This provides a lightweight alternative to
*systemd-networkd* and *NetworkMangager*.
Though the default for the `fqdn` value is listed as `both` in
*dhcpcd.conf(5)*, the current behavior of `dhcpcd` suggests that it may
actually be `none`. Without explicitly setting `fqdn both`, the value of
the kernel node name is sent as-is in the *hostname* option (12). If the
node name is set to the FQDN, then dynamic DNS gets broken, since the
DHCP server always appends its domain name to the provided hostname.
Setting `fqdn both` causes `dhcpcd` to send the FQDN in the *FQDN*
option (81), which the DHCP server interprets correctly.
Using a list to specify the values for the `allowinterfaces` and
`denyinterfaces` parameters in `dhcpcd.conf` makes the configuration
policy cleaner and more type-safe.
Today I realized that `dhcpcd` has been logging several hundred thousand
of these messages every second:
libudev: received NULL device
This was causing both `dhcpcd` and `systemd-journald` to consume 100%
CPU.
I am not entirely sure what a "device management" module is in the
context of `dhcpcd`, but it does not seem to be required. Setting the
`nodev` option in `dhcpcd.conf` suppresses the messages, and seems to
have no effect on the operation of the daemon.
Traffic from the management network is not allowed except for specific
services. NTP is required of course, for time synchronization with the
pyrocufflink.blue domain controllers. RADIUS is necessary for WiFi
authentication, which is also handled by the DCs.
The UniFi controller has been moved to a Raspberry Pi on the Management
network. This machine needs a static address to use in the "inform URL"
it sends to managed devices.
The Management network (VLAN 10, 172.30.0.240/28) will be used for
communication with and configuration of network devices including
switches and access points. This keeps configuration separate from
normal traffic, and allows complete isolation of infrastructure devices.
The `ifconfig` global directive specifies the IP address added to the
tunnel interface device, not the network. The `push route` directives
need to include this address to correctly send route information to
clients.
The *dch-openvpn-server* role installs and configures OpenVPN and
stunnel to provide both native OpenVPN service as well as
OpenVPN-over-TLS. The latter uses stunnel, listening on TCP port 9876,
to allow better firewall traversal and TCP port sharing via reverse
proxy.
The `apache_server_tokens` variable can now be set, which controls the
value of the `ServerTokens` directive. If the variable is set, the
`ServerTokens` directive will be added to the `00-servername.conf` file.
The `samba_interfaces` variable can now be defined to populate the
`interfaces` global configuration parameter in `smb.conf`. This
parameter controls the interfaces or addresses to which the Samba server
binds, and also the IP addresses that are registered in DNS.
The *certbot* role now supports copying the data for an existing Let's
Encrypt account to the managed node using an archive. If an archive
named for the inventory hostname (typically the FQDN) of the managed
node is found in the `accounts` directory under the `files` directory of
the *certbot* role, it will be copied to the managed node and extracted
at `/var/lib/letsencrypt/accounts`. This takes the place of running
`certbot register` to sign up for a new account.
The *install* tag is applied to any task that installs a package.
The *user* tag is applied to any task that creates an OS user or group.
The *group* tag is applied to any task that creates an OS user group.
Since the host *gw0* is not a member of the *pyrocufflink.blue* domain,
GSSAPI authentication does not work. As such, the SSH private key has to
be made available to the `ansible-playbook` process for authentication
to that host.
The `zabbix.yml` playbook applies to hosts that are not members of the
*pyrocufflink.blue* domain, and thus have different passwords for
`sudo`. Using the `-e` argument to `ansible-playbook` and specifying a
single Vault-encrypted file that defines the `ansible_become_password`
variable effectively forces Ansible to try to use that password on every
host. This is because variables defined on the command line, or read
from a file specified on the command line, have the highest precedence.
To use different passwords on different hosts, the normal variable
scoping rules have to be used. To that end, one `sudo-pass` file is
created in the `group_vars/pyrocufflink` directory, so it will apply to
all machines that are members of the *pyrocufflink.blue* domain.
Additionally, another `sudo-pass` file is created in the `host_vars/gw0`
directory; it will only apply to the gateway device.
Converting the *pyrocufflink* group variables definition from a file to
a directory will allow Jenkins jobs to place a Vault-envrypted file
within it that defines the `ansible_become_password` variable. In this
way, a different password can be used for machines that are members of
the *pyrocufflink.blue* domain than for other hosts. The existing
mechanism of specifying the path to the Vault-encrypted file that
defines the variable allows only a single password to be defined, so it
does not work when multiple machines in the same play have different
passwords.
The gateway device is now monitored by Zabbix. Adding it to the *zabbix*
group ensures that the Zabbix agent is installed and configured
correctly.
Because the *zabbix-agent* role has a task to configure FirewallD, the
`host_uses_firewalld` variable needs to be set to `false` for *gw0*,
since it does not use FirewallD.
For machines that do not use firewalld, the *zabbix-agent* role will now
skip attempting to open the Zabbix agent port using the `firewalld`
module. The `host_uses_firewalld` variable controls this behavior.
The `gitea_root_url` variable is used to configure the root URL for
Gitea, which is in turn used to generate HTTP/HTTPS "clone" links for
Git repositories. If this value is not set, the default is used, which
does not work since the application is behind a reverse proxy.
The *certbot* role installs and configures the `certbot` ACME client. It
adjusts the default configuration to allow the tool to run as an
unprivileged user, and then configures Apache to work with the *webroot*
plugin. It registers for an account and requests a certificate for the
domains specified by the `certbot_domains` Ansible variable. Finally, it
enables the *certbot-renew.timer* systemd unit to schedule automatic
renewal of all Let's Encrypt certificates.