Commit Graph

971 Commits (845911dcbd176b8f8767718b7a12bdf3927c1065)

Author SHA1 Message Date
Dustin 845911dcbd r/nginx: Make logging to files optional
If _nginx_ is configured to send error/access log messages to syslog, it
may not make sense to _also_ send messages to log files as well.  The
`nginx_error_log_file` and `nginx_access_log_file` variables are now
available to control whether/where to send log messages.  Setting either
of these to a falsy value will disable logging to a file.  A non-empty
string value is interpreted as the path to a log file.  By default, the
existing behavior of logging to `/var/log/nginx/error.log` and
`/var/log/nginx/access.log` is preserved.
2024-10-14 12:00:19 -05:00
Dustin a0c5ffc869 postgresql: Collect Wal-G metrics with statsd_exporter
_wal-g_ can send StatsD metrics when it completes an upload/backup/etc.
task.  Using the `statsd_exporter`, we can capture these metrics and
make them available to Victoria Metrics.
2024-10-13 20:01:19 -05:00
Dustin 87b9014721 r/statsd-exporter: Deploy statsd exporter
The *statsd exporter* is a Prometheus exporter that converts statistics
from StatsD format into Prometheus metrics.  It is generally useful as a
bridge between processes that emit event-based statistics, turning them
into Prometheus counters and gauges.
2024-10-13 19:59:52 -05:00
Dustin a22c8aa0d2 r/nextcloud: Configure trashbin retention
Setting the `trashbin_retention_obligation` setting to `auto, 30` should
supposedly delete files in users' trash bins after 30 days.
2024-10-13 18:38:12 -05:00
Dustin 265aa074aa r/nextcloud: Configure Memories app
The [Memories] app for Nextcloud provides a better user interface and
more features than the built-in Photos app.  The latter seems to be
somewhat broken recently (timeline stops in June 2024, even though there
are more recent photos available), so we're trying out Memories (and
Recognize for facial recognition).

[Memories]: https://memories.gallery
2024-10-13 18:36:25 -05:00
Dustin 5ab0bcd5bf r/nextcloud: Update rewrite config for .mjs files
Nextcloud 28+ uses JavaScript modules (`.mjs` files).  These need to be
served from the filesystem like other static files, so the *mod_rewrite*
configuration needs to be updated as such.
2024-10-13 18:35:01 -05:00
Dustin 221d3a2be9 vm-hosts: Scrape libvirt logs with Promtail
Collecting logs from VM serial consoles and QEMU monitor.
2024-10-13 18:33:25 -05:00
Dustin 1e6ab546bc r/vmhost: Create directory for console logs
Need a directory where _libvirt_ can write logs from VM serial console
output.
2024-10-13 18:30:04 -05:00
Dustin 75a146e19e newvm: Configure serial console log file
When a VM uses a serial port for its default console, kernel messages
(e.g. panics) are lost if no console client is connected at the time.
This is a major disadvantage when compared to a graphical console, which
usually at least keeps a "screenshot" of the console when the kernel
crashes.

While researching the available console device types to determine how
best to implement a tool that would both log the output from the serial
console at all times, while still allowing interactive connections to
it, I discovered that _libvirt_ actually already has this exact
functionality built-in:

https://libvirt.org/formatdomain.html#consoles-serial-parallel-channel-devices
2024-10-13 18:12:46 -05:00
Dustin 9bea8e1ce7 nextcloud: Scrape logs with Promtail
Nextcloud writes JSON-structured logs to
`/var/lib/nextcloud/data/nextcloud.log`.  These logs contain errors,
etc. from the Nextcloud server, which are useful for troubleshooting.
Having them in Loki will allow us to view them in Grafan as well as
generate alerts for certain events.
2024-10-13 18:05:50 -05:00
Dustin ceaef3f816 hosts: Decommission burp1.p.b
Everything has finally been moved to Chromie.
2024-10-13 17:52:48 -05:00
Dustin 808a912630 websites: Remove proxy roles
Reverse proxy for web sites and applications accessible to the Internet
is now handled by HAProxy.
2024-10-13 12:54:50 -05:00
Dustin 5ced24f2be hosts: Decommission matrix0.p.b
The Synapse server hasn't been working for a while, but we don't use it
for anything any more anyway.
2024-10-13 12:53:49 -05:00
Dustin 219fe75424 r/nginx: logrotate: do not delay compressing
_nginx_ access logs are typically either very small or very large.  For
small log files, it's fast enough to decompress them on the fly if
necessary.  For large files, they may take up so much space in
uncompressed form that the log volume fills too quickly.  In either
case, compressing the files as soon as they are rotated is a good
option, especially since their contents should already be sent to Loki.
2024-09-30 12:43:25 -05:00
Dustin dfdddd551f minio-backups: Keep nginx logs for 3 days
_WAL-G_ and _restic_ both generate a lot of HTTP traffic, which fills up
the log volume pretty quickly.  Let's reduce the number of days logs are
kept on the file system.  Logs are shipped to Loki anyway, so there's
not much need to have them local very long.
2024-09-29 11:21:24 -05:00
Dustin 829c04332d r/nginx: Configure logrotate
The default `logrotate` configuration for _nginx_ may not be appropriate
for high-volume servers.  The `nginx_keep_num_logs` variable is now
available to control how many days of logs are kept.
2024-09-29 11:20:29 -05:00
Dustin 0353360360 dch-proxy: Allow Internet access to IN
Invoice Ninja needs to be accessible from the Internet in order to
receive webhooks from Stripe.  Additionally, Apple Pay requires
contacting Invoice Ninja for domain verification.
2024-09-10 12:01:00 -05:00
Dustin 9e610eaf11 r/minio-backups-cert: Enable/start cerbot timer
Forgot to ensure the _certbot-renew.timer_ unit was enabled and started,
so the MinIO certificate did not get renewed the first time.
2024-09-08 09:15:36 -05:00
Dustin 621f82c88d hosts: Migrate remaining hosts to Restic
Gitea and Vaultwarden both have SQLite databases.  We'll need to add
some logic to ensure these are in a consistent state before beginning
the backup.  Fortunately, neither of them are very busy databases, so
the likelihood of an issue is pretty low.  It's definitely more
important to get backups going again sooner, and we can deal with that
later.
2024-09-07 20:45:24 -05:00
Dustin 7d93ba836e r/restic: Enhance restic-backup security sandbox
Since `restic` needs to run as root in order to back up files regardless
of their permissions, we need to restrict it to doing only that.  Using
systemd sandbox features, especially the capability bounding set, we can
remove all of _root_'s powers except the ability to read all files.
2024-09-04 17:43:24 -05:00
Dustin c2c283c431 nextcloud: Back up Nextcloud with Restic
Now that the database is hosted externally, we don't have to worry about
backing it up specifically.  Restic only backs up the data on the
filesystem.
2024-09-04 17:41:42 -05:00
Dustin 0f4dea9007 restic: Add role+playbook for Restic backups
The `restic.yml` playbook applies the _restic_ role to hosts in the
_restic_ group.  The _restic_ role installs `restic` and creates a
systemd timer and service unit to run `restic backup` every day.

Restic doesn't really have a configuration file; all its settings are
controlled either by environment variables or command-line options. Some
options, such as the list of files to include in or exclude from
backups, take paths to files containing the values.  We can make use of
these to provide some configurability via Ansible variables.  The
`restic_env` variable is a map of environment variables and values to
set for `restic`.  The `restic_include` and `restic_exclude` variables
are lists of paths/patterns to include and exclude, respectively.
Finally, the `restic_password` variable contains the password to decrypt
the repository contents.  The password is written to a file and exposed
to the _restic-backup.service_ unit using [systemd credentials][0].

When using S3 or a compatible service for respository storage, Restic of
course needs authentication credentials.  These can be set using the
`restic_aws_credentials` variable.  If this variable is defined, it
should be a map containing the`aws_access_key_id` and
`aws_secret_access_key` keys, which will be written to an AWS shared
credentials file.  This file is then exposed to the
_restic-backup.service_ unit using [systemd credentials][0].

[0]: https://systemd.io/CREDENTIALS/
2024-09-04 09:40:29 -05:00
Dustin 708bcbc87e Merge remote-tracking branch 'refs/remotes/origin/master' 2024-09-03 17:18:18 -05:00
Dustin dce7908a94 chromie: Set MinIO root password 2024-09-02 21:24:59 -05:00
Dustin 72936b3868 postgresql: Allow access by IPv6
Since LAN clients have IPv6 addresses now, some may try to connect to
the database over IPv6, so we need to allow this in the host-based
authentication rules.
2024-09-02 21:20:26 -05:00
Dustin 6f9cd7e4af r/postgres-exporter: Do not connect to tempate1 DB
It turns out, having the exporter connect to the _template1_ database is
not a great idea.  PostgreSQL does not allow creating a new database if
the template database is currently being accessed by any clients.  Since
_template1_ is the default choice, the `createdb` command will probably
fail.

It doesn't specifically matter which database the exporter connects to,
since it reads most (all?) of its data from the PostgreSQL catalog,
which isn't database-specific.
2024-09-02 21:15:23 -05:00
Dustin a0378feda8 nextcloud: Move database to db0
Moving the Nextcloud database to the central PostgreSQL server will
allow it to take advantage of the monitoring and backups in place there.
For backups specifically, this will make it easier to switch from BURP
to Restic, since now only the contents of the filesystem need backed up.

The PostgreSQL server on _db0_ requires certificate authentication for
all clients.  The certificate for Nextcloud is stored in a Secret in
Kubernetes, so we need to use the _nextcloud-db-cert_ role to install
the script to fetch it.  Nextcloud configuration doesn't expose the
parameters for selecting the certificate and private key files, but
fortunately, they can be encoded in the value provided to the `host`
parameter, though it makes for a rather cumbersome value.
2024-09-02 21:03:33 -05:00
Dustin 22dbc3ebc1 r/nextcloud-db-cert: Fetch client cert from k8s
Currently, the certificate authority that issues certificates for
PostgreSQL clients is hosted in Kubernetes and managed by
_cert-manager_.  Certificates it issues are stored in Kubernetes Secret
resources, making them easy to consume by applications running in the
cluster, but not for anything outside.  Since Nextcloud runs on its own
VM, we need a way to get the certificate out of the Secret and into a
file on that machine.  To that end, I've written the
`nextcloud-fetch-cert.py` script.  This script uses a Kubernetes Service
Account token to authenticate to the Kubernetes API and download the
contents of the Secret.  It runs periodically, triggered by a systemd
timer unit, to ensure the certificate is always up-to-date.

The obvious drawback to this approach is the requirement for a static
token.  Since there's not really a way to "renew" Service Account
tokens, it needs to be issued with a fairly long duration, to mitigate
the risk of being unable to fetch a new certificate once it has expired
because the token has also expired.  This somewhat negates the advantage
of using certificates for authentication, since now the machine needs a
static, pre-defined secret.

At some point, I may deploy another instance of _step-ca_ to manage the
PostgreSQL client CA.  Clients can then use e.g. `certbot` or `step ca
certificate` to obtain their certificates.  I chose not to implement
this yet, though for a couple of reasons.  First, I need to move the
Nextcloud database very soon, so we switch to using `restic` for backups
without having to deal with the database.  Second, I am still
considering moving Nextcloud into Kubernetes eventually, where it will
be able to get the Secret directly; since Nextcloud is the only client
outside the cluster, it may not be worth setting up _step-ca_ in that
case.
2024-09-02 20:35:32 -05:00
Dustin 924107abbe nextcloud: Support remote database server
The _nextcloud_ role originally handled setting up the PostgreSQL
database and assumed that it was running on the same server as Nextcloud
itself.  I have factored out those tasks into their own role,
_nextcloud-db_, which can be applied to a separate host.

I have also introduced some new variables (`nextcloud_db_host`,
`nextcloud_db_name`, `nextcloud_db_user`, and `nextcloud_db_password`),
which can be used to specify how to connect to the database, if it is
hosted remotely.  Since these variables are used by both the _nextcloud_
and _nextcloud-db_ roles, they are actually defined in a separate role,
_nextcloud-base_, upon which both depend.
2024-09-02 20:29:51 -05:00
Dustin d3a09a2e88 hosts: Add chromie, nvr2 to nut-monitor group
Deploy `nut-monitor` on these physical machines so they will shut down
safely in the event of a power outage.
2024-09-01 18:52:33 -05:00
Dustin 226232414f r/jellyfin: Fix HAProxy vhost
Without including the settings from `ssl.include`, the virtual host
bound to port 8443 expects to handle plain HTTP traffic, rather than
HTTPS.
2024-09-01 17:33:22 -05:00
Dustin e4766e54ac r/dch-proxy: Use separate sockets for IPv4/IPv6
When HAProxy binds to the IPv6 socket, it can handle both IPv6 and IPv4
clients.  IPv4 clients are handled as IPv4-mapped IPv6 addresses, which
some backends (i.e. Apache) cannot support.  To avoid this, we configure
HAProxy to bind to the IPv4 and IPv6 sockets separately, so that IPv4
addresses are handled as IPv4 addresses.
2024-09-01 12:43:22 -05:00
Dustin 7f599e9058 dch-proxy: Proxy Jellyfin
Allow access to Jellyfin from the Internet via the reverse proxy.  The
Jellyfin backend server has a separate port that supports the PROXY
protocol.
2024-09-01 12:42:07 -05:00
Dustin 921a12cf1f r/jellyfin: Add virtual host for HAProxy
Expose a virtual host on a separate TCP port that uses the PROXY
protocol.  This way, HAProxy can pass the original client IP address to
Jellyfin without terminating the TLS connection.
2024-09-01 12:40:20 -05:00
Dustin 2864a4185c r/jellyfin: Mount LDAP CA certificate in container
In order to enable authentication using LDAP over TLS in Jellyfin, we
need to expose the CA certificate that issues the LDAP server
certificates to the container.
2024-09-01 12:39:14 -05:00
Dustin db74e9ac3f btop: Install btop and run it on the console
`btop` is so much better than `top`.  It makes a really nice status
indicator for machine health, so I like running it on tty1.
2024-09-01 09:24:53 -05:00
Dustin e323324c54 postgresql: Switch wal-g to use new MinIO server
Switching to the MinIO server on _chromie.pyrocufflink.blue_ as
_burp1.pyrocufflink.blue_ is being decommissioned.
2024-09-01 09:01:04 -05:00
Dustin fbf587414a hosts: Add chromie.p.b
*chromie.pyrocufflink.blue* will replace *burp1.pyrocufflink.blue* as
the backup server.  It is running on the hardware that was originally
*nvr1.pyrocufflink.blue*: a 1U Jetway server with an Intel Celeron N3160
CPU and 4 GB of RAM.
2024-09-01 09:01:04 -05:00
Dustin 459d58bfb6 raid-array: Add PB to create md arrays
The `raid-array.yml` playbook can create Linux *md* software RAID arrays
using the `mdadm` command.  Two variables are required: `md_name` and
`raid_disks`.  The former is a string name for the array.  The latter is
an array of paths of block devices to add to the array.
2024-09-01 08:59:28 -05:00
Dustin b6cc83ad82 datavol: Support creating btrfs subvolumes
Set the `btrfs_subvolumes` variable to an array of objects with `name`
and `device` properties to create btrfs subvolumes.
2024-09-01 08:59:28 -05:00
Dustin 9d60ae1a61 minio-backups: Deploy MinIO for backups
This playbook uses the *minio-nginx* and *minio-backups-cert* role to
deploy MinIO with nginx.

The S3 API server is *s3.backups.pyrocufflink.blue*, and buckets can be
accessed as subdomains of this name.

The Admin Console is *minio.backups.pyrocufflink.blue*.

Certificates are issued by DCH CA via ACME using `certbot`.
2024-09-01 08:59:28 -05:00
Dustin 77ce7aa5e7 r/minio-backups-cert: Certbot for MinIO+nginx
The MinIO server for backups has special requirements for HTTPS.  I want
to use subdomains for bucket names, so the certificate must have a
wildcard name, which requires using the DNS-01 challenge.  Fortunately,
it is actually pretty easy to use `nsupdate` with GSS-TSIG
authentication to automate DNS record creation, and by default, all
domain-member machines can create any records.  Thus, using the `manual`
auth plugin for `certbot` and a script to run `nsupdate`, obtaining the
wildcard certificate is fairly straightforward.

The biggest issue I encountered while developing this feature was
caching of NXDOMAIN responses.  There doesn't seem to be a way to change
the TTL of the SOA record of the Active Directory DNS domain, which
defaults to 3600, meaning NXDOMAIN responses are always cached for an
hour.  When adding a record using `nsupdate -g`, the tool always
performs a SOA lookup of new name to find the target zone for it.  Since
the name does not exist yet, the domain controller responds with
NXDOMAIN, which gets cached by the main DNS server.  Thus, even after
adding the record, the ACME server will not be able to resolve the
name for up to an hour.  We can a void this by explicitly setting the
target zone.  That would not work in a multi-domain forest, but
fortunately, we do not have to worry about that.

This role borrows some logic from the *postgresql-cert* role.
Eventually, I probably want to combine some of the steps from both of
these roles, possibly replacing the old *certbot* role.
2024-09-01 08:59:28 -05:00
Dustin 7854a729b7 r/minio: Add option to disable firewall rules
If MinIO is behind a reverse proxy, we do not want to expose it directly
to the network.
2024-09-01 08:59:28 -05:00
Dustin 3c907d0a16 r/minio-nginx: Reverse proxy for MinIO
The *minio-nginx* role configures nginx to proxy for MinIO.  It uses the
"subdomain" pattern, as described in [Configure NGINX Proxy for MinIO
Server][0]; the S3 API and the console UI are accessible through
different domain names.

[0]: https://min.io/docs/minio/linux/integrations/setup-nginx-proxy-with-minio.html
2024-09-01 08:59:28 -05:00
Dustin 7ec7cad26a r/minio: Update container unit for Podman 5
Modern versions of Podman use Netavark, which needs to write various
files on the host file system (even when the container uses the
host's network namespace).
2024-09-01 08:59:28 -05:00
Dustin 623f652e0d r/minio: Add additional configuration options
If the `minio_address` variable is specified, it will be passed with the
`--address` argument to `minio server`.  This allows controlling the
socket the server binds to and listens on.

The `minio_browser_redirect_url` can be specified to populate the
similarly-named environment variable, which configures how MinIO serves
the web UI.

The `minio_domain` variable sets the `MINIO_DOMAIN` environment
variable, which enables DNS names (subdomains) for buckets, i.e.
`{bucket_name}.{MINIO_DOMAIN}`.
2024-09-01 08:59:28 -05:00
Dustin 2e37fce4f6 r/wal-g-pg: Run wal-g backup as postgres
`wal-g` needs to connect to the PostgreSQL database system, so it should
run as the _postgres_ user, who has permission to connect, rather than
_root_, who does not.
2024-08-30 09:44:43 -05:00
Dustin ab5da58175 r/frigate: Add Frigate RTSP port to firewall
Home Assistant streams camera videos via RTSP now.
2024-08-28 09:50:36 -05:00
Dustin 3511176c31 r/gitea: Configure SMTP mailer
Gitea needs SMTP configuration in order to send e-mail notifications
about e.g. pull requests.  The `gitea_smtp` variable can be defined to
enable this feature.
2024-08-25 08:46:37 -05:00
Dustin 1ab0dd3457 r/gitea: Set WORK_DIR in config
Gitea complains if the `WORK_DIR` setting is not set.  It tries to set
it itself, but fails because the configuration is read-only.  The value
it uses is incorrect anyway (`/usr/local/bin`, since that's where the
`gitea` executable is).
2024-08-25 08:45:29 -05:00