infra/cfg - cfg - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Dustin	45c35c065a	promtail: Deploy Loki Promtail Agent [Promtail][0] is the log collection agent for Grafana Loki. It reads logs from various locations, including local files and the _systemd_ journal and sends them to Loki via HTTP. Loki configuration is a highly-structured YAML document. Thus, instead of using Tera template syntax for loops, conditionals, etc., we can use the full power of CUE to construct the configuration. Using the `Marshal` function from the built-in `encoding/yaml` package, we serialize the final configuration structure as a string and write it verbatim to the configuration file. I have modeled most of the Promtail configuration schema in the `du5t1n.me/cfg/app/promtail/schema` package. Having the schema modeled will ensure the generated configuration is valid during development (i.e. `cue export` will fail if it is not), which will save time pushing changes to machines and having Loki complain. The `#promtail` "function" in `du5t1n.me/cfg/env/prod` makes it easy to build our desired configuration. It accepts an optional `#scrape` field, which can be used to provide specific log scraping definitions. If it is unspecified, the default configuration is to scrape the systemd journal. Hosts with additional needs can supply their own list, probably including the `promtail.scrape.journal` object in it to get the default journal scrape job. [0]: https://grafana.com/docs/loki/latest/send-data/promtail/	2024-02-18 11:35:13 -06:00
Dustin	29afcae52e	fetchcert: Deploy tool to get cert from k8s Secret The `fetchcert` tool is a short shell script that fetches an X.509 certificate and corresponding private key from a Kubernetes Secret, using the Kubernetes API. I originally wrote it for the Frigate server so it could fetch the _pyrocufflink.blue_ wildcard certificate, which is managed by _cert-manager_. Since then, I have adapted it to be more generic, so it will be useful to fetch the _loki.pyrocufflink.blue_ certificate for Grafana Loki. Although the script is rather simple, it does have several required configuration parameters. It needs to know the URL of the Kubernetes API server and have the certificate for the CA that signs the server certificate, as well as an authorization token. It also needs to know the namespace and name of the Secret from which it will fetch the certificate and private key. Finally, needs to know the paths to the files where the fetched data will be written. Generally, after certificates are updated, some action needs to be performed in order to make use of them. This typically involves restarting or reloading a daemon. Since the `fetchcert` tool runs in a container, it can't directly perform those actions, so it simply indicates via a special exit code that the certificate has been updated and some further action may be needed. The `/etc/fetchcert/postupdate.sh` script is executed by _systemd_ after `fetchcert` finishes. If the `EXIT_STATUS` environment variable (which is set by _systemd_ to the return code of the main service process) matches the expected code, the configured post-update actions will be executed.	2024-02-18 10:48:01 -06:00
Dustin	ffe450cd30	loki: Run Grafana Loki in a container Deploying Loki is pretty straightforward. It just needs a container unit file and a basic YAML configuration file.	2024-02-13 19:54:48 -06:00
Dustin	b7f5d4a910	app/ssh: Configure sshd trusted user CA keys Configuring the system-wide trusted user CA key list for sshd(8).	2024-02-03 11:16:52 -06:00
Dustin	f886a1bd8a	sudo: Configure pam_ssh_agent_auth I do not like how Fedora CoreOS configures `sudo` to allow the core user to run privileged processes without authentication. Rather than assign the user a password, which would then have to be stored somewhere, we'll install pam_ssh_agent_auth and configure `sudo` to use it for authentication. This way, only users with the private key corresponding to one of the configured public keys can run `sudo`. Naturally, pam_ssh_agent_auth has to be installed on the host system. We achieve this by executing `rpm-ostree` via `nsenter` to escape the container. Once it is installed, we configure the PAM stack for `sudo` to use it and populate the authorized keys database. We also need to configure `sudo` to keep the `SSH_AUTH_SOCK` environment variable, so pam_ssh_agent_auth knows where to look for the private keys. Finally, we disable the default NOPASSWD rule for `sudo`, if and only if the new configuration was installed.	2024-01-29 09:10:42 -06:00
Dustin	bb3705939e	nut: Fix upsmon reload hook `upsmon.conf` is used by nut-monitor (`upsmon`) rather than nut-server (`upsd`).	2024-01-19 18:01:42 -06:00
Dustin	caccffcb65	nut: split out template for sysusers.d config Hosts that run `upsmon` but not `upsd` still need the nut user.	2024-01-19 17:21:23 -06:00
Dustin	fb74f0e81c	nut: Configure upsmon `upsmon` is the component of NUT that tracks the status of UPSs and reacts to their changing by sending notifications and/or shutting down the system. It is a networked application that can run on any system; it can run on a different system than `upsd`, and indeed can run on multiple systems simultaneously. Each system that runs `upsmon` will need a username and password for each UPS it will monitor. Using the CUE [function pattern][0], I've made it pretty simple to declare the necessary values under `nut.monitor`. [0]: https://cuetorials.com/patterns/functions/	2024-01-19 08:52:14 -06:00
Dustin	51aaccc861	collectd: Deploy collectd in a container I keep going back-and-forth on whether or not collectd should run in a container on Fedora CoreOS machines. On the one hand, running it directly on the host allows it to monitor filesystem usage by mount point, which is consistent with how non-FCOS machines are monitored. On the other hand, installing packages on FCOS with `rpm-ostree` is a nightmare. It's _incredibly_ slow. There's also occasionally issues installing packages if the base layer has not been updated in a while and the new packages require an existing package to be updated. For the NUT server specifically, I have changed my mind again: the collectd-nut package depends on nut-client, which in turn depends on Python. I definitely want to avoid installing Python on the host, but I do not want to lose the ability to monitor the UPSs via collectd. Using a container, I can strip out the unnecessary bits of nut-client and avoid installing Python at all. I think that's worth having to monitor filesystem usage by device instead of by mount point.	2024-01-17 17:35:21 -06:00
Dustin	41e9fa85d2	Restructure CUE packages A bunch of stuff that wasn't schema definitions ended up in the `schema` package. Rather than split values up in a bunch of top-level packages, I think it would be better to have a package-per-app model.	2024-01-17 17:35:18 -06:00
Dustin	11f9957c11	Switch from KCL to CUE Although KCL is unquestionably a more powerful language, and maps more closely to my mental model of how host/environment/application configuration is defined, the fact that it doesn't work on ARM (issue 982]) makes it a non-starter. It's also quite slow (owing to how it compiles a program to evaluate the code) and cumbersome to distribute. Fortunately, `tmpl` doesn't care how the values it uses were computed, so we freely change configuration languages, so long as whatever we use generates JSON/YAML. CUE is probably a lot more popular than KCL, and is quite a bit simpler. It's more restrictive (values cannot be overridden once defined), but still expressive enough for what I am trying to do (so far).	2024-01-15 11:40:58 -06:00
Dustin	778c6d440d	Initial commit	2024-01-14 19:24:55 -06:00

12 Commits (5e10f2c1e7f731e22ef8ae1682da458103a62f0d)