--- title: Prometheus and Aegis date: 2021-04-05 tags: - prometheus - o11y --- [*Last time in the christine dot website cinematic universe:*](https://xeiaso.net/blog/unix-domain-sockets-2021-04-01) *Unix sockets started to be used to grace the cluster. Things were at peace. Then, a realization came through:* [What about Prometheus? Doesn't it need a direct line of fire to the service to scrape metrics?](conversation://Mara/hmm?smol) *This could not do! Without observability the people of the Discord wouldn't have a livefeed of the infrastructure falling over! This cannot stand! Look, our hero takes action!* [It will soon!](conversation://Cadey/percussive-maintenance?smol) In order to help keep an eye on all of the services I run, I use [Prometheus](https://prometheus.io/) for collecting metrics. For an example of the kind of metrics I collect, see [here (1)](/metrics). In the configuration that I have, Prometheus runs on a server in my apartment and reaches out to my other machines to scrape metrics over the network. This worked great when I had my major services listen over TCP, I could just point Prometheus at the backend port over my tunnel. When I started using Unix sockets for hosting my services, this stopped working. It became very clear very quickly that I needed some kind of shim. This shim needed to do the following things: - Listen over the network as a HTTP server - Connect to the unix sockets for relevant services based on the path (eg. `/xesite` should get the metrics from `/srv/within/run/xesite.sock`) - Do nothing else The Go standard library has a tool for doing reverse proxying in the standard library: [`net/http/httputil#ReverseProxy`](https://pkg.go.dev/net/http/httputil#ReverseProxy). Maybe we could build something with this? [The documentation seems to imply it will use the network by default. Wait, what's this `Transport` field?](conversation://Mara/hmm?smol) ```go type ReverseProxy struct { // ... // The transport used to perform proxy requests. // If nil, http.DefaultTransport is used. Transport http.RoundTripper // ... } ``` [So a transport is a <a href="https://pkg.go.dev/net/http#RoundTripper">`RoundTripper`</a>, which is a function that takes a request and returns a response somehow. It uses `http.DefaultTransport` by default, which reads from the network. So at a minimum we're gonna need: <ul><li>a `ReverseProxy`</li><li>a `Transport`</li><li>a dialing function</li><ul>Right?](conversation://Mara/hmm?smol) Yep! Unix sockets can be used like normal sockets, so all you need is something like this: ```go func proxyToUnixSocket(w http.ResponseWriter, r *http.Request) { name := path.Base(r.URL.Path) fname := filepath.Join(*sockdir, name+".sock") _, err := os.Stat(fname) if os.IsNotExist(err) { http.NotFound(w, r) return } ts := &http.Transport{ Dial: func(_, _ string) (net.Conn, error) { return net.Dial("unix", fname) }, DisableKeepAlives: true, } rp := httputil.ReverseProxy{ Director: func(req *http.Request) { req.URL.Scheme = "http" req.URL.Host = "aegis" req.URL.Path = "/metrics" req.URL.RawPath = "/metrics" }, Transport: ts, } rp.ServeHTTP(w, r) } ``` [So in this handler:](conversation://Mara/hmm?smol) ```go name := path.Base(r.URL.Path) fname := filepath.Join(*sockdir, name+".sock") _, err := os.Stat(fname) if os.IsNotExist(err) { http.NotFound(w, r) return } ts := &http.Transport{ Dial: func(_, _ string) (net.Conn, error) { return net.Dial("unix", fname) }, DisableKeepAlives: true, } ``` [You have the socket path built from the URL path, and then you return connections to that path ignoring what the HTTP stack thinks it should point to?](conversation://Mara/hmm?smol) Yep. Then the rest is really just boilerplate: ```go package main import ( "flag" "log" "net" "net/http" "net/http/httputil" "os" "path" "path/filepath" ) var ( hostport = flag.String("hostport", "[::]:31337", "TCP host:port to listen on") sockdir = flag.String("sockdir", "./run", "directory full of unix sockets to monitor") ) func main() { flag.Parse() log.SetFlags(0) log.Printf("%s -> %s", *hostport, *sockdir) http.DefaultServeMux.HandleFunc("/", proxyToUnixSocket) log.Fatal(http.ListenAndServe(*hostport, nil)) } ``` Now all that's needed is to build a NixOS service out of this: ```nix { config, lib, pkgs, ... }: let cfg = config.within.services.aegis; in with lib; { # Mara\ this describes all of the configuration options for Aegis. options.within.services.aegis = { enable = mkEnableOption "Activates Aegis (unix socket prometheus proxy)"; # Mara\ This is the IPv6 host:port that the service should listen on. # It's IPv6 because this is $CURRENT_YEAR. hostport = mkOption { type = types.str; default = "[::1]:31337"; description = "The host:port that aegis should listen for traffic on"; }; # Mara\ This is the folder full of unix sockets. In the previous post we # mentioned that the sockets should go somewhere like /tmp, however this # may be a poor life decision: # https://lobste.rs/s/fqqsct/unix_domain_sockets_for_serving_http#c_g4ljpf sockdir = mkOption { type = types.str; default = "/srv/within/run"; example = "/srv/within/run"; description = "The folder that aegis will read from"; }; }; # Mara\ The configuration that will arise from this module if it's enabled config = mkIf cfg.enable { # Mara\ Aegis has its own user account to keep things tidy. It doesn't need # root to run so we don't give it root. users.users.aegis = { createHome = true; description = "tulpa.dev/cadey/aegis"; isSystemUser = true; group = "within"; home = "/srv/within/aegis"; }; # Mara\ The systemd service that actually runs Aegis. systemd.services.aegis = { wantedBy = [ "multi-user.target" ]; # Mara\ These correlate to the [Service] block in the systemd unit. serviceConfig = { User = "aegis"; Group = "within"; Restart = "on-failure"; WorkingDirectory = "/srv/within/aegis"; RestartSec = "30s"; }; # Mara\ When the service starts up, run this script. script = let aegis = pkgs.tulpa.dev.cadey.aegis; in '' exec ${aegis}/bin/aegis -sockdir="${cfg.sockdir}" -hostport="${cfg.hostport}" ''; }; }; } ``` [Then I just flicked it on for a server of mine:](conversation://Cadey/enby?smol) ```nix within.services.aegis = { enable = true; hostport = "[fda2:d982:1da2:180d:b7a4:9c5c:989b:ba02]:43705"; sockdir = "/srv/within/run"; }; ``` [And then test it with `curl`:](conversation://Cadey/enby?smol) ```console $ curl http://[fda2:d982:1da2:180d:b7a4:9c5c:989b:ba02]:43705/printerfacts # HELP printerfacts_hits Number of hits to various pages # TYPE printerfacts_hits counter printerfacts_hits{page="fact"} 15 printerfacts_hits{page="index"} 23 printerfacts_hits{page="not_found"} 17 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 0.06 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1024 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 12 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 5296128 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1617458164.36 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 911777792 ``` [And there you go! Now we can make Prometheus point to this and we can save Christmas!](conversation://Cadey/aha?smol) [:D](conversation://Mara/happy?smol) --- This is another experiment in writing these kinds of posts in more of a Socratic method. I'm trying to strike a balance with a [limited pool of stickers](https://tulpa.dev/cadey/kadis-layouts/src/branch/master/moonlander/leader.c#L68-L84) while I wait for more stickers/emoji to come in. [Feedback](/contact) is always welcome. (1): These metrics are not perfect because of the level of caching that Cloudflare does for me.