iconia/doc/spec.md

4.8 KiB

iconia: A Service Gateway

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Abstract

Operating TCP services in production environments can be a troublesome thing in practice. At smaller scales it is easy to have a single server terminating a single service. Once that scales however, that gets more and more difficult as the complexity of the setup increases.

We need a better option.

Iconia is a service gateway that integrates into Caddy to allow for a better experience operating complicated applications at scale. The ultimate goal of Iconia is to allow services to not require a direct TCP line-of-fire from the load balancer to the backend services. Instead the Iconia agent would connect to the load balancer and redirect traffic to the backend, much like it would in service meshes like Envoy or Istio.

Name Origin

The name of Iconia is a reference to the Iconian gateway from the Star Trek franchise. It is a gateway that allows for instant travel to distant points in the galaxy in ways that bypass shields and other forms of security. The name is used here because Iconia is a gateway to distant services that bypasses NAT and other ways that ports get blocked.

Components

Iconia is made up of several components:

  • Caddy as the ingress/TLS terminating component
  • The Iconia agent running alongside the application being exposed to the world
  • The application being exposed via Iconia

Caddy Plugin

The Caddy plugin for Iconia MUST perform the following operations:

  • Gather metrics about the performance of each backend:
    • Roundtrip time
    • Time to first byte
    • Number of connections
    • Time it takes to do healthchecks
    • A healthcheck metric defined by the backend application
  • Gather metrics:
    • Number of connected backends
    • Number of authentication failures
  • Have some method to intelligently select backends based on the following criteria:
    • Load of the backend service instance in question
    • Client remote IP address and port number
    • Backend health status (IE: it MUST NOT route to backends that are marked as unhealthy)
  • Expose a protocol or method for backend services to connect to the concentrator
    • Evaluate smux, SSH and Quic
    • Ensure that only authorized agents are allowed to register as backends
  • Expose an API for controlling Iconia operations
    • List active backends
    • See details about a given backend by connection ID
    • Kill an arbitrary backend by connection ID
  • Log messages to the standard Caddy logging sink
  • Route requests and responses to and from the discovered backend efficiently

The Caddy plugin for Iconia MUST support allowing backends for multiple hosts to connect via the same TCP/UDP port.

Iconia Agent

The Iconia agent MUST perform the following operations:

  • Discover/gather configuration information from the environment and filesystem
  • Connect to the gateway server in a durable and fault-tolerant manner
  • Authenticate to the gateway server
  • Listen for incoming TCP sessions from the gateway server and route them to the backend service
  • Utilize the PROXY protocol to ensure that the backend service has accurate information about client IP addresses

Backend Service

Backend services for Iconia MUST have the following properties:

  • Understand the PROXY protocol to ensure that the backend service has accurate information about client IP addresses
  • Expose a healthcheck route:
    • On the HTTP host iconia-healthcheck
    • With the path /health
    • That MAY return 200 if everything is healthy with the body OK
    • And also MAY return 500 if everything is NOT healthy with the body containing a site-defined error message explaining what the issue is
    • This healthcheck MAY include the response header X-Iconia-Load to send the gateway a site-defined load metric to help the gateway choose the backend with the least load
  • Accepts connections over TCP to a known port

Caveats

This will undoubtedly add a slight amount of latency to standard HTTP operations. Applications that are latency sensitive should probably avoid using this tool in favor of traditional exposure methods.

The Go HTTP/2 stack doesn't currently support connection hijacking. This will be needed in order to have the most efficient routing possible.