Consul and etcd in the Autopilot Pattern

August 02, 2016 - by Tim Gross

Applications developed with the Autopilot Pattern are self-operating and self-configuring but use an external service catalog like Consul or etcd to store and coordinate global state. ContainerPilot sends the service catalog heartbeats to record that an instance of an application is still up and running. Both Consul and etcd have interesting assumptions about their topology that end users deploying on Triton should be aware of.

Consul and etcd both use raft as their means of ensuring consistent consensus among all nodes. But raft's consensus algorithm doesn't scale well beyond a handful of nodes, so both services have a mechanism to scale out to serve a large number of clients.

In Consul, this mechanism is the Consul agent. An agent is an instance of Consul that does not participate in the raft but does participate in the LAN gossip protocol. This means that it receives all the data that's written to the raft but proxies all writes it receives to the raft. Additionally, Consul expects that all health checks are associated with a particular agent. You'll send TTL heartbeats (or other health checks) to the agent rather than the raft. The agent will only update the raft if the state of an instance changes. If the agent fails, then all application instances associated with that agent will also fail. This allows for changes to propagate through the raft efficiently but without loading down the LAN gossip pool with what might be thousands of concurrent heartbeats in a large deployment.

Etcd has a similar arrangement with the etcd proxy but only in order to ease the discovery of the cluster. Prior to v3 etcd's proxy was transparent and forwarded TTL heartbeats to the raft. This meant that we didn't care which specific etcd node receives TTL heartbeats (unlike Consul), and that makes configuration of clients much easier. But ensuring that TTLs are propagated throughout the raft created scalability hardships if there are a large number of health checks, so in v3 etcd moved to a model where the keepalive is used to determined node health in a similar way to Consul. Instead of a key being associated with the local agent the TTL lease is maintained by the current etcd primary.

The agent/proxy model that both Consul and etcd expect has baked-in assumptions about your infrastructure. It works for container deployments in VMs or even single-tenant bare metal, but those kinds of deployments assume a single tenant per "machine". That makes it difficult for multiple teams to share the same infrastructure, or even one team to use the same infrastructure pool to do multiple deployments, such as for staging and prod.

Triton's strong isolation between containers makes it possible for every container to get bare-metal performance while sharing hardware with other containers. This means developers and customers can have as many isolated environments as they need to build, test, and deliver their applications.

Eliminating VMs makes it faster and easier to do all that, but creates a challenge given the topology assumptions of Consul and etcd; this same challenge faces so-called serverless environments and PaaS offerings as well. There's no particularly good way to differentiate between Consul nodes so that a container always uses a particular agent. And even if we could make this work, if a given Consul agent halts (it crashes or its underlying host is rebooted), then all containers associated with it become unhealthy, even though they are on different hosts.

So what are our options?

Etcd Without a Proxy

Although an etcd proxy helps the scalability of etcd, the cluster will behave correctly without one. A heartbeat can be sent to any node in the cluster. Bootstrapping etcd can be a little painful; unless we have a SRV record or want to use an external service, we need an existing etcd cluster to store the initial membership list. The autopilotpattern/etcd GitHub repo has an example of using a temporary single-node cluster to bootstrap our etcd cluster. If we give the etcd cluster a Triton CNS name, then applications using ContainerPilot can use the CNS record to find the service catalog. We can inject this CNS name via environment variable. The JSON configuration for ContainerPilot in this case looks like this:

{
    "etcd": {
        "endpoints": [
            "http://{{ .ETCD }}:4001"
        ],
        "prefix": "/containerpilot"
    }
}

Consul Agent in Multiprocess Containers

A multiprocess container can be an infrastructure container or a Docker container that has a lightweight supervisor (such as s6 or runit) as PID1. The supervisor will run a Consul agent as well as our application wrapped by ContainerPilot. (We could run the etcd proxy this way too.) Of course we'll want to run Consul as an HA raft and we'll need a service identifier for it. We can use the Consul cluster setup in autopilotpattern/consul to bootstrap the cluster using Triton CNS.

This arrangement complicates the container image quite a bit. An example of setup can be found in autopilotpattern/nginx in the multiprocess branch. The ContainerPilot configuration for reaching Consul remains trivial:

{
  "consul": "localhost:8500"
}

We'll then also include a runit service file for Nginx and Consul. Because runit clears the environment variables before forking applications, we're forced to include a script that writes out the environment to a file and then source that into our service file as shown below:

#!/bin/bash
# runit file for starting Consul

# load in environment that runit blows away
set -a
source /etc/envvars
set +a

set -e
exec 2>&1
umask 022

exec /usr/local/bin/consul agent \
  -data-dir=/data \
  -config-dir=/config \
  -rejoin \
  -retry-join ${CONSUL} \
  -retry-max 10 \
  -retry-interval 10s

Needless to say this is more complicated. We also lose the signal handling features provided by ContainerPilot; an external scheduler can't send SIGHUP, SIGUSR1, or SIGTERM signals to ContainerPilot without shelling into the container because it's no longer PID1. This makes it harder to trigger ContainerPilot reconfiguration and preStop behaviors.

Consul Agent as Coprocess

Our last and perhaps best option is to use ContainerPilot's new coprocess feature with a Consul agent. This feature allows ContainerPilot to spin up a secondary process that will be restarted if it fails rather than causing the container to exit. We'll tell ContainerPilot that the service catalog is local (the Consul agent), and use the CNS name that we assigned in autopilotpattern/consul to tell the Consul agent inside the container how to reach Consul. Our ContainerPilot configuration might look like this:

{
  "consul": "localhost:8500",
  "coprocesses": [
    {
      "command": [
          "/usr/local/bin/consul", "agent",
          "-data-dir=/data",
          "-config-dir=/config",
          "-rejoin",
          "-retry-join", "{{ .CONSUL }}",
          "-retry-max", "10",
          "-retry-interval", "10s"],
      "restarts": "unlimited"
    }
  ]
}

In the latest version of autopilotpattern/nginx (and most of the autopilotpattern/wordpress components) we're using this pattern and we've included an environment variable flag CONSUL_AGENT that, if set, updates the ContainerPilot configuration as shown above using the ContainerPilot config templating feature.

The assumptions that Consul and etcd both bring to their deployments aren't unique to these services. Many existing applications have hidden assumptions about a topology of VMs. Fortunately we can work around these problems with ContainerPilot, and gain the performance of bare metal and the developer productivity of Docker containers by deploying our applications to Triton.