Configuration Services

Learn about configurations and their problems.

What are configurations?

Configuration services like ZooKeeper and etcd are distributed databases that applications can use to coordinate their configuration 9^{9},^{,} 10^{10}. Configuration in this sense is more than just the static parameters that an instance would keep in .properties files. It does include simple settings such as hostnames, resource pool sizes, and timeouts. But “configuration” also includes the arrangement of instances among themselves. These configuration databases can be used for orchestration, leader election (in the case of a cluster with a master node), or quorum-based consensus.

However, these are built with code and not magic. They are still bound by the constraints of the CAP theorem and sub-light-speed communications. The configuration services are themselves distributed databases.

These services are scalable but not elastic. That means we can add and remove nodes, but response time will degrade as the nodes rebalance their data. It often requires an admin action to get the cluster to accept a new member or to indicate that an old member is gone for good.

Configuration problems

Keep in mind that the configuration service suffers the same network trauma that every other application does. There will be times that clients can’t reach the configuration service. Worse, there will be times when the nodes of the configuration service can’t reach each other but clients can reach the nodes. In this case, it has to be safe for the clients to run with slightly outdated configurations. Otherwise, we have no choice but to shut down applications when the configuration service is partitioned.

Information doesn’t only need to flow from the service to client instances, either. Instances can report back with their version numbers (or commit SHAs) and node identifiers. That means we can write a program or script to reconcile the actual state of the system with the expected state after a deployment. Be somewhat careful with this, as the configuration services can sustain high read volume but have to go through some consensus mechanism for every write. It’s alright to use these for relatively slowly changing configuration data, but they definitely don’t stand in for a log collection system.

A few pointers about configuration services:

  • Make sure your instances can start without the configuration service.
  • Make sure your instances don’t stop working when configuration is unreachable.
  • Make sure that a partitioned configuration node doesn’t have the ability to shut down the world.
  • Replicate across geographic regions.

Get hands-on with 1200+ tech skills courses.