Discovering Services

Learn about service discovery, how it is made and its different roles, and about different data distributed stores.

What is service discovery?

There are two cases where service discovery is important. First, our organization may have too many services for DNS management to be practical. Second, we may be in a highly dynamic environment. Container-based environments usually hit both of these criteria, but that’s not the only case.

Parts of service discovery

Service discovery really has two parts. First, it’s a way that instances of a service can announce themselves to begin receiving a load. This replaces statically configured load balancer pools with dynamic pools. Any kind of load balancer, whether done with hardware or software, can do this. It doesn’t require a special “cloud aware” load balancer.

The second part is lookup. A caller needs to know at least one IP address to contact for a particular service. The lookup process can appear to be a simple DNS resolution for the caller, even if some super-dynamic service-aware server is supplying the DNS service. Service discovery is itself another service. It can fail or get overloaded. It’s a good idea for clients to cache results for a short time.

Building service discovery

It’s best not to roll our own service discovery. Like connection pools and crypto libraries, there’s a world of difference between writing one that works and writing one that always works.

We can build a service discovery mechanism on top of a distributed data store such as Apache ZooKeeper or etcd. 8^{8},^{,} 9^{9} In these cases, we’ll wrap the low-level access with a library to make it both easier and more reliable to use these databases. Just as an example, in the terminology of the CAP theorem 10^{10}, ZooKeeper is a CP system. That means when there’s a network partition (and there will be a network partition), some nodes won’t answer queries or accept writes. Since clients need to be available, they must have a fallback to use other nodes or previously cached results. It’s not reasonable to expect every client to implement this behavior. Pinterest published a good experience report about using ZooKeeper for service discovery 11^{11}.

HashiCorp’s Consul resembles ZooKeeper in that it operates as a distributed database 12^{12}. However, Consul’s architecture places it in the “AP” arena, so it prefers to remain available and risk stale information when a partition occurs. In addition to service discovery, it also handles health checks.

Some other service discovery tools integrate directly with the control plane of PaaS platforms. For example, when Docker Swarm starts containers to run service instances, it automatically registers them with the swarm’s dynamic DNS and load-balancing mechanism.

This is a rapidly evolving space. As we can see, these tools have different considerations for each. They cover different scope and are subject to divergent behavior in failure cases. In fact, each one could occupy its own chapter, complete with cautions about sharp edges and detailed discussion about the boundary between the tools’ features and your applications’ responsibilities. Such chapters would probably be outdated by the time this course reaches print, or even epub, for that matter. There’s no plug-and-play replaceability. Choosing one is not a simple matter, and replacing one will have wide-reaching consequences. The only real answer here is to do homework and commit to solving implementation challenges with whichever tool you choose.

Get hands-on with 1200+ tech skills courses.