More topics are listed here that you can discover for enhancing your app.

Multi-stage builds

Large Docker images are slower to push and pull. As you become more experienced with Docker, you will want to find ways to make your images. Since version 17.05, Docker has had a feature called multi-stage builds. This lets you use multiple FROM statements in a single Dockerfile. Each new FROM is considered a new stage, and starts as a fresh new image. However, the COPY instruction has been enhanced to let you copy files from earlier stages.

The most obvious use case is where you need a lot of development tools that produce a final artifact. Think of a static site generator like Jekyll or Middleman. You need various tools to develop and generate the site, but once the static files are generated, they are the only thing needed to run the site. Multi-stage builds let you create an initial stage that generates the site, and a separate, final stage that copies those files into a clean web server image. The same goes for compiled languages like Go where, typically, the only thing you need to include in your final image is the compiled binary.

In the case of our Rails app, a quick win could be copying the precompiled assets into a final image, avoiding the need for all the JavaScript dependencies. There are other ways to save space if you think creatively and see what other people are doing.

Docker stats

Often, especially in production, it is useful to have a quick way to find out metrics about the resources being used. The Docker docs provide some useful information on various metrics that you can check out.

One of the simplest and most useful is the docker stats command. This provides various metrics, including CPU, memory usage, and network IO, which can be helpful for monitoring or debugging containers in production. Here is an example from Docker’s docs:

$ docker stats redis1 redis2
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
redis1 0.07% 796 KB / 64 MB 1.21% 788 B / 648 B ...
redis2 0.07% 2.746 MB / 64 MB 4.29% 1.266 KB / 648 B ...

Sharing config between Compose files

We are currently maintaining two files in Compose format: docker-compose.yml and docker-compose.prod.yml. As you develop your app, you may notice quite a lot of duplication between the Compose files for different environments.

Compose provides a mechanism that lets you extract the commonalities. It does this by allowing you to specify multiple Compose files:

docker-compose -f <file1> -f <file2> ... -f <fileN> up -d

Compose merges the config from the specified files, with config in later files taking precedence over config in earlier files.

As always, keeping or eliminating duplication both have trade-offs. On the plus side, extracting the duplication to a common file makes the differences between your environments clearer. These are the parts you will have to specify for an environment beyond the common base. It also (potentially) makes it (marginally) quicker to update the config for both sets of services.

On the downside, you have to piece together the definitions from multiple files to understand your app as a whole. As you can probably tell, this is an instance where the benefits of keeping the duplication outweigh our programmer instinct to keep things DRY.

However, it is worth knowing you have this option at your disposal should you need it. For example, this can also be put to use to keep common, one-off container admin tasks in a separate Compose file, rather than the same one as the application.

Database resiliency

Ensuring we back up our production database regularly is critical to ensure we can recover in case of error. It is possible to back up a database by running the normal database dump command inside a container. However, what’s slightly trickier is how to make this happen automatically in production. There are a number of different ways to handle this:

Platform-specific

Some container platforms allow you to schedule containers (for example, Amazon ECS scheduled tasks). Using these schedulers, you can run containers to back up the database at regular intervals. Additionally, platforms may offer backup capabilities. For example, Amazon Elastic Block Store (Amazon EBS) volumes provide automated incremental snapshotting capabilities. This can be a low-hassle, reliable approach to maintaining backups.

Cron running on the Dockerhost

There is nothing stopping you from setting up cron or a similar scheduler on your Dockerhost that triggers a container (or noncontainerized script) for backing up the database. Some people like this approach particularly because of its simplicity. However, the downside is the risk that your Dockerhost becomes a special snowflake that is harder to maintain. Your database backup mechanism is living outside of your containerization, so you lose all the benefits that brings.

Use third-party tools

You may also use third-party tools, for example, Barman for Postgres.

Containers on autopilot

There is a broader approach that is beginning to emerge known as the autopilot pattern. This involves baking standard operational tasks (such as scaling and resiliency) directly into your containerized services.

Rather than maintaining this operational logic spread externally with schedulers and separate task-based containers, your app containers have the smarts to perform their own life-cycle management. For example, imagine launching a Postgres container configured to check if its database was populated. If it finds that it was not, it fetches and restores the latest backup. Done well, maintenance and resiliency become automatic.

Joyent has been championing this approach, with several compelling articles on the subject. It also provides an open source tool called ContainerPilot to help with the coordination of life-cycle events. Alternatively, you can roll your own solution. I suspect we will see more in this space over time.

Database replication and high availability

To replicate your database, you typically need to rely on the built-in capabilities of your database (rather than a super naive approach of trying to use a shared filesystem).

Postgres offers many different options for clustering. Rather than reinventing the wheel, however, you can leverage the work that others have done in this area, for example:

  • Patroni39
  • Barman40
  • Crunchy41

You will find that similar work has been done for clustering and replicating other databases too.

Get hands-on with 1200+ tech skills courses.