Build time secrets with docker containers

Check out first part of this blog post if you are looking to provide secrets during container runtime.
Part 1: Runtime Secrets with Docker Containers

So far we have talked about run time secrets. But you also need secrets during the build time of docker images. For ex: During your build process, you need:

  • To hit some private repo to pull dependency.
  • Need private keys for remote SSH connection to pull stuff.
  • Build environment is behind PROXY than deploy envt.
  • Tons of other use cases.

There isn’t a single solution that addresses all the use-cases I listed above. For some there are features available in the latest docker version (1.9), for some there are third party solutions and for the others, there are custom hacks. I will list all of them depending upon their use-cases.

Solution 1: Dockerfile ENV variable

You can use ENV directive in Dockerfile to define variables and use it with other primitives in Dockerfile. According to me, this just helps to make your Dockerfile pretty and it shouldn’t be use for passing secrets or any sensitive information during the build process. Here’s why:

Issues:

  • Env variables that are passed are preserved in the final image as well as all the intermediate layers of the image. Anyone can do a “docker inspect” and can see the values of these variables. So, if you use this approach to pass secrets for your buildtime dependencies, you fear exposing them to anyone who pulls down the image.
  • Your Dockerfile is static with this approach. You cannot override ENV. values if you are building the image from various hosts that require different proxies.
  • Be mindful that the variables(secrets) are written to the disk at every intermediate and final layer of the image.

Solution 2: Docker build time variables

To overcome the issue about the variable persisting in the intermediate and final image, Docker introduced -build-arg variable in their 1.9 release. This flag allows you to pass the build-time variables that are accessed like regular environment variables in the RUN directive. Also, these values don’t persist in the intermediate or final images like ENV values do.

A good example is http_proxy or source versions for pulling intermediate files. The ARG instruction lets Dockerfile authors define values that users can set at build-time using the –build-arg flag:

Example: docker build –build-arg HTTP_PROXY=http://10.20.30.2:1234 .

Issues:

mapuri mentions it shouldn’t be used when the image caching is turned on.

Solution 3: Flattening Images

For the fear that your variables are visible in the intermediate layers of the image, people have started flatting their images. This also helps in reducing the overall size of the image and can save on your image upload/deploy time.

There are quite a few projects that lets you do that:

  • docker-squash from jwilder
  • Or you can do it yourself with exporting and importing the image:
    ID=$(docker run -d image-name /bin/bash)
    docker export $ID | docker import – flat-image-name

Issues:

  • You have to add flatteing(hacking) in your build and deployment workflow.
  • It will remove the secrets from the intermediate layers, but it may still end up in the build cache.
  • It unnecessary adds complexity when you are automatically building images using CI/CD pipeline.
  • Be mindful that the variables(secrets) are written to the disk at every intermediate and flattened final layer of the image.

Solution 4: Hosting secrets on a server

To serve secrets or ssh keys to the build process, there are various tools available like vault from dockito, that runs in its own container to serve the key over the HTTP.During the build, it’s invoked from the Dockerfile, using a special RUN directive, where you just pass it the command that requires the secret. It will fetch the key from the server and executes the command.
Vault from Dockito

docker-ssh-exec is a bit more advanced that vault from dockito. In that, it fetches the key over the network from the server container, writes it to disk, executes the desired command, and then removes the key. Since all this occurs within the scope of a single RUN directive, the key data is never written into the resulting filesystem layer.

Conclusion

There isn’t a single solution that satisfies every use case for the build time secrets. People have gotten restless about the lack of features from docker. So, they have went ahead and added their own custom hacks.

As mentioned by thaJeztah, there are lots of PR(Pull Request) at Docker still pending:

  • Add private files support #5836
  • Add secret store #6075
  • Continuation of the docker secret storage feature #6697
  • The Docker Vault” #10310
  • Provide roadmap / design for officially handling secrets. Make injecting secrets pluggable, so that they use existing offerings in this area, for example: Vault, Keywhiz, Sneaker

Runtime secrets with docker containers

We, at YP are using docker containers for quite some time now. Onboarding onto docker wasn’t always that easy. There are lots of things to account for before running a docker container in production. One of the thing to address is how to deal with secrets during runtime.

We have done significant work on that front. I will be discussing in multiple blog posts about the problem and the potential solution with regards to injecting secrets to the docker container. In this post, I will be talking about how do people use secrets with docker containers and their issues.

Why are secrets important?

Secrets are important for every application. Some of the application secrets that you may need are:

    • database credentials
    • api tokens
    • ssh keys
    • TLS certificates
    • GPG keys etc.

Traditionally, we have been storing these secrets under some packages that are encrypted or storing it in “secrets store” or just putting it as a part of the source code. Well that was all ok and good. But we cannot use the similar solutions with docker images. Then how do we use it with the docker containers?

Solution 1: Baking it in the image

Well this is straight forward: you will just put it as part of the image. This is the first thing you will do when you are onboarding your app onto docker. Maybe you will put under some dot file, chown it to root and think that everything is fine. This is the most prevalent anti-pattern in security.

Issues:

  • When it is published to any registry, anyone can pull that image and the secrets would be at their disposal.
  • None of Github or Dockerhub or your repository is designed to securely store or distribute secrets.
  • Updating secrets is a tedious job. Updating all the images.
  • This could still be ok if you have few number of images, but consider you tie in CI/CI pipeline to your image build process. Now you are managing tons of images.
  • Accounting for certificate expiration becomes difficult.
  • Old, EOL/EOS or decommissioned hardware can cause secrets leak.

Solution 2: Put it under ENV variables

This is the most common way to pass secrets to the applications (more than 90% of people do it). It is widely used because 12 factor app guidelines recommend apps to be delivered and consumed as a service.
Example:  docker run –it –e “DBUSER=dbuser” –e “DBPASSWD=dbpasswd” myimage /bin/bash

Issues:

thaJeztah and  diogomonica have captured in detail about the best practices about using secrets. However, I am just summarizing the issues with this solution here:

  • Kept in intermediate layers of image and can be easily viewed using “docker inspect”.
  • Accessible by all the processes in the image. Can be easily leaked.
  • Shared with any linked container.
  • Incredibly common having the app grab the whole envt., print it out or even send it part of error report or pager duty.
  • Env. variables are passed down to child processes. Imagine that you call third party tool to perform some action, all of a sudden that third party has access to  your environment.
  • Very common for the apps that crashes to store env. variables in log files for debugging.

Solution 3: Volume Mounts

This is again as straight-forward as passing ENV variables. You put your secrets in some directory structure on docker hosts. That directory structure can be on local file system, NFS or DFS like CEPH. You then mount the right directory inside the container for that particular app.
Example: docker run –i –t –v /mnt/app1/secrets:/secrets myimage /bin/bash

Issues:

  • Bad design putting all the secrets for all the images on a single machine.
  • Secrets are unencrypted, in plain text.

Solution 4: Secrets encryption

Some people are paranoid about keeping their secrets in plain text. And they are even more paranoid about putting image with plaintext secrets to some private/public docker registries. So, they encrypt the secrets using public key and elliptic curve cryptography using tools like “ejson” from Shopify and others. To decrypt, private keys are hosted on the docker hosts and those production machines are locked down. At least, with this way your image is safe from snooping.

Issues:

  • To update secrets, you need to create new images.
  • Solution is fairly static.
  • You can still see which private keys are used to decrypt using “docker inspect”.

Solution 5: Secrets store

There are secrets management and distribution services like: HashiCorp’s Vault, Square’s Keywhiz and Sneaker (for AWS). They help you generate and distribute secrets for services. Main benefit of this approach is that secrets are centrally managed in a secure manner. And there is also an auditability with the secret access. Almost all these solutions are API based and are mostly reliable.

There is already an integration of Keywhiz secrets store with docker as a volume-driver plugin. This solution is the robust of all and already integrated with docker. However, if docker (or docker swarm) is the only way you manage and run your containers. This plugin doesn’t extend well if you are using orchestration tools like Mesos or Kubernetes to manage/run your containers.

If you orchestrate containers through Mesos or Kubernetes, watch out my next series of post regarding the solution.

Persisting solutions for the ephemeral containers

With the advent of Docker, container scene has exploded and it is taking everyone by storm. It has gotten everyone excited: from Developers to QA engineers, from Deploy managers to System Administrators. Everyone wants to adopt it and start incorporating it in their workflow.

However, there are inherent challenges to manage, run and operate such systems at scale. First challenge begins  with trying to run ephemeral containers in a static world of hardwares. Most of us are trying to retrofit existing solutions and our mindsets into the new way of doing things. Others are focused on building orchestration tools like Kubernetes, Mesos or Cloud Foundry’s Lattice. I call these orchestration tools as a kernel of the data center operating system. There is a very little focus given to building tools around the kernel to make a truly distributed data-center specific GNU like operating system. Things like centralized logging, monitoring and alerting, metrics collection, persistent storage, service discovery etc. are the things, which needs solidification for the container ecosystem.

To compare, we just need to go back few decades to see how GNU operating system got evolved. We need to put our GNU hat on and see how they made it possible with the collection of applications, libraries, developer tools and even games with a solid Linux based kernel.

We, at YP, have devised some of the solutions around these problems. You can check out some of the work that we have made opensource.

Sysdig has also compiled “The Container Ecosystem Project”. Please check them out. There are some really interesting technologies mentioned there.

As I mentioned, there is a lot of movement and everybody is trying to get a head start in developing their own technologies that works for them. I feel the GNU god has to come down once again: to show us the right way to consolidate all these disjointed systems to work as a true data center operating system.