IT shouldn’t be a cost-center with Docker

IT is critical for every big and small organization. It touches all the business processes and data in every company. Every year organization allocates budget to keep the “Lights On”. The #1 priority of IT is to make sure that the business is running as usual. That means that it reinforces the view of IT as a cost-center. Every aspect in the IT is focused on minimizing downtime rather than providing value to the organization. And that is the how things are and that is how every company is being run. All the “C-Suite” bosses are happy to cut the fat checks and are content with it. There is nothing wrong with that. Except that you can’t improvise on that and can’t convert that cost-center into a profit-center.

If you are running a data center of your own or outsourcing it to cloud services or running a hybrid cloud, then you need a whole ensemble of a team to manage your systems, networks, databases, infrastructures etc. They will babysit your infrastructure, perform changes and do maintenances. Any change to the state of production would require long maintenance window involving all the stakeholders, tons of overtime pay, increased frustration and no family life. It is not only the people and processes, but you will also accumulate computing power along the way and keep on piling your racks in the data centers. You will tons of wasted resources and would hardly be using even 20% of the computing power that you have available. Unfortunately, this is the sorry state of affair for most of IT departments in majority of the companies.

Alas! IT Gods have finally smiled upon us. Things doesn’t have to be same like before with the advent of technologies like docker. Docker and its ecosystem have unleashed myriad range of technologies that lets you put your systems and infrastructure on an auto-pilot. Docker ecosystem provides you with the tools and technologies that allows your IT to add value to your business. I am not claiming we will be able to solve all the issues with old-school IT but there are already tools at our disposal which when combine with docker can help us solve real complex IT solutions that we have been dreading all these years. Much has been talked about docker being a developer’s friend, but it can truly help alleviate IT Ops problem and can help us run a real lean DevOps IT organization.

Using docker successfully to manage and run part of our IT operations at YellowPages, we have leveraged its capabilities with technologies like push based metrics, service discovery in ephemeral environment, orchestration tools, build pipeline and message queues to solve real world IT problems. There were lots of new challenges that we overcame doing that. Also, there is a lot of work that needs to be done for docker to become a IT department’s best friend. In short, things are looking up and we can do quite a few things with docker which will help us evolve the eco-system to make IT a truly profit center.

Runtime secrets with docker containers

We, at YP are using docker containers for quite some time now. Onboarding onto docker wasn’t always that easy. There are lots of things to account for before running a docker container in production. One of the thing to address is how to deal with secrets during runtime.

We have done significant work on that front. I will be discussing in multiple blog posts about the problem and the potential solution with regards to injecting secrets to the docker container. In this post, I will be talking about how do people use secrets with docker containers and their issues.

Why are secrets important?

Secrets are important for every application. Some of the application secrets that you may need are:

    • database credentials
    • api tokens
    • ssh keys
    • TLS certificates
    • GPG keys etc.

Traditionally, we have been storing these secrets under some packages that are encrypted or storing it in “secrets store” or just putting it as a part of the source code. Well that was all ok and good. But we cannot use the similar solutions with docker images. Then how do we use it with the docker containers?

Solution 1: Baking it in the image

Well this is straight forward: you will just put it as part of the image. This is the first thing you will do when you are onboarding your app onto docker. Maybe you will put under some dot file, chown it to root and think that everything is fine. This is the most prevalent anti-pattern in security.

Issues:

  • When it is published to any registry, anyone can pull that image and the secrets would be at their disposal.
  • None of Github or Dockerhub or your repository is designed to securely store or distribute secrets.
  • Updating secrets is a tedious job. Updating all the images.
  • This could still be ok if you have few number of images, but consider you tie in CI/CI pipeline to your image build process. Now you are managing tons of images.
  • Accounting for certificate expiration becomes difficult.
  • Old, EOL/EOS or decommissioned hardware can cause secrets leak.

Solution 2: Put it under ENV variables

This is the most common way to pass secrets to the applications (more than 90% of people do it). It is widely used because 12 factor app guidelines recommend apps to be delivered and consumed as a service.
Example:  docker run –it –e “DBUSER=dbuser” –e “DBPASSWD=dbpasswd” myimage /bin/bash

Issues:

thaJeztah and  diogomonica have captured in detail about the best practices about using secrets. However, I am just summarizing the issues with this solution here:

  • Kept in intermediate layers of image and can be easily viewed using “docker inspect”.
  • Accessible by all the processes in the image. Can be easily leaked.
  • Shared with any linked container.
  • Incredibly common having the app grab the whole envt., print it out or even send it part of error report or pager duty.
  • Env. variables are passed down to child processes. Imagine that you call third party tool to perform some action, all of a sudden that third party has access to  your environment.
  • Very common for the apps that crashes to store env. variables in log files for debugging.

Solution 3: Volume Mounts

This is again as straight-forward as passing ENV variables. You put your secrets in some directory structure on docker hosts. That directory structure can be on local file system, NFS or DFS like CEPH. You then mount the right directory inside the container for that particular app.
Example: docker run –i –t –v /mnt/app1/secrets:/secrets myimage /bin/bash

Issues:

  • Bad design putting all the secrets for all the images on a single machine.
  • Secrets are unencrypted, in plain text.

Solution 4: Secrets encryption

Some people are paranoid about keeping their secrets in plain text. And they are even more paranoid about putting image with plaintext secrets to some private/public docker registries. So, they encrypt the secrets using public key and elliptic curve cryptography using tools like “ejson” from Shopify and others. To decrypt, private keys are hosted on the docker hosts and those production machines are locked down. At least, with this way your image is safe from snooping.

Issues:

  • To update secrets, you need to create new images.
  • Solution is fairly static.
  • You can still see which private keys are used to decrypt using “docker inspect”.

Solution 5: Secrets store

There are secrets management and distribution services like: HashiCorp’s Vault, Square’s Keywhiz and Sneaker (for AWS). They help you generate and distribute secrets for services. Main benefit of this approach is that secrets are centrally managed in a secure manner. And there is also an auditability with the secret access. Almost all these solutions are API based and are mostly reliable.

There is already an integration of Keywhiz secrets store with docker as a volume-driver plugin. This solution is the robust of all and already integrated with docker. However, if docker (or docker swarm) is the only way you manage and run your containers. This plugin doesn’t extend well if you are using orchestration tools like Mesos or Kubernetes to manage/run your containers.

If you orchestrate containers through Mesos or Kubernetes, watch out my next series of post regarding the solution.

Persisting solutions for the ephemeral containers

With the advent of Docker, container scene has exploded and it is taking everyone by storm. It has gotten everyone excited: from Developers to QA engineers, from Deploy managers to System Administrators. Everyone wants to adopt it and start incorporating it in their workflow.

However, there are inherent challenges to manage, run and operate such systems at scale. First challenge begins  with trying to run ephemeral containers in a static world of hardwares. Most of us are trying to retrofit existing solutions and our mindsets into the new way of doing things. Others are focused on building orchestration tools like Kubernetes, Mesos or Cloud Foundry’s Lattice. I call these orchestration tools as a kernel of the data center operating system. There is a very little focus given to building tools around the kernel to make a truly distributed data-center specific GNU like operating system. Things like centralized logging, monitoring and alerting, metrics collection, persistent storage, service discovery etc. are the things, which needs solidification for the container ecosystem.

To compare, we just need to go back few decades to see how GNU operating system got evolved. We need to put our GNU hat on and see how they made it possible with the collection of applications, libraries, developer tools and even games with a solid Linux based kernel.

We, at YP, have devised some of the solutions around these problems. You can check out some of the work that we have made opensource.

Sysdig has also compiled “The Container Ecosystem Project”. Please check them out. There are some really interesting technologies mentioned there.

As I mentioned, there is a lot of movement and everybody is trying to get a head start in developing their own technologies that works for them. I feel the GNU god has to come down once again: to show us the right way to consolidate all these disjointed systems to work as a true data center operating system.