POST

In this post we look at using buildah to generate container images that only contain what we want, no extra fluff. We show how this can let us generate truly small images that will load faster and be more secure, and do this without the need for the Docker daemon to be running.

Nearly all container images are Docker images built from a Dockerfile using docker build. In fact you may think this is the only way to do so. This is not the case. There are several ways of building images (see here for a guide). This post focuses on a relatively new tool called buildah that has some key benefits over using the Docker tools:

  • No need for a running daemon process (the Docker daemon)
  • Packs the image from the outside not the inside

We’ll explain both of these as we go. First let’s establish some baselines. Many images will use Centos7 and Debain as base images, so let’s look at these standard images that you’ll find on Docker Hub.

$ docker images
REPOSITORY             TAG           IMAGE ID            CREATED             SIZE
docker.io/centos       7             3fa822599e10        6 months ago        204 MB
docker.io/debian       buster        ebdc13caae1e        2 months ago        106 MB

You’ll notice that the Centos image is almost twice the size of the Debian image. This is why we generally have used Debian based images. But why is this, and can something be done about this?

Part of the reason must cetainly be because Centos use Yum as the package manager, and Yum is based on Python, so the Centos image comes with a full Python installation, and Python is hardly a lightweight! In contrast the apt package manager in Debian does not need Python.

This hides another problem with both these images. They contain a package manager that is needed because of the way the docker build process works with its Dockerfile. Typically the first thing you do in a Dockerfile is to install the packages you need using the yum or apt package manager, but once this is done the package manager serves no purpose but remains part of the resulting image. This means the final image has extra size and has a larger attack vector for hackers. For instance, if you are wanting an image that runs nginx you really want that image to just contain nginx, not all the things that were needed to put nginx there in the first place.

This is where buildah is different. It installs packages from the outside and the resulting image does not include the package manager and anything else that was needed to build or install your tools. We’ll see this in action shortly.

Let’s get started. We’ll create a clean environment to work in. For Centos based images let’s do this in a Centos Docker image. We’ll fire up the container, update the packages and install buildah:

$ docker run -it -v $PWD:$PWD:Z -v /var/lib/containers:/var/lib/containers --privileged -w $PWD centos:7 bash
[root@441275f3a875 buildah]# yum update -y
...
[root@441275f3a875 buildah]# yum install -y buildah
...
[root@441275f3a875 buildah]#

Now we’re ready to go. At first you might be worried. What about all my existing Dockerfiles? I don’t want to switch to some other process for building my Docker images. Don’t worry! Your Dockerfiles can still work with buildah. As an example let’s use this simple Dockerfile for creating an nginx container:

FROM centos:7

RUN yum install -y epel-release &&\
  yum update -y &&\  
  yum -y install nginx --setopt install_weak_deps=false &&\
  yum -y clean all 

EXPOSE 80

CMD ["nginx", "-g", "daemon off;"]

Now let’s build this with buildah

[root@417a3c92073d buildah]# buildah bud .
...
...
STEP 3: EXPOSE 80
STEP 4: CMD ["nginx", "-g", "daemon off;"]
STEP 5: COMMIT containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.override_kernel_check=true]@7540f22d0d568655fb1d02a20b250911cdf1564b6fe84754f83528df6f082da1
Getting image source signatures
Skipping fetch of repeat blob sha256:43e653f84b79ba52711b0f726ff5a7fd1162ae9df4be76ca1de8370b8bbf9bb0
Copying blob sha256:0f499735cf675b07294a2add64622d105769a4aab08c96f5f63a886225d59088
 74.35 MiB / 74.35 MiB [====================================================] 2s
Copying config sha256:40f2bd546f05497db4ce9544b5072064421a1f192cbd2df97a3a77c142a44029
 1.17 KiB / 1.17 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
7540f22d0d568655fb1d02a20b250911cdf1564b6fe84754f83528df6f082da1
[root@417a3c92073d buildah]# buildah images
IMAGE ID             IMAGE NAME                      CREATED AT             SIZE
7540f22d0d56         <none>                          May 31, 2018 09:41     419.3 MB

So we’ve built an image from the Dockerfile. What’s noteable here is that inside this Centos container where we are working there is no Docker daemon running. We’ve built a Docker image without Docker.

So now lets move on to the more interesting aspect of buildah, the ability to pack images from the outside. What do we mean by this? Well, we should be familiar with the concept of using a package manager to install packages in a Docker file. A line like this installs NGinx:

RUN yum -y install nginx

The problem with this as we’ve stated above is that the package manager (and in this case also Python) is part of the image that is built, but neither yum nor Python is needed to run Nginx. With builah you still use a package manager, but it runs on the host machine and installs the packages into the file system that will become the image that is to be built. The package manager is not part of that image. More info on using buildah can be found here. It’s well worth a read and we won’t repeat things here.

So armed with buildah I thought it would be simple to solve the problem with Centos images that was mentioned earlier. I’ll create a base centos image that doesn’t include yum and Python, and we’ll get a nice small image. Well, the first attempt didn’t go well. The story is interesting and my initial attempts did not create very small images. I raised this issue on the project Atomic issue tracker that stimulated a lot of discussion and was summarised in this blog post by Tom Sweeney. Interestingly it seems that some of the ‘fixes’ that were needed do not seem to be needed now (maybe yum has got smarter?), but I keep them in the file as they are good practice anyway.

The build script looks like this:

#!/bin/bash

set -x

# build a minimal image
newcontainer=$(buildah from scratch)
scratchmnt=$(buildah mount $newcontainer)

# install the packages
yum install bash coreutils --installroot $scratchmnt --releasever 7\
  --setopt install_weak_deps=false --setopt=tsflags=nodocs\
  --setopt=override_install_langs=en_US.utf8 -y
yum clean all -y --installroot $scratchmnt --releasever 7
rm -rf $scratchmnt/var/cache/yum

# set some config info
buildah config --label name=centos-base $newcontainer

# commit the image
buildah unmount $newcontainer
buildah commit $newcontainer tdudgeon/centos-base

In this case we’re using buildah in the way it was meant to be used, not from a Dockerfile but as a set commands executed as a bash script. We create a new minimal image, basically just containing the Linux kenrnel from the host machine and then install bash and coreutils using the yum package manager. But the yum that is being used is the one on the host machine, and it’s installing those packages into the filesystem that is mounted as scratchmnt. The coreutils package that is installed contains a set of standard Linux tools. Strickly speaking many of these are not needed, but without them you’d have a hard time debugging your container if you needed to run a shell inside it. If you were obsessive about the number of installed packages then you could install a subset of these.

So what is the result?

$ buildah images
IMAGE ID             IMAGE NAME                                               CREATED AT             SIZE
06c90c079f3a         docker.io/tdudgeon/centos-base:latest                    May 31, 2018 10:48     56.96 MB

A final images size of about 57 MB. Pretty impressive as the equivalent from Docker Hub was 204 MB, and the Debian image was 106 MB.

As an aside, if you also include yum in the packages that are installed the image size increases to 120 MB supporting the idea that much of the extra size of the Centos image from Docker Hub compared to the Debian image is due to yum and Python being present.

So we how have a nice small buildah image, but can we use this with Docker? Yes, but we need to do a little extra work. The images needs to be copied from /var/lib/containers to where Docker expects it. For this we need the Docker daemon, so we move back outside the Docker image where we were working and do this:

$ sudo buildah push tdudgeon/centos-base docker-daemon:tdudgeon/centos-base:latest
...
$ docker run -it --rm tdudgeon/centos-base bash
bash-4.2#

So now we have a base Centos image that’s almost a quarter of the size of the one from DockerHub. Nice! You can find it on Docker Hub here.

In the next of this series of posts we’ll see how we can apply this to our RDKit containers and see if we can make them even smaller than we did in the previous post.