The key to optimizing Docker builds is to use the right base images, and keep your application images small with multi-stage builds. Multi-stage builds use the standard Dockerfile syntax, with multiple stages separated with FROM
commands. They give you a repeatable build with minimal dependencies.
Multi-stage builds are a great way to centralize your toolset - developers and build servers just need Docker and the source code, all the tools come packaged in Docker images, so everyone's using the same versions.
Sample Dockerfiles
It's the standard docker build
command for multi-stage builds. The Dockerfile syntax uses multiple FROM
instructions; the patterns are the same for all languages, but the individual details are specific.
These are samples in the major languages:
There are two build engines in Docker - the original and BuildKit. They both produce compatible images, but BuildKit is optimized and it's the default in recent Docker installations.
We'll start by using the original build engine so it's clear what's happening in the build - later we'll switch to BuildKit which has better performance:
# on macOS or Linux:
export DOCKER_BUILDKIT=0
# OR with PowerShell:
$env:DOCKER_BUILDKIT=0
Here's a simple multi-stage Dockerfile:
- the
base
stage uses Alpine and simulates adding some dependencies - the
build
stage builds on the base and simulates an app build - the
test
stage starts from the base, copies in the build output and simulates automated testing - the final stage starts from base and copies in the build output
📋 Build an image called simple
from the labs/docker/simple
Dockerfile.
Not sure how?
# just a normal build:
docker build -t simple ./labs/docker/simple/
All the stages run, but the final app image only has content explicitly added from earlier stages.
Run a container from the image and it prints content from the base and build stages:
docker run simple
The final image doesn't have the additional content from the test stage.
BuildKit is an alternative build engine in Docker. It's heavily optimized for multi-stage builds, running stages in parallel and skipping stages if the output isn't used.
Switch to BuildKit by setting an environment variable:
# on macOS or Linux:
export DOCKER_BUILDKIT=1
# OR with PowerShell:
$env:DOCKER_BUILDKIT=1
Now repeat the build for the simple Dockerfile - this time Docker will use BuildKit:
docker build -t simple:buildkit ./labs/docker/simple/
You'll see output from different stages at the same time - and if you look closely you'll see the test stage is skipped.
📋 Run a container from the new image. Is the output the same? Compare the image details.
Not sure how?
# run a container - the output is the same:
docker run simple:buildkit
# list images - they're the same size but not the same image:
docker image ls simple
BuildKit skips the test stage because none of the output is used in later stages. You can explicitly build an image up to a specific stage with the target
flag:
docker build -t simple:test --target test ./labs/docker/simple/
--target
states the target stage, Docker will build all stages up to and including the named one
This image is the output of the test stage not the final stage.
📋 Run a container from the test build, printing the contents of the build.txt file.
Not sure how?
# no output here - the test stage has no CMD instruction
docker run simple:test
# run the cat command to see the output
docker run simple:test cat /build.txt
The output is from the build stage plus the test stage.
Real multi-stage builds use an SDK (Software Development Kit) image to compile the app in the build stage and a smaller runtime image (with no build tools) to package the compiled app.
The images you use and the commands you run are different for each language, but you'll find official images on Docker Hub for all the major platforms, including:
- maven and gradle to build Java apps - using openjdk for the runtime image
- python - has Pip installed for dependencies
- node for Node.js apps - this has NPM so you can install packages in the build stage
- golang for Go apps - they don't need a runtime so the final image can start from scratch
- dotnet/sdk for .NET Core/5 apps, using dotnet/runtime or dotnet/aspnet for the final app image
We won't cover different languages in detail. The whoami Dockerfile shows how the pattern works, using a Go application:
- the builder stages starts from the Go SDK image
- it installs the OS packages needed to build the app
- then it copies the library list and runs
go mod download
to install the app's dependencies - next it copies the source code and compiles the app
- the final app image sets up the container environment
- then it copies in the compiled output from the builder
📋 Build an image called whoami
from the folder labs/docker/whoami
.
Not sure how?
docker build -t whoami ./labs/docker/whoami/
You'll see all the stage output from BuildKit.
SDK images are typically very large, having the whole build toolset. You don't want to use an SDK image in your final stage, otherwise you'll have all that stuff in your app image.
📋 Compare the sizes of the whoami
and golang
images.
Not sure how?
docker pull golang:1.16.4-alpine
docker image ls -f reference=whoami -f reference=golang
Woah! The SDK image is over 300MB; the app image is under 10MB.
The app is a simple web server. Run a container publishing a random port and find the port:
docker run -d -P --name whoami1 whoami
docker port whoami1
The
EXPOSE
instruction tells Docker the target port for the container; when you use the-P
flag Docker publishes all exposed ports.
Now you can use the app:
curl http://localhost:<port>
The server just prints some details about the environment and the request.
Apps need special Linux permissions to listen on the standard HTTP ports - even inside a container.
The whoami app supports an option to configure the port it listens on, so you can use a non-standard port and potentially run with tighter security.
Your goal for this lab is to run the whoami app in a container - using the -port
application argument to listen on a specific port. What happens when you run a container with the -P
(--publish-all
) option? Does Docker map the new port correctly?
What do you need to do to run a working container?
Cleanup by removing all the whoami containers:
docker rm -f $(docker ps -q --filter="name=whoami")