Skip to content

Latest commit

 

History

History
144 lines (101 loc) · 5.85 KB

README.md

File metadata and controls

144 lines (101 loc) · 5.85 KB

Docker_multistaging

Multistaging with Docker to reduce complexity, size, and build time

photo cred  

 
 

1. A brief intro on Docker and Image Size

photo cred

- A docker image is the "base" of the docker container.

- A docker image is created by a Dockerfile, which is a set of instructions that act as a multi-layered filesystem.

- When Docker runs the image it will produce one (or many) containers.

 

Formula for Docker Image Size

Image Size = Base Image + Essential Files + Cruft (a.k.a. random, unneeded files)

photo cred
 
For more detailed info on Docker, check out: https://github.com/sdl002/R_Docker_Intro  

Why you should avoid large Docker images:

  1. First and foremost: it’s best practice to maintain a small image size (see Docker Docs)
  2. Large images increase image build time (detrimental to continuous integration and continuous development/deployment)
  3. Large images also including pushing/pulling time (detrimental to continuous integration and continuous development/deployment)
  4. Reducing unnecesary dependencies will decrease both the complexity and the chances of vulnerability in your application

 
 

Real world example of a painfully large Docker Image:

This is an incredibly large image, one should aim to keep all images <1 GB.

But WHY is it so large?

1. R - using a Rocker based image is large (I used rocker/verse, which preps for PDF generation, and is HUGE: 3.69GB)

2. LaTeX - the tex-live-full library is HUGE (4.3 GB of disk space)

3. Did not clean up while building

4. Many "FROM" commands

5. Single stage (did not remove any unneeded artifacts)

 
 

2. Ways to reduce the size of a docker image

photo cred
 

1. smaller base image

 

2. fewer layers (combine commands, reduce use of "RUN")

photo cred
 

3. Use .dockerignore

photo cred
 

4. Add rm -rf /var/lib/apt/lists/* at the end of the apt-get -y (removes package manager cache)

photo cred  

5. Remove unnecessary dependencies with -–no-install-recommends flag

photo cred  

6. Some tools, like dive, can help find heavy layers and show what files are being added in each layer

photo cred  

7. Something I recently started using: multi-stage, described a bit more below

photo cred  

3. Docker multi-staging to the rescue

The FROM statement:

When building a Docker image, each stage begins with a "FROM" instruction. Your Dockerfile can have multiple "FROM" statements, and you can chose what artifacts pass through to the next stage (removing potential CRUFT)

When using multi-stage, you can include multiple stages in the same Dockerfile (as shown in the below example from Docker Docs)

photo cred  

You can also use external images, and copy "FROM" them into your production image:

photo cred  

4. A note on AWS and its lack of cache (by default)

By default, AWS does not utilize cache when buidling Docker images. There are ways to turn it on (ask Josh :) ), but I have found that with a large build using multistaging can also be useful.   See some options for using cache with AWS (and also probably still check with Josh):
Reference: https://docs.aws.amazon.com/codebuild/latest/userguide/build-caching.html  

I was able to somewhat circumvent giant image issues by utilzing multi-staging, and the other methods listed above, although my image could defnitely still be further optimizated and could use some more fine tuning. That being said, my build time and final image size was reduced by >50%... saving ~25 minutes in build time. Which is very helpful when testing an application.

Current Dockerfile: