- A docker image is created by a Dockerfile, which is a set of instructions that act as a multi-layered filesystem.
photo cred
For more detailed info on Docker, check out: https://github.com/sdl002/R_Docker_Intro
- First and foremost: it’s best practice to maintain a small image size (see Docker Docs)
- Large images increase image build time (detrimental to continuous integration and continuous development/deployment)
- Large images also including pushing/pulling time (detrimental to continuous integration and continuous development/deployment)
- Reducing unnecesary dependencies will decrease both the complexity and the chances of vulnerability in your application
Real world example of a painfully large Docker Image:
1. R - using a Rocker based image is large (I used rocker/verse, which preps for PDF generation, and is HUGE: 3.69GB)
6. Some tools, like dive, can help find heavy layers and show what files are being added in each layer
When building a Docker image, each stage begins with a "FROM" instruction. Your Dockerfile can have multiple "FROM" statements, and you can chose what artifacts pass through to the next stage (removing potential CRUFT)
When using multi-stage, you can include multiple stages in the same Dockerfile (as shown in the below example from Docker Docs)
By default, AWS does not utilize cache when buidling Docker images. There are ways to turn it on (ask Josh :) ), but I have found that with a large build using multistaging can also be useful.
See some options for using cache with AWS (and also probably still check with Josh):
Reference: https://docs.aws.amazon.com/codebuild/latest/userguide/build-caching.html
I was able to somewhat circumvent giant image issues by utilzing multi-staging, and the other methods listed above, although my image could defnitely still be further optimizated and could use some more fine tuning. That being said, my build time and final image size was reduced by >50%... saving ~25 minutes in build time. Which is very helpful when testing an application.