-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerfile staging and speed improvements #141
Conversation
In the infra repo, there is a workflow that scans the release image for any vulnerabilities. These files are the configuration files for recasting or skipping vulnerabilities and why
- Created root directory Makefile for infra workflows to call. These commands are the base ones needed for that logic - The updates to app/Makefile are for building the release image for ECS
This commit adds multi-stage build into Dockerfile so that we have a dev image that has the testing packages vs a release image that is built on scratch. Scratch is used as a base image so that it has only what is needed for the application and reduce any available methods for attacks
- Poetry's latest versions uses groups for specifying delineations between applications. This commit splits base dependencies from the app and dev dependencies. Base is used for scripts that can be ran from outside the app, such as the CSV command. App is for the application itself, and is the packages required for the release image. Dev is for local only images and includes the testing packages for the testing and linting suite - The other change is to swap from using package version ranges to using a specified version. In the NJ app, we were seeing long build times in the poetry install step, and found out that it is because for each package range, poetry needs to resolve any issues with the other packages. By reducing the range, or pinning to a specific version as we did here, build times can be sped up greatly. We were able to reduce build times by 4 minutes - The poetry wheel requires you specify any includes that would be needed, so you have to specify that the openapi.yml file is required
Commands were a holdover from tested changes that are no longer needed
Docker did not like the spaces in the declaration with --env-file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving an initial review for now, since we probably need to figure out what to do with the root level Makefile. One of the challenges of splitting up the templates across separate repos.
The root Makefile is copied over from the infra repo, so having this one would cause issues down the line
There is a PR out to fix this, so changing it here might cause issues downstream
Added comments to explain what some of the code does and the reasoning behind it
@@ -0,0 +1,26 @@ | |||
# List of vulnerabilities to ignore for the anchore scan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file looks to be the same as the one on template-infra so i think we can remove it, since it'll be copied over from template-infra when the project installs template-infra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually need to tweak the infra version since it has a python specific recast in it, the certifi
line. Should I remove this file and leave it in infra, or keep it and tweak the infra version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see. so you're suggesting that we move the python specific thing from the infra repo to this repo? that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as we continue to build in this repo, there will probably be other things we need to recast or safelist that are just for this repo, so I think we should keep the config files here since they'd fix those issues
@@ -0,0 +1,9 @@ | |||
# List of vulnerabilities to ignore for the trivy scan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm wondering if we can get rid of this file too since it looks to be the same as on template-infra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we continue to build in this repo, there will probably be other things we need to recast or safelist that are just for this repo, so I think we should keep the config files here since they'd fix those issues
app/Dockerfile
Outdated
COPY pyproject.toml poetry.lock ./ | ||
RUN poetry install --no-root --with app --without dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment for why we're specifically copying pyproject.toml
and poetry.lock
even though right after this we copy the entire .
directory with COPY . .
. My understanding is that functionally it's redundant, but that this way we can separate out the dependencies as an initial container layer to improve the cache performance of builds since dependencies don't change as often as the source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapped to just using the COPY . /app
since it covered everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there a performance gain to having it separate though? My understanding is that with the docker build cache, if the dependencies in pyproject.toml and poetry.lock don't change, Docker could reuse that layer, and only need to rebuild the layer with the source files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a slight gain in having them separate, but I think it would be on the local version only since I'm not sure how much GitHub caches between builds. I know there is a Docker cache action for workflows, but I think it is still in beta right now
## Ticket Issue brought up from navapbc/template-application-flask#141 (comment) ## Changes - The removed lines are specific to a `Flask` finding, and shouldn't have been merged into the infra repo since our other repos might not be using `Flask` ## Context for reviewers Removes a safelist for the `Anchore` scanner for `Flask` specific applications. The pertinent safelist with any required configurations will be the in the application repo themselves, this is more of a template of what the files should look like ## Testing No testing needed since it is just removing a config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great thanks for being patient with me. I have a bunch of nits. Apologies for the nitpicks, but since most people using the template aren't going to be infra experts, I want to make sure the comments and stuff are as clear and precise (and educational) as possible.
The `openapi` file is generated by the code, so we don't need to specify it being here
btw i updated the PR description ticket section. if you put "Resolves #[ticket number]" it will automatically move the issue to the "Done" column in Github Projects |
Thanks! I wasn't sure how to do that, so that helps. I updated the description section a bit to reflect the changes since the PR was opened |
Co-authored-by: Loren Yu <[email protected]>
Zip needs to have a strict param set in any calls for flake8, however this isn't removing the error locally. Pushing to test if this is a local issue or a flake8 issue
Last commit opened more issues with SQLAlchemy than it solved
@lorenyu @chouinar Just a heads up, there are some things that will be broken in future module versions. I rolled back to the versions to something that works, but future versions are going to have issues. Keeping this as a running list
Still kicking this around while I am juggling other things |
@Nava-JoshLong are you referring to future versions of SQLAlchemy or other packages? If you're referring to SQLAlchemy, could you add those notes to this ticket: #82? |
@Nava-JoshLong you still working on this? |
@lorenyu I am, but I had to put it on the back burner since I have other work that popped up. I ran into an issue where somehow the workflows have errors, and I can't reproduce them locally. I rolled some changes back locally, and still can't see where the issues are coming from. My plan is to try and split the overall changes up into multiple PRs in one of the NJ repos, and see if the same thing happens. If that works, I was going to close this PR, and do the same thing where I split it up and see if that works. The Freehold repo doesn't have these issues at all, so I'm not sure where or why these issues are happening. The only thing I can think of is that repo wasn't originally built off of this one, so there is some module that is causing all of my headaches |
sounds good thanks! |
Closing this to split into smaller future PRs to address errors we were seeing |
Ticket
Resolves #137
Resolves #37
Resolves #23
Vulnerability Scanner Configuration Files - From changes in the infra repo
Changes
There are three main types of changes in this PR
Vulnerability Scanner Configuration Files
scratch
imageTweaked Makefile
app/Makefile
to have the required command and logic to build the release imageMulti-stage Dockerfile
Dockerfile
had the local and release image as the same image, which included the testing packages which aren't needed for the release imagebase
is the base stage that the other images are based off of. This has the basic configuration for the appdev
is the local version of the image stage. This image is used for testing and linting before pushing the image to ECR and uses the poetry groupsapp
anddev
release-build
is the base for the release image. This image does not use thedev
poetry group (the test suite packages), then builds the poetry wheel to use in the release imagerelease
is the image used in ECS, it is based onscratch
to reduce the image size as well as any areas that could be used for vulnerabilities. The necessary files are explicitly copied over fromrelease-build
, so any new files will have to be copied over using the same methoddocker-compose
file so that it knows to build thedev
image locally instead of everythingPoetry Changes
openapi.yml
)Context for reviewers
Testing the
dev
version of the image uses the same commands as before,make init
. To test the release image, run the following commandmake release-build; sleep 15; docker run --env-file ./app/local.env --name template -p 8081:8080 template-application-flask-app:latest
. This will fail since the app doesn't have a database to hitTo test the vulnerability configs locally, you will need to copy the file from the infra repo into the workflows directory. Then, you will need to install
act
and run it withact push
which will simulate a push event in GitHub. The jobs run at the same time, so the output will step on each other, but you should see successes at the end of each job. Note that it might run into API limits, so you might need to run each job by usingact -j {JOB_NAME}
changing{JOB_NAME}
for the name of the job in the workflowTesting
The image scan writes over it's output since each job is running at once, so it isn't clean to paste the output here. As for testing the release image, we would need to create a new env file that has the connection info to a local DB since we cannot use localhost as the container does not have the database built in