Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile staging and speed improvements #141

Closed
wants to merge 31 commits into from

Conversation

Nava-JoshLong
Copy link
Contributor

@Nava-JoshLong Nava-JoshLong commented Feb 15, 2023

Ticket

Resolves #137
Resolves #37
Resolves #23
Vulnerability Scanner Configuration Files - From changes in the infra repo

Changes

There are three main types of changes in this PR

Vulnerability Scanner Configuration Files

  • In the infra repo, we added a vulnerability scanner workflow that is called for PRs and merges to main to ensure that release images are as secure as we can make them
  • We cannot use the ECR image scan in all cases, like when an image is based off of the scratch image
  • The ECR image scan occurs after the image is pushed, so unless the image is removed if there are findings, it could still be used in production
  • Some findings from the scanners need to be recasted as a lower level, or ignored because of one reason or another
  • Each file specifies what is being changed, and why

Tweaked Makefile

  • Updated app/Makefile to have the required command and logic to build the release image

Multi-stage Dockerfile

  • The original Dockerfile had the local and release image as the same image, which included the testing packages which aren't needed for the release image
  • The commit adds multi-stage builds to better separate out the different images and their configurations
  • base is the base stage that the other images are based off of. This has the basic configuration for the app
  • dev is the local version of the image stage. This image is used for testing and linting before pushing the image to ECR and uses the poetry groups app and dev
  • release-build is the base for the release image. This image does not use the dev poetry group (the test suite packages), then builds the poetry wheel to use in the release image
  • release is the image used in ECS, it is based on scratch to reduce the image size as well as any areas that could be used for vulnerabilities. The necessary files are explicitly copied over from release-build, so any new files will have to be copied over using the same method
  • Updated the docker-compose file so that it knows to build the dev image locally instead of everything

Poetry Changes

  • Added comment to explain how to copy over non-python files (ie with a code-first swagger spec, you'd need to add openapi.yml)
  • Poetry's latest versions uses groups to differentiate the required packages for different applications or images. There was a change to have the groups for this app
  • The groups are split into three groups
    • Base -> These packages are the base for all groups, it is the base required packages required are for the application only, and will be in the release image
    • Dev -> These packages are for local only images, they include the linting and testing packages
  • In the NJ project, we noticed that the poetry build times were slowing down the entire process.
    • When we looked into it, we found out that when a version range is specified in poetry, poetry will need to resolve any issues or dependencies for all the package ranges
    • If we reduced the version ranges, or pinned to a specific version like in this PR, it would reduce the amount of the resolutions that poetry would need to perform
    • By pinning versions, we were able to the overall build time by ~4 minutes
    • Testing in the example app was able to reduce by 10 seconds, but other applications should see a greater reduction in time

Context for reviewers

Testing the dev version of the image uses the same commands as before, make init. To test the release image, run the following command make release-build; sleep 15; docker run --env-file ./app/local.env --name template -p 8081:8080 template-application-flask-app:latest. This will fail since the app doesn't have a database to hit

To test the vulnerability configs locally, you will need to copy the file from the infra repo into the workflows directory. Then, you will need to install act and run it with act push which will simulate a push event in GitHub. The jobs run at the same time, so the output will step on each other, but you should see successes at the end of each job. Note that it might run into API limits, so you might need to run each job by using act -j {JOB_NAME} changing {JOB_NAME} for the name of the job in the workflow

Testing

The image scan writes over it's output since each job is running at once, so it isn't clean to paste the output here. As for testing the release image, we would need to create a new env file that has the connection info to a local DB since we cannot use localhost as the container does not have the database built in

In the infra repo, there is a workflow that scans the release image for any vulnerabilities. These files are the configuration files for recasting or skipping vulnerabilities and why
 - Created root directory Makefile for infra workflows to call. These commands are the base ones needed for that logic
 - The updates to app/Makefile are for building the release image for ECS
This commit adds multi-stage build into Dockerfile so that we have a dev image that has the testing packages vs a release image that is built on scratch. Scratch is used as a base image so that it has only what is needed for the application and reduce any available methods for attacks
- Poetry's latest versions uses groups for specifying delineations between applications. This commit splits base dependencies from the app and dev dependencies. Base is used for scripts that can be ran from outside the app, such as the CSV command. App is for the application itself, and is the packages required for the release image. Dev is for local only images and includes the testing packages for the testing and linting suite
- The other change is to swap from using package version ranges to using a specified version. In the NJ app, we were seeing long build times in the poetry install step, and found out that it is because for each package range, poetry needs to resolve any issues with the other packages. By reducing the range, or pinning to a specific version as we did here, build times can be sped up greatly. We were able to reduce build times by 4 minutes
- The poetry wheel requires you specify any includes that would be needed, so you have to specify that the openapi.yml file is required
Commands were a holdover from tested changes that are no longer needed
Docker did not like the spaces in the declaration with --env-file
Copy link
Contributor

@lorenyu lorenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving an initial review for now, since we probably need to figure out what to do with the root level Makefile. One of the challenges of splitting up the templates across separate repos.

Makefile Outdated Show resolved Hide resolved
app/Dockerfile Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
app/pyproject.toml Outdated Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
app/pyproject.toml Outdated Show resolved Hide resolved
The root Makefile is copied over from the infra repo, so having this one would cause issues down the line
There is a PR out to fix this, so changing it here might cause issues downstream
Added comments to explain what some of the code does and the reasoning behind it
@@ -0,0 +1,26 @@
# List of vulnerabilities to ignore for the anchore scan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file looks to be the same as the one on template-infra so i think we can remove it, since it'll be copied over from template-infra when the project installs template-infra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually need to tweak the infra version since it has a python specific recast in it, the certifi line. Should I remove this file and leave it in infra, or keep it and tweak the infra version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see. so you're suggesting that we move the python specific thing from the infra repo to this repo? that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as we continue to build in this repo, there will probably be other things we need to recast or safelist that are just for this repo, so I think we should keep the config files here since they'd fix those issues

.dockleconfig Show resolved Hide resolved
@@ -0,0 +1,9 @@
# List of vulnerabilities to ignore for the trivy scan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm wondering if we can get rid of this file too since it looks to be the same as on template-infra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we continue to build in this repo, there will probably be other things we need to recast or safelist that are just for this repo, so I think we should keep the config files here since they'd fix those issues

app/Dockerfile Outdated Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
app/Dockerfile Outdated
Comment on lines 91 to 92
COPY pyproject.toml poetry.lock ./
RUN poetry install --no-root --with app --without dev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment for why we're specifically copying pyproject.toml and poetry.lock even though right after this we copy the entire . directory with COPY . .. My understanding is that functionally it's redundant, but that this way we can separate out the dependencies as an initial container layer to improve the cache performance of builds since dependencies don't change as often as the source code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped to just using the COPY . /app since it covered everything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there a performance gain to having it separate though? My understanding is that with the docker build cache, if the dependencies in pyproject.toml and poetry.lock don't change, Docker could reuse that layer, and only need to rebuild the layer with the source files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a slight gain in having them separate, but I think it would be on the local version only since I'm not sure how much GitHub caches between builds. I know there is a Docker cache action for workflows, but I think it is still in beta right now

app/Dockerfile Outdated Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
Nava-JoshLong added a commit to navapbc/template-infra that referenced this pull request Feb 16, 2023
## Ticket

Issue brought up from
navapbc/template-application-flask#141 (comment)

## Changes
- The removed lines are specific to a `Flask` finding, and shouldn't
have been merged into the infra repo since our other repos might not be
using `Flask`

## Context for reviewers
Removes a safelist for the `Anchore` scanner for `Flask` specific
applications. The pertinent safelist with any required configurations
will be the in the application repo themselves, this is more of a
template of what the files should look like

## Testing
No testing needed since it is just removing a config
.hadolint.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@lorenyu lorenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great thanks for being patient with me. I have a bunch of nits. Apologies for the nitpicks, but since most people using the template aren't going to be infra experts, I want to make sure the comments and stuff are as clear and precise (and educational) as possible.

app/Dockerfile Outdated Show resolved Hide resolved
app/Dockerfile Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
app/Makefile Show resolved Hide resolved
app/Makefile Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
app/Dockerfile Outdated Show resolved Hide resolved
The `openapi` file is generated by the code, so we don't need to specify it being here
@lorenyu
Copy link
Contributor

lorenyu commented Feb 16, 2023

btw i updated the PR description ticket section. if you put "Resolves #[ticket number]" it will automatically move the issue to the "Done" column in Github Projects

@Nava-JoshLong
Copy link
Contributor Author

btw i updated the PR description ticket section. if you put "Resolves #[ticket number]" it will automatically move the issue to the "Done" column in Github Projects

Thanks! I wasn't sure how to do that, so that helps. I updated the description section a bit to reflect the changes since the PR was opened

app/Dockerfile Outdated Show resolved Hide resolved
Nava-JoshLong and others added 7 commits March 7, 2023 09:40
Co-authored-by: Loren Yu <[email protected]>
Zip needs to have a strict param set in any calls for flake8, however this isn't removing the error locally. Pushing to test if this is a local issue or a flake8 issue
Last commit opened more issues with SQLAlchemy than it solved
@Nava-JoshLong
Copy link
Contributor Author

@lorenyu @chouinar Just a heads up, there are some things that will be broken in future module versions. I rolled back to the versions to something that works, but future versions are going to have issues. Keeping this as a running list

  • The newer versions of SQLAlchemy will break this line. Looks like the module changes where dbapi_connection is located
  • alembic has some issues with the run.py file in db/migrations, looks to be the function changed what it is expecting
  • APIFlask isn't able to write the openapi file in the directory in the newer versions, not too sure why, it looks to be a permissions error, but I tried different levels of permissions in the directory and couldn't get it to write. Looks to still be an issue, even though I rolled back to the version in main?

Still kicking this around while I am juggling other things

@lorenyu
Copy link
Contributor

lorenyu commented Mar 14, 2023

@Nava-JoshLong are you referring to future versions of SQLAlchemy or other packages? If you're referring to SQLAlchemy, could you add those notes to this ticket: #82?

@lorenyu
Copy link
Contributor

lorenyu commented Apr 12, 2023

@Nava-JoshLong you still working on this?

@Nava-JoshLong
Copy link
Contributor Author

@lorenyu I am, but I had to put it on the back burner since I have other work that popped up. I ran into an issue where somehow the workflows have errors, and I can't reproduce them locally. I rolled some changes back locally, and still can't see where the issues are coming from.

My plan is to try and split the overall changes up into multiple PRs in one of the NJ repos, and see if the same thing happens. If that works, I was going to close this PR, and do the same thing where I split it up and see if that works. The Freehold repo doesn't have these issues at all, so I'm not sure where or why these issues are happening. The only thing I can think of is that repo wasn't originally built off of this one, so there is some module that is causing all of my headaches

@lorenyu
Copy link
Contributor

lorenyu commented Apr 12, 2023

sounds good thanks!

@Nava-JoshLong
Copy link
Contributor Author

Closing this to split into smaller future PRs to address errors we were seeing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants