-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make our Dangerzone image reproducible #1049
base: main
Are you sure you want to change the base?
Conversation
Move container-only build context - currently just the entrypoint script - from `dangerzone/gvisor_wrapper` to `dangerzone/container`. Update the rest of the scripts to use this location as well.
Remove the need to copy the Dangerzone container image (used by the inner container) within a wrapper gVisor image (used by the outer container). Instead, use the root of the container filesystem for both containers. We can do this safely because we don't mount any secrets to the container, and because gVisor offers a read-only view of the underlying filesystem Fixes #1048
Download and copy the following artifacts that will be used for building a Debian-based Dangerzone container image in the subsequent commits: * The APT key for the gVisor repo [1] * A helper script for building reproducible Debian images [2] [1] https://gvisor.dev/archive.key [2] https://github.com/reproducible-containers/repro-sources-list.sh/blob/d15cf12b26395b857b24fba223b108aff1c91b26/repro-sources-list.sh
Remove all the scaffolding in our `build-image.py` script for using the `poetry.lock` file, now that we install PyMuPDF from the Debian repos.
Remove our suggestions for not using the container cache, which stemmed from the fact that our Dangerzone image was not reproducible. Now that we have switched to Debian Stable and the Dockerfile is all we need to reproducibly build the exact same container image, we can just use the cache to speed up builds.
Add jinja2-cli as a package dependency, since it will be used to create the Dockerfile from some user parameters and a template.
Allow updating the Dockerfile from a template and some envs, so that it's easier to bump the dates in it.
Allow setting a tag for the container image, when building it with the `build-image.py` script. This should be used for development purposes only, since the proper image name should be dictated by the script.
Add a dev script for Linux platforms that verifies that a source image can be reproducibly built from the current Git commit. The reproducibility check is enforced by the `diffoci` tool, which is downloaded as part of running the script.
Add a CI job that uses the `reproduce.py` dev script to enforce image reproducibility, for every PR that we send to the repo. Fixes #1047
5cea5e1
to
279322b
Compare
c1f5d75
to
e02dbfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this!
I added a few comments here and there, and will probably come back to it once I've run it on my laptop. It's pretty neat!
|
||
FROM debian:bookworm-${DEBIAN_IMAGE_DATE}-slim | ||
|
||
ARG GVISOR_ARCHIVE_DATE=20250106 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding links here could allow us to follow them when we want to update the versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should we get these directly from the Dockerfile.env
file instead of defining them twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point. So, I've created an issue for automating the bump of these values, since I wanted to pursue it in this PR, but I didn't have time (check #1050). As for the URLs, I've added them to Dockerfile.env
as comments (see 96ab442, which also fixes a bug). As for the values in Dockerfile
, they are actually generated from the following command:
poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile
So, they are practically defined once. There's also a CI job that makes sure that the Dockerfile
does not deviate from the template and committed params. Are you ok with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't realize Dockerfile
was an artifact now then.
Should we then just check-in Dockerfile.in
in our Git tree and remove Dockerfile
from it (and add it to .gitignore
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it doesn't hurt committing it, and it's easier for viewers of the repo to see at a glance what are the contents of the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the "cons" side, it means that all our changes to the containers will be duplicated in these two files each time we do a change. I would prefer otherwise but can live with it if you have strong feeling on this.
If we decide to go down that road, we might want to have a CI task ensuring that Docker
is then the output of Docker.in
processed so the two don't end up out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to go down that road, we might want to have a CI task ensuring that Docker is then the output of Docker.in processed so the two don't end up out of sync.
Yeap, I agree. As I mentioned in #1049 (comment), there's one already in place:
dangerzone/.github/workflows/ci.yml
Lines 490 to 493 in b8755f2
- name: Verify that the Dockerfile matches the commited template and params | |
run: |- | |
poetry run jinja2 Dockerfile.in Dockerfile.env > out | |
diff Dockerfile out |
Dockerfile
Outdated
|
||
# Create an unprivileged user both for gVisor and for running Dangerzone. | ||
RUN mkdir -p /opt/dangerzone/dangerzone && \ | ||
touch /opt/dangerzone/dangerzone/__init__.py && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to create this __init__.py
file here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, fixed in cbeb103
conversion/common.py \ | ||
conversion/errors.py \ | ||
conversion/__init__.py \ | ||
/opt/dangerzone/dangerzone/conversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you're copying files from the dangerzone.conversion
python package. Should we copy all of the ones from this folder instead? Also, I'm starting to wonder if it could make sense to have a separate package for this, if we start to copy files manually. Just a thought, but I'm curious about what you'd think of it 💭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm copying the .py
files specifically for two reasons: a) to avoid picking up the .pyc
files, and b) to make the list of copied files more explicit for reproducibility reasons. As for having a separate package, I think it will add a bit more complexity than strictly necessary for this invocation. I'd like to know if you have something particular in mind though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit brittle to copy these manually, as it can break when we start adding dependencies to these files for instance. Manually choosing what to copy adds some logic here that could be removed by grouping the items we want to copy together and copying the folder.
I get what you mean about the .pyc
files, I suppose it's possible to remove them after copying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think a middle ground is to copy conversion/*.py
, which would future-proof us against that scenario you mention. What do you think?
--ignore-timestamps --ignore-image-name --verbose | ||
``` | ||
|
||
### Updating the image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we're missing a howto check reproducibility for old images. Maybe rephrasing a bit this section could help in this regard (as I believe it would be updating the version numbers / dates and rebuilding the image, then diffing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already have a "Reproducing the image" section in the design doc, not sure if you've seen it. If it's not clear what the user should do, here's an update in the wording: aa710e8
install/common/build-image.py
Outdated
# 3. If the contents of the Git repo are dirty, we will append a unique identifier | ||
# for this run, something like `0.8.0-31-g6bdaa7a-fdcb` or `0.8.0-fdcb`. | ||
dirty_ident = secrets.token_hex(2) | ||
tag = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could go in its own function to ease readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in 92d8a4c
@@ -28,7 +28,7 @@ def main(): | |||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this, I wonder if we should rename the vendor-pymupdf
to something specifically targeting debian-based distributions.
Co-authored-by: Alexis Métaireau <[email protected]>
Co-authored-by: Alexis Métaireau <[email protected]>
Co-authored-by: Alexis Métaireau <[email protected]>
This PR makes our container image reproducible, and enforces it with a CI job on every commit. To do so, we also change our base image from Alpine Linux to Debian Stable, and we use some prior art and tooling from the work that @AkihiroSuda has done on reproducible containers.
Fixes #1046
Fixes #1047
Fixes #1048