Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid caching images that are retrieved with pull #13

Open
axwalker opened this issue Aug 3, 2020 · 10 comments
Open

Avoid caching images that are retrieved with pull #13

axwalker opened this issue Aug 3, 2020 · 10 comments
Labels
enhancement New feature or request

Comments

@axwalker
Copy link

axwalker commented Aug 3, 2020

Is your feature request related to a problem? Please describe.
The Post Run satackey/[email protected] step is quite slow for our builds because it uploads a bunch of images that we already use pull for.

Describe the solution you'd like
Since we pull them anyway, it would be good to not bother caching them at all. Is there a way to only cache those images which are not retrieved through pull.

Describe alternatives you've considered
The alternative at the moment is to not use the caching at all, because the cache upload takes longer than the original build normally takes.

@axwalker axwalker added the enhancement New feature or request label Aug 3, 2020
@charleskorn
Copy link

I'm not sure that it's possible to know whether an image was pulled or built locally - could this perhaps be implemented as an option where you can restrict which images are cached based on their name?

@axwalker
Copy link
Author

axwalker commented Aug 6, 2020

Having something like an ignorePattern where you can give a regex for images to ignore would potentially solve our issue.

@satackey
Copy link
Owner

satackey commented Aug 8, 2020

In the main run of this action, the action saves the list of images that exist, which are not cached in the post run.
They will be excluded from the post run cache. (Because of this, cached container images in the hosted runnner are not cached.)

    steps:
    # not cached steps

    - uses: satackey/[email protected]

    # cached steps

I think pulling before uses: satackey/[email protected] would solve this problem, but any special situations?

@ForbesLindesay
Copy link

It's also currently re-pushing all the layers that it fetched from the cache whenever any layer changes. I think this is the main reason it ends up being so slow even for builds that are 90% cache hits.

@rcowsill
Copy link
Contributor

rcowsill commented Dec 5, 2020

I think pulling before uses: satackey/[email protected] would solve this problem, but any special situations?

One downside to that is that it forces you to pull before building. That might not be ideal if you're pulling a large image that's used to test the image you're building, but the build fails. You'd wait for the slow pull to complete before finding out that the build failed so the pulled image is not needed any more.

It's also currently re-pushing all the layers that it fetched from the cache whenever any layer changes. I think this is the main reason it ends up being so slow even for builds that are 90% cache hits.

#98 is a suggestion for how to avoid this problem. It will avoid reuploading the layer content for cache hits, as well as allowing safe sharing of cached layers between workflows.

@andy-maier
Copy link

Not everyone builds images. For example, we are pulling images from DockerHub because we use them. We are not building them at all.

So not caching pulled images makes this cache completely useless for anyone with that use case.

@MostefaKamalLala
Copy link

MostefaKamalLala commented Jun 4, 2021

What about the images that are already present in the runners. For instance, I'm working with windows runner, is it possible to avoid caching pre-installed images?
image

Also, one of my docker image build is about 5 min.

image

As u can see the caching action post run takes 15 mins. After re-running the same workflow, it takes almost 13 mins to download the cache, the build itself is blazing fast 2 sec. However 13 min + the post run action that is still running atm and is at 7+ mins. It's way more then the original 5 mins without caching.

image

I'm testing it now with another workflow that takes much more time to see if I gain time.

I am doing anything wrong?
I used the action at the second step of the jobs, after the checkout.

EDIT:
just noticed this is related to this

@rcowsill
Copy link
Contributor

rcowsill commented Jun 4, 2021

@MostefaKamalLala The pre-installed images are automatically skipped when writing the cache; your first screenshot shows the action detecting those images before your docker build step.

I think something else is causing the cache "Post run" to be so slow, but don't know what it could be without seeing the debug logs for that part of the run. Can you share a link to your run logs?

@MostefaKamalLala
Copy link

@MostefaKamalLala The pre-installed images are automatically skipped when writing the cache; your first screenshot shows the action detecting those images before your docker build step.

I think something else is causing the cache "Post run" to be so slow, but don't know what it could be without seeing the debug logs for that part of the run. Can you share a link to your run logs?

Yes of course, here is the log.
logs_6468.zip

@rcowsill
Copy link
Contributor

rcowsill commented Jun 4, 2021

Ok, your dockerfile starts with FROM mcr.microsoft.com/dotnet/framework/wcf:4.7.2-windowsservercore-ltsc2019, and that image isn't pre-installed. That means it gets pulled on the first build, and also gets cached as a result. It looks like it's a pretty big image too; I pulled the closest version compatible with my machine and it's about 14Gb unpacked.

This can be avoided by pulling that image before the Run satackey/[email protected] step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants