-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1937801 - Implement mechanism to use caches for common tools in run transforms #623
Conversation
4e22fba
to
fe7762e
Compare
0fdd5eb
to
f876c2b
Compare
0efe39a
to
129e34e
Compare
This PR was tested in a try push in Gecko: It should be ready for review |
129e34e
to
271e6af
Compare
271e6af
to
2bd5762
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes due to the failing tests (which look to be silly things, not indicative of major issues); this looks fine though.
if not task["run"].get("use-caches", True): | ||
worker = task["worker"] | ||
if worker["implementation"] not in ("docker-worker", "generic-worker"): | ||
# caches support not implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit odd to me that add_cache
silently does nothing in certain cases. This is clearly existing behaviour though, and I imagine it would break a lot of things if it changed, so it's certainly not a blocker here.
"""Add caches for common tools.""" | ||
worker = task["worker"] | ||
workdir = task["run"].get("workdir") | ||
base_cache_dir = ".task-cache" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably fine to put these caches outside of their default place, and obviously it worked fine with all your testing with Gecko. I anticipate that some edge case tasks or issues will come up though (eg: scripts or other things that are trying to either pull things from a cache, or inject them, and not paying attention to the env vars when doing so). That's not something that ought to stop this work, but it may cause unexpected bustage when this is picked up in various places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call out!
The reason I couldn't use .cache
is that when setting e.g VOLUME /builds/worker/.cache/pip
in the the Dockerfile, image-builder
actually creates that directory with root. The run-task
script then chown's all VOLUMEs to the worker user, but it doesn't handle parent directories, so ~/.cache/pip
ends up owned by worker, but ~/.cache
ends up owned by root. This prevented Gecko from writing to ~/.cache
and broke some tests in a subtle way ><. We could fix run-task
to chown the whole path to a VOLUME, but I thought it would be cleanest to just use an entirely separate directory.
Another issue is that the default place is different for every platform. So we would need logic in Taskgraph that detects which platform the task is running and tries to set the default place accordingly per tool. This is further complicated by the fact that under Windows the default cache dir is under %LOCALAPPDATA%
, which I'm not sure generic-worker
would be capable of resolving? We could probably implement this, but being explicit with the envs seemed simpler.
|
||
Where caches can be any of ``cargo``, ``pip``, ``uv`` or ``npm``. If the task | ||
was setting up ``.cache`` for another tool, a mount will need to be created | ||
for it manually. If ``use-caches`` was previously set to ``false``, omit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including a block (here here or in a linked doc) about exactly what to add to a task to set up a new cache would be useful (I don't think it's obvious to all readers).
This also makes me wonder if it would be beneficial to allow additional caches to be added in the project-wide or task configuration (so that the mounts would be set up automatically). This is firmly in the category of "future enhancement", though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that makes sense. It should be possible by monkeypatching the CACHES dict, but we might want some kind of registry / decorator thing like we have for other things.
Oops, this is from that top commit I just pushed. It worked locally >< |
867356c
to
2e1b540
Compare
Hm, clearly that test I added results in a different hash on Python 3.13 for some reason (I must have been using 3.13 when I wrote it, but now I'm using an earlier one) |
5c7f61d
to
59f97e0
Compare
This allows tasks to opt into specific caches rather than all or nothing.
We shouldn't set up a cache for every conceivable tool that's going to be added to the `CACHES` config. Instead tasks should list out the ones they need. The checkout cache is going to be pretty standard and is a reasonable default.
59f97e0
to
7045aa5
Compare
Sorry for all the churn.. I reverted the problematic test and I'm going to re-land in a follow-up PR. |
This is an overhaul of how we set up caches in the run transforms. At a high level, this is:
cache-dotcache
keyI recommend reviewing commit by commit.
Making this a draft PR for now so I have a chance to test it out with real tasks.