Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1937801 - Implement mechanism to use caches for common tools in run transforms #623

Merged
merged 6 commits into from
Jan 23, 2025

Conversation

ahal
Copy link
Collaborator

@ahal ahal commented Jan 7, 2025

This is an overhaul of how we set up caches in the run transforms. At a high level, this is:

  1. Creating a new mechanism to register caches for common tools (pip, uv, npm, cargo)
  2. Refactoring the existing checkouts cache to use this mechanism.
  3. Providing task and project wide configs to include or exclude caches
  4. Removing the cache-dotcache key

I recommend reviewing commit by commit.

Making this a draft PR for now so I have a chance to test it out with real tasks.

@ahal ahal self-assigned this Jan 7, 2025
@ahal ahal force-pushed the ahal/push-rwqssntoyqlt branch from 4e22fba to fe7762e Compare January 7, 2025 19:40
@ahal ahal added the BREAKING CHANGE Backwards incompatible request that will require major version bump label Jan 7, 2025
@ahal ahal force-pushed the ahal/push-rwqssntoyqlt branch 10 times, most recently from 0fdd5eb to f876c2b Compare January 13, 2025 20:30
@ahal ahal force-pushed the ahal/push-rwqssntoyqlt branch 5 times, most recently from 0efe39a to 129e34e Compare January 20, 2025 19:13
@ahal
Copy link
Collaborator Author

ahal commented Jan 20, 2025

This PR was tested in a try push in Gecko:
https://treeherder.mozilla.org/jobs?repo=try&revision=dcbe341f010d75c14a2b8f8983545ece6fbf2049

It should be ready for review

@ahal ahal force-pushed the ahal/push-rwqssntoyqlt branch from 129e34e to 271e6af Compare January 20, 2025 19:46
@ahal ahal marked this pull request as ready for review January 20, 2025 20:01
@ahal ahal requested review from a team and bhearsum January 20, 2025 20:01
@ahal ahal force-pushed the ahal/push-rwqssntoyqlt branch from 271e6af to 2bd5762 Compare January 20, 2025 21:19
Copy link
Contributor

@bhearsum bhearsum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes due to the failing tests (which look to be silly things, not indicative of major issues); this looks fine though.

if not task["run"].get("use-caches", True):
worker = task["worker"]
if worker["implementation"] not in ("docker-worker", "generic-worker"):
# caches support not implemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit odd to me that add_cache silently does nothing in certain cases. This is clearly existing behaviour though, and I imagine it would break a lot of things if it changed, so it's certainly not a blocker here.

"""Add caches for common tools."""
worker = task["worker"]
workdir = task["run"].get("workdir")
base_cache_dir = ".task-cache"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably fine to put these caches outside of their default place, and obviously it worked fine with all your testing with Gecko. I anticipate that some edge case tasks or issues will come up though (eg: scripts or other things that are trying to either pull things from a cache, or inject them, and not paying attention to the env vars when doing so). That's not something that ought to stop this work, but it may cause unexpected bustage when this is picked up in various places.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out!

The reason I couldn't use .cache is that when setting e.g VOLUME /builds/worker/.cache/pip in the the Dockerfile, image-builder actually creates that directory with root. The run-task script then chown's all VOLUMEs to the worker user, but it doesn't handle parent directories, so ~/.cache/pip ends up owned by worker, but ~/.cache ends up owned by root. This prevented Gecko from writing to ~/.cache and broke some tests in a subtle way ><. We could fix run-task to chown the whole path to a VOLUME, but I thought it would be cleanest to just use an entirely separate directory.

Another issue is that the default place is different for every platform. So we would need logic in Taskgraph that detects which platform the task is running and tries to set the default place accordingly per tool. This is further complicated by the fact that under Windows the default cache dir is under %LOCALAPPDATA%, which I'm not sure generic-worker would be capable of resolving? We could probably implement this, but being explicit with the envs seemed simpler.


Where caches can be any of ``cargo``, ``pip``, ``uv`` or ``npm``. If the task
was setting up ``.cache`` for another tool, a mount will need to be created
for it manually. If ``use-caches`` was previously set to ``false``, omit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including a block (here here or in a linked doc) about exactly what to add to a task to set up a new cache would be useful (I don't think it's obvious to all readers).

This also makes me wonder if it would be beneficial to allow additional caches to be added in the project-wide or task configuration (so that the mounts would be set up automatically). This is firmly in the category of "future enhancement", though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that makes sense. It should be possible by monkeypatching the CACHES dict, but we might want some kind of registry / decorator thing like we have for other things.

@ahal
Copy link
Collaborator Author

ahal commented Jan 23, 2025

Requesting changes due to the failing tests (which look to be silly things, not indicative of major issues); this looks fine though.

Oops, this is from that top commit I just pushed. It worked locally ><

@ahal ahal force-pushed the ahal/push-rwqssntoyqlt branch from 867356c to 2e1b540 Compare January 23, 2025 15:04
@ahal
Copy link
Collaborator Author

ahal commented Jan 23, 2025

Hm, clearly that test I added results in a different hash on Python 3.13 for some reason (I must have been using 3.13 when I wrote it, but now I'm using an earlier one)

@ahal ahal force-pushed the ahal/push-rwqssntoyqlt branch 2 times, most recently from 5c7f61d to 59f97e0 Compare January 23, 2025 15:49
ahal added 6 commits January 23, 2025 10:57
This allows tasks to opt into specific caches rather than all or
nothing.
We shouldn't set up a cache for every conceivable tool that's going to
be added to the `CACHES` config. Instead tasks should list out the ones
they need.

The checkout cache is going to be pretty standard and is a reasonable
default.
@ahal ahal force-pushed the ahal/push-rwqssntoyqlt branch from 59f97e0 to 7045aa5 Compare January 23, 2025 15:59
@ahal
Copy link
Collaborator Author

ahal commented Jan 23, 2025

Sorry for all the churn.. I reverted the problematic test and I'm going to re-land in a follow-up PR.

@ahal ahal requested a review from bhearsum January 23, 2025 16:10
@ahal ahal merged commit 1cae7e6 into taskcluster:main Jan 23, 2025
16 of 17 checks passed
@ahal ahal deleted the ahal/push-rwqssntoyqlt branch January 23, 2025 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BREAKING CHANGE Backwards incompatible request that will require major version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants