Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR]: Use bazel worker for copy_file_action #1046

Open
Andrius-B opened this issue Feb 8, 2025 · 2 comments
Open

[FR]: Use bazel worker for copy_file_action #1046

Andrius-B opened this issue Feb 8, 2025 · 2 comments

Comments

@Andrius-B
Copy link

Hi, I prototyped an idea about using a bazel worker and the naive implementation seems to copy the files over about 50% faster than using the cp process. I wanted to get some feedback if this kind of feature would be acceptable before spending more time on it (it's by no means ready for review yet).
This repository already has a tool written in go that copies directories copy_directory, so the toolchains and the release process are all in place. I would like to add a similar tool and related toolchains copy_file. Then in the copy_file_action this tool would be called instead of cp from coreutils.

The context for this change is that copying these files takes a while when there are a lot of source files. As an example of this case I created a small reproduction which generates 10_000 source src/*.js files and then builds a js_binary using all of those files, in tern calling copy_file_action on all of them. The whole build consists of only really copying files in this case so it's not a fair evaluation of how this feature would affect real builds, but gives some insights into the overhead of spawning so many processes.

With the latest [email protected] which uses cp the build takes around 80 seconds on my M1 macbook:

$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 83.830s, Critical Path: 3.58s
INFO: 10004 processes: 3 action cache hit, 3 internal, 10001 local.
INFO: Build completed successfully, 10004 total actions
bazel build example  0.06s user 0.06s system 0% cpu 1:25.07 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 83.391s, Critical Path: 3.64s
INFO: 10004 processes: 3 action cache hit, 3 internal, 10001 local.
INFO: Build completed successfully, 10004 total actions
bazel build example  0.06s user 0.08s system 0% cpu 1:25.63 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 82.971s, Critical Path: 3.68s
INFO: 10004 processes: 3 action cache hit, 3 internal, 10001 local.
INFO: Build completed successfully, 10004 total actions
bazel build example  0.06s user 0.07s system 0% cpu 1:24.75 total

Where as using a 4 worker processes inflates the action count 2x (as each copy actions gets an additional WriteFile action for the argument file), however it seems to run around twice as fast using a singleplex proto worker written in go. There are additional actions in this output since the build of the copy_file toolchain is cached as part of the build too:

$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 40.807s, Critical Path: 4.89s
INFO: 20005 processes: 37 action cache hit, 10004 internal, 10001 worker.
INFO: Build completed successfully, 20005 total actions
bazel build example  0.05s user 0.06s system 0% cpu 43.757 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 37.610s, Critical Path: 4.96s
INFO: 20004 processes: 38 action cache hit, 10003 internal, 10001 worker.
INFO: Build completed successfully, 20004 total actions
bazel build example  0.04s user 0.05s system 0% cpu 39.256 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 36.599s, Critical Path: 6.42s
INFO: 20004 processes: 4 action cache hit, 10003 internal, 10001 worker.
INFO: Build completed successfully, 20004 total actions
bazel build example  0.04s user 0.05s system 0% cpu 37.736 total

I am yet to test this on other platforms, but it looks promising. Let me know what you think

@thesayyn
Copy link
Collaborator

Idea sounds interesting, I have dealt with workers for a long time before. It gets nasty real quick if you are not careful. Obviously some of this overhead is simply from Bazel spawning actions. We could probably design a new copy_file_bulk api that copies files at once, or at least tries to. That idea is more interesting than hacking around the fact that we spawn an per copy.

@plobsing
Copy link
Contributor

We could probably design a new copy_file_bulk api that copies files at once, or at least tries to.

There's already a bulk API that rules_js and friends make use of: copy_files_to_bin_actions. And we already have a toolchain in this repo that can move files around the way we need:bsdtar xf MTREE --options=mtree:checkfs.

So it wasn't too much work to throw together a prototype that does this bulk copy action. plobsing@ffcfdff

That idea is more interesting than hacking around the fact that we spawn an per copy.

When testing my prototype, I ran into an interesting property that can probably only be had from action-per-copy. Action deduplication. If the same copy gets declared multiple times, that's not an error. That only works because the copy_file_action looks exactly the same no matter who declares it. A bulk action is going to have a hard time doing the same.

There's even a unit test for this behaviour, so someone somewhere found this important at some point:

# Case: two different targets copying same file to bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants