You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I prototyped an idea about using a bazel worker and the naive implementation seems to copy the files over about 50% faster than using the cp process. I wanted to get some feedback if this kind of feature would be acceptable before spending more time on it (it's by no means ready for review yet).
This repository already has a tool written in go that copies directories copy_directory, so the toolchains and the release process are all in place. I would like to add a similar tool and related toolchains copy_file. Then in the copy_file_action this tool would be called instead of cp from coreutils.
The context for this change is that copying these files takes a while when there are a lot of source files. As an example of this case I created a small reproduction which generates 10_000 source src/*.js files and then builds a js_binary using all of those files, in tern calling copy_file_action on all of them. The whole build consists of only really copying files in this case so it's not a fair evaluation of how this feature would affect real builds, but gives some insights into the overhead of spawning so many processes.
With the latest [email protected] which uses cp the build takes around 80 seconds on my M1 macbook:
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 83.830s, Critical Path: 3.58s
INFO: 10004 processes: 3 action cache hit, 3 internal, 10001 local.
INFO: Build completed successfully, 10004 total actions
bazel build example 0.06s user 0.06s system 0% cpu 1:25.07 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 83.391s, Critical Path: 3.64s
INFO: 10004 processes: 3 action cache hit, 3 internal, 10001 local.
INFO: Build completed successfully, 10004 total actions
bazel build example 0.06s user 0.08s system 0% cpu 1:25.63 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 82.971s, Critical Path: 3.68s
INFO: 10004 processes: 3 action cache hit, 3 internal, 10001 local.
INFO: Build completed successfully, 10004 total actions
bazel build example 0.06s user 0.07s system 0% cpu 1:24.75 total
Where as using a 4 worker processes inflates the action count 2x (as each copy actions gets an additional WriteFile action for the argument file), however it seems to run around twice as fast using a singleplex proto worker written in go. There are additional actions in this output since the build of the copy_file toolchain is cached as part of the build too:
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 40.807s, Critical Path: 4.89s
INFO: 20005 processes: 37 action cache hit, 10004 internal, 10001 worker.
INFO: Build completed successfully, 20005 total actions
bazel build example 0.05s user 0.06s system 0% cpu 43.757 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 37.610s, Critical Path: 4.96s
INFO: 20004 processes: 38 action cache hit, 10003 internal, 10001 worker.
INFO: Build completed successfully, 20004 total actions
bazel build example 0.04s user 0.05s system 0% cpu 39.256 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 36.599s, Critical Path: 6.42s
INFO: 20004 processes: 4 action cache hit, 10003 internal, 10001 worker.
INFO: Build completed successfully, 20004 total actions
bazel build example 0.04s user 0.05s system 0% cpu 37.736 total
I am yet to test this on other platforms, but it looks promising. Let me know what you think
The text was updated successfully, but these errors were encountered:
Idea sounds interesting, I have dealt with workers for a long time before. It gets nasty real quick if you are not careful. Obviously some of this overhead is simply from Bazel spawning actions. We could probably design a new copy_file_bulk api that copies files at once, or at least tries to. That idea is more interesting than hacking around the fact that we spawn an per copy.
We could probably design a new copy_file_bulk api that copies files at once, or at least tries to.
There's already a bulk API that rules_js and friends make use of: copy_files_to_bin_actions. And we already have a toolchain in this repo that can move files around the way we need:bsdtar xf MTREE --options=mtree:checkfs.
So it wasn't too much work to throw together a prototype that does this bulk copy action. plobsing@ffcfdff
That idea is more interesting than hacking around the fact that we spawn an per copy.
When testing my prototype, I ran into an interesting property that can probably only be had from action-per-copy. Action deduplication. If the same copy gets declared multiple times, that's not an error. That only works because the copy_file_action looks exactly the same no matter who declares it. A bulk action is going to have a hard time doing the same.
There's even a unit test for this behaviour, so someone somewhere found this important at some point:
Hi, I prototyped an idea about using a bazel worker and the naive implementation seems to copy the files over about 50% faster than using the
cp
process. I wanted to get some feedback if this kind of feature would be acceptable before spending more time on it (it's by no means ready for review yet).This repository already has a tool written in
go
that copies directoriescopy_directory
, so the toolchains and the release process are all in place. I would like to add a similar tool and related toolchainscopy_file
. Then in thecopy_file_action
this tool would be called instead ofcp
fromcoreutils
.The context for this change is that copying these files takes a while when there are a lot of source files. As an example of this case I created a small reproduction which generates
10_000
sourcesrc/*.js
files and then builds ajs_binary
using all of those files, in tern callingcopy_file_action
on all of them. The whole build consists of only really copying files in this case so it's not a fair evaluation of how this feature would affect real builds, but gives some insights into the overhead of spawning so many processes.With the latest
[email protected]
which usescp
the build takes around 80 seconds on my M1 macbook:Where as using a 4 worker processes inflates the action count 2x (as each copy actions gets an additional
WriteFile
action for the argument file), however it seems to run around twice as fast using a singleplexproto
worker written ingo
. There are additional actions in this output since the build of thecopy_file
toolchain is cached as part of the build too:I am yet to test this on other platforms, but it looks promising. Let me know what you think
The text was updated successfully, but these errors were encountered: