How to extract cache hits and misses along with spawn type for every action? #20347

alfredomusumeci · 2023-11-28T18:50:27Z

alfredomusumeci
Nov 28, 2023

I am trying to gain insights into which actions constitute a cache hit or miss during a bazel build. Not only that, but also whether they are "remote cache hit", "local", "remote" and so on. In the bazel event protocol (BEP), I can see a total summary of all actions in the section buildMetrics.runnerCount, e.g. (some dummy data below)

"runnerCount":[
  {"name": "total", "count":263406},
  {"name": "remote cache hit", "count":158034},
  {"name": "internal", "count":104827},
  {"name": "local", "count":40},
  {"name": "remote", "count":2}
]

however, I'd like to know exactly which actions constitute a remote cache hit or a local cache hit for example. The build event protocol doesn't seem to provide this information.

The execution log, instead, produced by using --execution_log_json_file reports for every action what is the 'runner', hence I can see exactly the info I need. The problem with that is that the execution log for big builds can amount to 100+ GB and makes build time very slow (whereas the very reason I want to find out about cache hits or misses is to speed up debugging and build time, so this seems counter-intuitive)

I know of tools such as BuildBuddy which provide information about cache hits or misses just using the build event stream. I am steering away from using these tools at the moment. Hence, I am assuming there must be a way to retrieve this information without having to use the execution log?

My questions are:

Is it possible to retrieve whether each action constitutes a hit or miss and what is the spawn type using ONLY the build event protocol?
If not, is the only alternative to get this information to use --execution_log_[binary/json]_file instead? How do tools such as buildBuddy retrieve this data then?
If the execution log is the only way, is it possible to stream it in a similar way to how build events are streamed using bes_backend?

Answered by tjgq

Nov 28, 2023

I don't believe the BEP has enough information to successfully debug cache misses. Even if it contained the cache hit/miss status for every spawn (which I don't believe it currently does), debugging the reason for a miss would in general require a list of inputs and their digests. This is too big to include in the BEP, which is why the execution log exists.

I acknowledge that the execution log is too bloated and too slow to produce (for basically any build of significant size). I'm currently designing an alternative format which should be much smaller and cheaper to produce. You can expect a prototype to be available at head within the next weeks, although it might take longer until it ca…

View full answer

tjgq · 2023-11-28T21:33:09Z

tjgq
Nov 28, 2023
Collaborator

I don't believe the BEP has enough information to successfully debug cache misses. Even if it contained the cache hit/miss status for every spawn (which I don't believe it currently does), debugging the reason for a miss would in general require a list of inputs and their digests. This is too big to include in the BEP, which is why the execution log exists.

I acknowledge that the execution log is too bloated and too slow to produce (for basically any build of significant size). I'm currently designing an alternative format which should be much smaller and cheaper to produce. You can expect a prototype to be available at head within the next weeks, although it might take longer until it can be relied upon as a stable feature. You can follow along at #18643.

I can't provide an answer regarding BuildBuddy specifically, but it seems to me that a remote caching/execution service should be able to infer cache hit/miss information by keeping track of the requests it receives from the client.

1 reply

alfredomusumeci Dec 1, 2023
Author

By the way, what is the expectation for this new format?

tjgq · 2023-11-28T21:39:39Z

tjgq
Nov 28, 2023
Collaborator

Also, a tip for using the currently available execution log: prefer the binary format (it's smaller and cheaper to produce than JSON) and consider setting --noexecution_log_sort if you don't need to compare logs.

3 replies

alfredomusumeci Dec 1, 2023
Author

Thanks for the suggestion.
Do you know how I can parse the execution log in binary format? For the build event stream I could use BuildEvenStreamProtos.BuildEvent.parseDelimiterFrom() but I don’t see an equivalent for the execution log in the source code?

tjgq Dec 1, 2023
Collaborator

They're delimited protos of this type: https://cs.opensource.google/bazel/bazel/+/master:src/main/protobuf/spawn.proto;l=116;drc=3e68f74f6f96715ce9bccf138ad67434e8595e99

alfredomusumeci Dec 1, 2023
Author

Thanks

sluongng · 2023-11-29T11:07:45Z

sluongng
Nov 29, 2023

To elaborate a bit on how BuildBuddy manages to do this:

We record all of the traffic going in/out of our Remote Cache.
Then we separate them by AC/CAS as well as correlate them based on the target we got from the request's Metadata,
as well as sort them by the order we receive the requests / request duration / file size (as in the digest).

We also provide some high-level overview on top of all these statistic at the top.
Finally we let our users correlate between the Bazel timing profile, cache requests, and remotely executed actions using the build target.

Side note: @tjgq this is why we brought up the prefetcher issue during BazelCon RBE BOF session.

For cache requests that are setting prefetcher instead of the build target in the metadata, our users would not be able to correlate these requests with the target that produced CAS entry as output, and targets that consumed the CAS entry as input. Some work toward improving the metadata here would let us correlate these requests much better.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to extract cache hits and misses along with spawn type for every action? #20347

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to extract cache hits and misses along with spawn type for every action? #20347

alfredomusumeci Nov 28, 2023

Replies: 3 comments · 4 replies

tjgq Nov 28, 2023 Collaborator

alfredomusumeci Dec 1, 2023 Author

tjgq Nov 28, 2023 Collaborator

alfredomusumeci Dec 1, 2023 Author

tjgq Dec 1, 2023 Collaborator

alfredomusumeci Dec 1, 2023 Author

sluongng Nov 29, 2023

alfredomusumeci
Nov 28, 2023

Replies: 3 comments 4 replies

tjgq
Nov 28, 2023
Collaborator

alfredomusumeci Dec 1, 2023
Author

tjgq
Nov 28, 2023
Collaborator

alfredomusumeci Dec 1, 2023
Author

tjgq Dec 1, 2023
Collaborator

alfredomusumeci Dec 1, 2023
Author

sluongng
Nov 29, 2023