How to extract cache hits and misses along with spawn type for every action? #20347
-
I am trying to gain insights into which actions constitute a cache hit or miss during a bazel build. Not only that, but also whether they are "remote cache hit", "local", "remote" and so on. In the bazel event protocol (BEP), I can see a total summary of all actions in the section buildMetrics.runnerCount, e.g. (some dummy data below)
however, I'd like to know exactly which actions constitute a remote cache hit or a local cache hit for example. The build event protocol doesn't seem to provide this information. The execution log, instead, produced by using --execution_log_json_file reports for every action what is the 'runner', hence I can see exactly the info I need. The problem with that is that the execution log for big builds can amount to 100+ GB and makes build time very slow (whereas the very reason I want to find out about cache hits or misses is to speed up debugging and build time, so this seems counter-intuitive) I know of tools such as BuildBuddy which provide information about cache hits or misses just using the build event stream. I am steering away from using these tools at the moment. Hence, I am assuming there must be a way to retrieve this information without having to use the execution log? My questions are:
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
I don't believe the BEP has enough information to successfully debug cache misses. Even if it contained the cache hit/miss status for every spawn (which I don't believe it currently does), debugging the reason for a miss would in general require a list of inputs and their digests. This is too big to include in the BEP, which is why the execution log exists. I acknowledge that the execution log is too bloated and too slow to produce (for basically any build of significant size). I'm currently designing an alternative format which should be much smaller and cheaper to produce. You can expect a prototype to be available at head within the next weeks, although it might take longer until it can be relied upon as a stable feature. You can follow along at #18643. I can't provide an answer regarding BuildBuddy specifically, but it seems to me that a remote caching/execution service should be able to infer cache hit/miss information by keeping track of the requests it receives from the client. |
Beta Was this translation helpful? Give feedback.
-
Also, a tip for using the currently available execution log: prefer the binary format (it's smaller and cheaper to produce than JSON) and consider setting |
Beta Was this translation helpful? Give feedback.
-
To elaborate a bit on how BuildBuddy manages to do this: We record all of the traffic going in/out of our Remote Cache. We also provide some high-level overview on top of all these statistic at the top. Side note: @tjgq this is why we brought up the For cache requests that are setting |
Beta Was this translation helpful? Give feedback.
I don't believe the BEP has enough information to successfully debug cache misses. Even if it contained the cache hit/miss status for every spawn (which I don't believe it currently does), debugging the reason for a miss would in general require a list of inputs and their digests. This is too big to include in the BEP, which is why the execution log exists.
I acknowledge that the execution log is too bloated and too slow to produce (for basically any build of significant size). I'm currently designing an alternative format which should be much smaller and cheaper to produce. You can expect a prototype to be available at head within the next weeks, although it might take longer until it ca…