A way to support repositories that need Bazel for being fetched #20464

lberki · 2023-12-07T07:54:09Z

lberki
Dec 7, 2023
Maintainer

Every once a while we get a request of the sort "I want to generate a BUILD file from a Bazel action" or "I want to run Bazel as part of fetching an external repository" (see the most recent such proposal here)

The ideological objection to this is that fetching repositories should be about collecting source code Bazel then builds and not about building things. However, there are some issues with this dogma; the most salient I know of is that fetching source code (especially if one wants to do it hermetically) sometimes requires unusual tools and currently the only two options Bazel offers for building these tools is either to make them build-less (e.g. a shell script) or reimplementing building them in a hacky way inside Bazel, whose whole business is building software.

The architectural objection to this is that making BUILD files depend on the result of actions would alter how Bazel works very significantly. For example, the semantics of query, cquery and aquery would become much more complicated: query in particular offers the guarantee that it can return all the files needed to load and analyze a given top-level target. If reading BUILD files required running actions, this would also require returning all the input files of those actions there. They would also need to run actions, talk to RBE, have the pertinent command line options, etc.

At the very least, the above makes implementing generating BUILD files from Bazel actions very difficult. That in practice translates to calendar time and bugs.

The straightforward solution to this is to just invoke Bazel in a subprocess during repository fetching. This can be done today and it works, but it is clunky because one needs to manually make sure that various caches are shared by the outer and inner Bazel instances and slow because repository rules prohibit starting a long-lived Bazel server (#20447).

I don't particularly like the idea of directly fixing #20447 because it pokes a hole in the nice property that repositories clean up after themselves after they fetch things and detecting whether a subprocess is Bazel sounds like too much DWIM.

However, we could teach Bazel to invoke Bazel as part of fetching a repository not as a free-form command line that happens to invoke Bazel, but explicitly. This would have a number of advantages:

The implementation can be very simple; all one needs to take care of is that the inner Bazel instance has a reasonable output base so that it doesn't litter files all over the place and that it dies when the outer one dies. It can even be thought of as a standalone feature request (although it's pushing the boundaries...)
No DWIM required to recognize inner Bazel invocations, therefore, repository rules can keep the guarantee that no processes are left running once they are done.
It can be gradually extended to make sharing "things" between the outer and inner Bazel instances easy (repo cache, disk cache, even action cache, command line options, etc.). Eventually, we could even merge the two instances if the semantics are clear enough and we are confident enough that it works and the path towards that is incremental.
The versions of the two Bazel instances are in sync by default.
Cleanup after bazel clean, bazel shutdown and the like on the outer instance can be arranged for.

All we need to do is to make sure that the interface is such that it's amenable to closer integration. Off the top of my had, I could imagine an interface that says "run Bazel in this repository, build those top-level targets, put their files from DefaultInfo in these given locations, make that a new repository" (This is pretty much what the proposal by @matts1 does)

@matts1 @Wyverald @meteorcloudy WDYT?

cc @lizkammer

matts1 · 2023-12-07T08:25:55Z

matts1
Dec 7, 2023

Something like this definitely seems cleaner than the current nested bazel invocation workaround that we're using. However, my biggest concern is that while my proposal feels like a solution (although admittedly one that's potentially very difficult to get implemented), this proposal feels like it's still a hack, just less of one. I'm not necessarily opposed to that, mind you, but my main issue is that if this were implemented, I'd want a migration plan to a proper solution, ideally in a backwards compatible way. I really don't want to have something like this implemented, find that it mostly works, but that we need something like my proposal to get a proper clean solution, and find out that doing that wouldn't be backwards compatible.

I guess my other concern is that when you're talking about sharing things between the outer and inner bazel invocation, it seems like it'll start off as less work than my proposal, but as you share more and more things, it seems like it'd become harder and harder to maintain. My proposal, while more work, kind of just gives you everything at once for free.

In terms of the viability of the idea, it seems mostly good, but the biggest issue I can think of off the top of my head is that knowing when you need to rerun a repo rule is nontrivial (though maybe that's not the case if you have access to the bazel internals). Given that bazel's motto is fast, correct - choose two, I don't think we should release it as an explicit feature it in a state where it imperfectly detects when it needs to rerun it (even if it is mostly accurate). Consider that we need to rerun on the following:

Source files change
Build files change
.bzl files loaded by build files / other bzl files change (this one is pretty hard to detect)
System / home / workspace bazelrc changes
Toolchains change
Other repo rules it transitively depends on change
Non-visible source files change (hub-and-spoke bzlmod repos, where the spoke is invisible, but the alias is visible)

Currently, my implementation uses cquery to write a manifest of the files the target transitively depends on, then writes that to a .bzl file and the repo rule depends on those files. It's mostly accurate, but it's definitely not 100%

In conclusion, I'm not explicitly opposed to the idea, but I'd want to be very careful about it.

0 replies

sluongng · 2023-12-07T11:00:18Z

sluongng
Dec 7, 2023

I could be misreading it (and please do correct me if I'm wrong), but I think the proposal could be generalized to a simple design problem:

At the bottom, Bazel build is a DAG comprising artifact nodes and action edges. The ask is whether we could support a self-expanding DAG as in, being able to feed some of the artifact nodes(i.e. starlark rules and BUILD files) in the initial analysis of the DAG to generate more nodes and edges.

Being able to run Bazel-in-bazel is one way to solve it, which is essentially using 2 (or more) DAGs with the result of the upstream ones being fed to one final downstream DAG. However, as @matts1 mentioned, it does feel like a short-term, bloated solution. Realistically that could create 2(or more) Bazel Java servers on the same client machine, which might significantly worsen the user experience.

Self-expanding DAG is a challenge indeed. As the comments in the proposal mentioned, Bazel is currently operating under some constraints, such as bazel query should be able to find all targets without having to execute actions, and self-expanding DAG might break those traditional constraints. I think if we could potentially "limit" how the DAG expansion could work, for example, rules could only access the "dynamically generated" targets through the parent targets, then we could still have a sound graph in the end.

16 replies

lberki Dec 13, 2023
Maintainer Author

@fmeum , do you think nested Bazel instances would help in this case?

I realize that then we'd have to guarantee that the toolchain resolution in the "inner" Bazel works in the same was as in the "outer" Bazel, but that's a problem even if we do it within a single Bazel invocation because toolchain resolution depends on the set of all registered toolchains, which is not known until we know every repository, which in your plan would depend on the set of available toolchains.

fmeum Dec 13, 2023
Collaborator

@lberki

I realize that then we'd have to guarantee that the toolchain resolution in the "inner" Bazel works in the same was as in the "outer" Bazel

This is the main reason why I am skeptical that nested instances will help. Debugging toolchain resolution in a nested Bazel invocation in a repo rule sounds so hopeless that I would much rather avoid that. I'm also worried that the nested instance handling may not work as well on Windows. The current "evil genius" version of the hack also requires users to maintain a tools/bazel, which makes it hard to use this pattern in rulesets.

but that's a problem even if we do it within a single Bazel invocation because toolchain resolution depends on the set of all registered toolchains, which is not known until we know every repository, which in your plan would depend on the set of available toolchains.

I don't think that's true anymore with Bzlmod: All toolchain registrations happen declaratively in MODULE.bazel files, which have been evaluated by the time any custom repo rule could run. (I am lying a bit here: the native Android repo rules have special capabilities and some Bazel-provided repo rules, such as http_archive, are used during module resolution, but those wouldn't want to use build rules anyway).

Wyverald Dec 13, 2023
Maintainer

but that's a problem even if we do it within a single Bazel invocation because toolchain resolution depends on the set of all registered toolchains, which is not known until we know every repository, which in your plan would depend on the set of available toolchains.

I don't think that's true anymore with Bzlmod: All toolchain registrations happen declaratively in MODULE.bazel files, which have been evaluated by the time any custom repo rule could run. (I am lying a bit here: the native Android repo rules have special capabilities and some Bazel-provided repo rules, such as http_archive, are used during module resolution, but those wouldn't want to use build rules anyway).

Isn't the problem that we need to actually resolve the toolchains, not that we just need to know which toolchains are registered? If we go down the "regular rules can generate repos" route, then there's still a chicken-and-egg problem if you're not careful. To fetch a repo @@foo, you might need to build //:my_tool, which might need a toolchain. Suddenly you need to fetch any repo that contains a registered toolchain, which means that you're hosed if anything in @@foo is registered.

fmeum Dec 13, 2023
Collaborator

That's true, but it's true in the same way as for any other instance of "no cycles", which is at least a well-known and well-understood restriction. I don't think that this will result in confusion as external repos with toolchain definitions typically don't have many dependencies.

In the concrete cases I have in mind, @@foo would be the @maven hub repo, the maven_install module extension or a Go module repo with BUILD files generated by gazelle, neither of which would carry a toolchain.

matts1 Dec 16, 2023

The current "evil genius" version of the hack also requires users to maintain a tools/bazel, which makes it hard to use this pattern in rulesets.

+1 to this. The way I see it is that you're never going to get this in a ruleset, because the rule author wouldn't be able to do so unless it was considered the standard way of doing things. I think it's reasonable to have nested bazel invocations as a workaround to make stuff work, but I think that it's always going to be too much of a workaround to make it the way to do things.

lberki · 2023-12-13T12:56:03Z

lberki
Dec 13, 2023
Maintainer Author

@lberki

I realize that then we'd have to guarantee that the toolchain resolution in the "inner" Bazel works in the same was as in the "outer" Bazel

This is the main reason why I am skeptical that nested instances will help. Debugging toolchain resolution in a nested Bazel invocation in a repo rule sounds so hopeless that I would much rather avoid that. I'm also worried that the nested instance handling may not work as well on Windows. The current "evil genius" version of the hack also requires users to maintain a tools/bazel, which makes it hard to use this pattern in rulesets.

re: debugging nested Bazel invocations, yeah, it's not my idea of a fun time either. But then again, doing the same within a singel Bazel invocation would also increase the state space of Bazel, so the question is not whether we need to add debugging tools, but what kind of debugging tools we'd need to add and I can't tell that assuming we provide good enough tools, which case would make debugging simpler.

re: Windows, I think it's extra work, but somewhat surprisingly, process handling on Windows is actually nicer than on Linux so I'm not worried that we'd not be able to do what we want, but it may be extra work indeed.

but that's a problem even if we do it within a single Bazel invocation because toolchain resolution depends on the set of all registered toolchains, which is not known until we know every repository, which in your plan would depend on the set of available toolchains.

I don't think that's true anymore with Bzlmod: All toolchain registrations happen declaratively in MODULE.bazel files, which have been evaluated by the time any custom repo rule could run. (I am lying a bit here: the native Android repo rules have special capabilities and some Bazel-provided repo rules, such as http_archive, are used during module resolution, but those wouldn't want to use build rules anyway).

Touché! @Wyverald , WDYT?

4 replies

matts1 Dec 14, 2023

so the question is not whether we need to add debugging tools, but what kind of debugging tools we'd need to add and I can't tell that assuming we provide good enough tools, which case would make debugging simpler.

IIUC (which I'm not sure I do), I don't agree here. Regular rules in a single bazel invocation should be trivial to debug, and we already have the tools for it. It's the same way you debug any other bazel action. For example:

To repro actions, you can use -s, --sandbox_debug, --verbose_failures
To ensure you don't run into problems in the first place, you can just write tests that depend upon those actions
To debug the module extension itself, you can print out the providers for a rule.
To debug toolchains, I think you should be able to just use whatever mechanism you currently use for debugging toolchains, though you'd know more about this than me.

Could you clarify in what ways you believe debugging a single bazel invocation would be hard?

lberki Dec 15, 2023
Maintainer Author

You're right that regular rules are easy to debug (not quite trivial, but it's not too painfuil), but this proposed feature would make things more complicated to understand. For example, right now, you can use bazel [ca]?query to tell why a change to a particular file invalidated your target. If one was able to run actions before fetching repositories, that would stop being true (or require extra work and new bells and whistles on these commands)

Again, this is not an absolute contraindication, "only" more work that's required.

fmeum Dec 15, 2023
Collaborator

For example, right now, you can use bazel [ca]?query to tell why a change to a particular file invalidated your target. If one was able to run actions before fetching repositories, that would stop being true (or require extra work and new bells and whistles on these commands)

@lberki It would help me understand your concern better if we could make it concrete on an example. Let's say we have a target //:foo that depends on @some_repo//:dep, which is generated by a repository rule that depends on the source file //:magic_file.txt via a label attr. Then //:foo will be invalidated whenever //:magic_file.txt is changed (modulo change pruning).

I don't think that it's possible to understand why with any of query, cquery or aquery today, you would instead have to dig into the repo rules source or use the workspace log feature. The particular way in which the repository rule generates @some_repo//:dep, be it via direct repository_ctx.execute, a nested Bazel invocation or via a new "depend on action outputs from repo rules" features, doesn't seem to affect this (lack of) debugability.

Wouldn't we just be replacing direct process execution with repository_ctx.execute(...), which is as hard to debug as it gets, with a build rule, which is at least inspectable with aquery, even if the link between the repo rule and the action isn't discoverable without extra work?

tbaing Dec 15, 2023

For example, right now, you can use bazel [ca]?query to tell why a change to a particular file invalidated your target. If one was able to run actions before fetching repositories, that would stop being true (or require extra work and new bells and whistles on these commands)

Can we run only the actions that are required to generate the repo(s) to which the [ac]?query will apply, but go no further? In which case, 1) this disadvantage applies only if you choose to put it into your build so you have control over whether this will affect you, and 2) if this was work you were doing today via repo rules, you'd still be doing these things in order to generate the repos so you can query over them. At 30,000 feet, this doesn't seem like it would stop us from using [ac]?query, only that it would mean we might have to wait longer than if we had a build that didn't use this feature.

lberki · 2023-12-18T11:39:33Z

lberki
Dec 18, 2023
Maintainer Author

Summarizing the above discussions, it looks like we haven't identified any fundamental issues, the long list semantic problems we'll need to work through seems to be shorted than I initially anticipated and seems like the complications this entails would not be much more hairy than the complications any workaround would ential.

Which means that the main obstacle is engineering time (aka. opportunity cost)

@meteorcloudy would you be fine with scheduling a prototype (without commitment) sometime in the first half of 2024?

4 replies

Wyverald Dec 19, 2023
Maintainer

IMO the value proposition doesn't quite cut it. Let me try to lay down my understanding:

The problem we're having is essentially that we want some build rules to run before repo rules. (Plenty of equivalent alternative formulations exist.)

rctx.run_bazel could solve the immediate pain (in an arguably somewhat painful way itself), and takes less time to implement (my guess is ~a few months). It's unlikely to be adopted by rulesets, whereas people who want to adopt this will have to write code around this new API, and they already have hacky workarounds anyway ("tools/bazel").

"Build rules generating BUILD files" ("BrgB"?) seems an eventual desirable state (save for the [ac]?query question), but will likely take multiple years to implement. If we get there, though, it looks like it'll get a lot of buy-in (people have been asking about remote-able repo rules for years), from corp users and ruleset authors alike.

The problem of doing rctx.run_bazel, then, is that 1) it pushes out the eventual desirable state even further, 2) adds more migration cost on top of it (switching APIs), and 3) will likely have low adoption rate.

So I'd honestly like to explore BrgB instead, if we have the resources.

Incidentally, as I was discussing with Fabian earlier today -- the "true repo cache" feature might also seem redundant if we're going for BrgB. But my take is that "true repo cache" 1) takes much less time to implement (~weeks), 2) is basically transparent to the user (no migration cost), and 3) is (again) basically transparent to the user (high adoption rate). So it brings much more near-term value to be worthwhile.

sluongng Dec 19, 2023

@Wyverald well articulated. I was trying to describe BrgB in my comment but could not express it as well.

Personally, I think BrgB could be divided into smaller, achievable goals, with some that could satisfy even the [ac]?query constraint. I think we should be able to satisfy the current use case for "run_bazel" earlier that way.

matts1 Dec 19, 2023

Maybe I misunderstood, but my interpretation of @lberki's comment was:
"it looks like we haven't identified any fundamental issues [with BrGB]"
"than the complications any workaround [nested bazel] would ential"
"scheduling a prototype [of BrgB]"

lberki Dec 20, 2023
Maintainer Author

I think the misunderstanding is that "we haven't identified any fundamental issues" is not the same as "there are no fundamental issues"; it's a complex area and I could very well imagine that we run into things while building a prototype that we have never thought about.

If @Wyverald thinks BrgB is a better goal to shoot for, I'm game. But it'll bear user-visible fruits much later, especially that it competes for engineering time with the migration to bzlmod and the latter has way higher priority: the BrgB plane is still on the ground where it's safe, but bzlmod is already very much in flight and the longer it takes, the longer we need to support the dual WORKSPACE/MODULE.bazel world, which is in no one's interest.

In addition, commencing BrgB after --noenable_workspace is viable probably makes its implementation much easier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A way to support repositories that need Bazel for being fetched #20464

{{title}}

Replies: 4 comments 24 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

A way to support repositories that need Bazel for being fetched #20464

lberki Dec 7, 2023 Maintainer

Replies: 4 comments · 24 replies

matts1 Dec 7, 2023

sluongng Dec 7, 2023

lberki Dec 13, 2023 Maintainer Author

fmeum Dec 13, 2023 Collaborator

Wyverald Dec 13, 2023 Maintainer

fmeum Dec 13, 2023 Collaborator

matts1 Dec 16, 2023

lberki Dec 13, 2023 Maintainer Author

matts1 Dec 14, 2023

lberki Dec 15, 2023 Maintainer Author

fmeum Dec 15, 2023 Collaborator

tbaing Dec 15, 2023

lberki Dec 18, 2023 Maintainer Author

Wyverald Dec 19, 2023 Maintainer

sluongng Dec 19, 2023

matts1 Dec 19, 2023

lberki Dec 20, 2023 Maintainer Author

lberki
Dec 7, 2023
Maintainer

Replies: 4 comments 24 replies

matts1
Dec 7, 2023

sluongng
Dec 7, 2023

lberki Dec 13, 2023
Maintainer Author

fmeum Dec 13, 2023
Collaborator

Wyverald Dec 13, 2023
Maintainer

fmeum Dec 13, 2023
Collaborator

lberki
Dec 13, 2023
Maintainer Author

lberki Dec 15, 2023
Maintainer Author

fmeum Dec 15, 2023
Collaborator

lberki
Dec 18, 2023
Maintainer Author

Wyverald Dec 19, 2023
Maintainer

lberki Dec 20, 2023
Maintainer Author