[chore] [WIP] Add script that runs go mod tidy
in topological order
#36723
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
In a repo with multiple Go modules, some of which depend on each other, and where internal dependencies are managed with
replace
statements, you end up with the situation where an update to thego.mod
of one module can require runninggo mod tidy
on modules that depend on it.We have the
make gotidy
command available, which runsgo mod tidy
on every module in the repo. But this is done in alphabetical order, so that ifalpha
depends onbravo
which depends oncharlie
, andcharlie
'sgo.mod
is modified,bravo
will get an update, but notalpha
, as it gets tidied before the other two.This leads to sometimes needing to run
go mod tidy
multiple times before things converge. This caused an issue in the latest release process in core, where the recently modified (link to relevant PR)check-contrib
check failed because runningmake gotidy
only once caused an error when runningmake generate
(link to the relevant issue).Description
This PR attempts to solve this issue by introducing a
make topotidy
command, which runsgo mod tidy
commands in topological order, ie. in an order such that dependencies always come before their dependents. This makes sure that changes to ago.mod
are always fully propagated through the dependency graph.Unfortunately, topological order doesn't quite cut it, as there are loops in the dependency graph (at the moment, between the two Datadog components, and between the three OTel-Arrow modules). It's not entirely clear if convergence is even possible in this context, but at the very least, we may need to run
go mod tidy
on a module twice to guarantee that all changes have had a chance to be propagated to every other module.To solve this, I used Tarjan's algorithm, which sorts the graph topologically while isolating its strongly connected components (SCCs, parts where no topological order is possible), and applied a very naive algorithm to each SCC (which will hopefully remain few and small).
I implemented all this as a Python script in the
internal/buildscripts
directory. I believe this is the first Python script in this repo, so please reach out if there are concerns about this.Link to tracking issue
Updates this issue on core. Unless we decide to replace
make gotidy
entirely with this script, we will need a PR on core to usemake topotidy
insidemake check-contrib
.Testing
I locally replicated the latest release process, and checked that all "updates to go.mod needed" were eliminated when using
make topotidy
instead ofmake gotidy
insidemake check-contrib
.Documentation
I added a comment explaining the rationale and functioning of the script at the top of the script.