feat(matching): use the Hungarian Algorithm for unordered matching #50
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Our current approach to unordered node matching relies on a naive assumption: that all nodes possess an identifier. While this holds true for most nodes we've encountered thus far, such as method and property declarations within a Java class, it proves insufficient when attempting to match nodes lacking a label, like static blocks in Java. In such cases, calculations for matchings may yield incorrect results, consequently leading to erroneous merges.
This pull request introduces a solution for matching unordered nodes via the Assignment Problem, utilizing the Hungarian Algorithm to resolve it. This approach mirrors the one used in jDime.
Given the widespread recognition of the Hungarian Algorithm, we rely on the implementation provided by the pathfinding crate. This simplifies our implementation efforts, as we only need to provide the weights matrix and extract the matching information from the solution.
A workaround had to be implemented since pathfinding expects the input matrix weight to have the same number of rows and columns, which might not always be true in our case since nodes can have a different number of children. The solution involves initializing the remaining columns/rows with 0.
For now, our naive label implementation has been bypassed and is not being utilized. In a further pull request, the idea is to resort to the Hungarian algorithm only if the nodes are unlabeled, as it's significantly more complex than merely matching identifiers.