Add mergiraf to the evaluated tools #380

wetneb · 2024-11-13T17:33:54Z

Thank you so much for this huge evaluation effort. I haven't managed to run it entirely myself because of some mismatching dependencies on my end, but I have made a little script which should (hopefully) add mergiraf to the evaluated tools. I have tested the script interactively and it seems to respect the contract, but I am not sure if I have done everything that is needed to evaluate it alongside the other tools.

mernst

Thank you for this patch. I just learned of mergiraf a week ago, and I had put "add mergiraf" to our to-do list. I appreciate your help in doing that.

You are right that the full pipeline is challenging to run. I'm sorry about that.

This patch looks great overall. I have a few comments for you.

src/scripts/merge_tools/mergiraf.sh

…that yields a path

mernst · 2024-11-14T16:54:08Z

I have fixed a few things. Now, a new "expected" value for the merge results must be committed. (Tests are currently failing because there is no "expected" value for Mergiraf results.)

benedikt-schesch · 2024-11-15T12:56:27Z

The results have been posted to this branch and visible in results/combined/tables/all/table_summary.tex. It looks very promising, unhandled merges go from 51% to 37% of merges compared to git but the number of incorrect merges go from 3% to 6%. At least the tool is already much better than Spork and Intellimerge but it would be great to look in the results file in results/combined/result_adjusted.csv and look for merges where the tool gives an incorrect result but git does not. With src/python/replay_merge.py you can replay each merge to see the differences.

benedikt-schesch · 2024-11-15T12:59:50Z

Small test is still failling which is a bit concerning, I updated the results and on my local machine it passes but in the Github CI/CD it seems to fail. It fails only because the hash of the commit is different which makes me suspect that your tool might not be fully platform agnostic i.e. the output of the tool might be slightly different on different platforms like for example bash sorting functions. Do you think that could be the case?

wetneb · 2024-11-15T13:52:18Z

Thanks a lot for this preliminary report! That's a good question, I don't know Rust well enough to know for sure. It could be that there are some differences in hashing, or line endings (CRLF/LF). The Git version is also likely to make a difference. I don't know how doable it would be to extract an example merge scenario which would behave differently on the two architectures - probably not that straightforward?

benedikt-schesch · 2024-11-16T09:00:53Z

I think git is fine with this, would be very concerning if not. I will try to isolate which one of the merges is affected by this.

benedikt-schesch · 2024-11-18T16:22:00Z

It seems that all the merges in the pedrovgs/Algorithms repository (as seen in the CI/CD pipeline) have a hash different from the expected one. This discrepancy occurs for all merges that start with index 2 in that repository. This makes me suspect that some character or other element in the repository is not being handled properly by mergiraf.

To reproduce the issue, you could either:

Find a machine where the test fails (if you have a MacBook, it often shows different behavior), or
Use act to locally run the GitHub Actions CI/CD pipeline and investigate why the merge produces a different hash.

I’ve encountered this problem before, and here’s the general debugging workflow I followed:

Steps to Diagnose:

Keep the Merge Workspaces
Set DELETE_WORKDIRS to False in this file. (Ideally, this should be a flag, not a variable.) This will ensure the merges are not cleaned up after testing.
Run the Small Test
Execute make small-test to reproduce the behavior.
Test on Different Machines or CI/CD with act
Run the test on:
- A passing machine and a failing machine, or
- Your local machine using act or directly through CI/CD.
Analyze the Hash Mismatch
On the problematic merge, run the following command:
```
sha256sum <(export LC_ALL=C; export LC_COLLATE=C; cd \
<local_repo_path>; \
find . -type f -not -path '*/\\.git*' -exec sha256sum {} \; | sort)
```
Replace <local_repo_path> with the actual path to the local repository.

Compare the hash values between a passing and a failing machine:
- Use diff to find differences between the two sets of merge outputs.
- Alternatively, identify which specific file is causing the hash mismatch by applying the same command to individual files.

Typically, the source of the issue becomes apparent from the diff. For reference, my repository for running GitHub Actions locally is nektos/act.

monperrus · 2025-01-15T12:33:48Z

hi all, how to make progress here? thanks!

wetneb · 2025-01-15T14:01:57Z

I would also like to progress this but it is difficult for me to have access to a MacOS machine and the process above looks quite complicated, so it's sadly not something I am counting on doing soon.

monperrus · 2025-01-15T14:07:03Z

all CI jobs are on Linux, I'm not sure to understand why we must go through Mac to debug, fix and merge. @benedikt-schesch WDYT?

mernst · 2025-01-15T17:03:25Z

I, too, would like to see this resolved. The problem is that Mergiraf seems to be nondeterministic -- or at least, it gives different results in different environments. We took a quick look and there was nothing obvious to explain the problem. Benedikt explained a way to proceed with debugging.

benedikt-schesch · 2025-01-16T05:09:16Z

@monperrus @wetneb Running this pipeline isn’t straightforward since we are testing multiple repositories. While a MacOS machine can be useful, it isn’t strictly necessary. You can reproduce the failing hashes identified in the CI/CD pipeline using act. Give me a week to investigate the non-deterministic outputs from Mergiraf, and I’ll get back to you.

benedikt-schesch · 2025-01-22T03:42:02Z

@wetneb I get the following outputs for mergiraf on the same merge.
On my machine:

<<<<<<< left
      while (array[left] < 0 && left < right) {
        left++;
      }
||||||| base
=======
      while (array[left] < 0 && left < right)
          left++;
>>>>>>> right
<<<<<<< left
      while (array[right] >= 0 && left < right) {
        right--;
      }
||||||| base
=======
      while (array[right] >= 0 && left < right)
          right--;
>>>>>>> right

In CI/CD:

<<<<<<< HEAD
      while (array[left] < 0 && left < right) {
        left++;
      }
||||||| 60d87e9
=======
      while (array[left] < 0 && left < right)
          left++;
>>>>>>> ___MERGE_TESTER_RIGHT
<<<<<<< HEAD
      while (array[right] >= 0 && left < right) {
        right--;
      }
||||||| 60d87e9
=======
      while (array[right] >= 0 && left < right)
          right--;
>>>>>>> ___MERGE_TESTER_RIGHT

Do you know why it might be using left, right naming on my computer but not in CI/CD. Is there a way to control this in mergiraf?

benedikt-schesch · 2025-01-24T18:12:56Z

@monperrus @wetneb @mernst So I reran our evaluation with the latest version of Mergiraf (0.4.0) and overall the results have not changed much since 0.3.0. Overall the tool is pretty good and is quite aggressive in merging but this leads to failing tests.
Here is the cost (The cost is the cost of an incorrect merge with respect to raising a merge conflict, low cost means that a failing test has little cost) plot and effort reduction score (higher is better): https://github.com/wetneb/AST-Merging-Evaluation/blob/add_mergiraf/results/combined/plots/tools/cost_with_manual.pdf
So this means that for a small cost (in this case <4) Mergiraf is the best tool out there which is very good but for higher costs git and Plumelib merging take the lead because they are less aggressive.
Full table with all the results: https://github.com/wetneb/AST-Merging-Evaluation/blob/add_mergiraf/results/combined/tables/all/table_summary.tex

The CI/CD is failing because Mergiraf has inconsistent behavior across platforms in the conflict markers it generates, I would really love it if this is fixed in Mergiraf because then I would be happy to merge this branch into the main. I am 95% confident that the results despite the failing CI/CD are correct.

wetneb · 2025-01-24T22:11:00Z

Thank you so much again!

The differences in output you are getting are interesting. Yes, the names of the base, left and right revisions are supplied using the -s, -x and -y arguments in Mergiraf. One likely reason why they might appear in the CI and not on your computer would be different git versions, as git is only able to supply those revision names to the merge driver with v2.44.0 or newer (see the corresponding PR).

mernst · 2025-01-25T18:31:47Z

In CI, git is curently "git version 2.48.1".

benedikt-schesch · 2025-01-25T20:50:00Z

We have 2.43.5 on our machine so it makes sense. We need to upgrade it to 2.44+ so we can make things consistent. I'll also add a check for this

monperrus · 2025-01-27T05:47:29Z

great progress towards merging, thanks a lot @benedikt-schesch @wetneb

benedikt-schesch · 2025-01-27T06:40:19Z

Thank you, I hope to close this by tomorrow. Small-test is now fixed and git >2.44 is now a requirement. I just need to rerun mergiraf on the dataset because the results were computed with the old git version we had but that will only have some effect on the merge hashes and won't affect the results we previously discussed

benedikt-schesch · 2025-01-27T19:33:30Z

Done. @wetneb thank you very much for making this tool and helping us to integrate it. As said before the tool tends to lead to higher number of test failures than git. I would recommend to look into https://github.com/benedikt-schesch/AST-Merging-Evaluation/blob/main/results/combined/result_adjusted.csv and look for merges that lead to test failure with mergiraf but not with git (git might either raise a conflict or get it correctly). To analyze them manually I can recommend you run (https://github.com/benedikt-schesch/AST-Merging-Evaluation/blob/main/src/python/replay_merge.py) as follows:
python3 src/python/replay_merge.py --idx INDEX to then compare the different merge outputs. If you need any other help to do this let me know. If you do analyze some merges with it, it would be very nice to share your thoughts and analysis on the examples you look at.

monperrus · 2025-01-28T09:12:56Z

🚀 that's great.

Add mergiraf to the evaluated tools

a776ab7

mernst requested changes Nov 13, 2024

View reviewed changes

src/scripts/merge_tools/mergiraf.sh Outdated Show resolved Hide resolved

src/scripts/merge_tools/mergiraf.sh Outdated Show resolved Hide resolved

wetneb and others added 4 commits November 14, 2024 06:23

Review and linting suggestions

651fb4d

Use --local for explicitness (even though it's the default)

77f9707

Add mergiraf_plus.sh that runs Mergiraf then Plume-lib Merging

4b2708f

mergiraf_absolutepath is a variable that holds a path, not a command …

bb24ded

…that yields a path

mernst and others added 4 commits November 14, 2024 08:58

Merge ../AST-Merging-Evaluation-branch-main into add_mergiraf

aff2668

Updated test results

893b590

Added latest results

09e6e70

Add debugging code

bb440ff

benedikt-schesch self-requested a review November 15, 2024 11:23

Added more debbuging code

67a5cc5

benedikt-schesch added 7 commits January 16, 2025 14:37

Merge branch 'main' into add_mergiraf

d753b4a

Updated test results

b05f7af

Added latest test data

126de8f

Fix in makefile

1d00845

Updated to mergiraf v0.4

80428d2

Use cargo install instead

6e878d4

Added hash testing in small-test

1ee924f

benedikt-schesch added 7 commits January 20, 2025 20:36

Check Hashes in ci/cd

63f5566

Show which hashes are missing

9c39f11

Show which hashes are missing

0e0e2ec

More debugging code

072173d

Added hashes

1bea400

New hashes

ef7d37e

Ignore resolve and intellimerge

6776896

Mergiraf v0.4.0 results

cd2ca8a

benedikt-schesch added 4 commits January 25, 2025 23:07

New small tests

b7a61b9

Added hashes

08a81d9

Updated results for hiresmerge

4cf4de4

Install mergiraf in ci/cd

c6356a5

Add latest results

98de815

benedikt-schesch merged commit 9238a73 into benedikt-schesch:main Jan 27, 2025
4 checks passed

monperrus mentioned this pull request Jan 28, 2025

DIfferent bugs resulting in invalid merge outputs found during an experiment ASSERT-KTH/spork#533

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mergiraf to the evaluated tools #380

Add mergiraf to the evaluated tools #380

wetneb commented Nov 13, 2024

mernst left a comment

mernst commented Nov 14, 2024

benedikt-schesch commented Nov 15, 2024

benedikt-schesch commented Nov 15, 2024

wetneb commented Nov 15, 2024

benedikt-schesch commented Nov 16, 2024

benedikt-schesch commented Nov 18, 2024

monperrus commented Jan 15, 2025

wetneb commented Jan 15, 2025

monperrus commented Jan 15, 2025 via email

mernst commented Jan 15, 2025

benedikt-schesch commented Jan 16, 2025

benedikt-schesch commented Jan 22, 2025

benedikt-schesch commented Jan 24, 2025

wetneb commented Jan 24, 2025

mernst commented Jan 25, 2025

benedikt-schesch commented Jan 25, 2025

monperrus commented Jan 27, 2025

benedikt-schesch commented Jan 27, 2025

benedikt-schesch commented Jan 27, 2025

monperrus commented Jan 28, 2025

Add mergiraf to the evaluated tools #380

Add mergiraf to the evaluated tools #380

Conversation

wetneb commented Nov 13, 2024

mernst left a comment

Choose a reason for hiding this comment

mernst commented Nov 14, 2024

benedikt-schesch commented Nov 15, 2024

benedikt-schesch commented Nov 15, 2024

wetneb commented Nov 15, 2024

benedikt-schesch commented Nov 16, 2024

benedikt-schesch commented Nov 18, 2024

Steps to Diagnose:

monperrus commented Jan 15, 2025

wetneb commented Jan 15, 2025

monperrus commented Jan 15, 2025 via email

mernst commented Jan 15, 2025

benedikt-schesch commented Jan 16, 2025

benedikt-schesch commented Jan 22, 2025

benedikt-schesch commented Jan 24, 2025

wetneb commented Jan 24, 2025

mernst commented Jan 25, 2025

benedikt-schesch commented Jan 25, 2025

monperrus commented Jan 27, 2025

benedikt-schesch commented Jan 27, 2025

benedikt-schesch commented Jan 27, 2025

monperrus commented Jan 28, 2025