Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mergiraf to the evaluated tools #380

Merged
merged 30 commits into from
Jan 27, 2025

Conversation

wetneb
Copy link
Contributor

@wetneb wetneb commented Nov 13, 2024

Thank you so much for this huge evaluation effort. I haven't managed to run it entirely myself because of some mismatching dependencies on my end, but I have made a little script which should (hopefully) add mergiraf to the evaluated tools. I have tested the script interactively and it seems to respect the contract, but I am not sure if I have done everything that is needed to evaluate it alongside the other tools.

Copy link
Collaborator

@mernst mernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this patch. I just learned of mergiraf a week ago, and I had put "add mergiraf" to our to-do list. I appreciate your help in doing that.

You are right that the full pipeline is challenging to run. I'm sorry about that.

This patch looks great overall. I have a few comments for you.

src/scripts/merge_tools/mergiraf.sh Outdated Show resolved Hide resolved
src/scripts/merge_tools/mergiraf.sh Outdated Show resolved Hide resolved
@mernst
Copy link
Collaborator

mernst commented Nov 14, 2024

I have fixed a few things. Now, a new "expected" value for the merge results must be committed. (Tests are currently failing because there is no "expected" value for Mergiraf results.)

@benedikt-schesch benedikt-schesch self-requested a review November 15, 2024 11:23
@benedikt-schesch
Copy link
Owner

The results have been posted to this branch and visible in results/combined/tables/all/table_summary.tex. It looks very promising, unhandled merges go from 51% to 37% of merges compared to git but the number of incorrect merges go from 3% to 6%. At least the tool is already much better than Spork and Intellimerge but it would be great to look in the results file in results/combined/result_adjusted.csv and look for merges where the tool gives an incorrect result but git does not. With src/python/replay_merge.py you can replay each merge to see the differences.

@benedikt-schesch
Copy link
Owner

Small test is still failling which is a bit concerning, I updated the results and on my local machine it passes but in the Github CI/CD it seems to fail. It fails only because the hash of the commit is different which makes me suspect that your tool might not be fully platform agnostic i.e. the output of the tool might be slightly different on different platforms like for example bash sorting functions. Do you think that could be the case?

@wetneb
Copy link
Contributor Author

wetneb commented Nov 15, 2024

Thanks a lot for this preliminary report! That's a good question, I don't know Rust well enough to know for sure. It could be that there are some differences in hashing, or line endings (CRLF/LF). The Git version is also likely to make a difference. I don't know how doable it would be to extract an example merge scenario which would behave differently on the two architectures - probably not that straightforward?

@benedikt-schesch
Copy link
Owner

I think git is fine with this, would be very concerning if not. I will try to isolate which one of the merges is affected by this.

@benedikt-schesch
Copy link
Owner

It seems that all the merges in the pedrovgs/Algorithms repository (as seen in the CI/CD pipeline) have a hash different from the expected one. This discrepancy occurs for all merges that start with index 2 in that repository. This makes me suspect that some character or other element in the repository is not being handled properly by mergiraf.

To reproduce the issue, you could either:

  1. Find a machine where the test fails (if you have a MacBook, it often shows different behavior), or
  2. Use act to locally run the GitHub Actions CI/CD pipeline and investigate why the merge produces a different hash.

I’ve encountered this problem before, and here’s the general debugging workflow I followed:

Steps to Diagnose:

  1. Keep the Merge Workspaces
    Set DELETE_WORKDIRS to False in this file. (Ideally, this should be a flag, not a variable.) This will ensure the merges are not cleaned up after testing.

  2. Run the Small Test
    Execute make small-test to reproduce the behavior.

  3. Test on Different Machines or CI/CD with act
    Run the test on:

    • A passing machine and a failing machine, or
    • Your local machine using act or directly through CI/CD.
  4. Analyze the Hash Mismatch
    On the problematic merge, run the following command:

    sha256sum <(export LC_ALL=C; export LC_COLLATE=C; cd \
    <local_repo_path>; \
    find . -type f -not -path '*/\\.git*' -exec sha256sum {} \; | sort)

    Replace <local_repo_path> with the actual path to the local repository.

    Compare the hash values between a passing and a failing machine:

    • Use diff to find differences between the two sets of merge outputs.
    • Alternatively, identify which specific file is causing the hash mismatch by applying the same command to individual files.

Typically, the source of the issue becomes apparent from the diff. For reference, my repository for running GitHub Actions locally is nektos/act.

@monperrus
Copy link

hi all, how to make progress here? thanks!

@wetneb
Copy link
Contributor Author

wetneb commented Jan 15, 2025

I would also like to progress this but it is difficult for me to have access to a MacOS machine and the process above looks quite complicated, so it's sadly not something I am counting on doing soon.

@monperrus
Copy link

monperrus commented Jan 15, 2025 via email

@mernst
Copy link
Collaborator

mernst commented Jan 15, 2025

I, too, would like to see this resolved. The problem is that Mergiraf seems to be nondeterministic -- or at least, it gives different results in different environments. We took a quick look and there was nothing obvious to explain the problem. Benedikt explained a way to proceed with debugging.

@benedikt-schesch
Copy link
Owner

@monperrus @wetneb Running this pipeline isn’t straightforward since we are testing multiple repositories. While a MacOS machine can be useful, it isn’t strictly necessary. You can reproduce the failing hashes identified in the CI/CD pipeline using act. Give me a week to investigate the non-deterministic outputs from Mergiraf, and I’ll get back to you.

@benedikt-schesch
Copy link
Owner

@wetneb I get the following outputs for mergiraf on the same merge.
On my machine:

<<<<<<< left
      while (array[left] < 0 && left < right) {
        left++;
      }
||||||| base
=======
      while (array[left] < 0 && left < right)
          left++;
>>>>>>> right
<<<<<<< left
      while (array[right] >= 0 && left < right) {
        right--;
      }
||||||| base
=======
      while (array[right] >= 0 && left < right)
          right--;
>>>>>>> right

In CI/CD:

<<<<<<< HEAD
      while (array[left] < 0 && left < right) {
        left++;
      }
||||||| 60d87e9
=======
      while (array[left] < 0 && left < right)
          left++;
>>>>>>> ___MERGE_TESTER_RIGHT
<<<<<<< HEAD
      while (array[right] >= 0 && left < right) {
        right--;
      }
||||||| 60d87e9
=======
      while (array[right] >= 0 && left < right)
          right--;
>>>>>>> ___MERGE_TESTER_RIGHT

Do you know why it might be using left, right naming on my computer but not in CI/CD. Is there a way to control this in mergiraf?

@benedikt-schesch
Copy link
Owner

@monperrus @wetneb @mernst So I reran our evaluation with the latest version of Mergiraf (0.4.0) and overall the results have not changed much since 0.3.0. Overall the tool is pretty good and is quite aggressive in merging but this leads to failing tests.
Here is the cost (The cost is the cost of an incorrect merge with respect to raising a merge conflict, low cost means that a failing test has little cost) plot and effort reduction score (higher is better): https://github.com/wetneb/AST-Merging-Evaluation/blob/add_mergiraf/results/combined/plots/tools/cost_with_manual.pdf
So this means that for a small cost (in this case <4) Mergiraf is the best tool out there which is very good but for higher costs git and Plumelib merging take the lead because they are less aggressive.
Full table with all the results: https://github.com/wetneb/AST-Merging-Evaluation/blob/add_mergiraf/results/combined/tables/all/table_summary.tex

The CI/CD is failing because Mergiraf has inconsistent behavior across platforms in the conflict markers it generates, I would really love it if this is fixed in Mergiraf because then I would be happy to merge this branch into the main. I am 95% confident that the results despite the failing CI/CD are correct.

@wetneb
Copy link
Contributor Author

wetneb commented Jan 24, 2025

Thank you so much again!

The differences in output you are getting are interesting. Yes, the names of the base, left and right revisions are supplied using the -s, -x and -y arguments in Mergiraf. One likely reason why they might appear in the CI and not on your computer would be different git versions, as git is only able to supply those revision names to the merge driver with v2.44.0 or newer (see the corresponding PR).

@mernst
Copy link
Collaborator

mernst commented Jan 25, 2025

In CI, git is curently "git version 2.48.1".

@benedikt-schesch
Copy link
Owner

We have 2.43.5 on our machine so it makes sense. We need to upgrade it to 2.44+ so we can make things consistent. I'll also add a check for this

@monperrus
Copy link

great progress towards merging, thanks a lot @benedikt-schesch @wetneb

@benedikt-schesch
Copy link
Owner

Thank you, I hope to close this by tomorrow. Small-test is now fixed and git >2.44 is now a requirement. I just need to rerun mergiraf on the dataset because the results were computed with the old git version we had but that will only have some effect on the merge hashes and won't affect the results we previously discussed

@benedikt-schesch
Copy link
Owner

Done. @wetneb thank you very much for making this tool and helping us to integrate it. As said before the tool tends to lead to higher number of test failures than git. I would recommend to look into https://github.com/benedikt-schesch/AST-Merging-Evaluation/blob/main/results/combined/result_adjusted.csv and look for merges that lead to test failure with mergiraf but not with git (git might either raise a conflict or get it correctly). To analyze them manually I can recommend you run (https://github.com/benedikt-schesch/AST-Merging-Evaluation/blob/main/src/python/replay_merge.py) as follows:
python3 src/python/replay_merge.py --idx INDEX to then compare the different merge outputs. If you need any other help to do this let me know. If you do analyze some merges with it, it would be very nice to share your thoughts and analysis on the examples you look at.

@benedikt-schesch benedikt-schesch merged commit 9238a73 into benedikt-schesch:main Jan 27, 2025
4 checks passed
@monperrus
Copy link

🚀 that's great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants