-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mergiraf to the evaluated tools #380
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this patch. I just learned of mergiraf a week ago, and I had put "add mergiraf" to our to-do list. I appreciate your help in doing that.
You are right that the full pipeline is challenging to run. I'm sorry about that.
This patch looks great overall. I have a few comments for you.
I have fixed a few things. Now, a new "expected" value for the merge results must be committed. (Tests are currently failing because there is no "expected" value for Mergiraf results.) |
The results have been posted to this branch and visible in |
Small test is still failling which is a bit concerning, I updated the results and on my local machine it passes but in the Github CI/CD it seems to fail. It fails only because the hash of the commit is different which makes me suspect that your tool might not be fully platform agnostic i.e. the output of the tool might be slightly different on different platforms like for example bash sorting functions. Do you think that could be the case? |
Thanks a lot for this preliminary report! That's a good question, I don't know Rust well enough to know for sure. It could be that there are some differences in hashing, or line endings (CRLF/LF). The Git version is also likely to make a difference. I don't know how doable it would be to extract an example merge scenario which would behave differently on the two architectures - probably not that straightforward? |
I think git is fine with this, would be very concerning if not. I will try to isolate which one of the merges is affected by this. |
It seems that all the merges in the To reproduce the issue, you could either:
I’ve encountered this problem before, and here’s the general debugging workflow I followed: Steps to Diagnose:
Typically, the source of the issue becomes apparent from the diff. For reference, my repository for running GitHub Actions locally is nektos/act. |
hi all, how to make progress here? thanks! |
I would also like to progress this but it is difficult for me to have access to a MacOS machine and the process above looks quite complicated, so it's sadly not something I am counting on doing soon. |
all CI jobs are on Linux, I'm not sure to understand why we must go through Mac to debug, fix and
merge. @benedikt-schesch WDYT?
|
I, too, would like to see this resolved. The problem is that Mergiraf seems to be nondeterministic -- or at least, it gives different results in different environments. We took a quick look and there was nothing obvious to explain the problem. Benedikt explained a way to proceed with debugging. |
@monperrus @wetneb Running this pipeline isn’t straightforward since we are testing multiple repositories. While a MacOS machine can be useful, it isn’t strictly necessary. You can reproduce the failing hashes identified in the CI/CD pipeline using act. Give me a week to investigate the non-deterministic outputs from Mergiraf, and I’ll get back to you. |
@wetneb I get the following outputs for mergiraf on the same merge.
In CI/CD:
Do you know why it might be using left, right naming on my computer but not in CI/CD. Is there a way to control this in mergiraf? |
@monperrus @wetneb @mernst So I reran our evaluation with the latest version of Mergiraf (0.4.0) and overall the results have not changed much since 0.3.0. Overall the tool is pretty good and is quite aggressive in merging but this leads to failing tests. The CI/CD is failing because Mergiraf has inconsistent behavior across platforms in the conflict markers it generates, I would really love it if this is fixed in Mergiraf because then I would be happy to merge this branch into the main. I am 95% confident that the results despite the failing CI/CD are correct. |
Thank you so much again! The differences in output you are getting are interesting. Yes, the names of the base, left and right revisions are supplied using the |
In CI, |
We have |
great progress towards merging, thanks a lot @benedikt-schesch @wetneb |
Thank you, I hope to close this by tomorrow. Small-test is now fixed and git >2.44 is now a requirement. I just need to rerun mergiraf on the dataset because the results were computed with the old git version we had but that will only have some effect on the merge hashes and won't affect the results we previously discussed |
Done. @wetneb thank you very much for making this tool and helping us to integrate it. As said before the tool tends to lead to higher number of test failures than git. I would recommend to look into https://github.com/benedikt-schesch/AST-Merging-Evaluation/blob/main/results/combined/result_adjusted.csv and look for merges that lead to test failure with mergiraf but not with git (git might either raise a conflict or get it correctly). To analyze them manually I can recommend you run (https://github.com/benedikt-schesch/AST-Merging-Evaluation/blob/main/src/python/replay_merge.py) as follows: |
🚀 that's great. |
Thank you so much for this huge evaluation effort. I haven't managed to run it entirely myself because of some mismatching dependencies on my end, but I have made a little script which should (hopefully) add mergiraf to the evaluated tools. I have tested the script interactively and it seems to respect the contract, but I am not sure if I have done everything that is needed to evaluate it alongside the other tools.