Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ben Scripts and Qualitative Analysis Code #279

Closed
wants to merge 16 commits into from

Conversation

cactusbranch01
Copy link
Collaborator

@cactusbranch01 cactusbranch01 commented Apr 30, 2024

Changes:

  • diff3 analysis script
  • README
  • style fixes
  • repo.py cones repo to a specified path. Alternative to the clone_repo method since it takes a string for the path and returns a different git repo obj

cactusbranch01 and others added 5 commits April 29, 2024 21:59
fixed style
fixed changes from old branch
why do we keep changing everything in repo.py every month???



- Python 3.x installed on your system.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this use the same setup as the rest of the project -- namely, the use of conda or mamba to set up an environment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the setup is the same as the rest of the project, we still are using a conda or mamba environment. I think I will just delete this line since there is already another README that more thoroughly explains setup. Thanks!

@@ -0,0 +1,259 @@
"""
Recreates a merge and outputs the diff files for two algorithms for comparison on a given conflict.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the merge recreation, why not use the existing src/python/replay_merge.py script? Then this script can be simplified to having a single purpose -- outputting the diff files. If that is not possible, please document why.

Copy link
Collaborator Author

@cactusbranch01 cactusbranch01 May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just added a few points on this in the README:

  1. It uses diff3 to show the base, conflicting branches, and the programmer merge which is useful to see the context around the conflict.
  2. It automates everything making the workflow faster. You just give it a conflict, and it outputs the differences between a pair of merge algorithms in a .txt file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not answered the question. Why can't this script either call replay_merge.py internally and then do the diff? That would make this script much shorter and simpler.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to do that, it allows you to delete all the setup code since replay_merge does everything. I would suggest maybe just remove this script and adding a flag -create_diffs to replay_merge.py that computes the diff as you do it here when this flag is used (by default set it to false though).


Ex:

python3 diff3_analysis.py "gitmerge_ort" "spork" 32 "./mixed_results_spork"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't indexes contain a hyphen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in Python, CLI arguments are usually passed in as strings and its up to the script to interpret them as numbers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that indexes are now (for example) "38-3" rather than a number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Now my scripts also use the idx column instead of the index in the spreadsheet. I updated the README and docs as well.

@@ -93,6 +93,36 @@ def clone_repo(repo_slug: str, repo_dir: Path) -> git.repo.Repo:
) from None


# Alternative clone repo method that returns git repo object for diff3 scripts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it different? The documentation is identical.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@cactusbranch01 cactusbranch01 changed the title ben-main new patch Ben Scripts and Qualitative Analysis Code May 13, 2024
Copy link
Collaborator

@mernst mernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Ben Scripts and Qualitative Analysis Code" will not be a good commit message for the commit that results from this pull request. Can you please make it more descriptive?

It seems that the "style fixes", "diff3 analysis script", and "repo.py" mentioned in the pull request description can be broken into separate pull requests. Is there a reason that is impossible or undesirable?

@@ -0,0 +1,53 @@
# Python Scripts for Merge Conflict Analysis


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of extraneous whitespace in this README file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is standard. I think is designed to be seen from an end-users perspective like in the browser it looks normal.

https://github.com/cactusbranch01/AST-Merging-Ben-Analysis/blob/ben-main/src/python/README.md

If you think I should change this, though, I would be happy to.


- It is different than the src/python/replay_merge, as it performs a 3 way diff between the base, conflicting branches, and the programmer merge.

- Outputs the differences between a pair of merge algorithms in a .txt file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does one interpret those differences?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more details. Thanks!


- `diff3_analysis.py`: This script analyzes merge conflicts for two merge tools on a given conflict.

- It is different than the src/python/replay_merge, as it performs a 3 way diff between the base, conflicting branches, and the programmer merge.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it first call replay_merge.py, and then do its own unique work? That could make it much smaller and simpler. If that is impossible or undesirable, why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the readme.

It might be possible to use replay_merge.py for the underlying process of recreating the merge. However, this still wouldn't have the programmer merge. Moreover, the diff3_analysis method keeps the conflict and repos to diff3 everything and outputs a .txt file. Also, both methods use different systems for organizing the files (mine makes its own cloned repo folders), which would take some time to change.

Honestly, I think the functionality is different enough that I would prefer to leave both as separate methods. It would probably be a significant time investment refactoring, and looking at how different the actual code is, I'm sure where to begin. Both methods work successfully, and I don't this is worth the extra effort.




- Necessary Python packages installed inside conda or mamba environment(`pandas`, `GitPython`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain exactly how to do the setup, or cross-reference to other instructions, or leave it out entirely under the assumption that the user has already done the setup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@@ -0,0 +1,259 @@
"""
Recreates a merge and outputs the diff files for two algorithms for comparison on a given conflict.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not answered the question. Why can't this script either call replay_merge.py internally and then do the diff? That would make this script much shorter and simpler.

@timeout(10 * 60)
def clone_repo_to_path(repo_name: str, path: str) -> git.repo.Repo:
"""Clones a repository to a specified path. Alternative to the clone_repo
method since it takes a string path and returns a git repo Repo obj.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clone_repo also returns a git repo Repo object, according to its signature on line 61. So, is the only difference taking a string versus a Path? If so, then can this method be a two-liner that creates a Path from a string and then calls clone_repo?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes just wrap the string you use into Path(str) that will give you the Path object directly and you can call the original function with that.




- `diff3_analysis.py`: This script analyzes merge conflicts for two merge tools on a given conflict.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scripts in this directory are already described in the top-level README. Why have you chosen to create a new one rather than locating this information with the existing information?

@benedikt-schesch
Copy link
Owner

I will close this PR for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants