Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explaining diffs of Merge<T> (structured conflicts part 1) #4259

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ilyagr
Copy link
Contributor

@ilyagr ilyagr commented Aug 13, 2024

This part of #4251 is more cooked than the rest, and contains the fundamental data model. There are still some TODOs for some parts of #4251, but I'd appreciate other people's input for which of them are important to fix immediately and what approach people prefer.

See #4251 for more discussion on the design.

This PR is that it does not add any new features, only new plumbing.

Checklist

If applicable:

  • (n/a) I have updated CHANGELOG.md
  • (n/a) I have updated the documentation (README.md, docs/, demos/)
  • (n/a) I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@ilyagr ilyagr marked this pull request as ready for review August 13, 2024 05:55
@ilyagr ilyagr changed the title Explaining diffs of Merge<T>, part one of #4251 Explaining diffs of Merge<T> (structured conflicts part 1) Aug 13, 2024
assert_eq!(c(&[0, 1], &[2, 3, 1]).simplify(), c(&[0], &[2, 3]));
assert_eq!(c(&[0, 1], &[2, 3, 2]).simplify(), c(&[0, 1], &[2, 3, 2]));
assert_eq!(c(&[0, 1], &[2, 3, 3]).simplify(), c(&[0, 1], &[2, 3, 3]));
assert_eq!(c(&[0, 1], &[2, 3, 4]).simplify(), c(&[0, 1], &[2, 3, 4]));
assert_eq!(
c(&[0, 1, 2], &[3, 4, 5, 0]).simplify(),
c(&[1, 2], &[3, 5, 4])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit dee5dff changed the order to be what it is here. I'm actually thinking more and more that we should try to interpret conflicts as a base (the first positive term) and a series of diffs/patches. Copy tracing is one reason for that, because the copies are associated with a diff.

Copy link
Contributor Author

@ilyagr ilyagr Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer. I'll think a bit on it. Technically, I can split the simplification function machinery into two similar function (in other words, leave the Merge simplification machinery alone); I'm not sure whether that's the best solution.

we should try to interpret conflicts as a base (the first positive term) and a series of diffs/patches

Firstly, I'd call the first positive term a "side" :)


I once convinced myself that our options are either:

  • Allow simplification cancel any add of a conflict against any remove, and be completely flexible about the order the adds & removes come in. This destroys the identity of diffs in the conflict whenever a simplification happens.
  • Be quite strict about the order and only allow cancelling out neighboring terms. This would allow each "diff" in the conflict to mostly preserve its identity. However, this would mean that if you have X -> Y -> Z and reorder it as X -> Z' -> Y', Y' will usually be conflicted.

So, I've been trying to get all we can get from the first of these options.

I need to look back on my notes/thoughts and see whether I still think that the intermediate options aren't very viable. It's possible that I attached some weight to an idea I no longer believe, namely that a merge of X and Y should be considered the same as the merge of Y and X.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • However, this would mean that if you have X -> Y -> Z and reorder it as X -> Z' -> Y', Y' will usually be conflicted.

That will give you this:

Z'=X+(Z-Y)
Y'=Z'+(Y-X)=X+(Z-Y)+(Y-X)=X+(Z-X)=Z

Simplifying Y' to Z only requires chaining diffs and treating the first term as a diff from the missing state. Right? To clarify, I don't object to chaining diffs (making A->B plus B->C into A->C). My problem is with converting e.g. A->B plus C->D into A->D plus C->B.

Btw, when diffing e.g. A+(B-C) to D+(E-F), I would prefer to display the diffs D-A, E-F, and C-B (reverse of B-C.

Copy link
Contributor Author

@ilyagr ilyagr Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm not addressing the reordering issue, still need to think or chat about that)

Btw, when diffing e.g. A+(B-C) to D+(E-F), I would prefer to display the diffs D-A, E-F, and C-B (reverse of B-C).

This is how what I called DiffOfMerge treats it, more or less (and I am very much considering reworking this PR to make it treat it that way, exactly). I was first intending to present that to the user more or less directly, but then decided it would be too confusing. So, I built all the DiffExplanationAtom machinery on top of that.

For the question of whether it's better to show (C-B) or -(B-C), the main example I have in mind is as follows. Suppose we resolve a conflict A+(B-C) to D where my made-up conflict is:

>>>>>
The blue heron
===============
- The blue pig
+ The flying pig
<<<<<

and the resolution is "The flying heron". If we show the diff D - (A + (B-C)) as (D-A) + (C-B), as you suggested, it'd be:

>>>>>
===== Part 1 diff (D-A)
- The blue heron
+ The flying heron
===== Part 2 diff (C-B)
+ The blue pig
- The flying pig
<<<<<

I find it hard to look at this and see what is going on, even though this is perfectly correct from the perspective of conflict theory.1

If I use my "DiffExplanation" presentation, I'm hoping jj will present this diff as follows:

>>>>>
===== Part 1: Change conflict side
- The blue heron
+ The flying heron
===== Part 2: Removed conflict diff (B-C)
-| - The blue pig
-| + The flying pig
<<<<<

Here, there is a way to immediately see that the conflict was resolved "correctly": I can quickly see that the diff in part 1 means the same thing as the removed diff in part 2. This would be relatively easy to teach to users.


I think there were a couple more examples I had in mind, but I need to remember them and write them down (I am kicking myself a little for not having them written down already). For example, strictly speaking, we could omit any "UnchangedSide"-s from the presentation of the conflict diff, since they are not different, but I believe that could get quite confusing.

Footnotes

  1. Less importantly, I think users who are less familiar with conflict theory will also be confused to see + The blue pig where this was not present anywhere else these conflicts were presented, and because this promotes a conflict base (that people tend to ignore) to a place of great visibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to wrap up a reply,

Simplifying Y' to Z only requires chaining diffs and treating the first term as a diff from the missing state. Right?

It think you are correct; it seems like my example wasn't actually an example. Z' being equivalent to Z seems to rely only on associativity and cancelling out of neighboring terms, no reordering needed.

Copy link
Contributor Author

@ilyagr ilyagr Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you meant ... = B + (A - C) here (as you also say further down).

Yes, thanks, fixed.

And then the same chaining of patches applies,

(This part of the comment is probably less relevant, unless we want to consider quite weird theories as in my footnote above)

I think this chaining is different. If we don't allow commuting, we could say that (A - B) + (B - C) = (A-C) but that (A - C) + (B - A) might differ from (B - C). So, we don't have to say that A + (B - A) = A.

However, as long as we want jj restore --from @- --to X; jj rebase -r @ -d X to never create a conflict, we probably want to say that A + (B-A) and B are equivalent. This has some implications, at the very least that A + (B - C) = C + (B - A) (that is, 2-sided conflicts commute), as I explained above.

Update/attempt at clarification that could be itself confusing: As I said above and say again below below, I'm not sure whether there are conflict theories where (A - C) + (B - A) differs from (B - C) , but A + (B - A) = B (and therefore A + (B - C) = B + (A -C)). If such a theory can be imagined, it'd be a weird one IMO.


OTOH, let's say we assume that (A - C) + (B - A) = (B - C) as you did in your comment (and which seems like a reasonable thing to want). As you pointed out, this definitely means that A + (B - A) = B, which means that A + (B - C) = B + (A - C) as I explained in my previous comment. In other words, your assumption is stronger than mine was (to be precise, it's at least as strong as far as I know at this moment).

I think that this stronger assumption also implies that we can also swap top/bottom terms around at will. For example, (A - C) + (B - D) = (B - C) + (A - D) because:

(A - C) + (B - D) = (A - C) + ( (B - A) + (A - B) ) + (B - D) =
  = ( (A - C) + (B - A) ) + ( (A - B) + (B - D) ) =[simplify 2nd summand]=
  = ( (A - C) + (B - A) ) + ( A - D )  =[by assumption]= 
  = (B - C) + (A - D)

I'm pretty sure the same can be done for the bottom terms, and once you can swap two terms on the top, you can reorder at will.

So, if there's a conflict theory that doesn't allow arbitrary reordering, it would have to consider (A - C) + (B - A) and (B - C) different, at least in some cases (or not be associative, or not allow the more natural simplifications I used above).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all makes sense. Thanks for writing it down. I don't think it's theoretically incorrect to swap bases and sides between diffs. I just think we should avoid it when we can so we don't present the user with a diff that they have never created themselves. OTOH, we do already do that in conflicts markers, but maybe it's more natural there because we show either the diff from the rebased commit or the diff from upstream (in the common case). Maybe we should change the code to only consider doing that swapping with the "upstream" (first positive term in the conflict - that might actually be a decent name for the first term, btw).

Copy link
Contributor

@yuja yuja Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Update from Ilya: I moved this comment to #4259 (comment), since I just put a long post on a different topic in this thread.]


FWIW, I guess it would be easier to follow if conflicts aren't reordered per hunk.

For example, from: A, to: B+(D-C) could be rendered as follows:

# line numbers
#         contents
A B: C D: .. context line ..
A B:    : .. hunk of [A, B] ..   # conflict
   : C D: .. hunk of [C, D] ..
A B: C D: .. context line ..
A B:    : .. hunk of [A, B] ..   # non-conflicting change
    ...
A B: C D: .. context line ..
A B:    : .. hunk of [A, B] ..   # another conflict (in the same order)
   : C D: .. hunk of [C, D] ..
A B: C D: .. context line ..
...

Copy link
Contributor Author

@ilyagr ilyagr Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I think there was a subtle but important mistake in my logic.

TLDR (written after the fact)

Fortunately (amusingly?), my mistakes cancel out, in a way 😅, with similar problems with one of Martin's arguments above. TLDR, what I now again believe is what I said without justification before:

If we are strict about the ordering of diffs in a conflict, then if you have commits X -> Y -> Z and reorder it as X -> Z' -> Y', Y' will usually be conflicted. In other words, if we require Y'=Z in this example, it follows that conflicts with the same set of adds/removes (in any order) must be equivalent.

What still seems true

The part that continues to hold is that if you assume that

(A - B) + (B - C) = A - C = (B - C) + (A - B)       (*)

then you can reorder the adds or removes arbitrarily. I repeat that proof in a better notation below.

In particular, if we don't want to allow arbitrary reordering, we have to forbid reordering patches in a conflict (which would imply (*)).

Mistakes

There are two things that I now believe were mistaken, for similar reasons:

  1. My argument that A + (B - A) = B by itself implies that A + (B - C) = B + (A-C). I must have mistakenly used both equalities from (*) in my "proof", and got away with it because the notation I used was subtly confusing when "+" is not commutative.
  2. Martin's argument from Explaining diffs of Merge<T> (structured conflicts part 1) #4259 (comment), if you do not assume both sides of (*)

Notation problem

One way to describe the problem is to say that the notation A + (B - C) is bad because it doesn't properly alternate signs. A + (-C + B) or (B - C) + A is better1. Both of these are a bit awkward to consistently type, so I suggest going back to notation I used a long time ago: [A, C -> B] instead of A + (-C + B). Here, , is really a composition operation, perhaps A∘(C->B) would be better, but it's harder to type. (For typing, the Unicode symbol is un-intuitively called the "ring operator", U+2218). Or should we settle on A*(C->B)?

With this notation, the "obvious" simplification equality is [C -> B, B -> A] = [C -> A]. Here, the diffs happen from left to right: turning C into B, and then turning B into A is the same as turning C into A. In the old notation ordered correctly, this would be (-C + B) + (-B + A) = (-C + A), which is actually the more awkward-looking right equality in (*). The add of the first diff cancels out with the remove of the second diff.

Mistake 1

With the better notation, both [A, A->B] = B and [B, A->A] = B look "obvious". The first one is the natural direction of simplification, while the second one is the separate axiom that A->A is a noop. The first one is the equality that used to be written as A + (B-A) = A, and which I previously thought was an extra assumption we needed to make.

I no longer think that these by themselves should imply that [A, B ->C] = [C, B->A] or anything else about reordering of terms.

Mistake 2

Let's now go over Martin's argument with a better notation. As before, we start with the commits X -> Y -> Z and reorder them via rebase to X -> Z' -> Y', where rebase of Z from Y onto X is defined to result in the conflict [X, Y -> Z]. So,

Z' = [X, Y-> Z]
Y' = [Z', X->Y] = [X, Y->Z, X->Y]

Without the "less natural" direction of (*) and without reordering of terms, there is no way to simplify Y'.

As I said before, and will restate below, if we allowed the simplification [Y->Z, X->Y]=[X->Z], it would follow that any reordering of top/bottom terms is possible.

Restating the "part that still holds" in the new notation

For completeness, let's restate the "part that still hold" from the beginning of the post in the better notation.

We always assume that [C -> A, A -> B] = [C -> B]. The claim is that if we also assume the opposite "less natural" direction, [A -> B, C -> A] = [C -> B], then we can prove that [A -> B, C -> D] = [C -> B, A -> D]. The proof goes as:

[A -> B, C -> D] = [A -> B, (C->A, A->C), C->D] =
  = [(A -> B, C -> A), (A -> C, C -> D)] =[simplify right parentheses]=
  = [(A -> B, C -> A), A -> D] =[less natural simplification we assumed]=
  = [C -> B, A -> D]

As before,

  • this implies that the "adds" (Update: "removes") of a conflict can be reordered arbitrarily (though "adds"/"removes" are not good names in this non-commutative case). Update: We could show the same thing for "adds" by inserting (B -> D, D -> B) after the first equality instead of (C->A, A->C).
  • the "less natural simplification" and therefore arbitrary reordering would follow immediately if we allowed [A->B, C->D] = [C->D, A->B].
  • this proof uses the associativity of patch composition, which I represented using parentheses. This conclusion wouldn't hold if we compromised on that, probably, but that would be confusing in its own way.

Footnotes

  1. These two correspond to different ways of thinking about which way patches compose, though! I'll describe the first ordering in detail below, but if we wanted the left equality of (*) to be the "natural" one (as I was mostly assuming in my arguments in previous posts), we'd have to use the second ordering or, equivalently, something like [B <- C; A]. We'd think of diffs as happening right-to-left.

Copy link
Contributor Author

@ilyagr ilyagr Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was chatting about this, and I think it's worthwhile to write down a simpler argument that's slightly less powerful and uses less precise notation but is shorter and easier to understand.

The claim is that if you allow reordering of diffs (B - C) + (D - E) = (D - E) + (B - C) then it follows that (B - C) + (D - E) = (D - C) + (B - E). If you believe this argument, analogous arguments imply that you can reorder adds and removes of any conflict arbitrarily.

The argument is:

(B - C) + (D - E) = (B - C) + ( (D - B) + (B - D) ) + (D - E) =
=reordering_diffs=  ((D - B) + (B - C)) + ((B - D) + (D - E))=
(D - C) + (B - E)

If you write this down in more precise notation, you'll notice that in this shorter argument I'm using the "backwards" cancellation rule. However, since we assumed that the order of diffs doesn't matter, that's fine.

@yuja
Copy link
Contributor

yuja commented Aug 13, 2024

I don't follow the idea of #4251, but does it make sense to diff from/to file values all at once, and post-process each hunk? I think the output would look similar to materialize_merge_result().

For example, diff(from = A+(C-B), to = D) will first calculate Diff::for_tokenizer([A, D, C, B]). Then, for each hunk, hunk_a, hunk_d pair is rendered normally (corresponding to "base" in materialized conflict), and the other pairs are rendered as conflicts.

Just an idea. I don't know how the rendering result would look.

if self.values.len() == 1 {
return self;
}
self.values = simplify_internal(self.values.to_vec()).into();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use .into_vec() to avoid allocation, but maybe we can make simplify_internal() not consume the input vec instead.

@martinvonz
Copy link
Member

Btw, also note that it makes sense to display diffs involving conflicts as just repeated diffs. That includes the --git output. Actually, I just checked that you can apply a patch like the following using git apply :)

diff --git a/file b/file
index 7898192..6178079 100644
--- a/file
+++ b/file
@@ -1 +1 @@
-a
+b
diff --git a/file b/file
index 7898192..6178079 100644
--- a/file
+++ b/file
@@ -1 +1 @@
-b
+c

@ilyagr
Copy link
Contributor Author

ilyagr commented Aug 13, 2024

I don't follow the idea of #4251, but does it make sense to diff from/to file values all at once, and post-process each hunk? I think the output would look similar to materialize_merge_result().

For example, diff(from = A+(C-B), to = D) will first calculate Diff::for_tokenizer([A, D, C, B]). Then, for each hunk, hunk_a, hunk_d pair is rendered normally (corresponding to "base" in materialized conflict), and the other pairs are rendered as conflicts.

Just an idea. I don't know how the rendering result would look.

I'm not sure I'm following exactly, but what you are saying sounds close to what I'm doing. In the next commit of #4251 (outside this PR, called "lib merge: Refactor merge of file hunks"), there is a split_merge_into_hunk that takes in files in a conflict and uses Diff::by_line to split that into hunks. The hunks then become the strings that are T in this PR, and it's the hunk conflicts that are explained to the user.

In particular, I didn't understand what you mean by "diffing file values all at once", whether you favor that or whether you thought that's what I was doing wrongly (I don't believe I'm doing that currently).

@ilyagr
Copy link
Contributor Author

ilyagr commented Aug 13, 2024

Btw, also note that it makes sense to display diffs involving conflicts as just repeated diffs. That includes the --git output. Actually, I just checked that you can apply a patch like the following using git apply :)

This does make sense, but it becomes very confusing to separate the diffs from Added diff making the conflict more complicated and diffs from Changed a Side of a conflict, which mean quite different things. For the user (and even more so for blame), I think both types of messages can be helpful if that results in a shorter explanation.

Also, if the sequence of diffs applies cleanly, I think that implies that the conflict could be simplified to unconflicted text to begin with.

@yuja
Copy link
Contributor

yuja commented Aug 14, 2024

In particular, I didn't understand what you mean by "diffing file values all at once",

files::merge(single_hunk) diffs contents from all Merge<_> values, and materialize_merge_result() shows them inline, I just thought we could process conflict diffs in a similar way. I assumed this PR is to pre-process conflicts. My idea is doing that later (at hunk level, without reordering based on distance?) I think it's closer to Martin's idea.

BTW, I don't know if we prefer file -> hunk -> conflict -> diff_lines order over file -> conflict -> hunk -> diff_lines when rendering diffs.

@ilyagr
Copy link
Contributor Author

ilyagr commented Aug 14, 2024

files::merge(single_hunk) diffs contents from all Merge<_> values, and materialize_merge_result() shows them inline, I just thought we could process conflict diffs in a similar way.

I think that's what I'm doing. If not, I didn't understand what you mean.

I assumed this PR is to preprocess conflicts.

Maybe the ordering of commits is confusing, this PR actually mostly post-processes conflicts of hunks (after the file conflicts are turned into a sequence of hunk conflicts, for each hunk, though I'm no longer sure we mean the same thing by the word "hunk"). The distance is taken into account when processing these hunks, very similarly to how materialize_merge_conflict takes the distance into account when materializing conflicts.

Aside: The reason for the confusing order is that I wanted to look over the commits that split files into conflicted hunks again (what I called "Part 2" in the outline in #4251; I just edited it to try to make it clearer what that is). I tried to de-duplicate the functionality between conflict diffs and conflict materialization, but as a result the former will be less optimized.

BTW, I don't know if we prefer file -> hunk -> conflict -> diff_lines order over file -> conflict -> hunk -> diff_lines when rendering diffs.

What's the difference? In my terminology, a "conflict" that this code deals with is essentially a Merge<hunk> (and the hunk is unconflicted if the Merge is resolved).

Perhaps what you are suggesting that my conflict hunks can get pretty long, and that one of them could be split into several hunks. In my language, this would replace one conflicted hunk with a sequence of two (or more) conflicted hunks that could then be simplified and presented independently. Some of these pieces perhaps would even no longer be conflicted after that simplification.

If so, I think this would be great, and in my mind the place for this is inside the algorithm that splits the original single-hunk conflict (the conflict where each side is as long as the entire file) into a sequence of conflicted hunks. Currently, that happens in later commits outside this PR (again, what I called "Part 2"). I do this very naively (it's really a placeholder algorithm for a proof of concept), but I do do it. I was hoping to postpone thinking about improving the algorithm (which could involve some non-trivial computer science, but there could also be some low-effort improvements) until later PRs.

without accounting for distance

Taking distance into account seems significant for blame/absorb: that's how you tell, in a commit that resolves a conflict, which side of the conflict a line in the resolution comes from. Maybe there is another way of doing things, but it's not clear to me yet what that would be.

@yuja
Copy link
Contributor

yuja commented Aug 14, 2024

Maybe the ordering of commits is confusing, this PR actually mostly post-processes conflicts of hunks (after the file conflicts are turned into a sequence of hunk conflicts, for each hunk,

Maybe I misunderstood that. Thanks for the clarification. I thought the added function would process Merge<TreeValue> or something.

So the difference is whether or not reorder (and classify?) hunks, I think.

BTW, I don't know if we prefer file -> hunk -> conflict -> diff_lines order over file -> conflict -> hunk -> diff_lines when rendering diffs.

What's the difference?

I mean the former is

// file -> hunk -> conflict -> diff_lines
for file in tree_diff {
    show_file_header(file);
    for hunk in Diff::by_line([from, to, from, to, ...]) {
        match hunk {
            Different([from, to, from, to, ...]) => {
                for (from, to) in [...].chunks(2) {
                    show_color_words_hunk(from, to)
                }
            }
        }
    }
}

and the latter is

// file -> conflict -> hunk -> diff_lines
for file in tree_diff {
    for (from, to) in [from, to, from, to, ...].chunks(2) {
        show_file_header(file);
        for hunk in Diff::by_line([from, to]) {
            match hunk {
                for (from, to) in [...] {
                    show_color_words_hunk(from, to)
                }
            }
        }
    }
}

Implementation-wise, the latter is simple, but we'll probably want the former?

without accounting for distance

Taking distance into account seems significant for blame/absorb: that's how you tell, in a commit that resolves a conflict, which side of the conflict a line in the resolution comes from. Maybe there is another way of doing things, but it's not clear to me yet what that would be.

I have no idea about absorb. For blame, I might want a predictable result that doesn't shuffle diff pairs, but I'm not sure. Maybe I wouldn't care much about the quality of annotation at conflicted commit so long as the originating commits are reachable from the context commit.

@martinvonz
Copy link
Member

Implementation-wise, the latter is simple, but we'll probably want the former?

Yes, my example from #4259 (comment) was to illustrate that it's always correct (but not necessarily intuitive) to display diffs involving conflicts as repeated diffs. For --git format, that's perhaps the best we can do, but it depends on how much we want to abuse the format (in different ways either way). For --color-words, we can probably do something more intuitive. (And, again, maybe we can for --git too.)

Taking distance into account seems significant for blame/absorb: that's how you tell, in a commit that resolves a conflict, which side of the conflict a line in the resolution comes from. Maybe there is another way of doing things, but it's not clear to me yet what that would be.

I think the algorithm in #4176 does not require measuring edit distances. It basically just keeps track of where each line came from by walking backwards through history until the line doesn't exist in the parents. So I think that handles the case where you resolve a conflict in a merge commit.

There's also the case of calculating blame across linear commits where some commits have conflicts. I'm not sure what makes most sense to do there. Maybe we consider a line as coming from a parent with a conflict if it appears in any side (positive term) of the conflict.

@ilyagr
Copy link
Contributor Author

ilyagr commented Aug 18, 2024

(Re reordering)

I don't have a final conclusion, but I just figured out a theoretical limitation on forbidding reordering. It seems to forbid too much, see #4259 (comment).

In addition, I'll mention more explicitly the main example that floats in my mind about reordering:

Let's say we have a commit X with conflict A + (B-C) and the user resolves it to just B. So, our commit tree is ... -> X -> B. Then,

  • The diff between B and X is annoying to have to present as (B-A) and -(B-C), as opposed to just -(A-C). Here, the notation is from Explaining diffs of Merge<T> (structured conflicts part 1) #4259 (comment).
  • For blame, it's nice if we could attribute the lines from B to the old B side of the conflict. Then, for example, if X were a merge, we could blame the correct parent for the line.
  • This still seems to apply if the user's resolution is B' that's really similar to but not quite equal to B. This is where something like the diff distance becomes relevant.

Copy link
Contributor Author

@ilyagr ilyagr Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a thread for @yuja 's comment #4259 (comment), copied below, so that it doesn't get lost:


FWIW, I guess it would be easier to follow if conflicts aren't reordered per hunk.

For example, from: A, to: B+(D-C) could be rendered as follows:

# line numbers
#         contents
A B: C D: .. context line ..
A B:    : .. hunk of [A, B] ..   # conflict
   : C D: .. hunk of [C, D] ..
A B: C D: .. context line ..
A B:    : .. hunk of [A, B] ..   # non-conflicting change
    ...
A B: C D: .. context line ..
A B:    : .. hunk of [A, B] ..   # another conflict (in the same order)
   : C D: .. hunk of [C, D] ..
A B: C D: .. context line ..
...

@ilyagr ilyagr force-pushed the expldiff branch 2 times, most recently from 96ca089 to b360d62 Compare August 23, 2024 01:38
@ilyagr ilyagr marked this pull request as draft September 2, 2024 01:53
@ilyagr ilyagr force-pushed the expldiff branch 3 times, most recently from c276599 to eddc8ee Compare September 5, 2024 22:19
@ilyagr ilyagr force-pushed the expldiff branch 2 times, most recently from 576b347 to 58d9498 Compare September 28, 2024 04:08
@ilyagr ilyagr force-pushed the expldiff branch 3 times, most recently from 877d6a0 to 9ce8a45 Compare October 14, 2024 03:18
@ilyagr ilyagr force-pushed the expldiff branch 2 times, most recently from caf1d84 to a545479 Compare November 16, 2024 02:00
These will be used for diffs of Merge-s.

This also refactors one of the less important tests. I did it mostly automatically with `ast-grep` (which I don't necessarily recommend), but we could also just delete this test I think.
This reorders the result slightly and makes it easier to generalize this to work on diffs of merges (where there is the same number of adds and removes).
This represents a diff of two merges, up to reordering and cancelling
out of terms. It can also be thought of as a stack of `n` diffs.

This type is not very suitable for presenting to the user (since it
loses information they might want to see), but is very natural from the
perspective of conflict theory.
…tion

For textual conflicts, the distance function will be the size of the diff between two chunks of text.
…ture

This turns two Merge-s into an object that represents a way to present their difference to the user, in the form of `Vec<DiffExplanationAtom>`.

`DiffExplanationAtom` is the key structure for the data model of differences between diffs.

The algorithm is inefficient and far from perfect, for now my focus is on the data model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants