Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log: --remerge-diff needs to keep around commit parents #1825

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dscho
Copy link
Member

@dscho dscho commented Nov 8, 2024

This fixes a bug that is pretty much as old as the remerge-diff machinery itself. I noticed it while writing my reply to Hannes Sixt's concerns about my range-diff --diff-merges=<mode> patch (https://lore.kernel.org/git/[email protected]/).

Changes since v1:

  • Amended the code comment of the affected conditional code block in harmony with the newly-added condition.

Cc: Johannes Sixt [email protected]
Cc: Elijah Newren [email protected]

@dscho dscho force-pushed the log-remerge-diff-needs-commit-parents branch from bd9d036 to 6823831 Compare November 8, 2024 11:37
@dscho
Copy link
Member Author

dscho commented Nov 8, 2024

/submit

Copy link

gitgitgadget bot commented Nov 8, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1825/dscho/log-remerge-diff-needs-commit-parents-v1

To fetch this version to local tag pr-1825/dscho/log-remerge-diff-needs-commit-parents-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1825/dscho/log-remerge-diff-needs-commit-parents-v1

Copy link

gitgitgadget bot commented Nov 8, 2024

On the Git mailing list, Elijah Newren wrote (reply to this):

On Fri, Nov 8, 2024 at 5:43 AM Johannes Schindelin via GitGitGadget
<[email protected]> wrote:
>
> From: Johannes Schindelin <[email protected]>
>
> To show a remerge diff, the merge needs to be recreated. For that to
> work, the merge base(s) need to be found, which means that the commits'
> parents have to be traversed until common ancestors are found (if any).
>
> However, one optimization that hails all the way back to
> cb115748ec0d (Some more memory leak avoidance, 2006-06-17) is to release
> the commit's list of parents immediately after showing it. This can break
> the merge base computation.
>
> Note that it matters more clearly when traversing the commits in
> reverse: In that instance, if a parent of a merge commit has been shown
> as part of the `git log` command, by the time the merge commit's diff
> needs to be computed, that parent commit's list of parent commits will
> have been set to `NULL` and as a result no merge base will be found.
>
> Let's fix this by special-casing the `remerge_diff` mode, similar to
> what we did with reflogs in f35650dff6a4 (log: do not free parents when
> walking reflog, 2017-07-07).
>
> Signed-off-by: Johannes Schindelin <[email protected]>
> ---
>     log: --remerge-diff needs to keep around commit parents
>
>     This fixes a bug that is pretty much as old as the remerge-diff
>     machinery itself. I noticed it while writing my reply to Hannes Sixt's
>     concerns about my range-diff --diff-merges=<mode> patch
>     (https://lore.kernel.org/git/[email protected]/).
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1825%2Fdscho%2Flog-remerge-diff-needs-commit-parents-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1825/dscho/log-remerge-diff-needs-commit-parents-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1825
>
>  builtin/log.c           | 2 +-
>  t/t4069-remerge-diff.sh | 7 +++++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/log.c b/builtin/log.c
> index c0a8bb95e98..a297c6caf59 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -522,7 +522,7 @@ static int cmd_log_walk_no_free(struct rev_info *rev)
>                          * but we didn't actually show the commit.
>                          */
>                         rev->max_count++;
> -               if (!rev->reflog_info) {
> +               if (!rev->reflog_info && !rev->remerge_diff) {
>                         /*
>                          * We may show a given commit multiple times when
>                          * walking the reflogs.

Should this comment be updated to reflect the expanded rationale for
this block's purpose?

> diff --git a/t/t4069-remerge-diff.sh b/t/t4069-remerge-diff.sh
> index 07323ebafe0..a68c6bfa036 100755
> --- a/t/t4069-remerge-diff.sh
> +++ b/t/t4069-remerge-diff.sh
> @@ -317,4 +317,11 @@ test_expect_success 'remerge-diff turns off history simplification' '
>         test_cmp expect actual
>  '
>
> +test_expect_success 'remerge-diff with --reverse' '
> +       git log -1 --remerge-diff --oneline ab_resolution^ >expect &&
> +       git log -1 --remerge-diff --oneline ab_resolution >>expect &&
> +       git log -2 --remerge-diff --oneline ab_resolution --reverse >actual &&
> +       test_cmp expect actual
> +'
> +
>  test_done
>
> base-commit: bea9ecd24b0c3bf06cab4a851694fe09e7e51408
> --
> gitgitgadget

Wow, nice find, and I appreciate the nice simple testcase
demonstrating what had been the problem.

Copy link

gitgitgadget bot commented Nov 11, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <[email protected]>
writes:

> From: Johannes Schindelin <[email protected]>
>
> To show a remerge diff, the merge needs to be recreated. For that to
> work, the merge base(s) need to be found, which means that the commits'
> parents have to be traversed until common ancestors are found (if any).
>
> However, one optimization that hails all the way back to
> cb115748ec0d (Some more memory leak avoidance, 2006-06-17) is to release
> the commit's list of parents immediately after showing it. This can break
> the merge base computation.

> Note that it matters more clearly when traversing the commits in
> reverse: In that instance, if a parent of a merge commit has been shown
> as part of the `git log` command, by the time the merge commit's diff
> needs to be computed, that parent commit's list of parent commits will
> have been set to `NULL` and as a result no merge base will be found.

Ouch.

I am curious about "more clearly" in the above, though.  Do we also
lose parents before a base can be computed during a forward
traversal?  Unlike the reflog walk, we won't show the same commit
twice, so unless we are traversing some of the history of our
parents *and* showing these ancestors found there _before_ we need
to show the merge in question, I do not think the issue would
surface during a forward traversal.  So the problematic case is when
we are not doing topological display, and traversing down to a
commit with skewed timestamp causes it to be shown (and lose its
parents due to memory optimization) because it appears much newer
than our merge, and there is an ancestry chain leading to it that
passes commits other than our merge, or something?

> Let's fix this by special-casing the `remerge_diff` mode, similar to
> what we did with reflogs in f35650dff6a4 (log: do not free parents when
> walking reflog, 2017-07-07).

OK.

> diff --git a/builtin/log.c b/builtin/log.c
> index c0a8bb95e98..a297c6caf59 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -522,7 +522,7 @@ static int cmd_log_walk_no_free(struct rev_info *rev)
>  			 * but we didn't actually show the commit.
>  			 */
>  			rev->max_count++;
> -		if (!rev->reflog_info) {
> +		if (!rev->reflog_info && !rev->remerge_diff) {
>  			/*
>  			 * We may show a given commit multiple times when
>  			 * walking the reflogs.

OK, that's trivial ;-)

> diff --git a/t/t4069-remerge-diff.sh b/t/t4069-remerge-diff.sh
> index 07323ebafe0..a68c6bfa036 100755
> --- a/t/t4069-remerge-diff.sh
> +++ b/t/t4069-remerge-diff.sh
> @@ -317,4 +317,11 @@ test_expect_success 'remerge-diff turns off history simplification' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'remerge-diff with --reverse' '
> +	git log -1 --remerge-diff --oneline ab_resolution^ >expect &&
> +	git log -1 --remerge-diff --oneline ab_resolution >>expect &&
> +	git log -2 --remerge-diff --oneline ab_resolution --reverse >actual &&
> +	test_cmp expect actual
> +'

This is the trivially reproducible "show in reverse" case.  Nice
finding.

Will queue.

Copy link

gitgitgadget bot commented Nov 11, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Elijah Newren <[email protected]> writes:

>> -               if (!rev->reflog_info) {
>> +               if (!rev->reflog_info && !rev->remerge_diff) {
>>                         /*
>>                          * We may show a given commit multiple times when
>>                          * walking the reflogs.
>
> Should this comment be updated to reflect the expanded rationale for
> this block's purpose?

Ah, we probably should.  Especially if the newly discovered failure
mode is not due to having to show the same commit multiple times.

    During reflog walk, we may show the same commit more than once,
    so do not discard the parents.  There may be a merge commit that
    is descendent from the current commit, and in order to show it with
    remerge-diff enabled, we need its ancestry chain, including our
    parents, to compute the merge base.

or something?  It is still not clear to me in what scenario a merge
commit, in a forward traversal, is shown _after_ one of its ancestor
commit is shown (and due to the memory optimization, loses its
parents).

Thanks, both.

Copy link

gitgitgadget bot commented Nov 11, 2024

This patch series was integrated into seen via git@cd7e6ff.

@gitgitgadget gitgitgadget bot added the seen label Nov 11, 2024
To show a remerge diff, the merge needs to be recreated. For that to
work, the merge base(s) need to be found, which means that the commits'
parents have to be traversed until common ancestors are found (if any).

However, one optimization that hails all the way back to
cb11574 (Some more memory leak avoidance, 2006-06-17) is to release
the commit's list of parents immediately after showing it. This can break
the merge base computation.

Note that it matters more clearly when traversing the commits in
reverse: In that instance, if a parent of a merge commit has been shown
as part of the `git log` command, by the time the merge commit's diff
needs to be computed, that parent commit's list of parent commits will
have been set to `NULL` and as a result no merge base will be found.

Let's fix this by special-casing the `remerge_diff` mode, similar to
what we did with reflogs in f35650d (log: do not free parents when
walking reflog, 2017-07-07).

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho force-pushed the log-remerge-diff-needs-commit-parents branch from 6823831 to afff27f Compare November 11, 2024 17:24
@dscho
Copy link
Member Author

dscho commented Nov 11, 2024

/submit

Copy link

gitgitgadget bot commented Nov 11, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1825/dscho/log-remerge-diff-needs-commit-parents-v2

To fetch this version to local tag pr-1825/dscho/log-remerge-diff-needs-commit-parents-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1825/dscho/log-remerge-diff-needs-commit-parents-v2

Copy link

gitgitgadget bot commented Nov 11, 2024

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Mon, 11 Nov 2024, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <[email protected]>
> writes:
>
> > From: Johannes Schindelin <[email protected]>
> >
> > To show a remerge diff, the merge needs to be recreated. For that to
> > work, the merge base(s) need to be found, which means that the commits'
> > parents have to be traversed until common ancestors are found (if any).
> >
> > However, one optimization that hails all the way back to
> > cb115748ec0d (Some more memory leak avoidance, 2006-06-17) is to release
> > the commit's list of parents immediately after showing it. This can break
> > the merge base computation.
>
> > Note that it matters more clearly when traversing the commits in
> > reverse: In that instance, if a parent of a merge commit has been shown
> > as part of the `git log` command, by the time the merge commit's diff
> > needs to be computed, that parent commit's list of parent commits will
> > have been set to `NULL` and as a result no merge base will be found.
>
> Ouch.
>
> I am curious about "more clearly" in the above, though.

I consider these examples less clear, but they are still affected:

	git show --remerge-diff v2.45.2^0
	vs
	git show --remerge-diff v2.44.2^0 v2.45.2^0

	git show --remerge-diff v2.45.1~1
	vs
	git log --topo-order --first-parent --remerge-diff v2.44.2 v2.45.1

Concretely, these diffs should be empty, but are not:

	git diff --no-index \
		<(git show --remerge-diff v2.45.2^0 | sed 1d) \
		<(git show --remerge-diff v2.44.2^0 v2.45.2^0 | sed 1,/^commit/d)

	and

	git diff --no-index \
		<(git show --remerge-diff v2.45.1~1 | grep -v ^commit) \
		<(git log --topo-order --first-parent -6 --remerge-diff v2.44.2 v2.45.1 |
		  sed '1,/^commit 1c00f92eb5ee4a48ab615eefa41f2dd6024d43bc/d;/^commit/,$d')

No `--reverse` required, not even clock skew.

Ciao,
Johannes

Copy link

gitgitgadget bot commented Nov 11, 2024

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <[email protected]> writes:

> I consider these examples less clear, but they are still affected:
>
> 	git show --remerge-diff v2.45.2^0
> 	vs
> 	git show --remerge-diff v2.44.2^0 v2.45.2^0
>
> 	git show --remerge-diff v2.45.1~1
> 	vs
> 	git log --topo-order --first-parent --remerge-diff v2.44.2 v2.45.1
>
> Concretely, these diffs should be empty, but are not:
>
> 	git diff --no-index \
> 		<(git show --remerge-diff v2.45.2^0 | sed 1d) \
> 		<(git show --remerge-diff v2.44.2^0 v2.45.2^0 | sed 1,/^commit/d)
>
> 	and
>
> 	git diff --no-index \
> 		<(git show --remerge-diff v2.45.1~1 | grep -v ^commit) \
> 		<(git log --topo-order --first-parent -6 --remerge-diff v2.44.2 v2.45.1 |
> 		  sed '1,/^commit 1c00f92eb5ee4a48ab615eefa41f2dd6024d43bc/d;/^commit/,$d')
>
> No `--reverse` required, not even clock skew.

As we often tell contributors, questions in reviewer comments should
be also answered in an updated patch, so that future readers of "git
log", who cannot ask direct questions to the author of the patch, do
not have to ask the same questions.  Can we add an explanation how
this affects forward traversal in a three-line paragraph?

Thanks.

Copy link

gitgitgadget bot commented Nov 12, 2024

This patch series was integrated into seen via git@70e0e2f.

Copy link

gitgitgadget bot commented Nov 13, 2024

This patch series was integrated into seen via git@0e49e1e.

Copy link

gitgitgadget bot commented Nov 15, 2024

This patch series was integrated into seen via git@e00edba.

Copy link

gitgitgadget bot commented Nov 16, 2024

This patch series was integrated into seen via git@ee5a48d.

Copy link

gitgitgadget bot commented Nov 18, 2024

This patch series was integrated into seen via git@7ca4a92.

Copy link

gitgitgadget bot commented Nov 19, 2024

This patch series was integrated into seen via git@9fb6743.

Copy link

gitgitgadget bot commented Nov 20, 2024

This patch series was integrated into seen via git@43be7c6.

Copy link

gitgitgadget bot commented Nov 21, 2024

This patch series was integrated into seen via git@d68487a.

Copy link

gitgitgadget bot commented Nov 22, 2024

This patch series was integrated into seen via git@7b178ed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant