Skip to content

Commit

Permalink
pseudo-merge: implement support for selecting pseudo-merge commits
Browse files Browse the repository at this point in the history
Teach the new pseudo-merge machinery how to select non-bitmapped commits
for inclusion in different pseudo-merge group(s) based on a handful of
criteria.

Note that the selected pseudo-merge commits aren't actually used or
written anywhere yet. This will be done in the following commit.

Signed-off-by: Taylor Blau <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
  • Loading branch information
ttaylorr authored and gitster committed May 24, 2024
1 parent 5831f8a commit faf558b
Show file tree
Hide file tree
Showing 7 changed files with 747 additions and 0 deletions.
2 changes: 2 additions & 0 deletions Documentation/config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,8 @@ include::config/apply.txt[]

include::config/attr.txt[]

include::config/bitmap-pseudo-merge.txt[]

include::config/blame.txt[]

include::config/branch.txt[]
Expand Down
91 changes: 91 additions & 0 deletions Documentation/config/bitmap-pseudo-merge.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
NOTE: The configuration options in `bitmapPseudoMerge.*` are considered
EXPERIMENTAL and may be subject to change or be removed entirely in the
future. For more information about the pseudo-merge bitmap feature, see
the "Pseudo-merge bitmaps" section of linkgit:gitpacking[7].

bitmapPseudoMerge.<name>.pattern::
Regular expression used to match reference names. Commits
pointed to by references matching this pattern (and meeting
the below criteria, like `bitmapPseudoMerge.<name>.sampleRate`
and `bitmapPseudoMerge.<name>.threshold`) will be considered
for inclusion in a pseudo-merge bitmap.
+
Commits are grouped into pseudo-merge groups based on whether or not
any reference(s) that point at a given commit match the pattern, which
is an extended regular expression.
+
Within a pseudo-merge group, commits may be further grouped into
sub-groups based on the capture groups in the pattern. These
sub-groupings are formed from the regular expressions by concatenating
any capture groups from the regular expression, with a '-' dash in
between.
+
For example, if the pattern is `refs/tags/`, then all tags (provided
they meet the below criteria) will be considered candidates for the
same pseudo-merge group. However, if the pattern is instead
`refs/remotes/([0-9])+/tags/`, then tags from different remotes will
be grouped into separate pseudo-merge groups, based on the remote
number.

bitmapPseudoMerge.<name>.decay::
Determines the rate at which consecutive pseudo-merge bitmap
groups decrease in size. Must be non-negative. This parameter
can be thought of as `k` in the function `f(n) = C * n^-k`,
where `f(n)` is the size of the `n`th group.
+
Setting the decay rate equal to `0` will cause all groups to be the
same size. Setting the decay rate equal to `1` will cause the `n`th
group to be `1/n` the size of the initial group. Higher values of the
decay rate cause consecutive groups to shrink at an increasing rate.
The default is `1`.
+
If all groups are the same size, it is possible that groups containing
newer commits will be able to be used less often than earlier groups,
since it is more likely that the references pointing at newer commits
will be updated more often than a reference pointing at an old commit.

bitmapPseudoMerge.<name>.sampleRate::
Determines the proportion of non-bitmapped commits (among
reference tips) which are selected for inclusion in an
unstable pseudo-merge bitmap. Must be between `0` and `1`
(inclusive). The default is `1`.

bitmapPseudoMerge.<name>.threshold::
Determines the minimum age of non-bitmapped commits (among
reference tips, as above) which are candidates for inclusion
in an unstable pseudo-merge bitmap. The default is
`1.week.ago`.

bitmapPseudoMerge.<name>.maxMerges::
Determines the maximum number of pseudo-merge commits among
which commits may be distributed.
+
For pseudo-merge groups whose pattern does not contain any capture
groups, this setting is applied for all commits matching the regular
expression. For patterns that have one or more capture groups, this
setting is applied for each distinct capture group.
+
For example, if your capture group is `refs/tags/`, then this setting
will distribute all tags into a maximum of `maxMerges` pseudo-merge
commits. However, if your capture group is, say,
`refs/remotes/([0-9]+)/tags/`, then this setting will be applied to
each remote's set of tags individually.
+
Must be non-negative. The default value is 64.

bitmapPseudoMerge.<name>.stableThreshold::
Determines the minimum age of commits (among reference tips,
as above, however stable commits are still considered
candidates even when they have been covered by a bitmap) which
are candidates for a stable a pseudo-merge bitmap. The default
is `1.month.ago`.
+
Setting this threshold to a smaller value (e.g., 1.week.ago) will cause
more stable groups to be generated (which impose a one-time generation
cost) but those groups will likely become stale over time. Using a
larger value incurs the opposite penalty (fewer stable groups which are
more useful).

bitmapPseudoMerge.<name>.stableSize::
Determines the size (in number of commits) of a stable
psuedo-merge bitmap. The default is `512`.
83 changes: 83 additions & 0 deletions Documentation/gitpacking.txt
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,89 @@ can take advantage of the fact that we only care about the union of
objects reachable from all of those tags, and answer the query much
faster.

=== Configuration

Reference tips are grouped into different pseudo-merge groups according
to two criteria. A reference name matches one or more of the defined
pseudo-merge patterns, and optionally one or more capture groups within
that pattern which further partition the group.

Within a group, commits may be considered "stable", or "unstable"
depending on their age. These are adjusted by setting the
`bitmapPseudoMerge.<name>.stableThreshold` and
`bitmapPseudoMerge.<name>.threshold` configuration values, respectively.

All stable commits are grouped into pseudo-merges of equal size
(`bitmapPseudoMerge.<name>.stableSize`). If the `stableSize`
configuration is set to, say, 100, then the first 100 commits (ordered
by committer date) which are older than the `stableThreshold` value will
form one group, the next 100 commits will form another group, and so on.

Among unstable commits, the pseudo-merge machinery will attempt to
combine older commits into large groups as opposed to newer commits
which will appear in smaller groups. This is based on the heuristic that
references whose tip commit is older are less likely to be modified to
point at a different commit than a reference whose tip commit is newer.

The size of groups is determined by a power-law decay function, and the
decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`,
where `f(n)` describes the size of the `n`-th pseudo-merge group. The
sample rate controls what percentage of eligible commits are considered
as candidates. The threshold parameter indicates the minimum age (so as
to avoid including too-recent commits in a pseudo-merge group, making it
less likely to be valid). The "maxMerges" parameter sets an upper-bound
on the number of pseudo-merge commits an individual group

The "stable"-related parameters control "stable" pseudo-merge groups,
comprised of a fixed number of commits which are older than the
configured "stable threshold" value and may be grouped together in
chunks of "stableSize" in order of age.

The exact configuration for pseudo-merges is as follows:

include::config/bitmap-pseudo-merge.txt[]

=== Examples

Suppose that you have a repository with a large number of references,
and you want a bare-bones configuration of pseudo-merge bitmaps that
will enhance bitmap coverage of the `refs/` namespace. You may start
wiht a configuration like so:

[bitmapPseudoMerge "all"]
pattern = "refs/"
threshold = now
stableThreshold = never
sampleRate = 100
maxMerges = 64

This will create pseudo-merge bitmaps for all references, regardless of
their age, and group them into 64 pseudo-merge commits.

If you wanted to separate tags from branches when generating
pseudo-merge commits, you would instead define the pattern with a
capture group, like so:

[bitmapPseudoMerge "all"]
pattern = "refs/(heads/tags)/"

Suppose instead that you are working in a fork-network repository, with
each fork specified by some numeric ID, and whose refs reside in
`refs/virtual/NNN/` (where `NNN` is the numeric ID corresponding to some
fork) in the network. In this instance, you may instead write something
like:

[bitmapPseudoMerge "all"]
pattern = "refs/virtual/([0-9]+)/(heads|tags)/"
threshold = now
stableThreshold = never
sampleRate = 100
maxMerges = 64

Which would generate pseudo-merge group identifiers like "1234-heads",
and "5678-tags" (for branches in fork "1234", and tags in remote "5678",
respectively).

SEE ALSO
--------
linkgit:git-pack-objects[1]
Expand Down
21 changes: 21 additions & 0 deletions pack-bitmap-write.c
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "trace2.h"
#include "tree.h"
#include "tree-walk.h"
#include "pseudo-merge.h"

struct bitmapped_commit {
struct commit *commit;
Expand All @@ -39,11 +40,25 @@ void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r)
if (writer->bitmaps)
BUG("bitmap writer already initialized");
writer->bitmaps = kh_init_oid_map();
writer->pseudo_merge_commits = kh_init_oid_map();

string_list_init_dup(&writer->pseudo_merge_groups);

load_pseudo_merges_from_config(&writer->pseudo_merge_groups);
}

static void free_pseudo_merge_commit_idx(struct pseudo_merge_commit_idx *idx)
{
if (!idx)
return;
free(idx->pseudo_merge);
free(idx);
}

void bitmap_writer_free(struct bitmap_writer *writer)
{
uint32_t i;
struct pseudo_merge_commit_idx *idx;

if (!writer)
return;
Expand All @@ -55,6 +70,10 @@ void bitmap_writer_free(struct bitmap_writer *writer)

kh_destroy_oid_map(writer->bitmaps);

kh_foreach_value(writer->pseudo_merge_commits, idx,
free_pseudo_merge_commit_idx(idx));
kh_destroy_oid_map(writer->pseudo_merge_commits);

for (i = 0; i < writer->selected_nr; i++) {
struct bitmapped_commit *bc = &writer->selected[i];
if (bc->write_as != bc->bitmap)
Expand Down Expand Up @@ -703,6 +722,8 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer,
}

stop_progress(&writer->progress);

select_pseudo_merges(writer, indexed_commits, indexed_commits_nr);
}


Expand Down
2 changes: 2 additions & 0 deletions pack-bitmap.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ struct bitmap_writer {
struct bitmapped_commit *selected;
unsigned int selected_nr, selected_alloc;

struct string_list pseudo_merge_groups;
kh_oid_map_t *pseudo_merge_commits; /* oid -> pseudo merge(s) */
uint32_t pseudo_merges_nr;

struct progress *progress;
Expand Down
Loading

0 comments on commit faf558b

Please sign in to comment.