Skip to content

Commit

Permalink
Merge branch 'newren:main' into patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
dalito authored Aug 17, 2024
2 parents e268767 + ac50405 commit 42d9976
Show file tree
Hide file tree
Showing 21 changed files with 802 additions and 297 deletions.
7 changes: 7 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "monthly"
6 changes: 3 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ jobs:
fail-fast: false
runs-on: ${{ matrix.os }}-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Setup python
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: 3.x
- name: test
Expand All @@ -33,7 +33,7 @@ jobs:
fi
- name: upload failed tests' directories
if: failure()
uses: actions/upload-artifact@v1
uses: actions/upload-artifact@v4
with:
name: failed-${{ matrix.os }}
path: failed
2 changes: 1 addition & 1 deletion Documentation/Contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ with a few exceptions:
2019 and should not be used. (Commit 4d0264ab723c
("filter-repo: workaround python<2.7.9 exec bug", 2019-04-30)
was the last version of filter-repo that worked with python2).
* You can depend on anything in python 3.5 or earlier. I may bump
* You can depend on anything in python 3.6 or earlier. I may bump
this minimum version over time, but do want to generally work
with the python3 version found in current enterprise Linux
distributions.
Expand Down
11 changes: 3 additions & 8 deletions Documentation/converting-from-filter-branch.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,16 +165,11 @@ such, filter-repo just uses the same mechanism:

```shell
git replace --graft $commit-id $graft-id
git filter-repo --force
git filter-repo --proceed
```

NOTE: --force should usually be avoided unless you have taken care to
make sure you have a backup (or are running on a fresh clone of) your
repo. It is needed in this case because filter-repo errors out when
no arguments are specified, and because it usually first checks
whether you are in a fresh clone before irrecoverably rewriting your
repository (git-replace created a new graft and thus added something
to your previously fresh clone).
NOTE: --proceed is needed here because filter-repo errors out if no
arguments are specified (doing so is usually an error).

### Removing commits by a certain author

Expand Down
198 changes: 130 additions & 68 deletions Documentation/git-filter-repo.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,16 @@ can be overridden, but they are all on by default):
* pruning commits which become empty due to the above filters (also
handles edge cases like pruning of merge commits which become
degenerate and empty)
* stripping of original history to avoid mixing old and new history
* repacking the repository post-rewrite to shrink the repo for the
user

And additional facilities are available via a config option

* creating replace-refs (see linkgit:git-replace[1]) for old commit
hashes, which if manually pushed and fetched will allow users to
continue to refer to new commits using (unabbreviated) old commit
IDs
* stripping of original history to avoid mixing old and new history
* repacking the repository post-rewrite to shrink the repo for the
user

Also, it's worth noting that there is an important safety mechanism:

Expand Down Expand Up @@ -218,17 +221,26 @@ Filtering of names & emails (see also --name-callback and --email-callback)
Parent rewriting
~~~~~~~~~~~~~~~~

--replace-refs {delete-no-add, delete-and-add, update-no-add, update-or-add, update-and-add}::
Replace refs (see linkgit:git-replace[1]) are used to rewrite
parents (unless turned off by the usual git mechanism); this
flag specifies what do do with those refs afterward. Replace
refs can either be deleted or updated to point at new commit
hashes. Also, new replace refs can be added for each commit
rewrite. With 'update-or-add', new replace refs are only
added for commit rewrites that aren't used to update an
existing replace ref. default is 'update-and-add' if
$GIT_DIR/filter-repo/already_ran does not exist;
'update-or-add' otherwise.
--replace-refs {delete-no-add, delete-and-add, update-no-add, update-or-add, update-and-add, old-default}::
How to handle replace refs (see git-replace(1)). Replace refs
can be added during the history rewrite as a way to allow
users to pass old commit IDs (from before git-filter-repo was
run) to git commands and have git know how to translate those
old commit IDs to the new (post-rewrite) commit IDs. Also,
replace refs that existed before the rewrite can either be
deleted or updated. The choices to pass to --replace-refs
thus need to specify both what to do with existing refs and
what to do with commit rewrites. Thus 'update-and-add' means
to update existing replace refs, and for any commit rewrite
(even if already pointed at by a replace ref) add a new
refs/replace/ reference to map from the old commit ID to the
new commit ID. The default is update-no-add, meaning update
existing replace refs but do not add any new ones. There is
also a special 'old-default' option for picking the default
used in versions prior to git-filter-repo-2.45, namely
'update-and-add' upon the first run of git-filter-repo in a
repository and 'update-or-add' if running git-filter-repo
again on a repository.

--prune-empty {always, auto, never}::
Whether to prune empty commits. 'auto' (the default) means
Expand Down Expand Up @@ -288,10 +300,10 @@ Generic callback code snippets
Location to filter from/to
~~~~~~~~~~~~~~~~~~~~~~~~~~

NOTE: Specifying alternate source or target locations implies --partial
except that the normal default for --replace-refs is used. However, unlike
normal uses of --partial, this doesn't risk mixing old and new history
since the old and new histories are in different repositories.
NOTE: Specifying alternate source or target locations implies
--partial. However, unlike normal uses of --partial, this doesn't
risk mixing old and new history since the old and new histories are in
different repositories.

--source <source>::
Git repository to read from
Expand All @@ -317,8 +329,7 @@ Miscellaneous options

--partial::
Do a partial history rewrite, resulting in the mixture of old and
new history. This implies a default of update-no-add for
--replace-refs, disables rewriting refs/remotes/origin/* to
new history. This disables rewriting refs/remotes/origin/* to
refs/heads/*, disables removing of the 'origin' remote, disables
removing unexported refs, disables expiring the reflog, and
disables the automatic post-filter gc. Also, this modifies
Expand Down Expand Up @@ -527,11 +538,13 @@ history rewrite are roughly as follows:
they have to clone a new URL.

* Rewriting history will rewrite tags; those who have already
downloaded tags will not get the updated tags by default (see the
"On Re-tagging" section of linkgit:git-tag[1]). Every user
trying to use an existing clone will have to forcibly delete all
tags and re-fetch them; it may be easier for them to just
re-clone, which they are more likely to do with a new clone URL.
downloaded tags will not get the updated tags by default.
Further, they won't get the updated tags even if they specify
`--tags` to `git fetch` or `git pull` (see the "On Re-tagging"
section of linkgit:git-tag[1]). Every user trying to use an
existing clone will have to forcibly delete all tags _before_
re-fetching them; it may be easier for them to just re-clone,
which they are more likely to do with a new clone URL.

* Rewriting history may delete some refs (e.g. branches that only
had files that you wanted excised from history); unless you run
Expand All @@ -544,61 +557,63 @@ history rewrite are roughly as follows:
`--prune` option as well. Simply re-cloning from a new URL is
easier.

* The server may not allow you to force push over some refs.
For example, code review systems may have special ref
namespaces (e.g. refs/changes/, refs/pull/,
refs/merge-requests/) that they have locked down.
* The server may not allow you to force push over some refs. For
example, code review systems may have special ref namespaces
(e.g. refs/changes/, refs/pull/, refs/merge-requests/) that they
have locked down, and you'll need to somehow prevent users from
merging those locked-down (and thus not cleaned up) histories
with your cleaned-up history. Every software code review system
handles this differently (see below for some links).

5. If you still want to push your rewritten history back to the
original url despite my warnings above, you'll have to manage it
very carefully:

* git-filter-repo deletes the "origin" remote to help avoid people
accidentally repushing to the same repository, so you'll need to
remind git what origin's url was. You'll have to look up the
command for that.
remind git what origin's url was.

* You'll need to carefully synchronize with *everyone* who has
cloned the repository, and will also need to carefully
synchronize with *everything* (e.g. CI systems) that has cloned
it. Every single clone will either need to be thrown away and
re-cloned, or need to take all the steps outlined in item 4 as
well as follow the necessary steps from "RECOVERING FROM UPSTREAM
REBASE" section of linkgit:git-rebase[1]. If you miss fixing any
clones, you'll risk mixing old and new history and end up with an
even worse mess to clean up.
cloned the repository (including forks on various software forges
and clones thereof), and will also need to carefully synchronize
with *everything* (e.g. CI systems) that has cloned it. Every
single clone will either need to be thrown away and re-cloned, or
need to take all the steps outlined in item 4 as well as follow
the necessary steps from "RECOVERING FROM UPSTREAM REBASE"
section of linkgit:git-rebase[1]. If you miss fixing any clones,
you'll risk mixing old and new history and end up with an even
worse mess to clean up.

* Finally, you'll need to consult any documentation from your
hosting provider about how to remove any server-side references
to the old commits (example:
https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html[GitLab's
excellent docs on reducing repository size], or just the warning
box that references "GitHub support" from
https://docs.github.com/en/github/authenticating-to-github/removing-sensitive-data-from-a-repository[GitHub's
otherwise dangerously out-of-date docs on removing sensitive
data]).
docs on reducing repository size], or
https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository#fully-removing-the-data-from-github[the
first and second steps under "Fully removing the data from
GitHub"]).

6. (Optional) Some additional considerations

* filter-repo by default creates replace refs (see
linkgit:git-replace[1]) for each rewritten commit ID, allowing
you to use old (unabbreviated) commit hashes in the git command
line to refer to the newly rewritten commits. If you want to use
these replace refs, manually push them to the relevant clone URL
and tell users to manually fetch them (e.g. by adjusting their
fetch refspec, `git config --add remote.origin.fetch
+refs/replace/*:refs/replace/*`). Sadly, replace refs are not
yet widely understood; projects like jgit and libgit2 do not
support them and existing repository managers (e.g. Gerrit,
GitHub, GitLab) do not yet understand replace refs. Thus one
can't use old commit hashes within the UI of these other systems.
This may change in the future, but replace refs at least help
users locally within the git command line interface. Also, be
aware that commit-graphs are excessively cautious around replace
refs and just turn off entirely if any are present, so after
enough time has passed that old commit IDs become less relevant,
users may want to locally delete the replace refs to regain the
speedups from commit-graphs.
* filter-repo has a --replace-refs option to allow creating replace
refs (see linkgit:git-replace[1]) for each rewritten commit ID,
allowing you to use old (unabbreviated) commit hashes in the git
command line to refer to the newly rewritten commits. If you
want to use these replace refs, manually push them to the
relevant clone URL and tell users to manually fetch them (e.g. by
adjusting their fetch refspec, `git config --add
remote.origin.fetch +refs/replace/*:refs/replace/*`). Sadly,
replace refs are not yet widely understood; projects like jgit
and libgit2 do not support them and existing repository managers
(e.g. Gerrit, GitHub, GitLab) do not yet understand replace refs.
Thus one can't use old commit hashes within the UI of these other
systems. This may change in the future, but replace refs at
least help users locally within the git command line interface.
Also, be aware that commit-graphs are excessively cautious around
replace refs and just turn off entirely if any are present, so
after enough time has passed that old commit IDs become less
relevant, users may want to locally delete the replace refs to
regain the speedups from commit-graphs.

* If you have a central repo, you may want to prevent people
from pushing old commit IDs, in order to avoid mixing old
Expand All @@ -607,6 +622,51 @@ history rewrite are roughly as follows:
(e.g. https://gerrit-review.googlesource.com/Documentation/cmd-ban-commit.html),
others require you to write hooks.

Why is my origin removed?
~~~~~~~~~~~~~~~~~~~~~~~~~

When you rewrite history, all commit IDs (starting with the first one
where changes are made) are modified. Even if you think you didn't
change an intermediate commit, the fact that you changed any of its
ancestors is also a change that counts and will cause a commit's ID to
change as well. It is unfortunately all-too-easy for yourself or
someone else to accidentally merge the old ugly history you were
trying to rewrite with the new history, resulting in not only the old
ugly history returning but getting you "two copies" of each commit
(both an original commit and a cleaned-up alternative), and thus
doubling the number of commits in your repository. In short, you end
up with an even bigger mess to clean up than you started with.

This happens frequently to people using `git filter-branch` or `BFG
repo cleaner`, and can happen to folks using `git filter-repo` if they
insist on pushing back to the original repo. Example ways you can get
such an even uglier history include:

* at the command line (of another clone of the same repo from before the
cleanup): "git pull && git push"
* in a software forge: "reopen old Pull-Request/Merge-Request/Code-Review
and hit the merge/submit button"

Removing the `origin` remote and suggesting people push to a new repo
(and ensuring they tell others to clone the new repo) is usually a
good forcing function to avoid these problems. But, if people really
want to push to the original repository despite these warnings, it is
trivial to do so; simply run:

* `git remote add origin $ORIGINAL_CLONE_URL`

and then you can push (e.g. `git push --force --branches --tags
--prune`). Since removing the origin url is such a cheap way to
potentially prevent big messes, and it's so easy to work around for
those that really do want to push back over the original history,
removing the origin url is a great safety measure that I employ.

One final warning if you really want to push back to the original
repo: there are more details about the kinds of messes that pushing to
the original repo can lead to (and what you'd need to do to avoid
those messes) in items #4 and #5 earlier in this DISCUSSION section.
Please read those first.

[[EXAMPLES]]
EXAMPLES
--------
Expand Down Expand Up @@ -981,7 +1041,7 @@ rewrite history to make it permanent:

--------------------------------------------------
git replace $commit_A $commit_B
git filter-repo --force
git filter-repo --proceed
--------------------------------------------------

To create a new commit with the same contents as $commit_A except with
Expand All @@ -990,12 +1050,14 @@ and rewrite history to make it permanent:

--------------------------------------------------
git replace --graft $commit_A $new_parent_or_parents
git filter-repo --force
git filter-repo --proceed
--------------------------------------------------

The reason to specify --force is two-fold: filter-repo will error out
if no arguments are specified, and the new graft commit would
otherwise trigger the not-a-fresh-clone check.
The `--proceed` option is needed to avoid failing the "no arguments
specified" check. Note that older versions of git-filter-repo
required `--force` to be passed after creating a graft to avoid
triggering the not-a-fresh-clone check; that check has been modified
to remove this overuse of `--force`.

Partial history rewrites
~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
Loading

0 comments on commit 42d9976

Please sign in to comment.