Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New git invocation changes the behaviour of source-repository-package #10605

Open
Kleidukos opened this issue Nov 28, 2024 · 27 comments
Open

New git invocation changes the behaviour of source-repository-package #10605

Kleidukos opened this issue Nov 28, 2024 · 27 comments

Comments

@Kleidukos
Copy link
Member

Building https://github.com/Kleidukos/get-tested/ used to work perfectly, with a source-repository-package that reads:

source-repository-package
  type: git
  location: https://github.com/goodlyrottenapple/tasty-test-reporter 
  tag: b704130

b704130.

However, with the latest cabal-head, I get this result:

❯ cabal build                          
Warning: this is a debug build of cabal-install with assertions enabled.
Created semaphore called cabal_semaphore_0 with 2 slots.
Cloning into '/home/hecate/Projects/get-tested/dist-newstyle/src/tasty-tes_-b16926c1aaf24d94e61a2dd23f73b43c646945927c2a595c777e30f60e9d2040'...
remote: Enumerating objects: 24, done.
remote: Counting objects: 100% (24/24), done.
remote: Compressing objects: 100% (18/18), done.
remote: Total 24 (delta 0), reused 17 (delta 0), pack-reused 0 (from 0)
Receiving objects: 100% (24/24), 230.02 KiB | 4.60 MiB/s, done.
fatal: couldn't find remote ref b704130

And with the long hash:

Warning: this is a debug build of cabal-install with assertions enabled.
Created semaphore called cabal_semaphore_0 with 2 slots.
From https://github.com/goodlyrottenapple/tasty-test-reporter
 * branch            b704130545aa3925a8487bd3e92f1dd5ce0512e2 -> FETCH_HEAD
warning: refname 'b704130545aa3925a8487bd3e92f1dd5ce0512e2' is ambiguous.
Git normally never creates a ref that ends with 40 hex characters
because it will be ignored when you just specify 40-hex. These refs
may be created by mistake. For example,

  git switch -c $br $(git rev-parse ...)

where "$br" is somehow empty and a 40-hex ref is created. Please
examine these refs and maybe delete them. Turn this message off by
running "git config advice.objectNameWarning false"
HEAD is now at b704130 add cabal file
Configuration is affected by the following files:
- cabal.project
Resolving dependencies...
Build profile: -w ghc-9.10.1 -O2
In order, the following will be built (use -v for more details):
 - ansi-terminal-1.1.2 (lib) (requires build)
 - cabal-doctest-1.0.11 (lib) (requires download & build)

@alt-romes I believe you're the last person to touch how cabal-install calls git. Could you please investigate this?

@alt-romes
Copy link
Collaborator

Indeed, this was unfortunately not caught by the testsuite. @9999years is fixing this in #10586

@Kleidukos
Copy link
Member Author

Excellent!

@9999years
Copy link
Collaborator

@Kleidukos Thank you, this is a useful test case!

@philderbeast
Copy link
Collaborator

@Kleidukos, I'm seeing this with https://github.com/Kleidukos/print-api too, and #10259, the API checking, relies on having that tool installed. @geekosaur I was looking to maybe have the makefile also install print-api. We don't yet have something like cabal run https://github.com/Kleidukos/print-api do we?

@9999years
Copy link
Collaborator

My bisect points at #10254 introducing the problem.

Output comparison

Before #10254, Cabal ran:

Running: git clone --no-checkout 'https://github.com/goodlyrottenapple/tasty-test-reporter' dist-newstyle/src/tasty-tes_-b16926c1aaf24d94e61a2dd23f73b43c646945927c2a595c777e30f60e9d2040
Cloning into 'dist-newstyle/src/tasty-tes_-b16926c1aaf24d94e61a2dd23f73b43c646945927c2a595c777e30f60e9d2040'...
remote: Enumerating objects: 143, done.
remote: Counting objects: 100% (62/62), done.
remote: Compressing objects: 100% (24/24), done.
remote: Total 143 (delta 43), reused 44 (delta 35), pack-reused 81 (from 1)
Receiving objects: 100% (143/143), 243.63 KiB | 9.37 MiB/s, done.
Resolving deltas: 100% (49/49), done.
Running: git submodule deinit --force --all
Running: git reset --hard b704130 --
HEAD is now at b704130 add cabal file
Running: git submodule sync --recursive
Running: git submodule update --force --init --recursive
Running: git submodule foreach --recursive 'git clean -ffxdq'
Running: git clean -ffxdq

After, Cabal ran:

Running: git clone '--depth=1' --no-checkout 'https://github.com/goodlyrottenapple/tasty-test-reporter' dist-newstyle/src/tasty-tes_-b16926c1aaf24d94e61a2dd23f73b43c646945927c2a595c777e30f60e9d2040
Cloning into 'dist-newstyle/src/tasty-tes_-b16926c1aaf24d94e61a2dd23f73b43c646945927c2a595c777e30f60e9d2040'...
remote: Enumerating objects: 24, done.
remote: Counting objects: 100% (24/24), done.
remote: Compressing objects: 100% (18/18), done.
remote: Total 24 (delta 0), reused 17 (delta 0), pack-reused 0 (from 0)
Receiving objects: 100% (24/24), 230.02 KiB | 1.37 MiB/s, done.
Running: git submodule deinit --force --all
Running: git fetch origin b704130
fatal: couldn't find remote ref b704130

I think what's happening here is:

  • The git clone --depth=1 means that the b704130 "tag" (not a tag!) may not be fetched, so
  • We have to run git fetch origin b704130 to fetch it, but
  • b704130 is not a valid refspec, so git fetch origin b704130 fails.

According to man git-fetch:

<src> is typically a ref, but it can also be a fully spelled hex object name.

Abbreviated refs are not allowed.

I have a couple thoughts on this:

  1. Personally, I don't think tag: b704130 should be accepted at all. Abbreviated commit hashes are not guaranteed to be unique, so this can shift out from under you! (Being able to specify a branch is also weird to me.)
  2. This is why systems like Nix separate the ref (which is git fetched) from the rev (which we git reset --hard to).
  3. We could have Cabal ignore if a git fetch ... fails, as long as it can git reset to the specified branch/tag, but that seems a bit sketchy.

@9999years
Copy link
Collaborator

I'll note that my PR #10586 doesn't fix this because it doesn't change the underlying git fetch command.

@Kleidukos
Copy link
Member Author

I too am partial to forbidding abbreviated commit hashes, but I'd like this to undergo a proper deprecation cycle.

If abbreviated hashes are too coupled to shallow clones, then let's revert the shallow cloning bits, and broadcast that in 3 major releases, we'll deprecate the use of abbreviated commit hashes in order to pull more efficiently from git sources.

@geekosaur
Copy link
Collaborator

I was surprised we actually allowed abbreviated hashes in the first place, because as mentioned previously they aren't necessarily unique. Our own repo is up to requiring 9 digits to be unique, per git describe (which shows a minimum of 7, hence the common use of 7-digit short hashes).

@9999years
Copy link
Collaborator

I am once again wishing for something like Rust's crater to let us attempt to build/test everything on Hackage with cabal-head.

@9999years
Copy link
Collaborator

If abbreviated hashes are too coupled to shallow clones, then let's revert the shallow cloning bits ...

I haven't checked, but it might be non-trivial to revert #10254 because of the work I've been doing to improve the Git support since then. We could attempt a roll-forward fix (reimplementing the non-shallow logic without reverting the specific commit), which is a little riskier but lets us keep the other improvements.

What do you think?

@mpickering
Copy link
Collaborator

I think #10254 fixed an important performance bug, it took a very long time to clone big git repos before this change, so please at least leave the option of good performance if there is some kind of revert.

3 releases seems like a very slow timeline to migrate, I think if there is a deprecation period it should be at most one release cycle. I do appreciate that things randomly breaking is annoying.

  • Either a good error message which explains you are no longer allowed to use branch names or references. This isn't documented as possible AFAIK, and as pointed out, not a good idea anyway as there references are not stable.
  • Or a deprecation over a single release cycle.

@9999years You can use head.hackage infrastructure to test building packages with cabal-head. That is what we did here when migrating packages to use the Hooks build type. (https://gitlab.haskell.org/mpickering/hooks-setup-testing/)

@Kleidukos
Copy link
Member Author

What do you think?

During yesterday's meeting we agreed to proceed forward with a fix that lets us keep the parallel cloning, and avoid the headache of reverting.

@Kleidukos
Copy link
Member Author

3 releases seems like a very slow timeline to migrate, I think if there is a deprecation period it should be at most one release cycle. I do appreciate that things randomly breaking is annoying.

Your input is heard, but three releases is the perfect amount of time to let projects who are not actively following the development of cabal to migrate without trouble. A huge criticism of the Haskell ecosystem is that it's hard to come to a project a year later with an updated toolchain, because so many things have to be changed. We have a draft process for deprecation (https://github.com/haskell/cabal/wiki/Deprecation-Process).

That being said, nothing prevents us to release experimental features behind options in the config file. This would allow us to release new features that provoke breaking changes in a way that is opt-in. @ffaf1 is studying how other ecosystems to do such a thing and will submit a draft proposal.

@philderbeast
Copy link
Collaborator

I'm with @mpickering on this. I sufferred long waits cloning https://github.com/brendanhay/amazonka. As for abbreviated hashes, why would you do that? Copying the whole hash is easier anyway, isn't it?

Tip

The navigation on GitHub at least makes it very easy to copy the last segment of the URL with the hash. They also provide a button to copy the whole hash when looking at the history.

https://github.com/brendanhay/amazonka/commit/30b200d487bccd2568de1257a812a464270d0096

@Kleidukos
Copy link
Member Author

Kleidukos commented Dec 6, 2024

@philderbeast As said above, the consensus is not to revert the parallel cloning but to fix the handling of hashes instead #10605 (comment)

To answer your question:

As for abbreviated hashes, why would you do that?

Because we allow it.

@philderbeast
Copy link
Collaborator

philderbeast commented Dec 6, 2024

To put numbers on the wait time for cloning amazonka, 78s versus 15s. This is a great improvement ❤️.

$ git diff
diff --git a/cabal.project b/cabal.project
index e368c280c..0973733ce 100644
--- a/cabal.project
+++ b/cabal.project
@@ -4,3 +4,11 @@ import: project-cabal/pkgs.config
 import: project-cabal/constraints.config
 
 tests: True
+
+source-repository-package
+    type: git
+    location: [email protected]:brendanhay/amazonka.git
+    tag: 30b200d487bccd2568de1257a812a464270d0096
+    subdir:
+        lib/amazonka-core
+        lib/services/amazonka-s3
$ ~/.ghcup/bin/cabal --numeric-version
3.12.1.0

$ time ~/.ghcup/bin/cabal build all --dry-run
Cloning into '/.../cabal/dist-newstyle/src/amazonka-c0ef6475e6367285'...
remote: Enumerating objects: 1013248, done.
remote: Counting objects: 100% (103137/103137), done.
remote: Compressing objects: 100% (7723/7723), done.
remote: Total 1013248 (delta 95696), reused 96960 (delta 95052), pack-reused 910111 (from 1)
Receiving objects: 100% (1013248/1013248), 1.25 GiB | 25.16 MiB/s, done.
Resolving deltas: 100% (853396/853396), done.
Updating files: 100% (62943/62943), done.
HEAD is now at 30b200d487 Merge pull request #1007 from kfigiela/patch-1
...
________________________________________________________
Executed in   77.76 secs    fish           external
   usr time  108.44 secs  411.00 micros  108.44 secs
   sys time    6.10 secs  140.00 micros    6.10 secs
$ git rev-parse HEAD
1f5296323f4c1d9869dcda9d418299f71eeb1258

$ cabal install cabal-install:exe:cabal --overwrite-policy=always

$ cabal --numeric-version
3.15.0.0

$ time cabal build all --dry-run
Warning: this is a debug build of cabal-install with assertions enabled.
Cloning into '/.../cabal/dist-newstyle/src/amazonka-cbab8c6b7ecbf67447929ec5b22d2f18f755d16032163c46ff74c158ef71a508'...
remote: Enumerating objects: 41238, done.
remote: Counting objects: 100% (41238/41238), done.
remote: Compressing objects: 100% (16150/16150), done.
Receiving objects: 100% (41238/41238), 44.26 MiB | 18.98 MiB/s, done.
remote: Total 41238 (delta 24838), reused 29966 (delta 23522), pack-reused 0 (from 0)
Resolving deltas: 100% (24838/24838), done.
From github.com:brendanhay/amazonka
 * branch              30b200d487bccd2568de1257a812a464270d0096 -> FETCH_HEAD
warning: refname '30b200d487bccd2568de1257a812a464270d0096' is ambiguous.
Git normally never creates a ref that ends with 40 hex characters
because it will be ignored when you just specify 40-hex. These refs
may be created by mistake. For example,

  git switch -c $br $(git rev-parse ...)

where "$br" is somehow empty and a 40-hex ref is created. Please
examine these refs and maybe delete them. Turn this message off by
running "git config advice.objectNameWarning false"
Updating files: 100% (62943/62943), done.
HEAD is now at 30b200d4 Merge pull request #1007 from kfigiela/patch-1

...
____________________________________________________
Executed in   14.89 secs    fish           external
   usr time    5.78 secs    0.00 micros    5.78 secs
   sys time    1.29 secs  579.00 micros    1.29 secs

@Kleidukos
Copy link
Member Author

@philderbeast fully agree with you, this is a very good thing.

@9999years
Copy link
Collaborator

During yesterday's meeting we agreed to proceed forward with a fix that lets us keep the parallel cloning, and avoid the headache of reverting.

I thought the issue here was with shallow clones, not parallelism. Did I misunderstand?

@geekosaur
Copy link
Collaborator

geekosaur commented Dec 6, 2024

The same PR addresses both issues, which is part of why reverting it isn't a good idea. In retrospect, it should probably have been split.

@Kleidukos
Copy link
Member Author

Yes sorry, without context it's of course confusing. Hence why proceeding forward with a fix without having to revert the whole thing looks preferable.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Dec 6, 2024

I am once again wishing for something like Rust's crater to let us attempt to build/test everything on Hackage with cabal-head.

You can use https://github.com/haskell/clc-stackage

@geekosaur
Copy link
Collaborator

I'll note also that "everything on Hackage" includes a lot of ancient stuff which won't build with modern (or even recentish) ghc.

@9999years
Copy link
Collaborator

Two important differences between crater and the proposed approaches:

  1. You can leave a comment with @craterbot check on a PR and have everything run automatically. A job will be created, scheduled, run, and the report will be uploaded automatically.
  2. Crater keeps track of failures so that it only notifies you of new regressions. Knowing how to exclude the "ancient stuff which won't build with modern ghc" is a critical part of this problem and it's a big part of the reason head.hackage isn't very useful in its current form.

Here's a recent report from crater: https://crater-reports.s3.amazonaws.com/beta-1.74-1/index.html

Note a few things about the output:

  • Errors that also errored in the previous run are in a separate section.
  • Errors are grouped by error code, so you can see that e.g. that run introduced regressions which added 5 new errors with code E0599.
  • Test suites are run for each package.
  • The report can distinguish between build failures in dependencies and build failures in a package itself.

9999years added a commit to MercuryTechnologies/cabal that referenced this issue Dec 6, 2024
9999years added a commit to 9999years/cabal that referenced this issue Dec 9, 2024
9999years added a commit to 9999years/cabal that referenced this issue Dec 9, 2024
@alt-romes
Copy link
Collaborator

The same PR addresses both issues, which is part of why reverting it isn't a good idea. In retrospect, it should probably have been split.

FWIW, the two changes are done cleanly in two separate commits, even if they were merged in the same PR.

I don't think we should give up on shallow clones altogether. Why not simply:

  • Do a shallow clone when the full hash is used
  • Do a full clone only otherwise

This allows us to fix the regression without pessimising cloning for the majority using full hashes?

alt-romes pushed a commit to alt-romes/cabal that referenced this issue Dec 13, 2024
When the `tag` of a `source-repository-stanza` is not a full hash,
we cannot do a shallow clone and fetch it separately.

Therefore, in those cases, we fall back to doing a full clone.

Fixes haskell#10605

Includes a regression test by @9999years.

Co-authored-by: Rebecca Turner <[email protected]>
alt-romes pushed a commit to alt-romes/cabal that referenced this issue Dec 13, 2024
When the `tag` of a `source-repository-stanza` is not a full hash,
we cannot do a shallow clone and fetch it separately.

Therefore, in those cases, we fall back to doing a full clone.

Fixes haskell#10605

Includes a regression test by @9999years.

Co-authored-by: Rebecca Turner <[email protected]>
alt-romes pushed a commit to alt-romes/cabal that referenced this issue Dec 13, 2024
When the `tag` of a `source-repository-stanza` is not a full hash,
we cannot do a shallow clone and fetch it separately.

Therefore, in those cases, we fall back to doing a full clone.

Fixes haskell#10605

Includes a regression test by @9999years.

Co-authored-by: Rebecca Turner <[email protected]>
@telser
Copy link

telser commented Dec 15, 2024

3 releases seems like a very slow timeline to migrate, I think if there is a deprecation period it should be at most one release cycle. I do appreciate that things randomly breaking is annoying.

Not everyone upgrades to every release. A single release with a deprecation does not give any warning for those users who might jump a few versions. This kind of things suddenly don't work is worse than annoying for some users, it drives them out of the ecosystem entirely.

@philderbeast
Copy link
Collaborator

  • Do a shallow clone when the full hash is used

I'm a full hash guy myself and would be very happy to have the 5x speed improvement.

@Kleidukos
Copy link
Member Author

And the advertised speed improvements will certainly motivate people to migrate to full hashes! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants