Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Irreproducibility: Code rebuilds only the first time I switch resolvers #5381

Closed
nh2 opened this issue Sep 8, 2020 · 17 comments
Closed

Irreproducibility: Code rebuilds only the first time I switch resolvers #5381

nh2 opened this issue Sep 8, 2020 · 17 comments

Comments

@nh2
Copy link
Collaborator

nh2 commented Sep 8, 2020

When I switch between lts-14.27 and lts-13.26 repeatedly, stack rebuilds my code only on the first such switch.

This is illegitmate because my executable depends on (at least) conduit-extra, and that one's version switches between those LTSs.

Background / how I found this

I'm debugging an issue with my library:

The project for reproing (on Ubuntu 18.04): https://github.com/nh2/lz4-frame-conduit/tree/4616ec594426de2defda6432eec9cff04692bf7c

Command to run:

stack test && while (.stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test); do sleep 0.01; done

When I test after stack clean, my bug (a segfault or GHC panic) occurs with

resolver: lts-13.26

and not with

resolver: lts-14.27

The actual Stack issue

But when I switch between the 2 and run stack test without previous stack clean, most of the time stack doesn't rebuild the code, so then I cannot observe the difference when switching resolvers.

I get this strange behaviour:

stack clean
stack test # builds from scratch
stack test # naturally builds nothing

# I switch resolvers, lts-13.26 -> lts-14.27

stack test
# Rebuilds code; outputs:
#     lz4-frame-conduit-0.1.0.0: unregistering (local file changes: README.md app/Main.hs lz4-frame-conduit.cabal lz4/lib/lz4.c lz4/lib/lz4.h lz4/lib/lz4frame.c lz4/...)
# Why is that? None of these files changed.

# I switch resolvers back, lts-14.27 -> lts-13.26

stack test # Does not rebuild! No further resolver switching rebuilds.

So it rebuilds only the first time I switch resolvers (but for apparently the wrong reason), and then never again.

@nh2
Copy link
Collaborator Author

nh2 commented Sep 8, 2020

stack test -v output leading to the local file changes: README.md ...:

[debug] Run process: /sbin/ldconfig -p
[debug] Process finished in 2ms: /sbin/ldconfig -p
[debug] Found shared library libtinfo.so.5 in 'ldconfig -p' output
[debug] Did not find shared library libtinfo.so.6
[debug] Did not find shared library libncursesw.so.6
[debug] Found shared library libgmp.so.10 in 'ldconfig -p' output
[debug] Did not find shared library libgmp.so.3
[debug] Potential GHC builds: standard
[debug] Found already installed GHC builds: standard
[debug] SELECT "id","actual_version","arch","ghc_path","ghc_size","ghc_modified","ghc_pkg_path","runghc_path","haddock_path","cabal_version","global_db","global_db_cache_size","global_db_cache_modified","info","global_dump" FROM "compiler_cache" WHERE "ghc_path"=?; [PersistText "/raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-8.6.5"]
[debug] Loaded compiler information from cache
[debug] Asking for a supported GHC version
[debug] Resolving package entries
[debug] Parsing the targets
[debug] Checking flags
[debug] SourceMap constructed
[debug] Starting to execute command inside EnvConfig
[debug] Finding out which packages are already installed
[debug] Run process: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --global --no-user-package-db dump --expand-pkgroot
[debug] Process finished in 27ms: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --global --no-user-package-db dump --expand-pkgroot
[debug] Ignoring package Cabal due to wanting version 2.4.1.0 instead of 2.4.0.1
[debug] Ignoring package haskeline due to wanting version 0.7.5.0 instead of 0.7.4.3
[debug] Run process: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --user --no-user-package-db --package-db /raid/stack/snapshots/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/pkgdb dump --expand-pkgroot
[debug] Process finished in 81ms: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --user --no-user-package-db --package-db /raid/stack/snapshots/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/pkgdb dump --expand-pkgroot
[debug] Run process: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --user --no-user-package-db --package-db /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/pkgdb dump --expand-pkgroot
[debug] Process finished in 19ms: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --user --no-user-package-db --package-db /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/pkgdb dump --expand-pkgroot
[debug] Constructing the build plan
[debug] SELECT "id","directory","type","pkg_src","active","path_env_var","haddock" FROM "config_cache" WHERE "directory"=? AND "type"=?; [PersistText "/home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/",PersistText "lib:lz4-frame-conduit-0.1.0.0-8RAA6BcdNm21R7aXlL5jl8"]
[debug] SELECT "id", "config_cache_id", "index", "option" FROM "config_cache_dir_option" WHERE ("config_cache_id"=?) ORDER BY "index"; [PersistInt64 2035]
[debug] SELECT "id", "config_cache_id", "index", "option" FROM "config_cache_no_dir_option" WHERE ("config_cache_id"=?) ORDER BY "index"; [PersistInt64 2035]
[debug] SELECT "id", "config_cache_id", "ghc_pkg_id" FROM "config_cache_dep" WHERE ("config_cache_id"=?); [PersistInt64 2035]
[debug] SELECT "id", "config_cache_id", "component" FROM "config_cache_component" WHERE ("config_cache_id"=?); [PersistInt64 2035]
[debug] Start: getPackageFiles /home/niklas/src/haskell/lz4-frame-conduit/lz4-frame-conduit.cabal
[debug] Finished in 2ms: getPackageFiles /home/niklas/src/haskell/lz4-frame-conduit/lz4-frame-conduit.cabal
[debug] Checking if we are going to build multiple executables with the same name
[debug] Executing the build plan
[info] lz4-frame-conduit-0.1.0.0: unregistering (local file changes: README.md app/Main.hs lz4-frame-conduit.cabal lz4/lib/lz4.c lz4/lib/lz4.h lz4/lib/lz4frame.c lz4/...)

@ardamose123
Copy link
Contributor

ardamose123 commented Sep 15, 2020

I'll try to repro, but as a first hunch, I guess it's a build cache.

If you look at the .stack-work folder in your project after building it (e.g. by using stack test), there are a lot of files that result from past compilations, sorted by Cabal version in the dist folder and GHC version in the install folder.

Running stack clean wipes out the dist folder. The install folder remains with its wacky hashes.

This is the output I get from building one of my pet projects:

Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/flattendir-test/flattendir-test ...
Preprocessing executable 'flattendir-exe' for flattendir-0.1.0.0..
Building executable 'flattendir-exe' for flattendir-0.1.0.0..
[1 of 2] Compiling Main
[2 of 2] Compiling Paths_flattendir
Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/flattendir-exe/flattendir-exe ...
flattendir> copy/register
Installing library in [...]/flattendir/.stack-work/install/x86_64-linux/9570ddfa5c53856cf854aff6bc38a117a2b2ccd9c89d93289e368b54d0c78fa9/8.6.5/lib/x86_64-linux-ghc-8.6.5/flattendir-0.1.0.0-HGHxHxXKkpZLfvbfb4LpUM
Installing executable flattendir-exe in [...]/flattendir/.stack-work/install/x86_64-linux/9570ddfa5c53856cf854aff6bc38a117a2b2ccd9c89d93289e368b54d0c78fa9/8.6.5/bin
Registering library for flattendir-0.1.0.0..

The last lines indicate the compilation results are kept in the install folder, followed by the arch, followed by a hash of some sort, followed by the GHC version, followed by other stuff. This way, it may keep build results generated through both resolvers. Apparently, stack is reusing the cached stuff instead of building anew, which is desirable.

Now, the issue with your tests is interesting. It should not cache a failed build... maybe? Assuming that's the issue. I'll try to check what's going on.

@qrilka
Copy link
Contributor

qrilka commented Sep 15, 2020

The thing that the project doesn't get rebuilt is explained by the fact that 2 LTSes give different implicit snapshots and no config change make it possible for them to coexist even for project packages. But probably it's not something intuitive as e.g. switch between a normal build and a test build invalidates project packages.
The one thing which clearly needs an investigation is this thing:

#     lz4-frame-conduit-0.1.0.0: unregistering (local file changes: README.md app/Main.hs lz4-frame-conduit.cabal lz4/lib/lz4.c lz4/lib/lz4.h lz4/lib/lz4frame.c lz4/...)
# Why is that? None of these files changed.

I started looking into that but had no time yet.

@ardamose123
Copy link
Contributor

I cloned the repo. Noticed it already had lts-13.26 as resolver.

After running stack test and switching back and forth between the resolvers, I noticed no difference between the test outputs. I couldn't observe the GHC panic/segfault. I didn't see the unregistering thing in any of the runs. The tests run with both resolvers with pretty much the same output.

The project relies on C stuff and a *.cabal file instead of a package.yaml.

I'm using Ubuntu 18.04.4, stack 2.3.3, and gcc 7.5.0 if that's of some use.

Minor finding: when switching resolvers, the test compresses 1MB ByteString fails with a hClose: resource vanished (Broken pipe) message, but when I ran the test again without switching resolvers, the test fails with an expectation mismatch between a large string of * and an empty string.

@nh2
Copy link
Collaborator Author

nh2 commented Sep 15, 2020

@ardamose123 Thanks for letting us know! That's interesting that it didn't unregister for you.

(On the point of the project's crash, which is unrelated of the stack issue: Yes, resource vanished is one of the ways that the underlying memory corruption can manifest -- I learned by now that it doesn't necessarily always segfault, see https://gitlab.haskell.org/ghc/ghc/-/issues/18680 for details on that.)

@ardamose123
Copy link
Contributor

Correction: I just saw the unregistering line by running stack test -v. I was not running tests with the -v flag before. Only happens when switching resolvers. This line doesn't appear if you run the tests again without changing resolvers.

However, as I mentioned before, I can build the project alright and run the tests while switching between resolvers, so it might not be related to the original issue.

@nh2, can you please describe your development environment w/ versions?

@nh2
Copy link
Collaborator Author

nh2 commented Sep 15, 2020

I just saw the unregistering line by running stack test -v

@ardamose123 Do you also see local file changes: README.md app/Main.hs lz4-frame-conduit.cabal lz4/lib/lz4.c lz4/lib/lz4.h lz4/lib/lz4frame.c lz4/... in that case?

Also, when you switch resolvers, does it unregister each time, or only the first time (as it does for me)?

(For the purpose of this bug, I don't worry much about the test output, as that's either a bug in my library's code or a GHC bug. I do worry about stack's rebuilding/unregistering behaviour though.)

@ardamose123
Copy link
Contributor

Only first time, just after switching resolvers.

@nh2
Copy link
Collaborator Author

nh2 commented Sep 15, 2020

Only first time, just after switching resolvers.

just for clarity, I mean when you switch resolvers forward and backward repeatedly.

@ardamose123
Copy link
Contributor

Just ran the tests a few more times. I only noticed the unregister after using stack clean:

  1. Ran stack clean
  2. Use resolver: lts-13.26
  3. Ran stack test -v » unregister appears.
  4. Ran stack test -v » logs are normal.
  5. Changed to resolver: lts-14.27
  6. Ran stack test -v » unregister appears.
  7. Ran stack test -v » logs are normal.
  8. Changed to resolver: lts-13.26
  9. Ran stack test -v » logs are normal.
  10. Ran stack test -v » logs are normal.
  11. Changed to resolver: lts-14.27
  12. Ran stack test -v » logs are normal.
  13. Ran stack test -v » logs are normal.
  14. Ran stack clean
  15. Ran stack test -v » unregister appears.

I thought I was getting the unregistering on every resolver switch, but I probably did a a stack clean in between my tests earlier.

This behavior LGTM, but I don't know if @nh2 is getting some other behavior.

@qrilka
Copy link
Contributor

qrilka commented Sep 15, 2020

@nh2 I think I see the reason behind local file changes: README.md app/Main.hs... - after stack clean there is no build cache yet and thus Stack reports every package file as dirty, I think we just need to handle this case better and produce some more meaningful message. With that fixed I think there is nothing more to explore in this ticket. Let me know if you see something else unresolved.

@nh2
Copy link
Collaborator Author

nh2 commented Sep 16, 2020

@qrilka The main problem was this:

# I switch resolvers back, lts-14.27 -> lts-13.26

stack test # Does not rebuild! No further resolver switching rebuilds.

where the expectation was that it should rebuild because the dependent libraries change.

I don't think we've found a solution/explanation for that yet (or did I miss it?).

@ardamose123 When you say "logs are normal", does it rebuild + relink the executable after switching resolvers?

@qrilka
Copy link
Contributor

qrilka commented Sep 16, 2020

@nh2 see my answer in #5381 (comment) - if you don't do stack clean you'll have 2 versions of your package built for 2 different implicit packages (one for each LTS).

@nh2
Copy link
Collaborator Author

nh2 commented Sep 16, 2020

@qrilka Ah sorry, I missed that. I don't quite follow it though. I also don't quite understand @ardamose123's preceding comment about

results are kept in the install folder

because I'm doing nothing with install/, I'm using the contents of the build/ directory.

I still do observe that switching resolvers leaves outdated contents in the build directory:

Click to expand full log output
% git checkout 4616ec594426de2defda6432eec9cff04692bf7c

% stack test

lz4-frame-conduit-0.1.0.0: unregistering (local file changes: README.md app/Main.hs lz4-frame-conduit.cabal lz4/lib/lz4.c lz4/lib/lz4.h lz4/lib/lz4frame.c lz4/...)
Building all executables for `lz4-frame-conduit' once. After a successful build of all of them, only specified executables will be rebuilt.
lz4-frame-conduit> configure (lib + exe + test)
Configuring lz4-frame-conduit-0.1.0.0...
lz4-frame-conduit> build (lib + exe + test)
Preprocessing library for lz4-frame-conduit-0.1.0.0..
Building library for lz4-frame-conduit-0.1.0.0..
[1 of 2] Compiling Codec.Compression.LZ4.CTypes
[2 of 2] Compiling Codec.Compression.LZ4.Conduit
Preprocessing executable 'haskell-lz4c' for lz4-frame-conduit-0.1.0.0..
Building executable 'haskell-lz4c' for lz4-frame-conduit-0.1.0.0..
[1 of 1] Compiling Main
Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/haskell-lz4c/haskell-lz4c ...
Preprocessing test suite 'lz4-frame-conduit-test' for lz4-frame-conduit-0.1.0.0..
Building test suite 'lz4-frame-conduit-test' for lz4-frame-conduit-0.1.0.0..
[1 of 1] Compiling Main
Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test ...
            
lz4-frame-conduit> copy/register
Installing library in /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/lib/x86_64-linux-ghc-8.6.5/lz4-frame-conduit-0.1.0.0-8RAA6BcdNm21R7aXlL5jl8
Installing executable haskell-lz4c in /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/bin
Registering library for lz4-frame-conduit-0.1.0.0..
lz4-frame-conduit> test (suite: lz4-frame-conduit-test)
                               
[...]
Finished in 0.0168 seconds
2 examples, 0 failures

lz4-frame-conduit> Test suite lz4-frame-conduit-test passed
Completed 2 action(s).
stack test  13.83s user 1.23s system 49% cpu 30.128 total

% sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test
9d23a8dd39ccfe86c164d52c03d59f103a904eec  .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test

% # switch to lts-13.27

% stack test
lz4-frame-conduit-0.1.0.0: unregistering (old configure information not found)
lz4-frame-conduit> configure (lib + exe + test)
Configuring lz4-frame-conduit-0.1.0.0...
lz4-frame-conduit> build (lib + exe + test)
Preprocessing library for lz4-frame-conduit-0.1.0.0..
Building library for lz4-frame-conduit-0.1.0.0..
[1 of 2] Compiling Codec.Compression.LZ4.CTypes
[2 of 2] Compiling Codec.Compression.LZ4.Conduit
Preprocessing test suite 'lz4-frame-conduit-test' for lz4-frame-conduit-0.1.0.0..
Building test suite 'lz4-frame-conduit-test' for lz4-frame-conduit-0.1.0.0..
[1 of 1] Compiling Main [Data.Conduit.Process changed]
Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test ...
Preprocessing executable 'haskell-lz4c' for lz4-frame-conduit-0.1.0.0..
Building executable 'haskell-lz4c' for lz4-frame-conduit-0.1.0.0..
[1 of 1] Compiling Main [Data.Conduit.Binary changed]
Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/haskell-lz4c/haskell-lz4c ...
            
lz4-frame-conduit> copy/register
Installing library in /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/7d6d968be153d4dcc0edab19f77cd81934e5dbcbcb69dad0082c318db701ea3b/8.6.5/lib/x86_64-linux-ghc-8.6.5/lz4-frame-conduit-0.1.0.0-3OL8VoGxo9YCAWPxV870O2
Installing executable haskell-lz4c in /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/7d6d968be153d4dcc0edab19f77cd81934e5dbcbcb69dad0082c318db701ea3b/8.6.5/bin
Registering library for lz4-frame-conduit-0.1.0.0..
lz4-frame-conduit> test (suite: lz4-frame-conduit-test)
                               
[...]
Finished in 0.0483 seconds
2 examples, 0 failures

lz4-frame-conduit> Test suite lz4-frame-conduit-test passed
Completed 2 action(s).
stack test  7.79s user 0.91s system 97% cpu 8.894 total

% sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test
c0d644d1d1f9d90e241ba5191bbbf2ca15c3ce67  .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test

% # switch to lts-13.26

% stack test
lz4-frame-conduit> test (suite: lz4-frame-conduit-test)

[...]
Finished in 0.0543 seconds
2 examples, 0 failures

lz4-frame-conduit> Test suite lz4-frame-conduit-test passed

% sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test
c0d644d1d1f9d90e241ba5191bbbf2ca15c3ce67  .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test

Summarised output of the above:

% stack test
Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test ...
2 examples, 0 failures

% sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test
9d23a8dd39ccfe86c164d52c03d59f103a904eec  .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test

% # switch to lts-13.27

% stack test
Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test ...
2 examples, 0 failures

% sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test
c0d644d1d1f9d90e241ba5191bbbf2ca15c3ce67  .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test

% # switch to lts-13.26

% stack test
2 examples, 0 failures

% sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test
c0d644d1d1f9d90e241ba5191bbbf2ca15c3ce67  .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test

As shown, I switch resolvers, and then the binary at .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test is not updated by stack test (it's the same binary as from before I switched resolvers).

@qrilka
Copy link
Contributor

qrilka commented Sep 16, 2020

Ouch, I didn't take into account that test executables don't go into snapshot-specific directory...
I guess this is just another case of ghc-pkg(?) treating executables and packages differently and Stack got it wrong again...
And maybe component-based builds could help here as well? :)
Stack sees your library in .stack-work/dist/x86_64-linux/XXX and thus it assumes that nothing needs to be rebuilt. Ok, I will take a look into some way out of this (hopefully without going into component-based builds).

@nh2 nh2 changed the title Irreproducibility: Code rebuiltds only the first time I switch resolvers Irreproducibility: Code rebuilds only the first time I switch resolvers Sep 17, 2020
@qrilka
Copy link
Contributor

qrilka commented Nov 1, 2020

After trying to fix this I came to conclusion that this doesn't look to be easily resolvable with the way stack builds work and the way to fix it properly is to do proper component-based builds as in #4745
The problem with an attempt to fix this seem to break the current way Stack excludes some rebuilds.

@mpilgrem
Copy link
Member

I am going to close this issue in favour of #6356, as it appears to be an instance of component-based builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants