Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build stackage2nix in NixOS sandbox #41

Open
4e6 opened this issue Dec 7, 2017 · 15 comments
Open

Build stackage2nix in NixOS sandbox #41

4e6 opened this issue Dec 7, 2017 · 15 comments

Comments

@4e6
Copy link
Contributor

4e6 commented Dec 7, 2017

Unable to build nix/stackage2nix on NixOS with nix.useSandbox enabled.

nix.useSandbox
If set, Nix will perform builds in a sandboxed environment that it will set up automatically for each build. This prevents impurities in builds by disallowing access to dependencies outside of the Nix store. This isn't enabled by default for performance. It doesn't affect derivation hashes, so changing this option will not trigger a rebuild of packages.

@4e6
Copy link
Contributor Author

4e6 commented Dec 7, 2017

related #40

@4e6
Copy link
Contributor Author

4e6 commented Dec 12, 2017

Currently, I see no ways of sandboxing the stackage2nix wrapper. See the issues below.

stackage2nix wrapper requires following dependencies to be fetched, see nix/lib.nix

To be able to satisfy the sandbox requirements, all these dependencies should be prefetched before the build by the standard nix-prefetch-scripts.

Stackage config files

Only files are needed, so fetchgit can be used to fetch fpco/lts-haskell and fpco/stackage-nightly dependencies. The only issue with this approach is less convenient updates, because it would require updating revision and hash for both repos, instead of bumping single cacheVersion parameter.

all-cabal-hashes

To build the exact copy of stackage packages set, stackage2nix searches for a project definitions in all-cabal-hashes by a hash defined in stackage config (single version of the package may have different revisions). In order to do so, all-cabal-hashes should be fetched with git metadata. Due to NixOs/nixpkgs #8567 there is no reliable way to do this with fetchgit.
The solution might be to fetch zip archive of a particular version. AFAIK, Github is able to create such links but only for archives containing project files, without metadata.

hackage-db

An issue with hackage-db is that URL doesn't have a particular version to put in fetchurl script. I'm assuming that hackage-db could be recreated from all-cabal-hashes repo, but I'm not sure how. Other solution would be to fetch versioned db from some other place.

@kirelagin
Copy link

@4e6
Copy link
Contributor Author

4e6 commented Jun 9, 2018

Regarding the non-determinism of all-cabal-hashes.

I've found this old comment on the original issue thread. The idea is to unpack the git objects and store them uncompressed bendlas/nixpkgs@4b9c24a
We should be able to do this unpacking as a postUnpack build step.

Downsides:

  • will lead to increased size of git repository

Upsides:

  • deterministic fetchgit
  • (should be checked) we can access those objects through the libgit interface (no changes are needed for stackage2nix itself)

binarin added a commit to binarin/stackage2nix that referenced this issue Jun 9, 2018
Part of typeable#41

Requires typeable/nixpkgs-stackage#26

Maybe the old .git code-path should be supported also. So both a
manual checkout and a nix-built all-cabal hashes can be used.
@zimbatm
Copy link
Contributor

zimbatm commented Jun 9, 2018

Do you really care about the git history or is it because the tool wants to query the current reference of the checkout?

For the latter, it could make sense to re-build a fake .git database with only the following files:

.git/HEAD -> ref: refs/heads/master
.git/refs/heads/master -> e843a2271a972b8cb6401e67f25d22c8f6fa68cb

@binarin
Copy link
Contributor

binarin commented Jun 9, 2018

@zimbatm It's the mapping from sha1 to a file content that is needed.

@zimbatm
Copy link
Contributor

zimbatm commented Jun 9, 2018

so the tool is not looking at the checked-out content but querying the git database directly instead?

if you go down the fetchgit + unpacked blobs maybe you can make it smaller by using a shallow copy of the database.

given the level of effort involved it could make sense to patch upstream as well

@binarin
Copy link
Contributor

binarin commented Jun 9, 2018

@zimbatm The full history is still needed, as we need all blobs reachable from the required commit.

I've discussed this with @4e6, and I think I'll just make a small tool that will create a canonical representation of git .pack file. So if everything (branches, tags) is properly pruned before that, the result will be a working git checkout that is also reproducible. I'll experiment with this approach here. If it'll work out, I try to do the same in the fetchgit itself.

@4e6
Copy link
Contributor Author

4e6 commented Jun 10, 2018

I tried the approach referenced in my previous comment with the unpacking of git objects bendlas/nixpkgs@4b9c24a

This led to the increase of all-cabal-hashes checkout size from 1.6 Gb to 16 Gb, which is not acceptable.

binarin added a commit to binarin/stackage2nix that referenced this issue Dec 7, 2018
This allows a bare checkout `all-cabal-hashes`, which saves some
space and removes a lot of separate files.

But it also means that we'll be able represent a whole
`all-cabal-hashes` repo as a single git .pack-file.

Making that .pack-file reproducible will be a final thing to
achieve typeable#41.
4e6 pushed a commit that referenced this issue Dec 24, 2018
This allows a bare checkout `all-cabal-hashes`, which saves some
space and removes a lot of separate files.

But it also means that we'll be able represent a whole
`all-cabal-hashes` repo as a single git .pack-file.

Making that .pack-file reproducible will be a final thing to
achieve #41.
@yorickvP
Copy link

Maybe we can use the github zip archive? It should allow fast random reads.

@binarin
Copy link
Contributor

binarin commented Jan 17, 2019

Maybe we can use the github zip archive? It should allow fast random reads.

Filenames are used only as a fallback, primary addressing method is by GitSHA1. So a full .git-repo is needed.

@yorickvP
Copy link

As I understand it, the bare git repo is only used because it is more compact than doing a repo checkout. However, there is no good way to get an up-to-date one within a nix sandbox. I had to revert 86f11b8 while working on updating nixpkgs-stackage.
Getting the latest .zip is trivial (builtins.fetchurl), way faster (20s vs 1m20s for git clone) and way smaller (189MB vs 366MB). Zip allows random access for decompression, so should be fast to grab files out of.

@binarin
Copy link
Contributor

binarin commented Jan 18, 2019

@yorickvP To make a latest .zip usable, you need to calculate GitSHA1 of every file inside of it and cache this info somewhere. It's doable, except for hackage revisions (just grep by x-revision) - there'll be only the latest revision available, without any way to fetch older ones. And that is what being solved by having a .git-folder.

Proper solution is to create some canonical representation of a .git repo which will be reproducible. Maybe that will require writing a custom git .pack file generator.

@yorickvP
Copy link

Does stack even expose the used cabal file revision? The intractability of the problem does not seem worth any of the potential savings of using older cabal files sometimes, assuming cabal files are rarely updated and do not break anything.

@binarin
Copy link
Contributor

binarin commented Jan 21, 2019

@yorickvP Yes, it's exposed - e.g. search for GitSHA1 in https://raw.githubusercontent.com/commercialhaskell/lts-haskell/master/lts-12.16.yaml

If non-breaking updates are OK, why you've enabled the sandboxing? =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants