Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it make sense to use an existing package management system like nix? #9

Open
aidanheerdegen opened this issue Nov 18, 2015 · 20 comments

Comments

@aidanheerdegen
Copy link

https://nixos.org/nix/

It is built to support multiple environments, which seems a very good fit for compiler/lib combinations, I'm thinking especially of compiler/MPI library combinations.

@szaghi
Copy link
Member

szaghi commented Nov 19, 2015

Hi @aidanheerdegen ,
Thank you for your interest.

Nix is very interesting. How is difficult to use? How does it integrate (if it is the case) with git/github?

One of feature that I really love of @cmacmackin project is its simplicity. I am a poor Fortran man that does not like to struggle with complicated environment managers :-)

Thank you for help.

@aidanheerdegen
Copy link
Author

I'm in the same (poor Fortran) boat as you, so I can't bring any expert knowledge of nix. Just wondering if those sorts of options had been explored.

I've also come across spack:

https://github.com/scalability-llnl/spack

and EasyBuild:

https://github.com/hpcugent/easybuild

So many tools!

I've now learnt about FoBiS because of your reply, so I'm very happy.

@szaghi
Copy link
Member

szaghi commented Nov 19, 2015

@aidanheerdegen thank you! Very nice hints!

If can suggest nice Fortran projects here take a look of @cmacmackin FORD and many many others of the our group members https://github.com/Fortran-FOSS-Programmers (the personal projects, our group is young and has few projects).

If you like to join us let me know. Our group is very open and like to add new Fortraner entusiast!

See you soon.

@cmacmackin
Copy link
Contributor

Hi, thanks for your interest. I was not aware that such package-management tools existed. They are certainly worth considering. The issue I potentially have with nix is that it seems to require the package maintainers to learn a whole new language. Not only that, but it's a functional language. The other two you mention only require maintainers to write some Python, which people are much more likely to be familiar with. However, I'll have to take a more detailed look at these before I can say anything decisive. Hopefully I'll get to that over the weekend.

@cmacmackin
Copy link
Contributor

Looking at these I can see pros and cons of all of them.

nix

Pros:

  • very powerful
  • explicit concept of adding extra "channels"; FLATPack could simply become a new one of these

Cons:

  • Packaging requires you to learn a whole new language, and a functional one at that
  • Potentially confusing in how complicated it is

Spack

Pros:

  • clear, easy-to-understand interface
  • packages written in Python and look very easy to build

Cons:

  • packages not separate from package manager, so difficult to submit new ones
  • packages only updated with new release

Easybuild

Pros:

  • very powerful
  • concept of dependencies and build-dependencies
  • lots of existing build toolchains which can be easily used

Cons:

  • packages are explicitly under their control (would they be interested in dealing with every Dick, Harry, and John's JSON parser, ODE solver, and plotting package?)
  • somewhat confusing system
  • requires a module system (such as GNU Modules) to be installed
  • seems entirely focused on HPC, not clear if it would be useful on a personal computer

To be honest, my inclination right now would be to fork spack, altering it so that package files are no longer part of the installation per se, but are in a separate git respository. We'd also probably want to add a command to pull down any updates to said repository. These changes would be relatively minor and it would be fairly easy to pull in updates made by the spack team themselves. In addition to the FLATPack repository, we'd also set up a FLATPack-db repository containing the package files. Preferably we could also add some way for people to add their own repositories as sources of package files (similar to adding PPAs in Ubuntu). However, it does seem a shame to have to fork a project which is so nearly where we want to be.

@aidanheerdegen
Copy link
Author

Thanks for taking the time to compare these existing projects. I had a vague idea of what the differences might be, but I didn't have the time to go into in detail.

I think you're right that they have different strengths and use cases. While I agree that Easybuild requires modules, I'm not sure this is entirely a "con". It allows for supporting libraries (and compilers) to be installed rather seamlessly. Which also helps a lot with testing and reproducibility. Whilst some scientific coders might not value these things in the beginning, you just know they'll thank you if they ever have a support request like "uhhh .. it doesn't work on my system".

Is it worth explicitly asking the developers of Easybuild and Spack to comment by @'ing them into the conversation? The Easybuild developer might have an opinion about package repos.

@szaghi
Copy link
Member

szaghi commented Nov 25, 2015

@cmacmackin @aidanheerdegen

Hi all,
I agree with Chris on the general considerations. Moreover, I dislike modulefiles also: I had used them for years and they are not user-friendly, they are somehow cumbersome to maintain. Without doubts they are very useful (I have many environments with different compilers/libraries) and are the standard in HPC, but I would like that FLATPack will be more friendly. In this regards, I have recently moved from modulefiles to desk, a very very simple and power environments manager. Here I explain how to use it as a substitute to modulefiles.

I agree with @aidanheerdegen : maybe the developers of Easybuild and Spack can give us valuable help.

See you soon.

@aidanheerdegen
Copy link
Author

I'm no fan of modules either @szaghi, but have you looked into Lmod? It seems to solve a lot of the problems of the original modules, and has a lot of features, including a hierarchical naming scheme

http://www.admin-magazine.com/HPC/Articles/Lmod-Alternative-Environment-Modules

I particularly like the save/restore, where you can save your current module setup. Even do named saves:

http://hpcugent.github.io/easybuild/files/sllides_mclay_20140617_Lmod.pdf

I plan to test out Lmod, but haven't had the time to do so up to this point.

Desk does look neat, but the support and wide user base of modules (Lmod) is very useful in my opinion.

@tgamblin
Copy link

Hi all: I'm one of the Spack developers. I'm actually integrating a PR for external package repositories right now, so you will soon be able to maintain packages outside the main Spack distro and point Spack at them. Your package will still be able to depend on the built-in packages; you'll just have to get your users to add a URL to a list of package repositories (kind of like yum, RPM, etc.).

As for modules, Spack actually does generate module files, but you only need them to get things into your PATH. Spack's compiler wrappers will add RPATHs for you so that your executables find their deps with or without modules, but the modules are still there for interactive stuff like PATH and MANPATH. There is also a PR for Lmod support which is on the list for integrating into the mainline.

We've had some adoption by solver folks... see this SC15 poster for details about INRIA Bordeaux's MORSE stack. If you're interested in more details there is also our paper and some slides, and I'm happy to answer questions.

@szaghi
Copy link
Member

szaghi commented Nov 25, 2015

@aidanheerdegen Lmod is very interesting, thank you for pointing it out! I agree that modulefiles are the de facto standard thus they are important (especially in HPC frameworks), but for single-user workstation I would like something more easy to maintain (Lmod seems very promising on this regards).

@tgamblin Thank you very much for your help!

I'm actually integrating a PR for external package repositories right now, so you will soon be able to maintain packages outside the main Spack distro and point Spack at them.

Nice, so one of the main issue identified by Chris is being fixed, good news!

Your package will still be able to depend on the built-in packages; you'll just have to get your users to add a URL to a list of package repositories (kind of like yum, RPM, etc.).

Is the dependency limited to the built-in packages or a package will be allowed to depend on also the external packages repositories that you are integrating? This could matter for us.

Spack's compiler wrappers will add RPATHs for you so that your executables find their deps with or without modules, but the modules are still there for interactive stuff like PATH and MANPATH.

Can you give us more details? I am not sure to understand the RPATHs mechanism. Which is the Spack's compiler wrappers workflow? Spack will compile the sources by means its wrappers, but the reference to RPATHs is not clear for me.

Thank you for poster/paper/slides I will read them soon and thank you for your kind help.

See you soon.

@tgamblin
Copy link

@szaghi: Sure thing. Answers below.

Your package will still be able to depend on the built-in packages; you'll just have to get your users to add a URL to a list of package repositories (kind of like yum, RPM, etc.).

Is the dependency limited to the built-in packages or a package will be allowed to depend on also the external packages repositories that you are integrating? This could matter for us.

The repositories would be implemented as overlays, so you could do either. e.g., in a YAML config file:

repos:
  - /path/to/local/packages
  - $spack/var/spack/packages

If your package depends_on('foo'), Spack would first search the local packages for foo, then search the builtins. You could always flip the precedence by changing the order, as well. $spack in the config file is shorthand for the spack install directory.

Spack's compiler wrappers will add RPATHs for you so that your executables find their deps with or without modules, but the modules are still there for interactive stuff like PATH and MANPATH.

Can you give us more details? I am not sure to understand the RPATHs mechanism. Which is the Spack's compiler wrappers workflow? Spack will compile the sources by means its wrappers, but the reference to RPATHs is not clear for me.

This is explained in more detail in the paper and the slides, but in short RPATH is like LD_LIBRARY_PATH but it's set at compile time and embedded in an executable or library. LD_LIBRARY_PATH is set by the user at runtime, and can be quite brittle. With RPATH, your executables and libraries are built so that they already know where to find their dependencies. This means you do not have to alter environment settings in order to successfully run a binary. If you still want more after looking at the paper section, there is more on this here.

There's a tiny bit more on how the compiler wrappers set this in the packaging guide.

@szaghi
Copy link
Member

szaghi commented Nov 25, 2015

@tgamblin

n short RPATH is like LD_LIBRARY_PATH but it's set at compile time and embedded in an executable or library. LD_LIBRARY_PATH is set by the user at runtime, and can be quite brittle. With RPATH, your executables and libraries are built so that they already know where to find their dependencies. This means you do not have to alter environment settings in order to successfully run a binary.

Wow, very interesting, I am completely unaware about RPATH mechanism. I will read your paper soon. Thank you very much!

If I am not too much bothering... can point me to the list of the compilers that SPACK currently wraps? For us it is very important GNU gfortran that I am almost sure you support, but others could be also important (Intel Fortran, IBM XL Fortran, NAG, etc...).

Thank you very much again for your help!

@cmacmackin
Copy link
Contributor

@tgamblin
Great to hear that Spack is adding support for external respositories. When can we expect a release of Spack which includes that? I see that v0.9 was supposed to be out last summer but you're still only on v0.8.17. I see that there are commits being added on a daily basis, so that is encouraging :). Additionally, are there any plans to put Spack in PyPI or a distribution's package manager at any point? It's always nice to have that option for installation.

@szaghi
Copy link
Member

szaghi commented Nov 25, 2015

@tgamblin Forget my last question, I found here that it should be easy for SPACK founding any local compilers, great job!

I also vote for PyPi install like Chris does :-)

@aidanheerdegen
Copy link
Author

Thanks very much to @tgamblin for popping in and sharing such great info. I was looking into the differences between spack and EasyBuild, when I came across this thread for hashdist:

https://groups.google.com/forum/#!topic/hashdist/CcsjgeaR7Zo

which features a lot of Todd's comments. So there is yet another great tool. From a scientific perspective, for me, reproducibility of builds is paramount. I want to be able to unambiguously identify what executable was used for a simulation, how the model code was built, and if necessary rebuild the executable to re-run the simulation.

Seems like hashdist is a better fit for those requirements? Is it worth bothering @certick for his opinion?

Edit: seems that @ didn't work ...

@tgamblin
Copy link

@szaghi:

If I am not too much bothering... can point me to the list of the compilers that SPACK currently wraps?

Support for each new compiler (compiler family, really) goes here. If you look at the files there, it's only 20-30 lines to add a new one -- all Spack needs to know is possible names, suffixes, how to add an RPATH, and how to query the version. We're currently adding support for things like detecting compilers in environments like Cray's, where you have to load a module to run the compiler.

@cmacmackin:

Great to hear that Spack is adding support for external respositories. When can we expect a release of Spack which includes that?

This week or next, hopefully. develop is probably the best branch to use, currently.

I see that v0.9 was supposed to be out last summer but you're still only on v0.8.17.

We've fallen behind on actually tagging the release (and updating the roadmap, apparently) but all the stuff slated for 0.9 on the roadmap you saw (variants, optional dependencies) is in develop and actually has been since summer. I've updated the roadmap a bit.. I could probably do better with putting relevant details there.

@cmacmackin, @szaghi:

Additionally, are there any plans to put Spack in PyPI or a distribution's package manager at any point? It's always nice to have that option for installation.

I also vote for PyPi install like Chris does :-)

We could probably rework spack to include PyPI support, although it's currently pretty easy to use. You just clone it (or grab a tarball) and you're ready to go -- you can run it right out of bin/spack and it requires no installation---installers that require installation are ironic to me :). It does assume you have Python 2.6 and curl, but that's all.

If people are used to PyPI, I suppose putting it there would increase exposure, which is a benefit, but is there a technical advantage? I would have to think about how best to rework the directory structure to fit a standard python install, and this might interfere with the current scheme, which is designed to fit unobtrusively into a standard Linux filesystem hierarchy.

@aidanheerdegen:

I don't see too much difference between hashdist and Spack from a reproducibility perspective. Neither uses a completely isolated environment (like Nix), but then again neither requires root access for chroot, either (I think that's a bonus). With Nix & hashdist, you get cryptographic hash versioning, but you don't get parameterized, query-able provenance like you do with spack (see slides above -- compiler, options, version, etc are first-class build parameters in spack).

Spack at queries your compilers to ensure they're what they say they are and that they're at a particular version... with hashdist you have to set some env vars to swap compilers (per the last time I talked to Ondrej), so that aspect would be more cumbersome. hashdist does hash the actual build file, which is nice, while Spack only hashes the build DAG for a version. However, I'm planning to add build file hashing, along with perhaps some customizability for how detailed you want your version hash to be.

Feel-wise, I think hashdist has more of a meta-build-system feel to it, while Spack feels more like a package manager like homebrew.

This is all just my opinion, though -- a lot of great work has gone into hashdist so I'd ask them for more details, as well.

@aidanheerdegen
Copy link
Author

Thanks again for the information @tgamblin. There is so much that is similar between the different packages that finding the critical differences can be hard for those of us who aren't very familiar with either (or both) packages.

Have you, the EB guys or hashdist done a comparison at all? Or use cases? Sounds like, if you wanted to test your code against a bunch of compiler/library combinations, spack would be a better choice than hashdist, at the present time.

As a user, I have to say PyPi isn't a big pull for me. Apart from the general weirdness of the inception-like installer recursion PyPi entails, I think it is a bonus that spack will install without dependencies.

@aidanheerdegen
Copy link
Author

I have a slightly provocative question: given the maturity and capabilities of spack (or hashdist) what is the raison d'être of FLATPack?

Is it a curated collection of Fortran specific libraries? Or does it make sense for more focussed curated collections in specific subject areas? In which case, is FLATPack more of an information source, HOW-TOs, suggestions for best practice, and maybe a point of aggregation for search and collaboration?

@cmacmackin
Copy link
Contributor

@aidanheerdegen
Prior to my being aware of the existence of these other package managers like Spack, my intention was essentially to create something along those lines (although I was only thinking about using it with Fortran code). I do vaguely remember Spack being mentioned in Issue #8, but for some reason I hadn't spared much of a thought for it. I think it was because the issue was raised after the initial burst of activity on FLATPack was over and I had gotten distracted by other projects.

I think the intention now would be to make it a curated collection of Fortran libraries and software. I don't think that it makes sense to have subject-specific collections, because there are many things in Fortran which are useful across a variety of fields. For example: ODE solvers, JSON parsers, plotting packages, vector libraries, implementation of data structures, etc. These are the sorts of things which a code may depend on whether it deals with radiative transfer, ocean currents, plasma physics, or economic transactions. If we hope to have some libraries become de facto standards for these things (and such a hope was part of the reason why I initiated this project) then a centralized collection makes the most sense. As such, I see major disadvantages of having subject-specific collections and no real advantage.

@tgamblin
I do realize that Spack is quite easy to install. However, perhaps due to inexperience, I always feel uncomfortable when I have to choose the location in which to install something and even more uncomfortable if I have to manually place things in the system paths. Anyway, it's not a huge problem if Spack isn't on PyPI, I just think it would make installation that little bit easier. I guess it also comes from my gut expectation that any open source Python software should be placed on PyPI, because that is where you're supposed to put it. One of my little irrationalities.

@tgamblin
Copy link

@aidanheerdegen:

Have you, the EB guys or hashdist done a comparison at all? Or use cases?

I talk to the EB guys quite a bit, and I'm on the hashdist mailing list, though not much has happened there lately. We don't have a formal feature comparison anywhere, though I think the EB guys and I agree that there are different use cases and target audiences for Spack and EB.

Spack has seen quite a bit of adoption with app teams and users, and more recently some HPC centers. EB seems to target sysadmins (though we'd like to target sysadmins too), and has been around for longer. In general, EB likes to be very specific about versions, and the config files are the DAG. With Spack, the package recipes are templates, and the tool has pretty robust DAG management (virtual deps, version constraints, normalization, concretization -- see the paper). So, you can compose things more easily but you might also build things that perhaps no one ever tested, so it might break :). The idea is to make it easy to build a one-off configuration outside the vetted stack.

There has been some talk about maybe sharing Spack's DAG library and capabilities with EB, to get some commonality bt/w the tools.

Sounds like, if you wanted to test your code against a bunch of compiler/library combinations, spack would be a better choice than hashdist, at the present time.

I think that is accurate, though I believe hashdist does let you test lots of lib combinations. You might have to do more editing to get everything done, though. With spack you can type a few commands b/c it has a syntax for this.

@cmacmackin: I'll keep PyPI in mind; it would be nice to install from there. I have a lot of things on my plate at the moment as we approach v1.0, but I think it's worth revisiting after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants