-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
document contribution policy for adding software to EESSI #108
Conversation
Hm. Considering their potential for abuse, should we require that when using |
Yes, that should be a requirement, indeed. The extra benefit of that is that it can put some pressure on actually getting PRs merged on the EasyBuild side (easybuilders/easybuild-easyblocks#2248 comes to mind here). |
|
||
Only **open source software** can be added to the EESSI repository. | ||
|
||
Make sure that you are aware of software license, and that redistribution is allowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"aware of relevant software licenses" ??
Requiring merging can be harsh, but for 99% of cases, it should be fine. But easybuilders/easybuild-easyblocks#2248 also suggest that this can stall stuff. We can start off with a "must be merged"-policy and reevaluate with a possible migration to "should be merged" at a later date, if needed? (We should probably also use som auto-labelling of PRs and set "uses-from-pr" or a similar tag if it is seen in the diff, just to help maintainers?) |
## Software versions & toolchains | ||
|
||
Recent software versions and toolchains *should* be preferred, | ||
although the installation of older versions of use of older toolchains is allowed if sufficiently motivated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"versions of use of older toolchains" this is rather unclear, what is really meant by this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded, should be clearer now
Recent software versions and toolchains *should* be preferred, | ||
although the installation of older versions of use of older toolchains is allowed if sufficiently motivated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we've only started to discuss this, and the machinery to help enable this doesn't yet exist, but we should maintain that once a toolchain is connected to a particular compat layer we don't include that toolchain in future compat layers. If we allow people to create PRs for older toolchains to newer compat layers we will bring a lot of baggage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's perhaps a bit too restrictive, there can be very valid reasons to have a recent toolchain like foss/2023a
both in the current EESSI version, but also have it in a future version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type of reason? I can't think of a need for this. You can have the later toolchain, just with a different compat layer. If we select compat layer via a module, rather than sourcing a script (which is entirely possible) they can happily live side by side using a (reduced) hierarchical view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially when we already know there are cases where it can't work, for example the whole OpenSSH thing we ran into with 2023.04
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start. I think, it's too early to make this public (and thereby a requirement). We should be much more restrictive across the types of requirements and then have the machinery in place for enforcing it.
Internally, we can start with something like the policy as described and try to ensure to meet all requirements.
@@ -0,0 +1,80 @@ | |||
# Contribution policy | |||
|
|||
When [openining a pull request to add software to EESSI](adding_software.md), the following requirements must |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd start with a brief description of what this policy is for. For example,
The purpose of the contribution policy is to provide guidelines for adding software to the shared EESSI repository. It informs about what requirements a software to be added must meet. ...
Small typo openining
opening
. I'd rephrase this, however, to leave out the technical aspect (opening a pull request). E.g.,
Any software to be added to the EESSI repository must meet the following requirements:
1. Open Source Software only (see section X for details)
2. ...
3. ...
For more information about a specific license, see the [SPDX license list](https://spdx.org/licenses/). | ||
|
||
!!! note | ||
|
||
We intend to automatically verify that this requirement is met, | ||
by requiring that the [SPDX license identifier](https://spdx.dev/ids/) is provided for all software. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying the SPDX license identifier is probably not enough. Verifying it (that the identifier reflects the license) automatically is likely difficult.
I'd suggest to be rather restrictive at the start:
- Every software to be added must provide license information covering the full sources of the software package.
- For all dependencies of the software, license information covering the full sources must be provided too.
- License information must be given as SPDX license identifier.
- At the start (policy version 0.1) only the following license identifiers are accepted: list SPDX license identifiers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we want to start doing this, but this shouldn't be required in the first version of the policy.
We need to set up a mechanism for this first, for example a licenses.yaml
file that maps software names to SPDX identifiers, and ideally also a way to detect that one or more entries are missing...
So, for now, I would keep it like it is now, and then work towards requiring that SPDX license identifiers can be provided somehow, and then update the policy accordingly once that is in place.
@@ -0,0 +1,80 @@ | |||
# Contribution policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a version to the policy?
!!! note | ||
|
||
This restriction may be relaxed later to also allow adding software that is not supported yet in the latest | ||
EasyBuild release, or to allow for installing software with other tools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this. Raises expectations we may not (want to) meet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this mostly to counter feedback like "why are you only using EasyBuild", but perhaps the contribution policy is not the place for that, so will remove.
A [compiler toolchain](https://docs.easybuild.io/terminology/#toolchains) that is still supported by the latest | ||
EasyBuild must be used for building the software. | ||
|
||
More information on deprecated toolchains in EasyBuild is available | ||
[here](https://docs.easybuild.io/deprecated-easyconfigs/#deprecated_easyconfigs_toolchains). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be more restrictive here? E.g.,
Only toolchains already available in the EESSI repository may be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that any toolchain still supported in EasyBuild is eligible to be included in EESSI, at least in the initial version of the policy.
Going back to older toolchains is likely going to be significantly more painful, if only to get even the installation of GCC to work, so this is sort of self-regulating...
Also, we can't really use a statement like "Only toolchains already available in the EESSI repository may be used.", because then adding toolchains (however recent) would be against the contribution policy? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be up to the EESSI maintainers what toolchains are and are not supported by a particular EESSI release, this can be handled via our hook, see for example https://github.com/easybuilders/JSC/blob/2024/Custom_Hooks/eb_hooks.py#L570-L640
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing someone to use a new toolchain is a big step, as obviously this is extremely likely to trigger a massive number of new software packages (and maintainer effort).
## Testing | ||
|
||
We should be able to test the software installations via the [EESSI test suite](https://github.com/EESSI/test-suite) | ||
being developed. | ||
|
||
Ideally one or more tests are available that verify that the software is functionally correct, | ||
and performs well. | ||
|
||
It should be possible to run a minimal *smoke test*, for example using EasyBuild's `--sanity-check-only` feature. | ||
|
||
!!! note | ||
|
||
The [EESSI test suite](https://github.com/EESSI/test-suite) is still in active development, | ||
and currently only has a minimal set of tests available. | ||
|
||
When the test suite is more mature, this requirement will be enforced more strictly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reads very vague, more like a goal/aim. I'd rather require that a software must be tested for functional correctness, single-core performance and multi-core scalability. Since the machinery (and tests) do not exist yet, the request to add a software should detail how the software can be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's way more restrictive than what we're doing currently, for many installations (dependencies) we don't really test at all...
So we need to keep this relatively loose for now.
Eventually we can hopefully require that for example --sanity-check-only
works on all installations, but that needs more work, and that the latest release of the EESSI test suite passes with certain tags (like CI
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wording is OK for now.
One of the plans in the EESSI test suite is to add a "smoke test" that basically just runs eb --sanity-check-only
on all installations.
Once that is in place, the minimal requirement for this part could be that this test must pass for all software installations being added, but then we should first fix some known problems in EasyBuild like easybuilders/easybuild-easyblocks#2986
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Almost done. A few small suggestions for rephrasing. One wish for the testing section.
I wonder if/how we could encourage contributors to help us define meaningful tests. For example,
- for package
X/y.z
, obtain test dataset fromhttps://x.org/testdata_vX.Y.tgz
- unpack test data
- run
X -arg 1 -arg 2 testdata.csv
- expected output files are:
testout.jpg
andtestout.log
wheretestout.log
includes 1 line matchingSUCCESS
and no line matchingERROR
docs/contribution_policy.md
Outdated
|
||
Only **open source software** can be added to the EESSI repository. | ||
|
||
Make sure that you are aware of relevant software license, and that redistribution is allowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo?
Make sure that you are aware of relevant software license, and that redistribution is allowed. | |
Make sure that you are aware of relevant software licenses, and that redistribution is allowed. |
or even?
Make sure that you are aware of relevant software license, and that redistribution is allowed. | |
Make sure that the software uses an open source license, and that redistribution is allowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-picking but the original formulation is better, we do allow binary distribution as long as the redistribution is permitted by the licence (CUDA for example falls into this category, we distribute the runtime)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ocaisa Then this shouldn't refer to "open source" software at all?
That is going to make the wording here a lot less clear though...
docs/contribution_policy.md
Outdated
### e) CPU targets { #cpu_targets } | ||
|
||
The software *should* work on all [CPU targets supported by EESSI](software_layer/cpu_targets.md). | ||
|
||
Exceptions to this requirement are allowed if technical problems that can not be resolved with reasonable effort | ||
prevent the installation of the software for specific CPU targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use architecture
instead of targets
?
### e) CPU targets { #cpu_targets } | |
The software *should* work on all [CPU targets supported by EESSI](software_layer/cpu_targets.md). | |
Exceptions to this requirement are allowed if technical problems that can not be resolved with reasonable effort | |
prevent the installation of the software for specific CPU targets. | |
### e) CPU architectures { #cpu_targets } | |
The software *should* work on all [CPU architectures supported by EESSI](software_layer/cpu_targets.md). | |
Exceptions to this requirement are allowed if technical problems that can not be resolved with reasonable effort | |
prevent the installation of the software for specific CPU architectures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trz42 We use "targets" in https://www.eessi.io/docs/software_layer/cpu_targets, so sticking to targets is more consistent?
I guess we could use "target CPU architectures"? (it should really be microarchitectures, though)
docs/contribution_policy.md
Outdated
We should be able to test the software installations via the [EESSI test suite](../test-suite). | ||
|
||
Ideally one or more tests are available that verify that the software is functionally correct, | ||
and performs well. | ||
|
||
It should be possible to run a minimal *smoke test*, for example using EasyBuild's `--sanity-check-only` feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question is, should we ask contributors to help us define a meaningful test?
Such tests could be added to the EESSI test suite. "Ideally" sounds like a very loose requirement ... and missed opportunity to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this a requirement for someone who is really just an end-user is a high bar. I agree for that marquee applications we should have such tests, but it's not realistic to expect that for all cases. In general, it's probably a much more pro-active effort on our part where we'd need to ensure we are reaching out to the actual developers (or very experienced users)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indeed necessarily needs to be a rather loose requirement, or we will be way too restrictive, and won't get any contributions because of that.
It's definitely unreasonable to require that a test must be available in the EESSI test suite, since adding something there is far from trivial (requires knowing Python, ReFrame, how to write a portable test, etc.).
We can probably provide some documentation with guidelines on how to do that eventually, but then it will still be quite an elaborate task.
We should phrase this such that it's clear that we prefer being able to test the software, in one way or another, without imposing a huge amount of effort on contributors, at least initially.
Maybe we can mention options like providing a test case, referring to an example or documentation of a basic run, etc.?
docs/contribution_policy.md
Outdated
### f) Versions & toolchains { #versions_toolchains } | ||
|
||
Recent software versions and toolchains *should* be preferred, | ||
although the installation of older software versions and toolchains is allowed if sufficiently motivated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not keen on this, we open the door to a lot of pain by looking backwards. In general, I do believe we should decide what toolchains to support on a specific compat and stick to it, and probably take the opportunity once a year to update the compat layer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels like swinging over to the other side a bit too much (by "locking" a toolchain version to a particular EESSI version, and hence compat layer).
Keeping compat layers updated is gradually going to become quite a bit of effort, and so we may want to install an existing toolchain version in a newer compat layer at some point, since that's likely to be a lot less painful than updating a previous compat layer version.
I guess we could stick to "a toolchain version is only installed on top of a single compat layer version" initially, and then see if that makes sense going forward, and then adjust accordingly if needed.
But should this be part of the contribution policy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence on this one. On the one hand, for a good user / contributor experience, discouraging of forbidding the deployment of old toolchains will probably lead to fewer issues. On the other hand: if someone needs an older toolchain, and if it works, I see little reason not to accept a PR like that.
I had a chat yesterday with someone from a university HPC support staff who asked 'if my user asks for something from a 2019b toolchain, because he needs the particular software version in that toolchain, can I deploy that through EESSI?'. Even though he fully agreed we should give pushback to users requesting this stuff, in the end he's also realistic and just wants to help the researcher. If it were a local software stack, he would have definitely given it a try. If our policy 'forbids' that, he would be pushed to a local solution. I'm a bit concerned if it wouldn't turn HPC support staff away from using EESSI if we are too restrictive on these things.
I guess we could stick to "a toolchain version is only installed on top of a single compat layer version" initially, and then see if that makes sense going forward, and then adjust accordingly if needed.
What do you mean here? If foss-2022a went into 2023.06, it cannot also go into 2023.12? I wouldn't do that. I think Alan's suggestion was more like: we can decide for 2023.06 to support toolchains 2021a onwards, for 2023.12 we do 2021b onwards, etc. So then there is overlap (2022a would still be on top of both versions of the compat layer), but it prevents deployment of very old toolchains on new compat layers (which probably won't work anyway).
The more I think about it, the more I am not in favour of limiting this initially in the policy. I think people will find it's hard to get older compilers to build, and older MPIs to function properly. In practice, I think they might still be blocked from contributing such toolchains, simply because it's impossible to get them to work (on new compat layers, at least). I wouldn't forbid that, just let the technical limitation be the blocker, not our 'rule'. And, I agree we could still mention this (i.e. old stuff is discouraged, likely to cause issues, and the community probably won't be very motivated to help you out) clearly in the policy.
Another advantage of not limiting it initially is that it allows us to build experience, which we can later use to make an informed decision to forbid it after all (if needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casparvl We discussed this aspect during the bot sync meeting earlier today, and I've made some changes to the policy in 86f99c3 based on that.
There are now separate entries for recent toolchains and recent software versions, where the former mentions that contributors should either use already installed toolchains, or submit a motivated support request to get an additional toolchain installed, which puts the ball in our camp, and leaves the door open for contributors.
There was a consensus during the meeting that this makes sense for the initial version of the policy, and we can revise that later should the need arise.
In practice, it's very likely anyway that is will be EESSI admins who will add toolchains.
We also briefly discussed that we should only install the latest N toolchains in the latest version of EESSI at any time, which allows for minimal overlap across EESSI versions (but that's not part of the policy, that's more of an internal thing we should try and stick to).
In practice this will probably mostly be governed by the compatibility of the glibc version in the compat layer and the GCC version in the toolchains...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In EESSI/software-layer#371 I've included the ability for both the site and the user to extend EESSI as they see fit. So even if we don't support it (most likely because of the architecture situation) doesn't mean that they can't do that locally if it "just works".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that now #117 was merged, we put a placeholder for the contribution policy at https://www.eessi.io/docs/contributing_sw/contribution_policy/
That'll require a rebase or merge into this feature branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: ocaisa <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @boegel !
Just a tiny suggestion.
Co-authored-by: Thomas Röblitz <[email protected]>
Lgtm 👍🏻👍🏻 |
Hi @casparvl, I think your requested change was implemented and your concerns about the toolchain versions were addressed. So, opted to dismiss your review. Hopefully ok with you. Cheers Thomas
Great work @boegel ! I think the changes you made after my review make sense. And as you mentioned: let's build experience, and fine tune if needed :) |
I have marked this as draft (work-in-progress), because this policy is only a proposal at this point, and needs to be discussed among project partners a bit more before adding it to the EESSI documentation.
All feedback is welcome!