-
Notifications
You must be signed in to change notification settings - Fork 0
2017 09 20
Wesley Bland edited this page Sep 22, 2017
·
1 revision
- Went over slides again for new folks in the room
- Confirmed that we still like the current proposal and cleared up confusion around it.
- When presenting to the forum, we should demonstrate plenty of use cases.
- Resource exhaustion is easy (and perhaps sufficient), but what about others?
-
inquiry.tex:524
- Implies that returningMPI_IS_OK
means we'll never have an error.- Frame this as stating what happened in the past, not that the future is guaranteed.
- Update issue description for reading.
- Reasons Reinit can't live in ULFM
- Can implement a failure detector, process recovery, etc. in SLURM faster than MPI.
- Uses PMPI interface.
- Fails faster without doing agreement / revoking.
- Ignacio: Let's have both models and add an API function to pick which model you want.
- Probably can't have both in the same app, but it might be possible if you can make strong guarantees about your application + libraries.
- Aurelien: Could we allow you to pick with error handlers?
- Set
MPI_ERRORS_REINIT
on your communicator if you want reinit.
- Set
- Concern about overlapping communicators comes from overlapping shrinks (not revokes)
- We think we can just improve our advice about the safest way to do MPI recovery to say that MPI recovery should all live in the same place instead of happening at multiple layers (unless you are sure it's ok).
The reading was not a success because of concern about the backward incompatibility. It's ok to have it, but we need to add a new chapter for backward incompatible changes to point this out.
Other notes are on the pull request itself: https://github.com/mpi-forum/mpi-standard/pull/1#pullrequestreview-64681897