Skip to content

2015 10 27

Wesley Bland edited this page Oct 27, 2015 · 1 revision

Attendees:

  • Intel - Wesley, Jeff Hammond
  • UTK - Aurelien
  • ORNL - Geoffroy, Christian

mpi-forum/mpi-issues#4

  • Removes text about implementing MPI_COMM_FREE as local
  • Aurelien thinks that the text is still helpful to remind implementors that it doesn’t have to be synchronizing.
    • If that’s the point of the text, there’s a better way to say it.
  • We’ll probably just drop this issue since it seems to have some merit.

mpi-forum/mpi-issues#7

  • Make it invalid to use communicators after freeing them.
    • This part we may get rid of since it’s possible to keep juggling the reference around internally.
  • Remove advice about refcounting communicators.
    • This part is fine.

ULFM: Failure reporting in MPI_WIN_FLUSH

  • mpiwg-ft/ft-issues#2 & mpiwg-ft/mpi-standard#1
  • The current version of the pull request says that only epoch closing operations will be required to report errors.
  • Jeff argues that we should require flush to report errors by default because it is semantically difficult to rationalize otherwise.
    • Gets are easy to return errors, puts are more expensive.
    • We should make the text for RMA reflect the same types of semantics for send/recv where put/accumulate are sends and gets are receives.
      • As long as the user buffer is ok, then the operation can return MPI_SUCCESS, otherwise it should return an error.
      • We should treat WIN_FLUSH with gets as required to return errors.
      • We should treat WIN_FLUSH with puts/accumulates as SSEND, which means it should return errors.
    • Wait, RMA cannot make guarantees about whether a process is alive or not. All we can tell you is whether the data is accessible/correct or not.
      • You may be able to get to a process’s memory through the hardware even though the process has failed.
      • So errors that we return should only be designed to say whether the data is valid or not.
      • We would probably want to create a new error class that describes this situation.
      • It’s still valid to return MPI_ERR_PROC_FAILED in the scenario where the process has actually failed, but that’s not the only error code that could be returned.
Clone this wiki locally