-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream-ordered wait{_any,_all}, fix wait_any implementation #125
base: develop
Are you sure you want to change the base?
Stream-ordered wait{_any,_all}, fix wait_any implementation #125
Conversation
0066522
to
66a68e1
Compare
src/KokkosComm/mpi/req.hpp
Outdated
|
||
template <KokkosExecutionSpace ExecSpace> | ||
void wait_all(const ExecSpace &space, std::vector<Req<Mpi>> &reqs) { | ||
space.fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this fence is necessary. If the semantics of KokkosExecutionSpace
is such that enqueued operations are ordered then it is sufficient to block the enqueuing thread until the communication is complete. Subsequent submissions are necessarily ordered with the (then-completed) MPI operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not need that fence there. There are two more occurrences of superfluous fences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving the wait_* functions to a separate file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically we're trying to order this "wait" after anything else already in the execution space instance. There's no way to actually enqueue an arbitrary host function in a Kokkos execution space instance, so the only way to ensure everything already in the instance is done is to fence it. Then we can call the MPI functions to complete the communication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is not observable. The operations have been started so they have seen the execution space at the time the operation was started. Subsequent kernels should not interfere with these operations, nor do operations submitted after the wait
(because they will themselves synchronize the execution space before starting the operation). And I don't think wait_all
should guarantee synchronization of the space because that is more than strictly needed.
66a68e1
to
4bc3385
Compare
This adds
wait
,wait_any
,wait_all
functions likeSemantically, this inserts the "wait" into the provided execution space instance - i.e., existing work in the instance is allowed to complete, and future work must wait for the communication to complete.
The 1-argument version without an execution space just calls the two-argument version with
Kokkos::DefaultExecutionSpace
.