-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds distributed row gatherer #1589
base: index-map-pgm
Are you sure you want to change the base?
Conversation
6b4521b
to
ae60198
Compare
6acf7c4
to
8aa6ab9
Compare
49557f1
to
4a79442
Compare
8aa6ab9
to
77398bd
Compare
4a79442
to
172eb7d
Compare
77398bd
to
d278cad
Compare
98fa10a
to
79de4c3
Compare
One issue that I have is the constructor. It takes a
If I can't come up with anything better, I guess I will use that. |
79de4c3
to
b0e5c92
Compare
d278cad
to
d6112ef
Compare
b0e5c92
to
775854a
Compare
d6112ef
to
1582673
Compare
Do we need to have the |
c432ffc
to
f8cb0e8
Compare
3ca34b0
to
9ac78ed
Compare
f8cb0e8
to
decc510
Compare
@pratikvn, @yhmtsai I've removed the row gatherer from the LinOp hierarchy for now. It instead directly derives from PolymorphicObject. You might want to give it a second look because of that. |
decc510
to
f2e677c
Compare
977738d
to
5580ebf
Compare
Signed-off-by: Marcel Koch <[email protected]>
Signed-off-by: Marcel Koch <[email protected]>
- only allocate if necessary - synchronize correct executor Co-authored-by: Pratik Nayak <[email protected]>
- split tests into core and backend part - fix formatting - fix openmpi pre 4.1.x macro Co-authored-by: Pratik Nayak <[email protected]> Co-authored-by: Yu-Hsiang M. Tsai <[email protected]> Signed-off-by: Marcel Koch <[email protected]>
- add copy/move tests - undo using MPI_Init_thread - add extra host_recv_buffer_ - create row-gatherer as unique_ptr Co-authored-by: Yu-Hsiang M. Tsai <[email protected]>
The `LinOp::apply` function creates temporary clones to match the operators executor, but this will lead to wrong behavior, if MPI doesn't support GPU buffers.
right now the RG doesn't support (blocking) apply, so it doesn't make much sense to keep it as a LinOp
f2e677c
to
b734fec
Compare
5580ebf
to
5d201aa
Compare
This PR adds a distributed row gatherer. This operator essentially provides the communication required in our matrix apply.
Besides the normal apply (which is blocking), it also provides two asynchronous calls. One version has an additional
workspace
parameter which is used as send buffer. This version can be called multiple times without restrictions, if different workspaces are used for each call. The other version doesn't have a workspace parameter, and instead uses an internal buffer. As a consequence, this function can only be called a second time, if the request of the previous call has been waited on. Otherwise, this function will throw.This is the second part of splitting up #1546.
It also introduces some intermediate changes, which could be extracted out beforehand:
a type-erasedDenseCache
makingnow part of Use index_map in distributed::matrix #1544detail::run
easier to usePR Stack: