-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use index_map in distributed::matrix #1544
Conversation
There is some interface breaking stuff in there, but it's minor and I can easily add the old version, if necessary. |
c1ecb3c
to
c49b4b7
Compare
58c6462
to
11f9ba3
Compare
11f9ba3
to
12f8ab3
Compare
12f8ab3
to
c12bac1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! The issue of interface needs to be resolved. It is in the experimental
namespace, but I think we said we would still not break interface in experimental
?
*/ | ||
void read_distributed( | ||
index_map<local_index_type, global_index_type> read_distributed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this breaks interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we made no promises about interface-stability in experimental
beyond "we'll try to keep changes to a minimum"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't consider that really an interface break. This has an effect only on very serious template meta programming, which I don't expect users to do. Other than that, current code will compile, it just discards the return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is actually an interface break here, but not due to the return value. Now this function expects a shared_ptr<Partition>
instead of a raw pointer. But this is done in #1543 if you want to discuss that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow the old comments I made dont appear anymore, so it is a bit annoying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is indeed an unfortunate interface break. I think the version with shared_ptr
should be okay, but I think users actually depend on read_distributed
, so it will break existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the index map needs to store the partition. I've looked again into the map_to_local
kernels, and there I actually need all arrays a partition stores. So it wouldn't make sense to only copy the needed data out of the partition and into the index-map.
c12bac1
to
a1d45ac
Compare
8d97aef
to
2abe709
Compare
a1d45ac
to
07f6d02
Compare
8206bc3
to
c700b1c
Compare
07f6d02
to
b42ab92
Compare
c700b1c
to
989967d
Compare
b42ab92
to
8f104fd
Compare
8f104fd
to
a0824a8
Compare
989967d
to
cc997ff
Compare
80e2802
to
33d28fb
Compare
2072219
to
20e3d96
Compare
33d28fb
to
cdb0447
Compare
20e3d96
to
2812123
Compare
Quality Gate passedIssues Measures |
cdb0447
to
abadebc
Compare
2812123
to
52dcfc8
Compare
abadebc
to
8fb9d44
Compare
1c0bd5f
to
355e776
Compare
8fb9d44
to
fdb3ce2
Compare
355e776
to
204d7e8
Compare
fdb3ce2
to
dff0aa8
Compare
204d7e8
to
3be79ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor gripe with the memory allocation tracking that is incomplete in index_map, maybe a fix could be included here?
@@ -5,7 +5,7 @@ | |||
"comm_pattern": "stencil", | |||
"spmv": { | |||
"csr-csr": { | |||
"storage": 6564, | |||
"storage": 6420, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not directly related to this, but I went back to check, there are some places left over in #1543 that still use std::vector instead of gko::vector or gko::array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only checked (reference|omp)/distributed
for std::vector
and found only 4 instances. Did you find more? (I'm updating them btw)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I checked the patch, that should be all. The rest happens inside tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to gko::vector
and it didn't change the benchmark output (because the missing changes were only in omp/
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, this only reports the resting storage size, not temporary memory. But it did remind me to check for missing updates :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Maybe also providing the maximal storage might be interesting to add in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point!
core/distributed/matrix.cpp
Outdated
for (size_type i = 0; i < host_recv_targets->get_size(); ++i) { | ||
recv_sizes_[host_recv_targets->get_const_data()[i]] = | ||
host_offsets->get_const_data()[i + 1] - | ||
host_offsets->get_const_data()[i]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels pedantic, but I think having a raw loop next to a bunch of algorithm calls is a mismatch of abstraction levels - this might be better extracted into a separate function, similar to Sean Parent's "no raw loops" suggestion (https://www.youtube.com/watch?v=W2tWOdzgXHA)
Maybe we can call it compute_recv_sizes
? Then the high-level control flow is easy to follow, and if you need to know how those recv_sizes get determined, you look at the function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the issue here is that arrays can't be used with STL. If we had begin, and end for them, this would just be std::adjacent_difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use pointers here , but adjacent_difference
is not 100% sufficient, since it leaves the first element unchanged and doesn't scatter the results. I just meant extract these 5 lines into a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, adjacent_difference
is weird, same as partial_sum
. I would still argue that the issue here is just that our arrays are not easy to access. If it would read
for (size_type i = 0; i < ecv_targets.size(); ++i) {
recv_sizes_[recv_targets[i]] = offsets[i + 1] - offsets[i];
}
I don't think you would have the same issue. Only because we have to clutter it with host_
prefixes, and use very verbose function names it becomes an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is mostly not about the complexity of the code (.get_const_data()[]
vs []
) but about mixing low-level code with high-level code. I do agree that it would be nice to have a cleaner access to arrays. If your extracted function takes pointers, you would arrive at that code though.
Maybe we could add a host_array
type that is derived from an array
and a custom constructor that always takes get_master()
, with additional operator[]
and begin/end()
functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this, but maybe not exactly how you imagined it. I'm using a lambda for this, since I don't want to put it outside the function, due to it's limited use.
{std::make_tuple(gko::dim<2>{2, 1}, I<git>{1}, I<git>{0}, I<vt>{6}), | ||
std::make_tuple(gko::dim<2>{2, 3}, I<git>{0, 0, 1}, I<git>{2, 1, 0}, | ||
{std::make_tuple(gko::dim<2>{2, 1}, I<git>{1}, I<git>{1}, I<vt>{6}), | ||
std::make_tuple(gko::dim<2>{2, 3}, I<git>{0, 0, 1}, I<git>{1, 3, 2}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are a bit hard to review - can you add a comment to the function explaining what the tuple members represent?
auto imap = index_map<local_index_type, global_index_type>( | ||
exec, col_partition, comm.rank(), global_non_local_col_idxs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite confident that it will become necessary to store this map in the matrix class in the future. I'm not doing that for now, because with the distributed PGM, there are now ways to create a non-empty matrix without this read function. I don't know how to create the mapping in the PGM case, so there would be non-empty matrices, but with empty mappings.
I didn't think this is suitable, so I don't store the mapping. But making it a member in the future would either require unnecessary copies or an interface break.
Any thoughts on this @upsj @pratikvn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a fundamental choice - do we want to consider the matrix storing the implicitly block-reordered matrix (every rank has a consecutive range of indices), or the original matrix? I'm fine with the latter, but it may complicate things as mappings global <--> local are more costly. But at the same time, I'm not sure that would ever be a performance bottleneck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the interface break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interface break:
auto imap = mat->read_distributed(...);
will result in an error if read_distributed
returns again void. The return type could be changed to const index_map&
but this would result in copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the original matrix, we can't do anything that involves more than the local block. So, the distributed PGM would not work, also RAS would not be possible. Also I think the mapping is necessary to provide users an option to use pre-existing data, since it allows mapping global indices, which the users know, to the ginkgo non-local indices, which the user doesn't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not entirely true - you can operate on the block-reordered indexing just fine, imagine computing an index_map after remapping each rank's indices to be consecutive. But that is still doing 90% of the work towards having a full global <--> local index map, so it's probably to better to do it fully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, just from a properties perspective, I think it makes sense for the matrix class to have an index map. Any function that creates a distributed matrix, should then also be required to update its index_map. I understand that we dont do that for PGM, right now, but I think the interface choices should not be dependent on that.
So, I think it should be okay to store an index_map in the matrix class.
TBH, I am not sure why read_distributed
should return an index_map. I think we should just store it inside the class and have a public getter to access it where necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reordering question is a bit separate from the index-map question. As you said, you could still have an index map after reordering. In general, I think reordering can become quite complicated, see the fun image in petsc's user guide: https://petsc.org/release/manual/mat/#block-matrices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now I will just remove the return type. The index map is still helpful, since it simplifies the matrix kernels, while the information necessary for PGM is still available. Thus we could add the index map as a member at a later point if we want to.
3be79ab
to
b519e56
Compare
- comment tests - replace std::vectors in backend - refactoring Co-authored-by: Tobias Ribizel <[email protected]>
b519e56
to
0d3b710
Compare
Application of #1543. Now the
index_map
is used to create the numbering for the non-local columns. Theindex_map
is created during thedistributed::matrix::read_distributed
and also returned from that function. This can allow users to provide already separated local and non-local matrices for theread_distributed
further down the line.Also, it makes the mapping
non-local -> global
more easily available.This PR also introduces:
run
easier to userun
for distributed matrices -> removes need for changes to distributed basePR Stack: