You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The general strategy for implementing large-count operations is to use datatypes. In some cases, this is straightforward, but it appears to be a very poor solution in the case of v-collectives. In order to use the datatype solution for v-collectives, one has to map (counts[],type) to (newcounts[],newtypes[]), which then requires the w-collective, since only it takes a vector of types.
In fact, we are in the large-count case even if all of the counts are less than INT_MAX because of the limitations of the offset vector. If the sum of counts[i] up to any i<comm_size exceeds INT_MAX, then displs[i] will overflow. This means that one cannot use any of the v-collectives for relatively small data sets, e.g. 3B floats, which is only 12 GB per process. This is likely to be limiting when implementing 3D FFT, matrix transpose and IO aggregation, all of which are likely use v-collectives.
The displacement issue is exacerbated in the large-count case because all the displacements are interpreted in bytes rather than the extent of the datatype, so there is no way to index beyond 2GB of data, irrespective of the datatype and the counts.
Below is an example of the displacement problem. Clearly, ''in this specific case'', we could use MPI_SCATTER instead, but homogeneous counts were chosen only to make the example simple and readable. A number of trivial modifications would cause this example to require MPI_SCATTERV.
Note that the aforementioned example will fail even on a 32-bit system (this requires the number of processes to be large enough such that (1+1./size)*bignum bytes can be allocated).
Using the w-collective for large-count v-collectives has these issues:
Calling the w-collectives requires the allocation and assignment of O(Nproc) vectors, which is tedious but certainly not a memory issue if one is in the large-count regime.
One cannot deallocate the argument vectors until the operation completes, which means that one cannot implement the nonblocking case, since there is no opportunity to deallocate the temporary vectors in the wait call (any solution involving generalized requests is almost certainly untenable for most users).
Because MPI_ALLTOALLW takes displacements of type int and interprets these irrespective of the extent of the datatype (see page 173 of MPI-3), it is hard to index more than 2GB of data ''using any datatype''. There is a solution using datatypes encoded with the offset internally (e.g. via MPI_Type_create_struct), but it is far from user-friendly.
In the absence of proper support in the MPI standard, the most reasonable implementation of large-count v-collectives uses point-to-point, which means that users must make relatively nontrivial changes to their code to support large counts, or they have to use something like BigMPI, which already implements these functions (vcollectives_x.c)). An RMA-based implementation is also possible, but users are unlikely to accept this suggestion.
One can map also the v-collectives to MPI_Neighborhood_alltoallw, but in a far-from-efficient manner, and this is not particularly useful for the nonblocking case because MPI_Dist_graph_create_adjacent is blocking.
This ticket proposes two possible solutions for solving the large-count v-collective problem.
Solution: New Function Prototypes
Adding _x versions of the v-collectives and w-collectives that have the count of type MPI_Count and displacement vectors of type MPI_Aint[] is the most direct solution and prevents users from having to allocate and set O(Nproc) vectors in the course of mapping to the most general collective available (e.g. MPI_NEIGHBORHOOD_ALLTOALLW).
I do not think that the implementation of these functions within MPICH will be particularly difficult, but I have not yet started working on it. BigMPI implements many of them already.
The text was updated successfully, but these errors were encountered:
jeffhammond
changed the title
large-count v-collectives (was Trac ticket 430)
large-count v-collectives (Trac #430)
Dec 3, 2015
Overview
The general strategy for implementing large-count operations is to use datatypes. In some cases, this is straightforward, but it appears to be a very poor solution in the case of v-collectives. In order to use the datatype solution for v-collectives, one has to map
(counts[],type)
to(newcounts[],newtypes[])
, which then requires the w-collective, since only it takes a vector of types.In fact, we are in the large-count case even if all of the counts are less than INT_MAX because of the limitations of the offset vector. If the sum of
counts[i]
up to anyi<comm_size
exceedsINT_MAX
, thendispls[i]
will overflow. This means that one cannot use any of the v-collectives for relatively small data sets, e.g. 3B floats, which is only 12 GB per process. This is likely to be limiting when implementing 3D FFT, matrix transpose and IO aggregation, all of which are likely use v-collectives.The displacement issue is exacerbated in the large-count case because all the displacements are interpreted in bytes rather than the extent of the datatype, so there is no way to index beyond 2GB of data, irrespective of the datatype and the counts.
Below is an example of the displacement problem. Clearly, ''in this specific case'', we could use
MPI_SCATTER
instead, but homogeneous counts were chosen only to make the example simple and readable. A number of trivial modifications would cause this example to requireMPI_SCATTERV
.Note that the aforementioned example will fail even on a 32-bit system (this requires the number of processes to be large enough such that
(1+1./size)*bignum
bytes can be allocated).Using the w-collective for large-count v-collectives has these issues:
MPI_ALLTOALLW
takes displacements of typeint
and interprets these irrespective of the extent of the datatype (see page 173 of MPI-3), it is hard to index more than 2GB of data ''using any datatype''. There is a solution using datatypes encoded with the offset internally (e.g. viaMPI_Type_create_struct
), but it is far from user-friendly.In the absence of proper support in the MPI standard, the most reasonable implementation of large-count v-collectives uses point-to-point, which means that users must make relatively nontrivial changes to their code to support large counts, or they have to use something like BigMPI, which already implements these functions (vcollectives_x.c)). An RMA-based implementation is also possible, but users are unlikely to accept this suggestion.
One can map also the v-collectives to
MPI_Neighborhood_alltoallw
, but in a far-from-efficient manner, and this is not particularly useful for the nonblocking case becauseMPI_Dist_graph_create_adjacent
is blocking.This ticket proposes two possible solutions for solving the large-count v-collective problem.
Solution: New Function Prototypes
Adding _x versions of the v-collectives and w-collectives that have the count of type
MPI_Count
and displacement vectors of typeMPI_Aint[]
is the most direct solution and prevents users from having to allocate and set O(Nproc) vectors in the course of mapping to the most general collective available (e.g.MPI_NEIGHBORHOOD_ALLTOALLW
).The C bindings for the proposed new functions are given below. These changes have been made in https://github.com/mpiwg-large-count/mpi-standard/tree/large-count-vector-collectives.
Implementation
I do not think that the implementation of these functions within MPICH will be particularly difficult, but I have not yet started working on it. BigMPI implements many of them already.
The text was updated successfully, but these errors were encountered: