-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify alignment requirements #3
Comments
We should make sure to fix |
What is the status of this issue? The missing alignment guarantee is somewhat of an annoyance for us and we support the idea of @jeffhammond to request an alignment using info keys (in case this hasn't been discussed and down-voted already). |
Since I had some time to spare over the weekend I thought I'd give this a try. I'm not sure how to properly format such a proposal so I am simply marking my proposed changes in the text: 8.2 Memory Allocation In some systems, message-passing and remote-memory-access (RMA) operations run faster [...] The info argument can be used to provide directives that control the desired location 11.2.3 Window That Allocates Shared Memory The allocated memory is contiguous across process ranks unless the info key A.1.5 Info Keys Add the key Note that there are no changes required to the chapter 11.2.2 (Window That Allocates Memory) since it is included through the references of Section 8.2. The implementation of the The description of the default alignment matches the description taken from With these changes the user will a) be able to rely on proper memory alignment (which is not the case currently), and b) will be able to control memory alignment in windows, which can be beneficial for vectorized computation on that memory. Doing the latter manually would require additional management of offset tables by the user. |
@devreal Do you have access to https://github.com/mpi-forum/mpi-standard? If so, then you can fork that and create a pull request for your proposed changes. If you do not have access or have other reasons for not wanted to create such a pull request, then somebody else should be able to do it for you (at a cost of latency). |
@jeffhammond Thanks for the quick reply. I have access to the MPI-Forum github repo and started writing into a fork. Should I create an issue at https://github.com/mpi-forum/mpi-issues/issues/ first that can be linked in the PR (and changelog) or is a PR sufficient. So far this topic only seems to have come up in the RMA WG. |
@devreal I think this is a solid idea but we may need to think about the naming of the info key as this should be an assertion not a hint. |
See comm info keys in the draft standard. Ex: |
@hjelmn Interesting, I was not aware of these new keys. I'm not sure I fully understand the difference between assertions (as used for communicators) and hints (as used in the RMA chapter) though. I would assume that a key like The assertions for communicators all sound similar to |
In this case I see these as assertions to the implementation. If it can not fulfill the request it should just return an error (or abort depending on the error handler). I think cb_block_size to me is more of a hint that can be used to tweak the implementation. I could be wrong though. The idea of info assertions is fairly new. We will need to iterate on this to figure out what the correct naming for the info key is. We have some time as this missed the two-week deadline for the Dec meeting so will be discussed in March. |
Ahh I see, so the meaning of assertion is different than what I thought. Thanks for the clarification. There is no need to rush then and some discussion during one of the next phone calls is probably a good idea. I won't file the PR before we settled on a good name then. |
Actually, I don't think this is an assert. @devreal's original interpretation was what my understanding of assert hints is too. So |
@pavanbalaji In this case the user wants to enforce a specific alignment. If it can't be done it should produce an error. Shouldn't we have a special naming for these since it really is not a hint? |
I want both a hint and an assert version of this, by the way. |
@jeffhammond Thats certainly possible. Maybe |
I want to follow the existing naming convention:
|
@jeffhammond Sounds reasonable to me. |
@hjelmn and @jeffhammond I don't think asserts are defined that way. Let me clarify a few things:
With that explanation, I don't think the two info keys are any different. |
As @pavanbalaji has pointed out, it is not possible to have the two different info keys with the current state of affairs so we are left with the best-effort user request. It is left to the user to check the provided alignment if more than natural alignment is required and error out or even retry with a smaller alignment if feasible. However, I am not sure why implementations would impose an upper limit on the supported alignment (which would be the only use-case I see for having two keys). What is the way forward from here? Can we find a consensus on a name? @hjelmn proposed |
@jdinan The window buffer alignment requirements will be in MPI 4. If I understand your initial post correctly, you are also looking at whether the target address of RMA operations should be naturally aligned. AFAICS, there is no language in the standard mandating alignment of either the origin or the target address of RMA operations. There is some language in an AtoU stating that "the alignment of the communication buffers may also impact performance" (page 419 of MPI 3.1) but that of course is not normative. Should a mandate for natural alignment of origin and target addresses be added as part of the cleanup 4.1 release? It seems like a non-backwards-compatible change though. I guess implementation can take a slow path if either the origin address or the target address is misaligned (allocating a temporary aligned origin buffer or avoiding RDMA operations for misaligned target addresses). |
AMOs in particular may not work if they aren't naturally aligned. I think the risk of breaking existing applications with this clarification is relatively low. |
Could this be treated as an errata? (I'm not sure about what can be treated as an errata and what not) Otherwise this will have to be moved to MPI 4.1... |
I would consider this to be an erratum to MPI 3.0, but I'm not sure whether others will feel the same. |
I can bring this up at the phone call on Wednesday. Is there a consensus within the WG to add language that requires a) origin buffers and/or b) target memory addresses to be naturally aligned? As mentioned above, implementation could fall-back to a slow-path to deal with misaligned origin/target addresses. So this change would mainly remove some complexity from implementations. |
I don't think the RMA WG has convened in a while. Perhaps @jeffhammond @pavanbalaji @wgropp @rsth can share their thoughts on requiring origin/target buffers to be naturally aligned for the datatype. This should be a requirement for accumulate operations, and I would consider that part to be an erratum. If buffers in accumulate operations are allowed to cross cache lines, then many implementations will not be able to use hardware (including processor) atomics. For put/get it seems more like a performance advice to users (which we sort of already have in window allocation). |
I'm fine with requiring aligned data, as long as the behavior with unaligned data is undefined, not erroneous. That allows implementations to permit operations on unaligned data if they can, but standard-conforming, portable programs need to ensure alignment. |
I think the alignment requirements being proposed are a bit overspecified. Most (all?) networks only require the part of the data that is atomic to be aligned. For example, only the target buffer in an accumulate operation. For I do see the concern for |
@pavanbalaji Some processors require alignment for all operands. Of course, the implementation can check for this and copy operands to aligned temporary buffers. But, this feels to me like we would be taking on overhead to support an uncommon usage model. As Bill suggested, we can make unaligned usage undefined, which puts the burden of portability on the application rather than the MPI implementation. |
True, for processor atomics (I was thinking of network atomics in my previous message). I want to point out that this would be a backward incompatible change. That does not mean that we cannot do it. It just means that any decision about this should consider that fact. We could argue that "we always meant that; we just didn't clearly specify it", but that argument is somewhat thin in this context. If we want to make this change, why is this only for RMA operations? This would certainly be true for reduction collectives too, which have similar semantics (e.g., two processes on shared memory could reduce into the same buffer). But even more broadly, some MPI implementations (e.g., MPICH) "assume" natural alignment while packing/unpacking noncontiguous datatypes in some cases. For instance, in some cases, we use assignment operations instead of memcpy for performance reasons. This assumption would break on platforms that require strict alignment, such as Sparc, that throws a SIGBUS error when the data is not naturally aligned; but would be fine on platforms such as x86, where the architecture is more forgiving. So, perhaps this is required for many other MPI operations? |
I think it is reasonable to allow MPI implementations to expect memory specified by the user (either directly through pointers or indirectly through RMA target offsets) to be naturally aligned for the provided datatype argument. At least in C (and I'm sure in Fortran as well), directly loading or storing an object from a misaligned address is undefined behavior so by extension the user should never pass a misaligned address to MPI. We would simply pass on the rules the currently supported base languages impose. Given that this would affect a large part of the standard it might be hard to convince the Forum to accept it at the last minute though. Maybe this will have to be punted to 4.1... |
@devreal Good points on the base language requirements. This is an important issue and I would propose it for 4.0. |
A quick summary from the discussion today:
I have some preliminary text that I cobbled together today, which was the basis for the discussion:
One argument was that there is no corresponding datatype for derived datatypes in MPI. The definition thus should be recursive starting from the predefined datatypes. I will try to come up with something along those lines. |
You could say something like "... then the address corresponding to each basic datatype element in the provided MPI datatype must be naturally aligned for that basic datatype. For example, on an architecture with byte-addressible memory, a naturally aligned address for an object with basic datatype D must be an integer multiple of the number of bytes in D." |
I don't understand item 5. The x86 is one example of a "weird" architecture that allows unaligned load/store. It is still undefined behavior in the C language specification, even if it works on that architecture. We are proposing to do the same in MPI, make it undefined in MPI but allow implementations to make it work. |
Here is another shot at it:
I'm a bit worried that the first sentence is too complex but right now I cannot think of a way to break it up without losing essential information. |
I don't think this is correct. On some platforms, |
You're right, forgot about that (32bit x86 is one such case IIRC). Maybe we shouldn't use the type size after all but require proper alignment for load/store? Or simply to the requirements of the base language? |
Nobody will go along with it, but I'd be fine with a minimum RMA alignment of 128 bytes, which is the PowerPC cache line size, because all reasonable use cases are fine with that. |
I think we are rediscovering why people say "naturally aligned" without trying to define it. |
@jeffhammond I'm afraid you're right on that :D it would definitely break any backwards compatibility and be a waste on the ubiquitous embedded systems... @jdinan I believe "type size rounded up to the next power-of-two" should be OK. |
@devreal I don't think so. Following my same example as earlier, some platforms require only an 8-byte alignment for |
That's fine with me. We would need to limit to the officially supported language bindings though, right? Otherwise a hypothetical language with 4B alignment for any type would leave implementations written in C stranded... |
It would be the responsibility of each language bindings to do the right thing for that language. For example, if Fortran had different alignment requirements than C, then it would need to add additional code to either make sure the buffer is "C aligned" or copy data to match the alignment as needed before calling the C versions of those functions. The users would not have to worry about any of this. They would simply use the alignment requirements of the language they are using. |
@devreal Requiring greater alignment does not break backwards compatibility of MPI libraries, because no code can be broken by that. It's true that code written to assume greater alignment won't work with older MPI libraries, but so also will my MPI-3 RMA code not work with LAMMPI. |
RMA needs to support the natural alignment of |
@jeffhammond Are you suggesting that every MPI buffer be aligned on a 128B boundary? How will you send/recv/rma elements in an int (4B) or double (8B) array? |
I am suggesting that every buffer returned by MPI_Win_allocate(_shared) return a buffer aligned to 128B. I have made no statement on the alignment requirements for any communication operations. I am certainly not saying that MPI_Put requires a 128B-aligned input. Obviously, more than 32B is overkill for correctness but cache-line alignment has a very positive impact on performance. If I allocate two 8B windows and MPI allocates them consecutively in memory, and I then bang on them with MPI_Fetch_and_op, the performance will be crap in almost all multiprocessing environments based on cache-coherent processors (I get that these aren't your thing anymore :-P). |
I think we had been talking here about the alignment requirements of buffers passed to MPI routines. It does look like we also failed to specify alignment requirements for MPI_Win_allocate(_shared). I think the right thing to do here is to copy malloc -- "The allocated memory is aligned such that it can be used for any +predefined MPI+ data type." -- and allow (but not require) implementations to further pad alignments for the reasons you mentioned. |
That works for me. I confess to not reading the entire thread in spite of losing my eidetic memory for MPI Forum discussions. |
This has been added to MPI 4 as part of mpi-forum/mpi-issues#121. The language is "at least the alignment required for load/store accesses of any datatype corresponding to a predefined MPI datatype." That is specified for |
Looking at the definitions in the C/C++ standard drafts, the alignment requirements are indeed implementation-defined. So yes, natural alignment is off the table. Here is a definition in terms of the language implementation used to call the MPI procedure:
This is now explicitly limited to objects in local memory because the situation for RMA seems a bit trickier. Consider a heterogeneous system where the target has different alignment requirement from that of the origin (e.g., different architecture, different language bindings). At the target, the application may store objects in window memory with less strict alignment. It would be impossible for the origin to provide a target offset for which the operation is guaranteed to be well-defined according to the rules above. The RMA chapter needs an additional sentence such as the following:
Here is another caveat: it might be that third-party bindings for languages not officially supported by the MPI implementation that have less strict alignment requirements may not be able to use MPI RMA because they cannot meet these requirements (e.g., the binding for language X can use temporary buffers to pass data aligned for use with the C language implementation to |
From the discussion at the WG meeting 03/01/2022:
|
I was sure I had seen language requiring a mapping between specified datatypes and objects in the buffer but couldn't find the place in the standard during the discussion at the meeting today. Here it is: Section 3.3.1 says
and then
That mapping is then specified in tables 3.1 and 3.2. Section 5.1.11 then extends this to general datatypes. Long story short: the P2P chapter specifies that the specified type has to match the types of the objects in the input and output buffers. We will need to extend that to RMA and require that the type of each variable in the origin and target window buffer of an RMA operation has to match the type specified for that operation. It seems natural that the rules governing datatype usage in P2P and collectives also apply to RMA, it just hasn't been specified (as far as I can see). |
Alignment requirements on window buffers and RMA operations are not currently clear.
My current understanding is that window buffers have no alignment requirement, but that the effective target address (base + offset*disp_unit) for RMA operations must be naturally aligned for the datatype used in the operation.
I can't seem to locate text for this semantic. This issue is a placeholder to track down the semantic and add a clarification somewhere in the RMA chapter.
The text was updated successfully, but these errors were encountered: