Fix core dump when performing misaligned atomics. #8574

awlauria · 2021-03-09T15:46:04Z

c_reqops MTT test has a weird alignment when creating the window,
which is perfectly valid. This will cause a segv
when performing atomic operations. While this was found on
Power, it probably is not limited to it.

Detect a misalignment, and fallback to a lock + copy method.

Signed-off-by: Austen Lauria [email protected]

awlauria · 2021-03-09T15:49:08Z

@hjelmn if you have ideas on how to do this better let me know.

hjelmn · 2021-03-09T15:56:18Z

opal/mca/btl/base/btl_base_am_rdma.c

-    default:
-        return OPAL_ERR_BAD_PARAM;
+    if(OPAL_LIKELY(0 == ((((intptr_t) addr) & 0x7)))) { 
+        switch (op) {


Not sure I am comfortable with this change. It is my understanding that MPI_Accumulate, etc the MPI implementation is allowed to require (or is required to) only operated on naturally aligned types. @bosilca Is this correct or is the user allowed to assume they can do atomics on byte-aligned data?

The problem with this change is that you essentially would need to disable the memory atomics interoperability because this lock is only in the BTL. That would break a lot of assumptions.

For what it's worth this test passes with MPICH and Spectrum MPI (IBM mpi).

It's my understanding that the window alignment of 1, like the test is doing is valid.

The accumulate call itself is not doing anything funny with offsets.

My understanding is that the MPI standard currently does not mandate any alignment requirements for user buffers: mpiwg-rma/rma-issues#3

Well, a window alignment of 1 is fine. The problem is working on a datatype with natural alignment 8 on an off-alignment. That is not safe on many architectures.

@devreal Maybe look at the datatype chapter. There is some language about alignment in that chapter. Not sure if any of it is relevant. Thats why I would like @bosilca to comment.

@hjelmn I agree, but the optics are bad that OMPI segv's where other implementations don't.

Even if we don't want to support this, maybe we can add a check at a higher level to not segv? Not sure off the top of my head where that can be prevented.

Yeah. This is really messy. What I am worried about is silent wrong answers. Crashing is better there. Let me think about the whole stack that leads to this. Maybe this will be ok.

Few things:

If we accept to use a lock approach if the data is misaligned, then on the same window we cannot use both the atomic and the lock based version, because we will lose the atomicity on the data. This seems like a big issue to me.

The standard stresses on the fact that all atomic operations apply on predefined datatype, which as @hjelmn suggested have specific alignment requirements. Unfortunately, I can't find anything in the standard that explicitly talks about this.

"Also, on some systems, performance is improved when window boundaries are aligned at “natural” boundaries (word, double-word, cache line, page frame, etc.)", this talks about performance impact not correctness.

I don't think we should provide more atomicity than available on the system. Thus, if the windows alignment is 1, the safest approach is to force the lock approach, while for matching alignments we can use the atomic implementation, and error out if the user-provided pointer is misaligned.

I don't think we should provide more atomicity than available on the system. Thus, if the windows alignment is 1, the safest approach is to force the lock approach, while for matching alignments we can use the atomic implementation, and error out if the user-provided pointer is misaligned.

That sounds good to me. If the user creates the window with matching alignment they would have to try extra hard to misalign the pointer, so letting it segfault seems appropriate.

awlauria · 2021-03-10T02:09:26Z

@bosilca @hjelmn I added a patch to detect the bad alignment and fall back to locks.

awlauria · 2021-03-10T02:18:18Z

bot:aws:retest

devreal

A few nitpicks, looks good to me otherwise.

devreal · 2021-03-10T14:01:22Z

ompi/mca/osc/rdma/osc_rdma_accumulate.c

+    if(OPAL_UNLIKELY((module -> acc_use_lock_fallback == true) || (module -> disp_unit % extent))) {
+        flags |= MCA_BTL_ATOMIC_FLAG_USE_LOCK_FALLBACK;
+        module -> acc_use_lock_fallback = true;
+    }


In the interest of readability:

Suggested change

if(OPAL_UNLIKELY((module -> acc_use_lock_fallback == true) || (module -> disp_unit % extent))) {

flags |= MCA_BTL_ATOMIC_FLAG_USE_LOCK_FALLBACK;

module -> acc_use_lock_fallback = true;

}

if(OPAL_UNLIKELY((module->acc_use_lock_fallback == true) || (module->disp_unit % extent))) {

flags |= MCA_BTL_ATOMIC_FLAG_USE_LOCK_FALLBACK;

module->acc_use_lock_fallback = true;

}

Yeah I let my own preference sneak in. Thanks. Fixed.

devreal · 2021-03-10T14:01:53Z

ompi/mca/osc/rdma/osc_rdma_accumulate.c

@@ -221,7 +240,7 @@ static int ompi_osc_rdma_acc_single_atomic (ompi_osc_rdma_sync_t *sync, const vo
    ompi_osc_rdma_module_t *module = sync->module;
    mca_btl_base_module_t *selected_btl = ompi_osc_rdma_selected_btl (module, peer->data_btl_index);
    int32_t atomic_flags = selected_btl->btl_atomic_flags;
-    int btl_op, flags;
+    int btl_op, flags = 0;


This change seems unnecessary

Right, fixed.

devreal · 2021-03-10T14:06:41Z

opal/mca/btl/base/btl_base_am_rdma.c

-        break;
-    default:
-        return OPAL_ERR_BAD_PARAM;
+        case MCA_BTL_ATOMIC_ADD:


I randomly picked two places where switch statements are used in opal and both do not indent the case labels. These white-space changes seem unnecessary.

My own preference sneaking in again. Fixed

good that we now have the clang-format to keep all this in a consistent state ;)

awlauria · 2021-03-10T17:13:07Z

@devreal thanks for the review - changes addressed. I also squashed down and added the copyright stuffs.

devreal · 2021-03-10T17:16:09Z

opal/mca/btl/sm/btl_sm_module.c

@@ -178,6 +179,7 @@ static int sm_btl_first_time_init(mca_btl_sm_t *sm_btl, int n)
            return rc;
        }
    }
+    OBJ_CONSTRUCT(&sm_btl->super.btl_atomic_fallback_lock, opal_mutex_t);


Is that a double-init with https://github.com/open-mpi/ompi/pull/8574/files#diff-600bdda1e90d484b3451565f25a31b08d7cae44139e005c485d4441f03749525R589 ?

To clarify: I tried to understand the mutex construction/destruction and couldn't find a matching destructor call for this construction here

This is a good catch, it might be. Will check

I moved the lock to base, I think it makes more sense to have it there, I think the btls should all use the same lock to preserve atomicity. That look good? If so I can squish it down.

@bosilca ^^

Shouldn't the lock be per window instead of global ?

The answer is not simple, it depends on the memory model. For the two you just mentionned, SM and TCP, it would work because they both do atomic operations in the target memory. Take SM and IB and you would have a totally different story but a less clear outcome.

In any case, the global lock is providing a stronger consistency model than required by MPI, with certain performance implications as well. A lock per window (not per BTL) will at least alleviate the cost of fighting for locks in a highly contentious case.

I agree that it should be per-window, not per-btl.

Hacking at the code though, I'm not sure of the best way to make this work. The btl has no knowledge of windows, and that's where the actual accumulate operation happens on the target - and where the lock needs to happen. I suspect we would need a largish refactor to make this work, but perhaps not and I am missing something?

I'm happy to take a crack at implementing this, but at this point I'm not sure how to proceed. It doesn't look to me like the current window has its own btl module pointer, it shares it from a global btl pointer with other windows. I guess the easiest thing to do would be to give each window it's own copy and stuff the lock in there, but I don't know if there's other ramifications to that.

That said, this is such a small edge case, I wonder if it's worth the effort. Users who mis-align their windows shouldn't expect great performance, anyway.

@hjelmn and @devreal do you have any thoughts on this?

This locking issue seems to be a bit thorny.

I'm OK with a global lock. People using misaligned atomics should have no expectation of performance whatsoever and ripping the module structure apart for it isn't worth it. We should push the alignment discussion within the MPI forum to eventually disallow it.

Yeah, I am thinking along the same lines.

One way we can do this on a per-window basis would be to give the btl a callback function into the rdma layer with window info to look it up on the target side.

If the guts is already present to do something like (I don't think it is, but not 100% sure) that wouldn't be too bad. But if we need to rip things apart to get that working, it probably isn't worth it.

bosilca · 2021-03-10T17:19:39Z

opal/mca/btl/base/btl_base_am_rdma.c

+    if(lock) {
+        OPAL_THREAD_LOCK(lock);
+    }
+


Can you please move the result = *addr; outside the switch case.

bosilca · 2021-03-10T17:20:30Z

opal/mca/btl/base/btl_base_am_rdma.c

+        }
+        break;
+    default: {
+        if(lock) {


This is the internal version of the op, I am almost certain we can't hit this code with an incorrect opcode (all the checks should have been done at the upper level).

Understood, I just added it for completeness. I can remove it if you want

bosilca · 2021-03-10T17:20:45Z

opal/mca/btl/base/btl_base_am_rdma.c

+    }
+
+    switch (op) {
+    case MCA_BTL_ATOMIC_ADD:


same comments as above.

bosilca · 2021-03-10T17:22:36Z

ompi/mca/osc/rdma/osc_rdma_accumulate.c

+
+    }
+
+    if(OPAL_UNLIKELY((module->acc_use_lock_fallback == true) || (module->disp_unit % extent))) {


We try to put the constant first (as you did it in opal/mca/btl/base/btl_base_am_rdma.c)

awlauria · 2021-03-11T20:26:49Z

@bosilca @devreal I think I addressed all change requests, let me know if I missed anything. If it looks good I can rebase.

I'm still open to moving/reworking the lock but I am unsure how to proceed on that.

awlauria · 2021-03-11T22:32:39Z

bot:aws:retest

c_reqops MTT test has a weird alignment when creating the window, which is perfectly valid. This will cause a segv when performing atomic operations. While this was found on Power, it probably is not limited to it. Detect a misalignment, and fallback to a lock + copy method. If it's detected that the window alignment doesn't match the type, abandon the atomics and use a lock implementation for all atomic operations in that window. Signed-off-by: Austen Lauria <[email protected]>

hjelmn · 2021-03-12T17:16:19Z

Thinking about this. Couldn't we just force the read-modify-write path if the address is not 64-bit or 32-bit aligned? Just a top level check in osc/rdma when deciding whether to use network atomics, CAS, or read-modify-write.

awlauria · 2021-03-12T19:28:00Z

@hjelmn I think that's a good point - and probably the better way to go. I checked the v4.1 branch, and the c_reqops test where I found this passes because it does take the rmw path. I opened #8599 to replace this.

awlauria · 2021-03-15T02:18:09Z

Closing as #8599 was merged.

awlauria requested a review from hjelmn March 9, 2021 15:49

hjelmn reviewed Mar 9, 2021

View reviewed changes

awlauria force-pushed the misaligned_atomics branch 3 times, most recently from 26995cf to 9f6e870 Compare March 10, 2021 02:08

awlauria force-pushed the misaligned_atomics branch from 9f6e870 to 89ee72d Compare March 10, 2021 13:53

devreal requested changes Mar 10, 2021

View reviewed changes

awlauria force-pushed the misaligned_atomics branch 2 times, most recently from c7abf64 to a7e6426 Compare March 10, 2021 17:10

devreal reviewed Mar 10, 2021

View reviewed changes

bosilca requested changes Mar 10, 2021

View reviewed changes

awlauria force-pushed the misaligned_atomics branch 3 times, most recently from 0110f52 to 25949f4 Compare March 11, 2021 20:13

bosilca approved these changes Mar 11, 2021

View reviewed changes

awlauria force-pushed the misaligned_atomics branch from 25949f4 to 596b8f6 Compare March 11, 2021 23:57

awlauria force-pushed the misaligned_atomics branch from 596b8f6 to 587f4a9 Compare March 11, 2021 23:59

awlauria mentioned this pull request Mar 12, 2021

Fix coredump for unaligned atomics. #8599

Merged

awlauria closed this Mar 15, 2021

awlauria deleted the misaligned_atomics branch March 16, 2021 19:57


		}

		if(OPAL_UNLIKELY((module->acc_use_lock_fallback == true) \|\| (module->disp_unit % extent))) {

Fix core dump when performing misaligned atomics. #8574

Fix core dump when performing misaligned atomics. #8574

Conversation

awlauria commented Mar 9, 2021

awlauria commented Mar 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awlauria commented Mar 10, 2021

awlauria commented Mar 10, 2021

devreal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awlauria commented Mar 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awlauria Mar 11, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awlauria Mar 11, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awlauria commented Mar 11, 2021

awlauria commented Mar 11, 2021

hjelmn commented Mar 12, 2021

awlauria commented Mar 12, 2021

awlauria commented Mar 15, 2021

awlauria Mar 11, 2021 •

edited

Loading

awlauria Mar 11, 2021 •

edited

Loading