Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/WIREUP: Implement common copying of lanes function for futher re-use #5610

Closed
wants to merge 17 commits into from

Conversation

dmitrygx
Copy link
Member

What

Implement common copying of lanes function for futher re-use.

Why ?

To shorten #5582 and simplify maintaining the main EP's and TMP EP's lanes that may point to the same WIREUP EP.
Using different WIRUEP EPs that point to the same UCT EPs (with different is_owner flag - depends on the state of WIREUP) makes it possible.

How ?

  1. Accept is_owner flag in ucp_wireup_ep_set_next_ep() to mark WIREUP EPs that are owner/not owner for UCT EP that's under the hood.
  2. Repalce ucp_cm_client_restore_ep() by ucp_cm_copy_ep_lanes() that copies lanes? creates new WIREUP EPs for lanes and changes ownership of UCT EPs if needed.

@dmitrygx
Copy link
Member Author

@evgeny-leksikov @alinask @brminich could you review pls?

continue;
}

uct_ep = ucp_wireup_extract_lane(from_ep, lane_idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we know if from_ep lane was the owner of uct_ep?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to know whether it's owner or not
if we are changing ownership, to_ep will be owner, otherwise - from_ep is owner

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if "from_ep" was NOT the owner, it means the "to_ep" should not be the owner as well, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add assertion that from_ep is owner for the UCT EP - it is expected behavior

@dmitrygx
Copy link
Member Author

@brminich @evgeny-leksikov could you review pls?

w_ep->super.ucp_ep = ucp_ep;
}
if (change_ownership) {
to_is_owner = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: can do shorter:

to_is_owner = change_ownership;
from_is_owner = !change_ownership;


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/ucp/wireup/wireup_cm.c Show resolved Hide resolved
@@ -251,24 +251,56 @@ ucp_cm_client_connect_prog_arg_free(ucp_cm_client_connect_progress_arg_t *arg)
ucs_free(arg);
}

static void ucp_cm_client_restore_ep(ucp_wireup_ep_t *wireup_cm_ep,
ucp_ep_h ucp_ep)
static void ucp_cm_copy_ep_lanes(ucp_ep_h to_ep, ucp_ep_h from_ep,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls document all the parameters. How the EPs has to be created and initialized before the call? if to_ep is not initialized, then maybe make it [out] parameter of ucp_cm_ep_dup?

Copy link
Member Author

@dmitrygx dmitrygx Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added the description
to_ep has to be initialized, so, no need to duplicate it here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in this case here should be 2 iterators and to_ep should be reconfigured since to_ep can have some lanes initialized.

Copy link
Member Author

@dmitrygx dmitrygx Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have assert that to_ep doesn't have UCT_EPs for the lanes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is removed?

assert() doesn't work, since configurations are not equal:

  • TMP EP has two lanes: CM lane and transport lane
  • UCP EP has only one lane: CM lane

After we copy UCT EPs from TMP EP to UCP EP, it will do init_lanes() where new config will be selected
The new config could be equal to what we have in TMP EP, then UCT EPs copied to UCP EP will be re-used in the new config. In the current code, it has to be the same.


uct_ep = ucp_wireup_extract_lane(from_ep, lane_idx);
if (uct_ep == NULL) {
continue;
}

status = ucp_wireup_ep_create(to_ep, &to_ep->uct_eps[lane_idx]);
if (status != UCS_OK) {
/* coverity[leaked_storage] */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we failed to create wireup_ep, what happens to the lane we extracted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added ucs_fatal() instead of continue;

@dmitrygx
Copy link
Member Author

@brminich @evgeny-leksikov @yosefe your comments were addressed, could you review pls?
the CI was restarted, since the PR has small merge conflict with #5618 that was merged today morning

@brminich
Copy link
Contributor

errors look relevant:

[     INFO ] server listening on fe80::268a:7ff:fe8d:af27:48488
[swx-rdmz-ucx-new-01:22258:0:22258]   wireup_ep.c:42   Assertion `!(wireup_ep->flags & UCP_WIREUP_EP_FLAG_LOCAL_CONNECTED)' failed

/scrap/azure/agent-06/AZP_WORKSPACE/2/s/contrib/../src/ucp/wireup/wireup_ep.c: [ ucp_wireup_ep_connect_to_ep() ]
      ...
       39     ucp_wireup_ep_t *wireup_ep = ucp_wireup_ep(uct_ep);
       40 
       41     ucs_assert(!(wireup_ep->flags & UCP_WIREUP_EP_FLAG_LOCAL_CONNECTED));
==>    42     wireup_ep->flags |= UCP_WIREUP_EP_FLAG_LOCAL_CONNECTED;
       43     return uct_ep_connect_to_ep(wireup_ep->super.uct_ep, dev_addr, ep_addr);
       44 }
       45 

@dmitrygx
Copy link
Member Author

errors look relevant:

@brminich fixed

@@ -439,7 +439,7 @@ ucs_status_t ucp_wireup_ep_connect(uct_ep_h uct_ep, unsigned ep_init_flags,
goto err;
}

ucp_proxy_ep_set_uct_ep(&wireup_ep->super, next_ep, 1);
ucp_wireup_ep_set_next_ep(uct_ep, next_ep, 1, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ucp_wireup_ep_set_next_ep() does additional checks

Comment on lines 657 to 659
if (is_local_connected) {
wireup_ep->flags |= UCP_WIREUP_EP_FLAG_LOCAL_CONNECTED;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need this? can just call ucp_proxy_ep_set_uct_ep instead like it is done now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to use ucp_wireup_ep_set_next_ep() akways instead of:

if (local_conencted) {
    ucp_wireup_ep_set_next_ep();
} else {
    ucp_proxy_ep_set_uct_ep();
}

(from_wireup_ep->flags &
UCP_WIREUP_EP_FLAG_LOCAL_CONNECTED);

status = ucp_wireup_ep_create(to_ep, &to_ep->uct_eps[lane_idx]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just take existing wireup ep, like it was done before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, since we will need this to set who is the owner of the UCT EP- TMP EP or user's UCP EP

@dmitrygx
Copy link
Member Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dmitrygx
Copy link
Member Author

dmitrygx commented Sep 1, 2020

@brminich @evgeny-leksikov @yosefe could you review pls?

to_is_owner, is_local_connected);

if (from_wireup_ep == NULL) {
/* from_ep must be the owner of the UCT EP in this case */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please elaborate? why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 294 to 296
is_local_connected = (from_wireup_ep == NULL) ||
(from_wireup_ep->flags &
UCP_WIREUP_EP_FLAG_LOCAL_CONNECTED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not we inherit other wireup flags? if yes - maybe set them outside of ucp_wireup_ep_set_next_ep or pass instead of is_local_connected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dmitrygx
Copy link
Member Author

dmitrygx commented Sep 2, 2020

@brminich @evgeny-leksikov @yosefe could you review pls?


if (from_wireup_ep == NULL) {
/* wireup EPs couldn't exist for the from_ep lanes if this is
* called on the server side of the conenction to copy the lanes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean? This function is not invoked on the server side

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but it will be called

@dmitrygx dmitrygx closed this Sep 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants