core: Support multiple auth keys per EP #9319

iziemba · 2023-09-12T03:48:14Z

EPs bound to an AV configure with FI_AUTH_KEY can support 1+ auth keys per EPs.

All eligible auth keys must be pre-inserted into the AV via fi_av_insert_auth_key(). Acceptable flags are the following:

FI_TRANSMIT: Restrict the auth key to outbound data transfers. This includes send message, RMA, and atomic operations.
FI_RECV: Restrict the auth key to inbound data transfers. This includes received messages and target MRs of RMA and atomic operations.

The FI_AUTH_KEY_MATCH_ALL fi_av_attr flag can be used to pre-configure the AV for all possible auth keys with a FI_TRANSMIT | FI_RECV configuration.

Once the AV is bound to an EP and the EP is successfully enabled, the EP will be configured to support all auth keys in the AV at that point in time. Later fi_av_insert_auth_key() will not propagate to already enabled EPs.

For AVs configured with FI_AUTH_KEY,
fi_av_insert_auth_key_{addr, svc, sym} must be used to generate fi_addr_t's for a specific <EP addr, auth key> tuple.
fi_av_insert_{addr, svc, sym} will not be supported. The resulting fi_addr_t's can be used for outgoing data transfers. If the endpoint is configured with FI_DIRECTED_RECV, the resulting fi_addr_t's can be used to restrict a receive buffer to a specific <EP addr, auth key> tuple. FI_ADDR_UNSEPC can be used to match any <EP addr, auth key> combination. An insert can be done with FI_AUTH_KEY_MATCH_ALL to generate an fi_addr_t to match on all EP addr and a specific auth key.

All inserted auth_keys must have buffer size equal to fi_info::domain_attr::auth_key_size reported by the provider. Failure to ensure this may lead to memory corruption.

For FI_EADDRNOTAVAIL CQ errors, the FI_SOURCE_ERR flag in fi_cq_err_entry::flags can be used to distinguish if fi_cq_err_entry::src_err_data or fi_cq_err_entry::err_data is valid. If set, fi_cq_err_entry::src_err_data is valid.

For EPs configured with FI_SOURCE_ERR and bound to an AV with FI_AUTH_KEY, all FI_EADDRNOTAVAIL CQ errors need to be generated with the FI_SOURCE_ERR flag set and fi_cq_err_entry::src_err_data filled in accordingly. fi_cq_err_source_err_data::addr and
fi_cq_err_source_err_data::auth_key will be set to valid pointers which can be used for fi_av_insert_auth_key_addr().

jswaro · 2023-09-12T16:22:20Z

What would be useful to this request is man page changes describing each API, and what it intends to accomplish from the application writer's and provider's perspectives.

include/rdma/fi_domain.h

include/rdma/fi_eq.h

include/rdma/fi_domain.h

shefty · 2023-09-12T18:25:01Z

Thanks - seeing the API makes it easier to discuss and identify potential changes.

iziemba · 2023-09-13T17:00:10Z

Version 2 of the API is ready.

    core: Support multiple auth keys per EP

    FI_MULTI_AUTH_KEY is a new, domain-level, secondary capability flag used
    to denote if a provider supports multiple authorization keys per
    connectionless EP (flag does not apply to connected EPs). The number of
    authorization keys which can be supported will be reported via
    fi_ep_attr::auth_key_cnt.

    Applications which do not set FI_MULTI_AUTH_KEY will not see a change to
    their current usage of authorization keys.

    Applications which set FI_MULTI_AUTH_KEY must associate authorization
    keys with the AV being bound to the EP.
    fi_domain_attr::auth_key and fi_ep_attr::auth_key will be ignored.

    All eligible auth keys must be pre-inserted into the AV via
    fi_av_insert_auth_key(). Acceptable flags are the following:
    - FI_TRANSMIT: Restrict the auth key to outbound data transfers. This
      includes send message, RMA, and atomic operations.
    - FI_RECV: Restrict the auth key to inbound data transfers. This
      includes received messages and target MRs of RMA and atomic operations.

    fi_av_insert_auth_key() output is an fi_addr_t handle specific to this
    authorization key. If the EP is configured with FI_DIRECTED_RECV, this
    fi_addr_t can be used to match all EP addrs associated with this
    authorization key. Calling fi_av_remove() with this fi_addr_t will
    delete the authorization key. -FI_EBUSY will be returned from
    fi_av_remove() should this key still be used by en EP. In other words,
    all EPs using this authorization key need to be closed for
    fi_av_remove() to succeed.

    The FI_AUTH_KEY_MATCH_ALL fi_av_attr flag can be used to pre-configure
    the AV for all possible auth keys with a FI_TRANSMIT | FI_RECV
    configuration.

    Once the AV is bound to an EP and the EP is successfully enabled, the
    EP will be configured to support all auth keys in the AV at that point
    in time.

    If the domain is configured with FI_MULTI_AUTH_KEY, users must provide
    an authorization key fi_addr_t with fi_av_insert_{addr, svc, sym}. This
    is done by using the fi_addr as input. For fi_av_insert_{addr, sym},
    since fi_addr may be an array, only index 0 will be looked at for the
    authorization key fi_addr_t. That is only a single authorization key
    fi_addr_t will be supported for these functions. The output of
    fi_av_insert_{addr, svc, sym} is an fi_addr_t mapping to a specific
    <EP addr, auth_key> tuple.

    For FI_EADDRNOTAVAIL CQ errors, users must specify
    fi_cq_err_entry::auth_key and fi_cq_err_entry::auth_key_size to retrieve
    the authorization key. fi_cq_err_entry::auth_key and
    fi_cq_err_entry::auth_key_size behavior similar to
    fi_cq_err_entry::err_data and fi_cq_err_entry::err_data_size. That is
    users need to provide the buffer for providers to memcpy the
    authorization key into. fi_cq_err_entry::auth_key, combined with the
    existing behavior of fi_cq_err_entry::err_data, enables users to
    generate a fi_addr_t mapping to the specific <EP addr, auth_key> tuple
    which triggered the FI_EADDRNOTAVAIL event.

include/rdma/fabric.h

include/rdma/fi_domain.h

iziemba · 2023-09-14T15:42:57Z

include/rdma/fi_eq.h

@@ -239,6 +239,8 @@ struct fi_cq_err_entry {
 	/* err_data is available until the next time the CQ is read */
 	void			*err_data;
 	size_t			err_data_size;
+	void			*auth_key;
+	size_t			auth_key_size;


If FI_AUTH_KEY_MATCH_ALL is removed (thus ensuring a fi_addr_t exists for each auth key), I am not sure if we need to have fi_cq_err_entry::auth_key and fi_cq_err_entry::auth_key_size as currently defined. Seems like we can just defined fi_addr_t fi_cq_err_entry::auth_key and return the fi_addr_t from insert. Thoughts?

I think you're right, and you're also exposing a gap in the current API. We can't report the src_addr with any error completion, which may be useful. For example, if we report a truncated receive completion at the target, we can't report the source of the message.

So... I like the change you're proposing, but I'd rename the 'auth_key' field to 'src_addr' to make it more generic. The app would need to use the error code to know to interpret the src_addr as an auth_key only address.

Sounds good. Do you want me to open a new issue for truncated recvs not reporting fi_addr_t?

You can if you want; it seems like a gap that should be addressed, which is easier to handle with your proposal.

include/rdma/fi_eq.h

iziemba · 2023-09-15T02:58:51Z

Version 3 of the API is ready. Since it seems like we are converging on something, I have added in markdown documentation along with ABI compat.

shefty · 2023-09-15T16:51:16Z

Please separate the ABI breaking changes (modification to fi_domain_attr) from the other API change (fi_cq_err_entry) into their own commits.

shefty

Some comments to fill your time. :) Thanks!

include/rdma/fi_domain.h

include/rdma/fi_eq.h

man/fi_av.3.md

man/fi_cq.3.md

man/fi_domain.3.md

iziemba · 2023-09-19T16:16:54Z

Assuming these API changes pan out, can these API changes target libfabric 1.20?

shefty · 2023-09-19T17:25:07Z

Targeting v1.20 looks doable.

man/fi_av.3.md

shefty

See comments. I need to think about the comments posted to the last patch.

man/fi_cq.3.md

prov/bgq/src/fi_bgq_cq.c

prov/opx/include/rdma/opx/fi_opx_cq_ops_table.h

prov/psm2/src/psmx2_cq.c

src/fabric.c

man/fi_av.3.md

iziemba · 2023-09-25T18:46:56Z

Another round is update. Only comment not address is how to handle FI_AV_USER_ID.

man/fi_mr.3.md

man/fi_av.3.md

shefty

I think this is in decent shape. Thanks!

include/rdma/fabric.h

src/abi_1_0.c

man/fi_mr.3.md

iziemba · 2023-10-10T04:51:13Z

FI_AUTH_KEY flag added.

man/fi_domain.3.md

man/fi_endpoint.3.md

iziemba · 2023-10-12T13:24:40Z

include/rdma/fi_eq.h

@@ -403,6 +404,8 @@ fi_cq_readfrom(struct fid_cq *cq, void *buf, size_t count, fi_addr_t *src_addr)
 static inline ssize_t
 fi_cq_readerr(struct fid_cq *cq, struct fi_cq_err_entry *buf, uint64_t flags)
 {
+	/* For compatibility with older providers. */
+	buf->src_addr = FI_ADDR_NOTAVAIL;


This is tripping a CXI provider test failure where buf is NULL. I am going to update to:

if (buf) buf->src_addr = FI_ADDR_NOTAVAIL

iziemba · 2023-10-12T13:46:09Z

As HPE works through implementing this API, I would expect other minor changes, like setting domain_attr::auth_key to NULL, to happen.

iziemba · 2023-10-12T16:08:12Z

man/fi_domain.3.md

+## Max Authorization Keys per Endpoint (max_ep_auth_key)
+
+: The maximum number of authorization keys which can be supported per connectionless
+  endpoint.


What is the correct behavior here if users specify zero? Following the precedence of similar fields, seems like if this value is zero, providers can return their default value?

I think this depends on what a provider should do with this value if passed into an open domain/ep call. Most apps will pass this value directly through.

man/fi_av.3.md

iziemba · 2023-10-13T15:50:00Z

I think the fi_cq_err_entry, max_ep_auth_key, and optional_caps commits are ready to be merged. Thus, I think I am going to open up new PRs with intent of lands for each of these functional changes.

Add a fi_addr_t source error field to the CQ error event. This can be used by various CQ error events to return source information. Signed-off-by: Ian Ziemba <[email protected]>

fi_cq_err_entry::src_addr can be used by CQ errors to report a source address. How this field will be used to CQ error event specific. Signed-off-by: Ian Ziemba <[email protected]>

Update various providers to only reference fi_cq_err_entry::src_addr if API version is 1.20. Signed-off-by: Ian Ziemba <[email protected]>

max_ep_auth_key is used by providers to report the number of authorization keys supported per connectionless endpoint. This is required to support FI_AV_AUTH_KEY in future commits. Signed-off-by: Ian Ziemba <[email protected]>

This is used by providers to report the number of authorization keys per connectionless endpoint. Signed-off-by: Ian Ziemba <[email protected]>

Optional capabilities may optionally be requested by an application. If requested, providers are not required to support the capability. If providers do not support the capability, the capability will be cleared in the corresponding fi_info caps fields returned from fi_getinfo. Else, providers will set the capability in the corresponding fi_info caps. All capabilities, both primary and secondary, are eligible as being optional. Signed-off-by: Ian Ziemba <[email protected]>

Signed-off-by: Ian Ziemba <[email protected]>

ABI version is updated to 1.7 to accommodate fi_domain_attr::max_ep_auth_key_cnt and optional_caps. Signed-off-by: Ian Ziemba <[email protected]>

ABI 1.7 adds fi_domain_attr::max_ep_auth_key and optional_caps. Signed-off-by: Ian Ziemba <[email protected]>

fi_domain_attr::max_ep_auth_key is used to reported the number of authorization keys supported by an endpoint. If this value is non-zero, connectionless endpoints must implement FI_AV_AUTH_KEY. FI_AV_AUTH_KEY is set by libfabric users via fi_domain_attr:::auth_key_size to denoted if MR and EP authorization keys from the AV instead of MR and EP attrs. When set, providers will ignore fi_ep_attr::auth_key during endpoint enable. From MRs, fi_mr_regattr() must be used with fi_mr_attr::auth_key pointing to a struct fi_mr_auth_key and fi_mr_attr:auth_key_size equal to sizeof(struct fi_mr_auth_key). fi_mr_auth_key::av should point to the AV the MR authorization keys should come from. If the domain is configured with FI_DIRECTED_RECV, fi_mr_auth_key::src_addr is used to restrict the MR to a specific fi_addr_t including authorization key fi_addr_t's. fi_av_insert_auth_key() output is an fi_addr_t handle specific to this authorization key. All operations, including AV operations data transfer operations, which may accept an auth_key fi_addr_t are required to pass in the FI_AUTH_KEY flag. If the EP is configured with FI_DIRECTED_RECV, this auth_key fi_addr_t can be used to match all EP addrs associated with this authorization key. Calling fi_av_remove() with this fi_addr_t will delete the authorization key. -FI_EBUSY will be returned from fi_av_remove() should this key still be used by en EP. In other words, all EPs using this authorization key need to be closed for fi_av_remove() to succeed. Once the AV is bound to an EP and the EP is successfully enabled, the EP will be configured to support all auth keys in the AV at that point in time. Users must provide an authorization key fi_addr_t with fi_av_insert_{addr, svc, sym}. This is done by using the fi_addr as input and setting the FI_AUTH_KEY flag. For fi_av_insert_{addr, sym}, since fi_addr may be an array, authorization key fi_addr_t's need to be specified for each index. The output of fi_av_insert_{addr, svc, sym} is an fi_addr_t mapping to a specific <EP addr, auth_key> tuple. For FI_EADDRNOTAVAIL CQ errors, fi_cq_err_entry::src_addr will return the authorization key handle associated with the incoming data transfer. This, combined with the existing behavior of fi_cq_err_entry::err_data enables users to generate a fi_addr_t mapping to the specific <EP addr, auth_key> tuple which triggered the FI_EADDRNOTAVAIL event. Signed-off-by: Ian Ziemba <[email protected]>

FI_AV_AUTH_KEY is used to enable multiple auth keys per connectionless endpoint. Signed-off-by: Ian Ziemba <[email protected]>

fi_av_set_user_id() is used to set the user id when the AV is opened with FI_AV_USER_ID. Signed-off-by: Ian Ziemba <[email protected]>

Document FI_AV_USER_ID as a primary cap. In addition, define FI_AV_USER_ID as a new domain primary cap. This enables AVs to be opened with FI_AV_USER_ID. Define AV opened with FI_AV_USER_ID behavior. In addition, document how the existing FI_AV_USER_ID behavior can be used if FI_AV_USER_ID is not requested as a capability. Signed-off-by: Ian Ziemba <[email protected]>

man/fi_av.3.md

iziemba added ⚠️ Do not merge discussion labels Sep 12, 2023

iziemba force-pushed the auth_keys branch from 8deb386 to 3791524 Compare September 12, 2023 14:46

iziemba marked this pull request as draft September 12, 2023 14:47

iziemba requested a review from shefty September 12, 2023 14:49

iziemba force-pushed the auth_keys branch from 3791524 to f6ac5fa Compare September 12, 2023 14:53

shefty reviewed Sep 12, 2023

View reviewed changes

iziemba force-pushed the auth_keys branch from f6ac5fa to bcf1e0f Compare September 13, 2023 16:58

iziemba requested a review from shefty September 13, 2023 16:59

shefty reviewed Sep 13, 2023

View reviewed changes

include/rdma/fabric.h Outdated Show resolved Hide resolved

include/rdma/fabric.h Outdated Show resolved Hide resolved

include/rdma/fi_domain.h Outdated Show resolved Hide resolved

iziemba commented Sep 14, 2023

View reviewed changes

include/rdma/fi_eq.h Outdated Show resolved Hide resolved

iziemba force-pushed the auth_keys branch from bcf1e0f to 3ffc31b Compare September 15, 2023 02:56

iziemba requested a review from shefty September 15, 2023 02:59

shefty reviewed Sep 15, 2023

View reviewed changes

iziemba force-pushed the auth_keys branch from 3ffc31b to 65c6b5e Compare September 19, 2023 17:28

iziemba commented Sep 19, 2023

View reviewed changes

man/fi_av.3.md Show resolved Hide resolved

shefty reviewed Sep 19, 2023

View reviewed changes

iziemba force-pushed the auth_keys branch from 65c6b5e to a09d829 Compare September 25, 2023 18:45

iziemba commented Sep 25, 2023

View reviewed changes

man/fi_mr.3.md Outdated Show resolved Hide resolved

iziemba commented Sep 25, 2023

View reviewed changes

man/fi_av.3.md Outdated Show resolved Hide resolved

shefty reviewed Sep 25, 2023

View reviewed changes

include/rdma/fabric.h Outdated Show resolved Hide resolved

src/abi_1_0.c Outdated Show resolved Hide resolved

man/fi_mr.3.md Outdated Show resolved Hide resolved

iziemba force-pushed the auth_keys branch from e117ef4 to 13410bc Compare October 10, 2023 04:50

iziemba requested a review from shefty October 10, 2023 04:50

iziemba commented Oct 11, 2023

View reviewed changes

man/fi_domain.3.md Outdated Show resolved Hide resolved

iziemba commented Oct 11, 2023

View reviewed changes

man/fi_endpoint.3.md Outdated Show resolved Hide resolved

iziemba commented Oct 11, 2023

View reviewed changes

man/fi_endpoint.3.md Outdated Show resolved Hide resolved

iziemba commented Oct 12, 2023

View reviewed changes

iziemba force-pushed the auth_keys branch from 13410bc to 212d6b4 Compare October 12, 2023 13:41

iziemba commented Oct 12, 2023

View reviewed changes

man/fi_av.3.md Show resolved Hide resolved

iziemba added 13 commits October 16, 2023 09:56

core: Define fi_cq_err_entry::src_addr

ee39df3

Add a fi_addr_t source error field to the CQ error event. This can be used by various CQ error events to return source information. Signed-off-by: Ian Ziemba <[email protected]>

man: Doc fi_cq_err_entry::src_addr

24c585e

fi_cq_err_entry::src_addr can be used by CQ errors to report a source address. How this field will be used to CQ error event specific. Signed-off-by: Ian Ziemba <[email protected]>

core: ABI compat for fi_cq_err_entry::src_addr

2f3e8ef

Update various providers to only reference fi_cq_err_entry::src_addr if API version is 1.20. Signed-off-by: Ian Ziemba <[email protected]>

core: Define fi_domain_attr::max_ep_auth_key

3760f9b

max_ep_auth_key is used by providers to report the number of authorization keys supported per connectionless endpoint. This is required to support FI_AV_AUTH_KEY in future commits. Signed-off-by: Ian Ziemba <[email protected]>

man: Doc max_ep_auth_key

704466b

This is used by providers to report the number of authorization keys per connectionless endpoint. Signed-off-by: Ian Ziemba <[email protected]>

man: Doc optional_caps

51cf820

Signed-off-by: Ian Ziemba <[email protected]>

core: 1.7 ABI compat

dd05600

ABI version is updated to 1.7 to accommodate fi_domain_attr::max_ep_auth_key_cnt and optional_caps. Signed-off-by: Ian Ziemba <[email protected]>

man: Doc ABI 1.7

7365927

ABI 1.7 adds fi_domain_attr::max_ep_auth_key and optional_caps. Signed-off-by: Ian Ziemba <[email protected]>

man: Doc FI_AV_AUTH_KEY

c28c8db

FI_AV_AUTH_KEY is used to enable multiple auth keys per connectionless endpoint. Signed-off-by: Ian Ziemba <[email protected]>

core: Define fi_av_set_user_id

3220c45

fi_av_set_user_id() is used to set the user id when the AV is opened with FI_AV_USER_ID. Signed-off-by: Ian Ziemba <[email protected]>

iziemba force-pushed the auth_keys branch from 212d6b4 to 7ebea6f Compare October 16, 2023 14:58

iziemba mentioned this pull request Oct 26, 2023

Support for max_ep_auth_key #9440

Merged

iziemba commented Oct 26, 2023

View reviewed changes

man/fi_av.3.md Show resolved Hide resolved

iziemba mentioned this pull request Oct 26, 2023

Support Multiple Authorization Keys per Endpoint #9476

Merged

iziemba closed this Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: Support multiple auth keys per EP #9319

core: Support multiple auth keys per EP #9319

iziemba commented Sep 12, 2023 •

edited

Loading

jswaro commented Sep 12, 2023

shefty commented Sep 12, 2023

iziemba commented Sep 13, 2023

iziemba Sep 14, 2023

shefty Sep 14, 2023

iziemba Sep 14, 2023

shefty Sep 14, 2023

iziemba commented Sep 15, 2023

shefty commented Sep 15, 2023

shefty left a comment

iziemba commented Sep 19, 2023

shefty commented Sep 19, 2023

shefty left a comment

iziemba commented Sep 25, 2023

shefty left a comment

iziemba commented Oct 10, 2023

iziemba Oct 12, 2023

iziemba commented Oct 12, 2023

iziemba Oct 12, 2023

shefty Oct 12, 2023

iziemba commented Oct 13, 2023

core: Support multiple auth keys per EP #9319

core: Support multiple auth keys per EP #9319

Conversation

iziemba commented Sep 12, 2023 • edited Loading

jswaro commented Sep 12, 2023

shefty commented Sep 12, 2023

iziemba commented Sep 13, 2023

iziemba Sep 14, 2023

Choose a reason for hiding this comment

shefty Sep 14, 2023

Choose a reason for hiding this comment

iziemba Sep 14, 2023

Choose a reason for hiding this comment

shefty Sep 14, 2023

Choose a reason for hiding this comment

iziemba commented Sep 15, 2023

shefty commented Sep 15, 2023

shefty left a comment

Choose a reason for hiding this comment

iziemba commented Sep 19, 2023

shefty commented Sep 19, 2023

shefty left a comment

Choose a reason for hiding this comment

iziemba commented Sep 25, 2023

shefty left a comment

Choose a reason for hiding this comment

iziemba commented Oct 10, 2023

iziemba Oct 12, 2023

Choose a reason for hiding this comment

iziemba commented Oct 12, 2023

iziemba Oct 12, 2023

Choose a reason for hiding this comment

shefty Oct 12, 2023

Choose a reason for hiding this comment

iziemba commented Oct 13, 2023

iziemba commented Sep 12, 2023 •

edited

Loading