-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: Support multiple auth keys per EP #9319
Conversation
What would be useful to this request is man page changes describing each API, and what it intends to accomplish from the application writer's and provider's perspectives. |
Thanks - seeing the API makes it easier to discuss and identify potential changes. |
Version 2 of the API is ready.
|
include/rdma/fi_eq.h
Outdated
@@ -239,6 +239,8 @@ struct fi_cq_err_entry { | |||
/* err_data is available until the next time the CQ is read */ | |||
void *err_data; | |||
size_t err_data_size; | |||
void *auth_key; | |||
size_t auth_key_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If FI_AUTH_KEY_MATCH_ALL is removed (thus ensuring a fi_addr_t exists for each auth key), I am not sure if we need to have fi_cq_err_entry::auth_key
and fi_cq_err_entry::auth_key_size
as currently defined. Seems like we can just defined fi_addr_t fi_cq_err_entry::auth_key
and return the fi_addr_t
from insert. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, and you're also exposing a gap in the current API. We can't report the src_addr with any error completion, which may be useful. For example, if we report a truncated receive completion at the target, we can't report the source of the message.
So... I like the change you're proposing, but I'd rename the 'auth_key' field to 'src_addr' to make it more generic. The app would need to use the error code to know to interpret the src_addr as an auth_key only address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Do you want me to open a new issue for truncated recvs not reporting fi_addr_t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can if you want; it seems like a gap that should be addressed, which is easier to handle with your proposal.
Version 3 of the API is ready. Since it seems like we are converging on something, I have added in markdown documentation along with ABI compat. |
Please separate the ABI breaking changes (modification to fi_domain_attr) from the other API change (fi_cq_err_entry) into their own commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments to fill your time. :) Thanks!
Assuming these API changes pan out, can these API changes target libfabric 1.20? |
Targeting v1.20 looks doable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments. I need to think about the comments posted to the last patch.
Another round is update. Only comment not address is how to handle FI_AV_USER_ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is in decent shape. Thanks!
FI_AUTH_KEY flag added. |
include/rdma/fi_eq.h
Outdated
@@ -403,6 +404,8 @@ fi_cq_readfrom(struct fid_cq *cq, void *buf, size_t count, fi_addr_t *src_addr) | |||
static inline ssize_t | |||
fi_cq_readerr(struct fid_cq *cq, struct fi_cq_err_entry *buf, uint64_t flags) | |||
{ | |||
/* For compatibility with older providers. */ | |||
buf->src_addr = FI_ADDR_NOTAVAIL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tripping a CXI provider test failure where buf is NULL. I am going to update to:
if (buf)
buf->src_addr = FI_ADDR_NOTAVAIL
As HPE works through implementing this API, I would expect other minor changes, like setting domain_attr::auth_key to NULL, to happen. |
man/fi_domain.3.md
Outdated
## Max Authorization Keys per Endpoint (max_ep_auth_key) | ||
|
||
: The maximum number of authorization keys which can be supported per connectionless | ||
endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the correct behavior here if users specify zero? Following the precedence of similar fields, seems like if this value is zero, providers can return their default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this depends on what a provider should do with this value if passed into an open domain/ep call. Most apps will pass this value directly through.
I think the fi_cq_err_entry, max_ep_auth_key, and optional_caps commits are ready to be merged. Thus, I think I am going to open up new PRs with intent of lands for each of these functional changes. |
Add a fi_addr_t source error field to the CQ error event. This can be used by various CQ error events to return source information. Signed-off-by: Ian Ziemba <[email protected]>
fi_cq_err_entry::src_addr can be used by CQ errors to report a source address. How this field will be used to CQ error event specific. Signed-off-by: Ian Ziemba <[email protected]>
Update various providers to only reference fi_cq_err_entry::src_addr if API version is 1.20. Signed-off-by: Ian Ziemba <[email protected]>
max_ep_auth_key is used by providers to report the number of authorization keys supported per connectionless endpoint. This is required to support FI_AV_AUTH_KEY in future commits. Signed-off-by: Ian Ziemba <[email protected]>
This is used by providers to report the number of authorization keys per connectionless endpoint. Signed-off-by: Ian Ziemba <[email protected]>
Optional capabilities may optionally be requested by an application. If requested, providers are not required to support the capability. If providers do not support the capability, the capability will be cleared in the corresponding fi_info caps fields returned from fi_getinfo. Else, providers will set the capability in the corresponding fi_info caps. All capabilities, both primary and secondary, are eligible as being optional. Signed-off-by: Ian Ziemba <[email protected]>
Signed-off-by: Ian Ziemba <[email protected]>
ABI version is updated to 1.7 to accommodate fi_domain_attr::max_ep_auth_key_cnt and optional_caps. Signed-off-by: Ian Ziemba <[email protected]>
ABI 1.7 adds fi_domain_attr::max_ep_auth_key and optional_caps. Signed-off-by: Ian Ziemba <[email protected]>
fi_domain_attr::max_ep_auth_key is used to reported the number of authorization keys supported by an endpoint. If this value is non-zero, connectionless endpoints must implement FI_AV_AUTH_KEY. FI_AV_AUTH_KEY is set by libfabric users via fi_domain_attr:::auth_key_size to denoted if MR and EP authorization keys from the AV instead of MR and EP attrs. When set, providers will ignore fi_ep_attr::auth_key during endpoint enable. From MRs, fi_mr_regattr() must be used with fi_mr_attr::auth_key pointing to a struct fi_mr_auth_key and fi_mr_attr:auth_key_size equal to sizeof(struct fi_mr_auth_key). fi_mr_auth_key::av should point to the AV the MR authorization keys should come from. If the domain is configured with FI_DIRECTED_RECV, fi_mr_auth_key::src_addr is used to restrict the MR to a specific fi_addr_t including authorization key fi_addr_t's. fi_av_insert_auth_key() output is an fi_addr_t handle specific to this authorization key. All operations, including AV operations data transfer operations, which may accept an auth_key fi_addr_t are required to pass in the FI_AUTH_KEY flag. If the EP is configured with FI_DIRECTED_RECV, this auth_key fi_addr_t can be used to match all EP addrs associated with this authorization key. Calling fi_av_remove() with this fi_addr_t will delete the authorization key. -FI_EBUSY will be returned from fi_av_remove() should this key still be used by en EP. In other words, all EPs using this authorization key need to be closed for fi_av_remove() to succeed. Once the AV is bound to an EP and the EP is successfully enabled, the EP will be configured to support all auth keys in the AV at that point in time. Users must provide an authorization key fi_addr_t with fi_av_insert_{addr, svc, sym}. This is done by using the fi_addr as input and setting the FI_AUTH_KEY flag. For fi_av_insert_{addr, sym}, since fi_addr may be an array, authorization key fi_addr_t's need to be specified for each index. The output of fi_av_insert_{addr, svc, sym} is an fi_addr_t mapping to a specific <EP addr, auth_key> tuple. For FI_EADDRNOTAVAIL CQ errors, fi_cq_err_entry::src_addr will return the authorization key handle associated with the incoming data transfer. This, combined with the existing behavior of fi_cq_err_entry::err_data enables users to generate a fi_addr_t mapping to the specific <EP addr, auth_key> tuple which triggered the FI_EADDRNOTAVAIL event. Signed-off-by: Ian Ziemba <[email protected]>
FI_AV_AUTH_KEY is used to enable multiple auth keys per connectionless endpoint. Signed-off-by: Ian Ziemba <[email protected]>
fi_av_set_user_id() is used to set the user id when the AV is opened with FI_AV_USER_ID. Signed-off-by: Ian Ziemba <[email protected]>
Document FI_AV_USER_ID as a primary cap. In addition, define FI_AV_USER_ID as a new domain primary cap. This enables AVs to be opened with FI_AV_USER_ID. Define AV opened with FI_AV_USER_ID behavior. In addition, document how the existing FI_AV_USER_ID behavior can be used if FI_AV_USER_ID is not requested as a capability. Signed-off-by: Ian Ziemba <[email protected]>
EPs bound to an AV configure with FI_AUTH_KEY can support 1+ auth keys per EPs.
All eligible auth keys must be pre-inserted into the AV via fi_av_insert_auth_key(). Acceptable flags are the following:
The FI_AUTH_KEY_MATCH_ALL fi_av_attr flag can be used to pre-configure the AV for all possible auth keys with a FI_TRANSMIT | FI_RECV configuration.
Once the AV is bound to an EP and the EP is successfully enabled, the EP will be configured to support all auth keys in the AV at that point in time. Later fi_av_insert_auth_key() will not propagate to already enabled EPs.
For AVs configured with FI_AUTH_KEY,
fi_av_insert_auth_key_{addr, svc, sym} must be used to generate fi_addr_t's for a specific <EP addr, auth key> tuple.
fi_av_insert_{addr, svc, sym} will not be supported. The resulting fi_addr_t's can be used for outgoing data transfers. If the endpoint is configured with FI_DIRECTED_RECV, the resulting fi_addr_t's can be used to restrict a receive buffer to a specific <EP addr, auth key> tuple. FI_ADDR_UNSEPC can be used to match any <EP addr, auth key> combination. An insert can be done with FI_AUTH_KEY_MATCH_ALL to generate an fi_addr_t to match on all EP addr and a specific auth key.
All inserted auth_keys must have buffer size equal to fi_info::domain_attr::auth_key_size reported by the provider. Failure to ensure this may lead to memory corruption.
For FI_EADDRNOTAVAIL CQ errors, the FI_SOURCE_ERR flag in fi_cq_err_entry::flags can be used to distinguish if fi_cq_err_entry::src_err_data or fi_cq_err_entry::err_data is valid. If set, fi_cq_err_entry::src_err_data is valid.
For EPs configured with FI_SOURCE_ERR and bound to an AV with FI_AUTH_KEY, all FI_EADDRNOTAVAIL CQ errors need to be generated with the FI_SOURCE_ERR flag set and fi_cq_err_entry::src_err_data filled in accordingly. fi_cq_err_source_err_data::addr and
fi_cq_err_source_err_data::auth_key will be set to valid pointers which can be used for fi_av_insert_auth_key_addr().