From 80576e9a89f22ddca2f93ede7b32086ef96485ed Mon Sep 17 00:00:00 2001 From: Sean Hefty Date: Mon, 2 Oct 2023 16:23:41 -0700 Subject: [PATCH] core: Define new tag formats Allow specifying precise tag formatting options. The mem_tag_format takes as input a set of bit fields. In practice, this ends up being unusable to implement, resulting in the entire tag simply being masked with ignore bits. When the mem_tag_format value only has the lower bits set (< 256), interpret the format as specific options. Two new options are defined, one aligned with MPI and the other with CCLs. This information can be used by providers to optimize for the separate use cases. Signed-off-by: Sean Hefty --- include/rdma/fabric.h | 7 ++ include/rdma/fi_tagged.h | 10 +++ man/fi_endpoint.3.md | 138 +++++++++++++++++++++++++++++---------- 3 files changed, 121 insertions(+), 34 deletions(-) diff --git a/include/rdma/fabric.h b/include/rdma/fabric.h index a54e8a66547..35a5269e9c1 100644 --- a/include/rdma/fabric.h +++ b/include/rdma/fabric.h @@ -336,6 +336,13 @@ enum { FI_PROTO_CXI_RNR, }; +enum { + FI_TAG_BITS, + FI_TAG_HPC, + FI_TAG_AI, + FI_TAG_MAX_FORMAT = (1ULL << 16), +}; + enum { FI_TC_UNSPEC = 0, FI_TC_DSCP = 0x100, diff --git a/include/rdma/fi_tagged.h b/include/rdma/fi_tagged.h index 61eba4e860a..a8525162d82 100644 --- a/include/rdma/fi_tagged.h +++ b/include/rdma/fi_tagged.h @@ -42,6 +42,16 @@ extern "C" { #endif +#define FI_HPC_IGNORE_TAG ((uint64_t) UINT32_MAX) +#define FI_HPC_IGNORE_PAYLOAD (((uint64_t) UINT8_MAX) << 32) + + +static inline uint64_t +fi_tag_hpc(int tag, uint8_t payload_id) +{ + return (((uint64_t) payload_id) << 32) | ((uint64_t) (uint32_t) tag); +} + struct fi_msg_tagged { const struct iovec *msg_iov; void **desc; diff --git a/man/fi_endpoint.3.md b/man/fi_endpoint.3.md index 101a9880943..d1091e44e33 100644 --- a/man/fi_endpoint.3.md +++ b/man/fi_endpoint.3.md @@ -782,40 +782,110 @@ A value of -1 guarantees ordering for any data size. ## mem_tag_format - Memory Tag Format -The memory tag format is a bit array used to convey the number of -tagged bits supported by a provider. Additionally, it may be used to -divide the bit array into separate fields. The mem_tag_format -optionally begins with a series of bits set to 0, to signify bits -which are ignored by the provider. Following the initial prefix of -ignored bits, the array will consist of alternating groups of bits set -to all 1's or all 0's. Each group of bits corresponds to a tagged -field. The implication of defining a tagged field is that when a mask -is applied to the tagged bit array, all bits belonging to a single -field will either be set to 1 or 0, collectively. - -For example, a mem_tag_format of 0x30FF indicates support for 14 -tagged bits, separated into 3 fields. The first field consists of -2-bits, the second field 4-bits, and the final field 8-bits. Valid -masks for such a tagged field would be a bitwise OR'ing of zero or -more of the following values: 0x3000, 0x0F00, and 0x00FF. The provider -may not validate the mask provided by the application for performance -reasons. - -By identifying fields within a tag, a provider may be able to optimize -their search routines. An application which requests tag fields must -provide tag masks that either set all mask bits corresponding to a -field to all 0 or all 1. When negotiating tag fields, an application -can request a specific number of fields of a given size. A provider -must return a tag format that supports the requested number of fields, -with each field being at least the size requested, or fail the -request. A provider may increase the size of the fields. When reporting -completions (see FI_CQ_FORMAT_TAGGED), it is not guaranteed that the -provider would clear out any unsupported tag bits in the tag field of -the completion entry. - -It is recommended that field sizes be ordered from smallest to -largest. A generic, unstructured tag and mask can be achieved by -requesting a bit array consisting of alternating 1's and 0's. +The memory tag format field is used to convey information on +the use of the tag and ignore parameters in the fi_tagged API calls, +as well as matching criteria. This information is used by the +provider to optimize tag matching support, including alignment with +wire protocols. The following tag formats are defined: + +*FI_TAG_BITS* + +: If specified on input to fi_getinfo, this indicates that tags + contain up to 64-bits of data, and the receiver must apply ignore_bits + to tags when matching receive buffers with sends. The output of + fi_getinfo will set 0 or more upper bits of mem_tag_format to 0 to + indicate those tag bits which are ignored or reserved by the provider. + Applications must check the number of upper bits which are 0 and + set them to 0 on all tag and ignore bits. + + The value of FI_TAG_BITS is 0, making this the default behavior if + the hints are left uninialized after being allocated by fi_allocinfo(). + This format provides the most flexibility to applications, but limits + provider optimization options. FI_TAG_BITS aligns with the behavior + defined for libfabric versions 1.x. + +*FI_TAG_HPC* + +: FI_TAG_HPC is a constrained usage of FI_TAG_BITS. When selected, applications + treat the tag as fields of data, rather than bits, with the ability to + wildcard each field. The HPC tag format specifically targets MPI based + implementations and applications. An HPC formatted tag consists of 2 fields: + a message tag and a payload identier. The message tag is a 32-bit searchable + tag. Matching on a message tag requires searching through a list of posted + buffers at the receiver, which we refer to as a searchable tag. + The integer tag in MPI point-to-point messages can map directly to + the libfabric message tag field. + + The second field is an identifier that corresponds to the operation or + data being carried in the message payload. For example, this field may + be used to identify the type of collective operation associated with a + message payload. Note that only the size and behavior for + the HPC tag formats are defined. Described use of the fields are only + suggestions. + + Applications that use the HPC format should initialize their tags using + the fi_tag_hpc() function. Ignore bits should be specified as + FI_HPC_IGNORE_TAG, FI_HPC_IGNORE_PAYLOAD, or their bitwise OR'ing. + +*FI_TAG_AI* + +: The FI_TAG_AI format further restricts the FI_TAG_HPC format. When used, + only a single tag field may be set, which must match exactly at the target. + The field may not be wild carded. The AI tag format targets collective + communication libraries and applications. The AI format consists of a single + field: a payload identifier. The identifier corresponds to the operation or + data being carried in the message payload. For example, this field may be + used to identify whether a message is for point-to-point communication or + part of a collective operation, and in the latter case, the type of + collective operation. + + The AI tag format does not require searching for matching receive + buffers, only directing the message to the correct virtual message queue + based on to the payload identifier. + + Applications that use the AI format pass in the payload identifier + directly as the tag and set ignore bits to 0. + +*FI_TAG_MAX_FORMAT* +: If the value of mem_tag_format is >= FI_TAG_MAX_FORMAT, the tag format + is treated as a set of bit fields. The behavior is functionally the same + as FI_TAG_BITS. The following description is for backwards compatibility + and describes how the provider may interpret the mem_tag_format field + if the value is >= FI_TAG_MAX_FORMAT. + + The memory tag format may be used to + divide the bit array into separate fields. The mem_tag_format + optionally begins with a series of bits set to 0, to signify bits + which are ignored by the provider. Following the initial prefix of + ignored bits, the array will consist of alternating groups of bits set + to all 1's or all 0's. Each group of bits corresponds to a tagged + field. The implication of defining a tagged field is that when a mask + is applied to the tagged bit array, all bits belonging to a single + field will either be set to 1 or 0, collectively. + + For example, a mem_tag_format of 0x30FF indicates support for 14 + tagged bits, separated into 3 fields. The first field consists of + 2-bits, the second field 4-bits, and the final field 8-bits. Valid + masks for such a tagged field would be a bitwise OR'ing of zero or + more of the following values: 0x3000, 0x0F00, and 0x00FF. The provider + may not validate the mask provided by the application for performance + reasons. + + By identifying fields within a tag, a provider may be able to optimize + their search routines. An application which requests tag fields must + provide tag masks that either set all mask bits corresponding to a + field to all 0 or all 1. When negotiating tag fields, an application + can request a specific number of fields of a given size. A provider + must return a tag format that supports the requested number of fields, + with each field being at least the size requested, or fail the + request. A provider may increase the size of the fields. When reporting + completions (see FI_CQ_FORMAT_TAGGED), it is not guaranteed that the + provider would clear out any unsupported tag bits in the tag field of + the completion entry. + + It is recommended that field sizes be ordered from smallest to + largest. A generic, unstructured tag and mask can be achieved by + requesting a bit array consisting of alternating 1's and 0's. ## tx_ctx_cnt - Transmit Context Count