uring enablement #8

ooststep · 2024-08-09T19:10:29Z

No description provided.

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.4.0 to 4.4.1. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@5076954...604373d) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.10 to 3.26.11. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@e2b3eaf...6db8d63) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>

When zcpy rx is on, both inject_msg_size and inject_rma_size should be reported as inline buf size. Signed-off-by: Shi Jin <[email protected]>

A provider may update the memory region that is added to accommodate for instance alignment of the region to a larger page boundary. In such cases, the MR cache info used to search the cache should use the updated region. This allows the provider to avoid walking /proc/pid/smaps if the underlying kernel component may more efficiently determine the backing page size. Signed-off-by: Steve Welch <[email protected]> Signed-off-by: Ian Ziemba <[email protected]>

Application should use FI_MR_DMABUF API to pass the dmabuf fd and offset to make Libfabric register the mr via dmabuf. The only exception is for synapseai, beacuse dmabuf is the only way to register Gaudi device buffer and it was implemented before the FI_MR_DMABUF API. Keep this behavior unchanged for compatibility. Signed-off-by: Shi Jin <[email protected]>

psm3 is failing onecclgpu because of a missing package. Disable it until the package dependency is resolved. Signed-off-by: Zach Dworkin <[email protected]>

Remove deprecated FI_MR_BASIC flag Signed-off-by: Tadeusz Struk <[email protected]>

This patch adds the missing inband sync in ft_fabric_init_cm to handle the case where rx buffers are not pre-posted by the application. The default behaviour in fabtests is to pre-post a rx buffer. This change enables fabtests using ft_fabric_init_cm to consume the posted receive with an inband sync by setting the test option FT_OPT_NO_PRE_POSTED_RX. Similar changes have been made to ft_init_fabric ofiwg#10394 Signed-off-by: Nikhil Nanal <[email protected]>

efa_mr_hmem_setup previously always called ofi_hmem_dev_register on all FI_HMEM_CUDA calls, regardless of the presence of FI_MR_DMABUF in flags. When gdrcopy is enabled, this means deconstructing the fi_mr_dmabuf into a struct iovec from its {base, offset, len} 3-tuple, then passing the resulting iovec to gdr_pin followed by gdr_map. a dmabuf cannot be exported by the nvidia module without an implicit promise that the address space is already reserved and mapped in the current pid, of appropriate size and alignment, and that all pages/ranges backing it can be made available to an importer. All requirements are enforced by the cuda APIs used to acquire one. At best, calls to libgdrcopy here are unnecessary for dmabufs, and at worst the pgprots set by gdrdrv are different enough from the ones setup by cuda proper to cause issues, or the redundant mappings become costly for the driver to maintain. Prior to this patch, apps can only prevent these gdr_map calls on dmabuf arguments by disabling gdrcopy entirely through environment variables before launch. But apps may wish to use fi_mr_regattr with dmabuf arguments in the default case, while still reserving the right to call fi_mr_regattr with iov arguments on the same domain, where the gdr flow may still be desired in the latter case. This makes that possible. Signed-off-by: Nicholas Sielicki <[email protected]>

fi_multinode command line arguments changed. Update script to accommodate the change. Signed-off-by: Amir Shehata <[email protected]>

set FT_OPT_ADDR_IS_OOB by default. It enables out of band address exchange which is needed by CXI. Signed-off-by: Amir Shehata <[email protected]>

Signed-off-by: Peinan Zhang <[email protected]>

Add clarification in the man page indicating that the owner is responsible for creating unique fi_peer_*_contexts for each peer and that the peers are only allowed to set the peer ops of that context. Signed-off-by: Alexia Ingerson <[email protected]>

The peer API has been updated to specify that the owner must allocate the peer's fid_peer_srx. The shm implementation was allocating its own internal fid_peer_srx. This updates the shm implementation to assume it has a unique fid_peer_srx and updates the imported fid_peer_srx peer_ops, saving a pointer to the fid_peer_srx instead of the internal fid_ep which required a wrapper function to get back to the fid_peer_srx It also returns an internal fid_ep for the created srx which is used to close the srx by the owner. Even though shm doesn't need anything attached to the internal fid_ep, it is there for consistency and to track the domain reference counting for errors. This patch also moves the srx specific functions into smr_domain where they belong Signed-off-by: Alexia Ingerson <[email protected]>

The previous definition of the peer API didn't specify who allocated the second peer structure (the one referenced by the peer). The shm implementation was choosing to duplicate the imported srx and set it internally. The new definition specifies that the owner handle the duplication of the peer resource which is then imported into the peer to just set. Shm has been updated accordingly but efa needs to be updated to create a second peer_srx and set the fields to the original one for the peer to reference the owner_ops correctly. This also adds a missing fi_close for the shm srx resource Signed-off-by: Alexia Ingerson <[email protected]> Signed-off-by: Shi Jin <[email protected]>

Signed-off-by: OFIWG Bot <[email protected]>

Signed-off-by: Zach Dworkin <[email protected]>

In order to receive unmap events, uffd uses 'mode missing' when registering memory regions. This implies getting page fault events as well. So handle them by returning a zero-filled page. Page faults come in 3 flavors: reads, writes and writes to protected pages. The only ones we can handle are writes to non-backed pages. Signed-off-by: Mike Uttormark <[email protected]> Signed-off-by: Ian Ziemba <[email protected]>

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.0 to 4.2.1. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@d632683...eef6144) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.11 to 3.26.13. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@6db8d63...f779452) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.4.1 to 4.4.3. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@604373d...b4b15b8) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: Itai Masuari <[email protected]>

Fix incorrect atomic LOR on complex numbers. The values were incorrectly getting ANDed together instead of ORed. This went unnoticed because the code was very difficult to read. This also refactors the logical checks with a helper function to make it more readible and less prone to errors. Signed-off-by: Alexia Ingerson <[email protected]>

This allows fabtests to make use of atomic validation code There were many Windows atomics bugs, inconsistencies, and missing definitions. This patch also cleans up the entire ofi_atomic.c implementation for unix and windows The following changes are included: - Separate fill and check based on real or complex types as setting and reading complexes on windows is not allowed (not native datatype, abstracted). Complex versions use eq and set functions specific for complexes defined in osd.h - Remove duplicated ofi_complex definitions in ofi_atomic (already in osd.h file) - Add general check_atomic and fill_atomic calls and use them in ubertest - Add EXPAND ( x ) x define to work nicely with windows VA_ARGS handling - Fix inconsistency with ofi_complex_type/or naming ('complex' always should come first) - Fix inconsistency with op names "equ" and "mul" -> "eq" and "prod" - Add missing lxor complex op definitions on Windows Signed-off-by: Alexia Ingerson <[email protected]>

To properly validate atomic data, we need host bounce buffers for the result and compare buffers in addition to the regular bounce buffer for the tx/rx bufs. This adds two extra bufs allocated only for atomic purposes and adds hmem support to the common atomic validation path. It also renames the alloc/free_tx_buf calls to generic alloc/free_host_bufs which allocates all three buffers at once. Signed-off-by: Alexia Ingerson <[email protected]>

Signed-off-by: Jianxin Xiong <[email protected]>

Signed-off-by: Jessie Yang <[email protected]>

pingpong doesn't support FI_MR_ENDPOINT today, so the mr is associated with domain instead of ep. It is unsafe to close mr before closing ep because it can cause an EBUSY error when there are outstanding recvs of the mr posted to the ep/qp. This patch fixes this issue by moving the mr close after the ep close. Signed-off-by: Shi Jin <[email protected]>

Signed-off-by: Zach Dworkin <[email protected]>

Uplevel pre-build directory so that it is not scp'd Signed-off-by: Zach Dworkin <[email protected]>

Signed-off-by: Zach Dworkin <[email protected]>

Put slow stages first so they start executing and other tests can complete in parallel while the slow one is running. Signed-off-by: Zach Dworkin <[email protected]>

…ions fixed fabtests send and recv functions to use flags argument type as uint64_t instead of int as the underlying fi calls use uint64_t. removed declaration of unused function ft_writemsg from shared.h Also fixed functions calling ft_sendmsg and ft_recvmsg touse uint64_t for flags Signed-off-by: Nikhil Nanal <[email protected]>

Lookup a all teams and users in the ofiwg github team. If the submitter is not in the list of users then deny them Signed-off-by: Zach Dworkin <[email protected]>

lpp includes stdatomic.h but does not include a check for it in the configure so can cause a build to fail on a system without it Signed-off-by: Alexia Ingerson <[email protected]>

Could result in a peer getting incorrectly unmmaped Signed-off-by: Alexia Ingerson <[email protected]>

This commit fixes the following bugs in neuron fabtests 1. The neuron accelerator detection is broken on some OSs because the full path of the executable `neuron-ls` was not used 2. Before this commit, each pytest worker was assigned a single neuron core. This works on multi node tests but fails on single node tests because a neuron core can only be opened by a single process. This commit assigns two different neuron cores to each pytest worker for client-server tests: one for the server and one for the client. Trn1 has 2 cores per neuron device and Trn2 has 8 cores per neuron device, so this assignment works for both. 3. When running in serial mode, the env var PYTEST_XDIST_WORKER is not set, so the NEURON_RT_VISIBLE_CORES env var is also not set. This causes the server to occupy all neuron cores and the client fails. So this commit assigns device 0 to the server and client when running with one worker. Signed-off-by: Sai Sunku <[email protected]>

Before this change, the EFA AV entry contained a reference to efa_rdm_peer which is specific to a given endpoint. This member also prevented binding a single AV to multiple endpoints. This change removes efa_rdm_peer from AV entry by adding a hashmap to the endpoint that maps fi_addr to efa_rdm_peer. And it also enables multiple EFA endpoints to bind to the same AV. Co-authored-by: Shi Jin <[email protected]> Signed-off-by: Sai Sunku <[email protected]>

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.27.6 to 3.27.9. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@aa57810...df409f7) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: Seth Zegelstein <[email protected]>

Signed-off-by: OFIWG Bot <[email protected]>

Remove all CXI_MAP_IOVA_ALLOC references from libfabric. Signed-off-by: Soumendu Satapathy <[email protected]>

Currently, when local support unsolicited write recv while the peer doesn't support it, the peer will crash because it expects to get a valid wr_id for IBV_WC_RECV_RDMA_WITH_IMM op code. This peer crash can cause weird error message on sender side's cq when it is still sending data to it. When local doesn't support unsolicited write recv while the peer support it, local will get cq error for the rdma op as "Unexpected status" as well. This patch makes the initiator of rdma write imm detect the unsolicited write recv support status on both sides. If there is inconsistency, the initiator will return error with clear error messages that instruct the mitigation. Signed-off-by: Shi Jin <[email protected]>

efa_fork_support_enable_if_requested was moved to EFA_INI, so efa_fork_support_install_fork_handler can be registered at any stage that is later. Move efa_fork_support_install_fork_handler back to efa_domain_open to avoid installing fork handler for non-EFA provider during fi_getinfo's provider discovery process. Signed-off-by: Jessie Yang <[email protected]>

This commit disables most Intel CI and should not be merged.

we may receive uring events before we're fully connected so don't try to progress rx until that connection is established

the previously used io_uring_prep_readv function does not support flags, instead flags were being passed as an offset, triggering an illegal seek error

multishot is not supported on older kernels (prior to 5.19) and is unreliable in early 6.x kernels. For now, use single-shot and re-submit

ooststep force-pushed the uring branch 3 times, most recently from 9c8b7eb to 9ebf395 Compare August 14, 2024 18:34

ooststep force-pushed the uring branch from 9ebf395 to 2a04e17 Compare August 20, 2024 19:38

dependabot bot and others added 26 commits October 7, 2024 11:20

prov/efa: report correct inject_msg_size for zcpy rx

765cc16

When zcpy rx is on, both inject_msg_size and inject_rma_size should be reported as inline buf size. Signed-off-by: Shi Jin <[email protected]>

contrib/intel/jenkins: Temporarily disable psm3 in onecclgpu

e31af23

psm3 is failing onecclgpu because of a missing package. Disable it until the package dependency is resolved. Signed-off-by: Zach Dworkin <[email protected]>

fabtests/lpp: remove deprecated FI_MR_BASIC

c1820bf

Remove deprecated FI_MR_BASIC flag Signed-off-by: Tadeusz Struk <[email protected]>

fabtests: Update runmultinode.py with args

030d734

fi_multinode command line arguments changed. Update script to accommodate the change. Signed-off-by: Amir Shehata <[email protected]>

fabtests: fi_multinode update

cd63ccf

set FT_OPT_ADDR_IS_OOB by default. It enables out of band address exchange which is needed by CXI. Signed-off-by: Amir Shehata <[email protected]>

prov/hook/trace: Add trace log for domain_attr.

ed239a5

Signed-off-by: Peinan Zhang <[email protected]>

Updated nroff-generated man pages

64426b9

Signed-off-by: OFIWG Bot <[email protected]>

contrib/intel/jenkins: Split mpichtestsuite into multiple stages

d38a92e

Signed-off-by: Zach Dworkin <[email protected]>

contrib/intel/jenkins: Re-enable PSM3 to run in OneCCL-GPU

8e01a3f

Signed-off-by: Zach Dworkin <[email protected]>

use new synapse api

50a90d5

Signed-off-by: Itai Masuari <[email protected]>

j-xiong and others added 8 commits December 13, 2024 10:35

configure: Bump the version to 2.1.0a1

2d4ac0e

Signed-off-by: Jianxin Xiong <[email protected]>

prov/efa: Add unit tests for efa_rma

ebca5ec

Signed-off-by: Jessie Yang <[email protected]>

contrib/intel/jenkins: Update slurm partitions for new head node

2c200a4

Signed-off-by: Zach Dworkin <[email protected]>

contrib/intel/jenkins: Uplevel pre-build directory

2f94ead

Uplevel pre-build directory so that it is not scp'd Signed-off-by: Zach Dworkin <[email protected]>

contrib/intel/jenkins: Force Cleanup in Post

e4a7c57

Signed-off-by: Zach Dworkin <[email protected]>

contrib/intel/jenkins: Cleanup trailing whitespace

8c33b6f

Signed-off-by: Zach Dworkin <[email protected]>

contrib/intel/jenkins: Re-order stages to put slow ones first

e5fe96e

Put slow stages first so they start executing and other tests can complete in parallel while the slow one is running. Signed-off-by: Zach Dworkin <[email protected]>

ooststep force-pushed the uring branch 4 times, most recently from fedf1c5 to 7617ac3 Compare December 17, 2024 22:25

nikhilnanal and others added 17 commits December 18, 2024 13:07

contrib/intel/jenkins: Do not run pipeline for unauthorized users

482e474

Lookup a all teams and users in the ofiwg github team. If the submitter is not in the list of users then deny them Signed-off-by: Zach Dworkin <[email protected]>

prov/lpp: add check for atomics

fde8569

lpp includes stdatomic.h but does not include a check for it in the configure so can cause a build to fail on a system without it Signed-off-by: Alexia Ingerson <[email protected]>

prov/shm: fix name compare bug

442fa89

Could result in a peer getting incorrectly unmmaped Signed-off-by: Alexia Ingerson <[email protected]>

man/fi_setup: Complete partial sentence

7ae9698

Signed-off-by: Seth Zegelstein <[email protected]>

Updated nroff-generated man pages

90f3ba9

Signed-off-by: OFIWG Bot <[email protected]>

prov/cxi: Remove CXI_MAP_IOVA_ALLOC flag.

6e4daf1

Remove all CXI_MAP_IOVA_ALLOC references from libfabric. Signed-off-by: Soumendu Satapathy <[email protected]>

-- DO NOT MERGE --

b430479

This commit disables most Intel CI and should not be merged.

prov/tcp: enable all tests for io uring

46b50a4

prov/tcp: only progress rx when connected

0a3a39d

we may receive uring events before we're fully connected so don't try to progress rx until that connection is established

prov/tcp: use readv2 when passing flags to io uring

751056a

the previously used io_uring_prep_readv function does not support flags, instead flags were being passed as an offset, triggering an illegal seek error

prov/tcp: don't use uring multishot

9905731

multishot is not supported on older kernels (prior to 5.19) and is unreliable in early 6.x kernels. For now, use single-shot and re-submit

ooststep force-pushed the uring branch from 7617ac3 to 9905731 Compare January 6, 2025 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uring enablement #8

uring enablement #8

ooststep commented Aug 9, 2024

uring enablement #8

Are you sure you want to change the base?

uring enablement #8

Conversation

ooststep commented Aug 9, 2024