Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uring enablement #8

Open
wants to merge 337 commits into
base: main
Choose a base branch
from
Open

uring enablement #8

wants to merge 337 commits into from

Conversation

ooststep
Copy link
Owner

@ooststep ooststep commented Aug 9, 2024

No description provided.

dependabot bot and others added 26 commits October 7, 2024 11:20
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.4.0 to 4.4.1.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@5076954...604373d)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.10 to 3.26.11.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@e2b3eaf...6db8d63)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
When zcpy rx is on, both inject_msg_size and inject_rma_size
should be reported as inline buf size.

Signed-off-by: Shi Jin <[email protected]>
A provider may update the memory region that is added to accommodate
for instance alignment of the region to a larger page boundary. In such
cases, the MR cache info used to search the cache should use the updated
region.

This allows the provider to avoid walking /proc/pid/smaps if the
underlying kernel component may more efficiently determine the backing
page size.

Signed-off-by: Steve Welch <[email protected]>
Signed-off-by: Ian Ziemba <[email protected]>
Application should use FI_MR_DMABUF API to pass the dmabuf
fd and offset to make Libfabric register the mr via
dmabuf. The only exception is for synapseai, beacuse
dmabuf is the only way to register Gaudi device buffer
and it was implemented before the FI_MR_DMABUF API.
Keep this behavior unchanged for compatibility.

Signed-off-by: Shi Jin <[email protected]>
psm3 is failing onecclgpu because of a missing package.
Disable it until the package dependency is resolved.

Signed-off-by: Zach Dworkin <[email protected]>
Remove deprecated FI_MR_BASIC flag

Signed-off-by: Tadeusz Struk <[email protected]>
This patch adds the missing inband sync in ft_fabric_init_cm
to handle the case where  rx buffers are not pre-posted by
the application.
The default behaviour in fabtests is to pre-post a rx buffer.
This change enables fabtests using ft_fabric_init_cm to consume
the posted receive with an inband sync by setting the test option
FT_OPT_NO_PRE_POSTED_RX.

Similar changes have been made to ft_init_fabric
ofiwg#10394

Signed-off-by: Nikhil Nanal <[email protected]>
efa_mr_hmem_setup previously always called ofi_hmem_dev_register on all
FI_HMEM_CUDA calls, regardless of the presence of FI_MR_DMABUF in flags.
When gdrcopy is enabled, this means deconstructing the fi_mr_dmabuf into
a struct iovec from its {base, offset, len} 3-tuple, then passing the
resulting iovec to gdr_pin followed by gdr_map.

a dmabuf cannot be exported by the nvidia module without an implicit
promise that the address space is already reserved and mapped in the
current pid, of appropriate size and alignment, and that all
pages/ranges backing it can be made available to an importer. All
requirements are enforced by the cuda APIs used to acquire one.

At best, calls to libgdrcopy here are unnecessary for dmabufs, and at
worst the pgprots set by gdrdrv are different enough from the ones setup
by cuda proper to cause issues, or the redundant mappings become costly
for the driver to maintain.

Prior to this patch, apps can only prevent these gdr_map calls on dmabuf
arguments by disabling gdrcopy entirely through environment variables
before launch. But apps may wish to use fi_mr_regattr with dmabuf
arguments in the default case, while still reserving the right to call
fi_mr_regattr with iov arguments on the same domain, where the gdr flow
may still be desired in the latter case. This makes that possible.

Signed-off-by: Nicholas Sielicki <[email protected]>
fi_multinode command line arguments changed. Update script to
accommodate the change.

Signed-off-by: Amir Shehata <[email protected]>
set FT_OPT_ADDR_IS_OOB by default. It enables out of band
address exchange which is needed by CXI.

Signed-off-by: Amir Shehata <[email protected]>
Add clarification in the man page indicating that the owner is
responsible for creating unique fi_peer_*_contexts for each peer
and that the peers are only allowed to set the peer ops of that
context.

Signed-off-by: Alexia Ingerson <[email protected]>
The peer API has been updated to specify that the owner must allocate
the peer's fid_peer_srx. The shm implementation was allocating its
own internal fid_peer_srx.
This updates the shm implementation to assume it has a unique
fid_peer_srx and updates the imported fid_peer_srx peer_ops, saving
a pointer to the fid_peer_srx instead of the internal fid_ep which
required a wrapper function to get back to the fid_peer_srx

It also returns an internal fid_ep for the created srx which is used
to close the srx by the owner. Even though shm doesn't need anything
attached to the internal fid_ep, it is there for consistency and to
track the domain reference counting for errors.

This patch also moves the srx specific functions into smr_domain
where they belong

Signed-off-by: Alexia Ingerson <[email protected]>
The previous definition of the peer API didn't specify who allocated the
second peer structure (the one referenced by the peer). The shm implementation
was choosing to duplicate the imported srx and set it internally. The new
definition specifies that the owner handle the duplication of the peer resource
which is then imported into the peer to just set. Shm has been updated accordingly
but efa needs to be updated to create a second peer_srx and set the fields to the
original one for the peer to reference the owner_ops correctly.

This also adds a missing fi_close for the shm srx resource

Signed-off-by: Alexia Ingerson <[email protected]>
Signed-off-by: Shi Jin <[email protected]>
In order to receive unmap events, uffd uses 'mode missing'
when registering memory regions. This implies getting page
fault events as well. So handle them by returning a zero-filled page.

Page faults come in 3 flavors: reads, writes and writes to protected
pages.  The only ones we can handle are writes to non-backed pages.

Signed-off-by: Mike Uttormark <[email protected]>
Signed-off-by: Ian Ziemba <[email protected]>
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.0 to 4.2.1.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@d632683...eef6144)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.11 to 3.26.13.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@6db8d63...f779452)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.4.1 to 4.4.3.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@604373d...b4b15b8)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Itai Masuari <[email protected]>
Fix incorrect atomic LOR on complex numbers. The values were incorrectly
getting ANDed together instead of ORed. This went unnoticed because the
code was very difficult to read. This also refactors the logical checks
with a helper function to make it more readible and less prone to errors.

Signed-off-by: Alexia Ingerson <[email protected]>
This allows fabtests to make use of atomic validation code

There were many Windows atomics bugs, inconsistencies, and missing
definitions. This patch also cleans up the entire ofi_atomic.c
implementation for unix and windows

The following changes are included:
- Separate fill and check based on real or complex types as setting
  and reading complexes on windows is not allowed (not native datatype, abstracted).
  Complex versions use eq and set functions specific for complexes defined in osd.h
- Remove duplicated ofi_complex definitions in ofi_atomic (already in osd.h file)
- Add general check_atomic and fill_atomic calls and use them in ubertest
- Add EXPAND ( x ) x define to work nicely with windows VA_ARGS handling
- Fix inconsistency with ofi_complex_type/or naming ('complex' always should come first)
- Fix inconsistency with op names "equ" and "mul" -> "eq" and "prod"
- Add missing lxor complex op definitions on Windows

Signed-off-by: Alexia Ingerson <[email protected]>
To properly validate atomic data, we need host bounce buffers for the
result and compare buffers in addition to the regular bounce buffer for
the tx/rx bufs.
This adds two extra bufs allocated only for atomic purposes and adds hmem
support to the common atomic validation path.
It also renames the alloc/free_tx_buf calls to generic alloc/free_host_bufs
which allocates all three buffers at once.

Signed-off-by: Alexia Ingerson <[email protected]>
j-xiong and others added 8 commits December 13, 2024 10:35
pingpong doesn't support FI_MR_ENDPOINT today,
so the mr is associated with domain instead of ep.
It is unsafe to close mr before closing ep because
it can cause an EBUSY error when there are outstanding
recvs of the mr posted to the ep/qp. This patch fixes
this issue by moving the mr close after the ep close.

Signed-off-by: Shi Jin <[email protected]>
Uplevel pre-build directory so that it is not scp'd

Signed-off-by: Zach Dworkin <[email protected]>
Put slow stages first so they start executing and other tests
can complete in parallel while the slow one is running.

Signed-off-by: Zach Dworkin <[email protected]>
@ooststep ooststep force-pushed the uring branch 4 times, most recently from fedf1c5 to 7617ac3 Compare December 17, 2024 22:25
nikhilnanal and others added 17 commits December 18, 2024 13:07
…ions

fixed fabtests send and recv functions to use flags argument type as
uint64_t instead of int as the underlying fi calls use uint64_t.
removed declaration of unused function ft_writemsg from shared.h

Also fixed functions calling ft_sendmsg and ft_recvmsg touse uint64_t for flags

Signed-off-by: Nikhil Nanal <[email protected]>
Lookup a all teams and users in the ofiwg github team.
If the submitter is not in the list of users then deny them

Signed-off-by: Zach Dworkin <[email protected]>
lpp includes stdatomic.h but does not include a check for it
in the configure so can cause a build to fail on a system without it

Signed-off-by: Alexia Ingerson <[email protected]>
Could result in a peer getting incorrectly unmmaped

Signed-off-by: Alexia Ingerson <[email protected]>
This commit fixes the following bugs in neuron fabtests
1. The neuron accelerator detection is broken on some OSs because the
   full path of the executable `neuron-ls` was not used

2. Before this commit, each pytest worker was assigned a single  neuron
   core. This works on multi node tests but fails on single node tests
because a neuron core can only be opened by a single process. This
commit assigns two different neuron cores to each pytest worker for
client-server tests: one for the server and one for the client. Trn1 has
2 cores per neuron device and Trn2 has 8 cores per neuron device, so
  this assignment works for both.

3. When running in serial mode, the env var PYTEST_XDIST_WORKER is not
   set, so the NEURON_RT_VISIBLE_CORES env var is also not set. This
causes the server to occupy all neuron cores and the client fails. So
this commit assigns device 0 to the server and client when running with
one worker.

Signed-off-by: Sai Sunku <[email protected]>
Before this change, the EFA AV entry contained a reference to
efa_rdm_peer which is specific to a given endpoint. This member also
prevented binding a single AV to multiple endpoints.

This change removes efa_rdm_peer from AV entry  by adding a hashmap
to the endpoint that maps fi_addr to efa_rdm_peer. And it also
enables multiple EFA endpoints to bind to the same AV.

Co-authored-by: Shi Jin <[email protected]>
Signed-off-by: Sai Sunku <[email protected]>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.27.6 to 3.27.9.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@aa57810...df409f7)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Remove all CXI_MAP_IOVA_ALLOC references from libfabric.

Signed-off-by: Soumendu Satapathy <[email protected]>
Currently, when local support unsolicited write recv while the peer
doesn't support it, the peer will crash because it expects to get
a valid wr_id for IBV_WC_RECV_RDMA_WITH_IMM op code. This peer crash
can cause weird error message on sender side's cq when it is still
sending data to it. When local doesn't support unsolicited write recv
while the peer support it, local will get cq error for the rdma op as
"Unexpected status" as well.

This patch makes the initiator of rdma write imm
detect the unsolicited write recv support status on both sides. If
there is inconsistency, the initiator will return error with clear
error messages that instruct the mitigation.

Signed-off-by: Shi Jin <[email protected]>
efa_fork_support_enable_if_requested was moved to EFA_INI, so
efa_fork_support_install_fork_handler can be registered at any
stage that is later. Move efa_fork_support_install_fork_handler
back to efa_domain_open to avoid installing fork handler for non-EFA
provider during fi_getinfo's provider discovery process.

Signed-off-by: Jessie Yang <[email protected]>
This commit disables most Intel CI and should not be merged.
we may receive uring events before we're fully connected so
don't try to progress rx until that connection is established
the previously used io_uring_prep_readv function does not
support flags, instead flags were being passed as an offset,
triggering an illegal seek error
multishot is not supported on older kernels (prior to 5.19) and is
unreliable in early 6.x kernels.

For now, use single-shot and re-submit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.