PR #21708: NUMA-pin host memory buffers for D2H/H2D transfers #22243
+177
−97
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR #21708: NUMA-pin host memory buffers for D2H/H2D transfers
Imported from GitHub PR #21708
This ensures that the pinned host buffers used for transfers between host and device are pinned to the NUMA node closest to the device. It had a previous life as #15216.
In a benchmark that triggers large, concurrent, copies from all devices to the host then achieved D2H throughput is around 33 GiB/s with NUMA pinning on a DGX H100 node (2xCPU, 8xH100). Without pinning, the achieved throughput is around 13.5 GiB/s from the same benchmark.
While it is already possible to achieve the correct NUMA pinning in process-per-GPU and process-per-NUMA-node configurations using
numactl
or similar, achieving correct pinning in process-per-node configuration requires logic inside XLA.Copybara import of the project:
--
1a2d98b by Olli Lupton [email protected]:
NUMA-pin host memory buffers for D2H/H2D transfers
--
60a4659 by Olli Lupton [email protected]:
256 byte alignment for host allocations when NUMA is not enabled
--
839da45 by Olli Lupton [email protected]:
Address review comments
--
b61ce94 by Olli Lupton [email protected]:
Drop TENSORFLOW_USE_NUMA
--
793fde0 by Olli Lupton [email protected]:
std::string_view -> absl::string_view
Merging this change closes #21708
FUTURE_COPYBARA_INTEGRATE_REVIEW=#21708 from olupton:numa 793fde0