Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-hipGraph MSCCL++ tests for allReduce and allGather #1503

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

isaki001
Copy link
Contributor

Details

Do not mention proprietary info or link to internal work items in this PR.

Work item: "Internal", or link to GitHub issue (if applicable).

What were the changes?
Added functional test for allGather and allReduce when utilizing MSCCL++ kernels in non-hipGraph mode, with/without managed memory.

Why were the changes made?
No test for non-hipGraph mode user-buffer registration.

How was the outcome achieved?
TestBed infrastructure was encountering a hang. As such, I added a simple routine that creates 8 process through fork(), and calls allReduce/allGather.

Additional Documentation:
What else should the reviewer know?

Approval Checklist

Do not approve until these items are satisfied.

  • Verify the CHANGELOG has been updated, if
    • there are any NCCL API version changes,
    • any changes impact library users, and/or
    • any changes impact any other ROCm library.

src/register.cc Outdated Show resolved Hide resolved
Copy link
Collaborator

@corey-derochie-amd corey-derochie-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give the PR a more descriptive title, as this will become the commit message.

src/misc/msccl/msccl_lifecycle.cc Outdated Show resolved Hide resolved
src/misc/msccl/msccl_lifecycle.cc Outdated Show resolved Hide resolved
@isaki001 isaki001 changed the title Test work non-hipGraph MSCCL++ tests for allReduce and allGather Jan 23, 2025
@corey-derochie-amd corey-derochie-amd dismissed their stale review January 24, 2025 18:56

Request completed

test/AllReduceTests.cpp Outdated Show resolved Hide resolved
@corey-derochie-amd corey-derochie-amd dismissed their stale review January 28, 2025 22:47

More changes needed.

corey-derochie-amd and others added 2 commits January 28, 2025 15:48
…tandaloneUtils is for the Standalone tests. Renamed the functions to be slightly more accurate and follow existing naming conventions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants