-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial import of OpenSHMEM support #4571
base: master
Are you sure you want to change the base?
Conversation
Be warned that the current MPI implementation is broken, see #3334 for a mostly working implementation. There is still an issue on that branch related to the KV cache operations. There are further, smaller bugs here and there that I've fixed in a different branch of mine, so if you have any problems with your implementation you should be able to look at my branches for possible solutions |
@AutonomicPerfectionist cool, will review #3334 and make modifications as necessary. thank you for the heads up. |
@AutonomicPerfectionist is your mpi example program a sound representative sample of the main program driver? ( |
That was mainly a playground example I was using to verify my fixes worked, I plan on removing it as soon as I have time. The regular main example on that branch should work with MPI |
@AutonomicPerfectionist other question, I read there's a bug in the MPI code related to the 0 rank and the other ranks. Something about an end of stream token, could you expand on that issue? |
The original design didn't have a way to cleanly terminate the worker nodes when the head node exited. It's not really a bug to do with tokens, just that there was no way to signal that it's time to exit. The code on my PR doesn't fix that as it's out of scope for it, but I did fix it on a different branch of mine by using tags and MPI_Probe to determine the type of message being received |
@AutonomicPerfectionist interesting, so the compute ranks (ranks != 0) are in a loop, and for the compute ranks to break from the loop, they need a "break message". Then they break the loop and call the MPI finalization function wrapper. Which of your branches has a fix? I'd like to look it over, much appreciated! |
Yep, that's exactly right. The branch that has a fix is |
This PR provides llama.cpp support for OpenSHMEM as per #4570 . The implementation provides cmake scripts to detect OpenSHMEM and PMI, makefile support, updates to the README.md, and the openshmem support implementation in C. The implementation is modeled after the existing MPI support.