Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial import of OpenSHMEM support #4571

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

ct-clmsn
Copy link

@ct-clmsn ct-clmsn commented Dec 21, 2023

This PR provides llama.cpp support for OpenSHMEM as per #4570 . The implementation provides cmake scripts to detect OpenSHMEM and PMI, makefile support, updates to the README.md, and the openshmem support implementation in C. The implementation is modeled after the existing MPI support.

@ct-clmsn ct-clmsn mentioned this pull request Dec 21, 2023
@AutonomicPerfectionist
Copy link
Contributor

Be warned that the current MPI implementation is broken, see #3334 for a mostly working implementation. There is still an issue on that branch related to the KV cache operations. There are further, smaller bugs here and there that I've fixed in a different branch of mine, so if you have any problems with your implementation you should be able to look at my branches for possible solutions

@ct-clmsn
Copy link
Author

@AutonomicPerfectionist cool, will review #3334 and make modifications as necessary. thank you for the heads up.

@ct-clmsn
Copy link
Author

@AutonomicPerfectionist is your mpi example program a sound representative sample of the main program driver? (int main)

@AutonomicPerfectionist
Copy link
Contributor

That was mainly a playground example I was using to verify my fixes worked, I plan on removing it as soon as I have time. The regular main example on that branch should work with MPI

@ct-clmsn
Copy link
Author

@AutonomicPerfectionist other question, I read there's a bug in the MPI code related to the 0 rank and the other ranks. Something about an end of stream token, could you expand on that issue?

@AutonomicPerfectionist
Copy link
Contributor

The original design didn't have a way to cleanly terminate the worker nodes when the head node exited. It's not really a bug to do with tokens, just that there was no way to signal that it's time to exit. The code on my PR doesn't fix that as it's out of scope for it, but I did fix it on a different branch of mine by using tags and MPI_Probe to determine the type of message being received

@ct-clmsn
Copy link
Author

@AutonomicPerfectionist interesting, so the compute ranks (ranks != 0) are in a loop, and for the compute ranks to break from the loop, they need a "break message". Then they break the loop and call the MPI finalization function wrapper. Which of your branches has a fix? I'd like to look it over, much appreciated!

@AutonomicPerfectionist
Copy link
Contributor

Yep, that's exactly right. The branch that has a fix is mpi-speculative, but be warned that this branch is extremely volatile and is not designed to be merged upstream. It's my research branch for my master's degree, and many of the design decisions were made in light of significant time constraints. There's also a significant number of changes on that branch, most of which are undocumented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants