-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you migrate the new server from llama.cpp to here? #51
Comments
I'm just working on 3 larger extensions. By now falcon is quite a bit more advanced in some features, finetune handling and syntax, stopwords, some bugfixes. So that likely is all missing from the server of llama.cpp. I'm working on large context tests (such as processing 64k and more while keeping the model sane), a fully evaluated system prompt and replacing context swapping with continued generation. I think the server is useful but maybe not needed. It's probably not a huge thing to port the server, but it's a bit bigger to also integrate the new features properly |
I can see that, and it is not the only thing that makes ggllm.cpp the more interesting single project for me at the moment. In the end, ggllm.cpp may be sooner finished with a large conext size build for falcon than llama.cpp for llamas. As for the server: If this server could be addressed in the same way as the one of Llama.cpp, then programs would be found quickly, which use these servers and which could then use falcon immediately without any change. #52 will dissolve by itself. I started to look at the code the last two days, and the transfer of the server seemed to me a good occasion, because it could work, even if I don't understand large parts of the code.
If I understand it correctly, this PR ensures that when the server is accessed by different clients, each client gets its own context. This commit is important in itself to be able to offer a server function in a meaningful way - and for its implementation, changes have been made in all important libraries and sample programs. Whether this would be a suitable project to become more familiar with the codebase as a whole (though the whole machine-learning and neural-network-layer stuff still remains largely foreign to me), or whether a commit will actually completely overwhelm me, I don't know yet - but if I do: |
It would probably be best to modularize it slightly. So main() is always a server-type loop. Regarding the name changes: Whenever I touch something that needs a modification from original behavior I, usually, rename it to falcon. regarding PRs: If you take a llama server example, look at the differences between that loop and the falcon_main loop and implement/adapt as much as you can and provide a working version that's appreciated. It's basically a separate example to work on. Regarding the "stateless" part: we are quite there already. model loading and context creation is split. |
In the past few days, the server-example from llama.cpp has become a really useful piece of software - so much so that for many things it could replace the main program as the primary interaction tool with a model.
How difficult will it be to make this server available for falcon as well?
I have no idea how much falcon-specific code is actually in falcon-main - shouldn't most of the specific stuff be in the libraries, especially falcon_common and libfalcon?
How much is left to do once you've changed all the external calls in server.cpp to the corresponding calls from falcon_common and libfalcon?
The text was updated successfully, but these errors were encountered: