-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Choosing Model via the post request, when making API Call. #7
Comments
If I understand you correctly, you would want the model loaded as specified by the client side? So, something like: response = client.chat.completions.create(model="OpenGVLab/InternVL2-Llama3-76B", messages=messages, **params) This is a bit complex because you can't specify any options like --load-in-4bit, flash-attn, etc. It would probably need a model specific default config which would be loaded also with the request. I'm working on a system for this with the openedai-image server, but am not really happy yet with how complex it is. |
Yes Correct. So to make it simple, we could set some default value such as --load-in-4bit, flash-attn ..etc for all models to start with. Based on the request it receives, it download the model and get it ready to be served ( Which means, first API call will take some time to get the response back ) |
Just FYI, I think the openai client times out after about 30 or 60 seconds, so it's likely this will not work well unless the model is very small. What about a web UI instead? I just don't think the API is well suited for model management but I do admit it's a nice feature. |
Sure. I get you. That's fine. Just another thought. If it's not well suited and complicated then we don't need to do this. These are just ideas. Yes, Web UI is fine instead. Thanks :) |
here is how it is done by llama cpp https://github.com/Jaimboh/Llama.cpp-Local-OpenAI-server/blob/main/README.md Multiple Model Load with Config cat config.json
you can preload an array of models as specified in config.json and it is smart enough to swap in to the right model as specified in the client request |
here is a blog with a simple streamlit gui interface too https://medium.com/@odhitom09/running-openais-server-locally-with-llama-cpp-5f29e0d955b7 |
i would love some kinda gui when interacting with these multimodal models especially initially before i know how to automate it |
For manually interacting with the models I can highly recommend either open-webui (via docker, which also works with openedai-speech, whisper, images, etc - I use this) or web.chatbox.app (can be used fully in browser, without any installation), you can configure an openai api provider (with the API BASE URL) for the For testing, I prefer the 'raw' text output from the included console app |
@matatonic Thanks again for providing the newest models so swiftly. |
It's doable, and I will probably do this along with model switching/selecting via API in an upcoming release. It's a more significant change, and I'll need to update my testing also so it might take a bit longer. PS. I'm currently out of country and have limited access to the internet. |
Currently Providing model is a required argument.
AIM:Adding the ability to Choose the model when Calling the API. This would be a great option.
It gives the additional flexibility.
The text was updated successfully, but these errors were encountered: