-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoRA support #51
Comments
Hello, thanks for raising this issue. Looking through the It seems quite reasonable for us to add |
Hi, Thanks for your prompt response. Absolutely! However, ideally, it would be nice for the web server to accept such [optional] argument so that it can be passed as a parameter in the json payload so I can make requests like this:
Please note the |
Would creating and using a fused model be a suitable alternative? More documentation here https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md#fuse . I think that it's non-trivial to enhance the API to make adapter selection a good user experience. The LM Studio team would need to design a process to register an adapter or multiple adapters for each model to make the API request as simple as |
I have used the fused model approach and it works but it requires a large number of near identical models to be stored and managed whereas with mlx, only the adapter part will need to be loaded especially if the actual model is already loaded in the memory. Using fused models is not scalable either as imagine receiving multiple requests at the same time each pointing to a different fused model so in this case, it thinks it's a different model altogether and the entire LLM will need to be reloaded in the memory even if another instance of it - albeit with the LoRA difference - is already loaded whereas with MLX approach, the base model is loaded once in the memory and it's just matter of loading different LoRA adapters for each request which makes a huge difference. |
Looking at the I will raise this feature request to the team so we can place it on our roadmap. In the meantime, using fused models seem to be a workable (though suboptimal/inflexible) way to use adapters. I will keep this issue open to track this feature request since we can definitely improve UX on this front. |
Hi,
I use MLX to create LoRA adapters in .safetensors format.
MLX LM has a web server that can take the name/path of the LoRA adapter at inference time when using the Chat completion endpoint by specifying the adapter parameter in the JSON payload.
Would be great to see the same being supported on LM Studio
The text was updated successfully, but these errors were encountered: