Allow specifying `adapter_id` on `chat/completions` requests #2939

tsvisab · 2025-01-22T11:40:06Z

Feature request

It seems that if i want to load a base model with an adapter and consume it, i'll have to use the generate route only which allows specifying adapter_id

`curl 127.0.0.1:3000/generate
-X POST
-H 'Content-Type: application/json'
-d '{
"inputs": "Was "The office" the funniest tv series ever?",
"parameters": {
"max_new_tokens": 200,
"adapter_id": "tv_knowledge_id"
}
}'

but can't use v1/chat/completions

are you planing to support this?

Motivation

Many use v1/chat/completions and train lora adapters for it

Your contribution

Maybe, if you're over your capacity

The text was updated successfully, but these errors were encountered:

alvarobartt · 2025-01-24T20:59:35Z

Hi here @tsvisab thanks for the question, indeed that's supported via the model parameter, if you provide the adapter_id as the model whenever you send the request, then it will use the loaded LoRA adapter instead of the base model. Anyway if that didn't work for you, happy to reproduce your test and see if we can fix it 🤗 (See an example cURL request)

curl http://localhost:8080/v1/chat/completions \
    -X POST \
    -d '{"messages":[{"role":"user","content":"What is Deep Learning?"}],"temperature":0.7,"top_p":0.95,"max_tokens":256,"model":"your-username/your-lora-adapter"}}' \
    -H 'Content-Type: application/json'

And to send requests to the base model instead just remove the model or set it to the actual model value e.g. meta-llama/Llama-3.1-8B-Instruct

curl http://localhost:8080/v1/chat/completions \
    -X POST \
    -d '{"messages":[{"role":"user","content":"What is Deep Learning?"}],"temperature":0.7,"top_p":0.95,"max_tokens":256,"model":"meta-llama/Llama-3.1-8B-Instruct"}}' \
    -H 'Content-Type: application/json'

tsvisab · 2025-01-28T15:01:35Z

Thanks! this definitely does something, when i use "model" : "something that does not exists" it acts as the base model but when i use the adapter key (i.e: my_adapter when tgi was launched with --lora-adapters "my_adapter=/path/to/local/folder") it generates/completes nonsense (e.g: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!)
while merging works fine..
p.s:
i launched the model with the adapter folder rather than the path to the adapter file, does that make sense?

alvarobartt · 2025-01-28T16:52:05Z

So on start up you should be seeing something like the following:

Retrieved from https://huggingface.co/docs/google-cloud/examples/gke-tgi-multi-lora-deployment in case that tutorial is useful to you too!

alvarobartt · 2025-01-28T16:52:35Z

P.S. If the adapters are public on the Hub, I'll be happy to reproduce and let you know!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow specifying `adapter_id` on `chat/completions` requests #2939

Allow specifying `adapter_id` on `chat/completions` requests #2939

tsvisab commented Jan 22, 2025

alvarobartt commented Jan 24, 2025

tsvisab commented Jan 28, 2025

alvarobartt commented Jan 28, 2025

alvarobartt commented Jan 28, 2025

Allow specifying adapter_id on chat/completions requests #2939

Allow specifying adapter_id on chat/completions requests #2939

Comments

tsvisab commented Jan 22, 2025

Feature request

Motivation

Your contribution

alvarobartt commented Jan 24, 2025

tsvisab commented Jan 28, 2025

alvarobartt commented Jan 28, 2025

alvarobartt commented Jan 28, 2025

Allow specifying `adapter_id` on `chat/completions` requests #2939

Allow specifying `adapter_id` on `chat/completions` requests #2939