Skip to content

b4406

Compare
Choose a tag to compare
@github-actions github-actions released this 02 Jan 14:41
0da5d86
server : allow using LoRA adapters per-request (#10994)

* slot.can_batch_with

* lora per request

* test: force disable cache prompt

* move can_batch_with check

* fix condition

* add slow test with llama 8b

* update docs

* move lora change task to queue

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

* lora_base

* remove redundant check

---------

Co-authored-by: Georgi Gerganov <[email protected]>