You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is worth mentioning https://github.com/predibase/lorax on this list. It was built based on tgi v0.9.4 (when they still had the apache license) to enable dynamic lora adaptor loading on the inference time.
Related - maybe it's also worth mentioning dynamic lora adaptor loading as a feature itself. Two ideas I've seen so far are LoRaX (loading lora adaptor from disk at request time) and S-LoRA (pre-load adaptors in the GPU memory and route the compute at request time).
I can start a PR to add LoRaX to the table.
The text was updated successfully, but these errors were encountered:
It is worth mentioning https://github.com/predibase/lorax on this list. It was built based on tgi v0.9.4 (when they still had the apache license) to enable dynamic lora adaptor loading on the inference time.
Related - maybe it's also worth mentioning dynamic lora adaptor loading as a feature itself. Two ideas I've seen so far are LoRaX (loading lora adaptor from disk at request time) and S-LoRA (pre-load adaptors in the GPU memory and route the compute at request time).
I can start a PR to add LoRaX to the table.
The text was updated successfully, but these errors were encountered: