make the best out of the race to bottom
a convenience script we use internally having a collection of providers with cheap infra cost for LLM inference
if you like what we are doing
Please leave a star on the repo
Support us on buymeacoffee
Checkout our socials and follow us there
- Rename the
.env.template
file to.env
and add the corresponding credentials for the providers you want to use (check pricing and performance comparison below) - You can either run the script directly for testing the endpoints or use the inference function in your program logic
To directly use the inference function copy the .env
file and llm_inference_script
to your project and import the function
example :
from llm_inference_script import llm_inference
from dotenv import load_dotenv
import os
load_dotenv()
KEY = os.getenv("PROVIDER_API_KEY") # put correct provider name here
output = llm_inference(provider, prompt, KEY)
As of 19-12-2023
Provider | Run 1 Processing Time | Run 2 Processing Time |
---|---|---|
OpenRouter | 4.353891372680664 | 3.521432638168335 |
Anyscale | 3.7671189308166504 | 3.727105140686035 |
Together | 2.8246912956237793 | 2.7198286056518555 |
DeepInfra | 19.719090700149536 | 18.043762922286987 |
AbacusAI | 11.183688879013062 | 14.28700041770935 |
Provider | Pricing per Million Tokens | Free Credits |
---|---|---|
OpenRouter | 0.3 - 0.7 | $1 |
Anyscale | 0.5 | $10 |
Together | 0.6 | $25 |
DeepInfra | 0.27 | Unclear |
AbacusAI | 0.3 | Unclear |
My recommendation use Together for latency ciritical inference and DeepInfra for latency non critical applications. These two provided me with the most flexibility in what I can run with reasonable pricing and a very good user experience
Also checkout this idk if it works but seems interesting