-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token generation limit #1
Comments
Having same issue. Matt probably has a way higher token limit than most of us! |
same issue |
I believe that this error is due to openAI limiting your requests. You can increase the rate limit or adjust the retry backoff. To increase the rate limit, you need to submit a form here: https://docs.google.com/forms/d/e/1FAIpQLSc6gSL3zfHFlL6gNIyUcjkEv29jModHGxg5_XGyr-PrE2LaHw/viewform. You can also learn more about model limits: https://platform.openai.com/docs/guides/rate-limits/error-mitigation. |
The problem with increasing the rate limit for me is the increase in cost. The code I posted above is basically a simple retry backoff and I haven't had any issues with it. I was hoping for a solution that limited the token count of the output as it reaches the rate limit, but this messes up the outputs. |
You can try using the litellm router if you have multiple deployments of the same model, this will allow you to increase your effective rate limit from litellm import Router
model_list = [{ # list of model deployments
"model_name": "gpt-3.5-turbo", # model alias
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-v-2", # actual model name
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-functioncalling",
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "gpt-3.5-turbo",
"api_key": os.getenv("OPENAI_API_KEY"),
}
}]
router = Router(model_list=model_list)
# openai.ChatCompletion.create replacement
response = await router.acompletion(model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) |
I was getting the RateLimitReached Error yesterday (after around the 80th generation each prompt is around 10000 tokens). My simple workaround is below, but is there a better way?
here is a local version: https://github.com/xiscoding/local_gpt_llm_trainer
The text was updated successfully, but these errors were encountered: