-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Rate Limit Errors when using with PaperQA #7358
Comments
hi @gurugecl
your error is coming from google saying your resource is exhausted. This doesn't even look like it's coming from the router, just from the backend llm api. Closing for now as i don't see a litellm-specific issue, cc: @jamesbraza let me know if i'm missing something here |
Yes but the issue seems to be that the error is coming on the very first request to Gemini and no amount of exponential backoff or waiting seems to be helping which I believe would indicate something is happening behind the scenes either with Paperqa or Litellm to exhaust the resource so quickly. The following seems to indicate that Gemini Flash 2.0 is supported 7188 and I even tried setting rate limits in the llm config like below according to the PaperQA docs but that did not seem to help
However if this not a LiteLLM issue I can reach out to PaperQA for further assistance |
@gurugecl you can validate this with a simple test with litellm
cc: @jamesbraza - let me know if i'm missing something here |
Yes that works fine as does using Gemini through the API directly. However the problem arises when Im using Gemini for PaperQA which in turn uses LiteLLM I believe as I describe in this issue I just opened with them, 786. I'm using their Docs object as I need to manually add papers, however most of their documentation is in regards to the ask function for querying papers not the Docs class so Im not sure if the rate limit can be set in the same way and setting the max_concurrent_requests as they suggested didn't resolve the issues yet either |
@gurugecl does it work when trying to use the openai api? with them |
if so - one way around this is to spin up the cli proxy export GEMINI_API_KEY=<your-key>
litellm --model gemini/gemini-2.0-flash-exp
# RUNNING on http://0.0.0.0:4000 and then set the openai api base from paperqa to this ^
More docs - https://docs.litellm.ai/docs/#quick-start-proxy---cli |
Yes it works fine with the OAI api as I believe it is used by default. Unfortunately, I couldn't quite get the proxy to work due to additional errors however I was able to get the implementation working without it by modifying the setup of the Docs object. Not sure yet why original approach worked initially but thanks a lot for your assistance with this. Really appreciate it! |
Nice. Curious what were the proxy errors? |
bump on this? @gurugecl would be helpful so i know if there's a key issue we might've missed |
So Im not sure how much of an issue this is on your end but after setting up everything as we discussed I received the following error when trying to add new papers to the PaperQA Docs object:
minimal script:
So I modified the litellm command to run like this:
However at that point it just hung when trying to add papers and that's when I was able to resolve the issues without using the proxy as I mentioned so I didn't attempt to debug the proxy issue beyond this point |
It looks like you've misconfigured I am interested where the |
i believe that value is set when calling openai embeddings (i believe on their sdk, although i can check that) |
nope, their default is float -- |
@jamesbraza yup, sorry, the settings errors are what I fixed to get it to work without the proxy and not related to the proxy issue. Yeah it's the other base64 error that comes up when using the proxy and adding new papers to the Docs object |
We call And I guess it seems from |
where do you see this value being set? @jamesbraza in general, yes - i don't think it needs to be unless there's a specific need for it |
It looks I wonder if something in @gurugecl 's code or LiteLLM |
@jamesbraza It was when he tried to call Gemini via the cli proxy using the openai route in the paperqa code |
Yeah the code that throws the error just consists of the below minimum example plus running the proxy commands I was given in a separate terminal |
What happened?
Starting recently I keep getting rate limit errors when using models like Gemini flash 2.0 even though I should be below the rate limit based on the number of requests I'm initiating. Previously this was working fine. I am using LiteLLM via PaperQA. There seems to also be an async issue but that was not previously causing a rate limit error but not sure if that's related. I tried a number of ways to avoid hitting the rate limit but so far none have worked so any assistance with this would be greatly appreciated.
https://github.com/Future-House/paper-qa
I've also seen this message when using gpt-4o but it still works without an issue then
AFC is enabled with max remote calls: 10.
Below is how I am setting up the Settings object which then throws the rate limit error.
Relevant log output
Are you a ML Ops Team?
Yes
What LiteLLM version are you on ?
1.45.0
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: