[Bug]: Rate Limit Errors when using with PaperQA #7358

gurugecl · 2024-12-22T03:14:44Z

What happened?

Starting recently I keep getting rate limit errors when using models like Gemini flash 2.0 even though I should be below the rate limit based on the number of requests I'm initiating. Previously this was working fine. I am using LiteLLM via PaperQA. There seems to also be an async issue but that was not previously causing a rate limit error but not sure if that's related. I tried a number of ways to avoid hitting the rate limit but so far none have worked so any assistance with this would be greatly appreciated.

https://github.com/Future-House/paper-qa

I've also seen this message when using gpt-4o but it still works without an issue then

AFC is enabled with max remote calls: 10.

Below is how I am setting up the Settings object which then throws the rate limit error.

from paperqa import Docs, Settings

settings = Settings(
        llm="gemini/gemini-2.0-flash-exp",
        summary_llm="gemini/gemini-2.0-flash-exp",
        llm_config={
            "model_list": [{
                "model_name": "gemini/gemini-2.0-flash-exp",
                "litellm_params": {
                    "model": "gemini/gemini-2.0-flash-exp",
                    "api_key": os.environ.get('GEMINI_API_KEY'),
                }
            }]
        },
        summary_llm_config={
            "model_list": [{
                "model_name": "gemini/gemini-2.0-flash-exp",
                "litellm_params": {
                    "model": "gemini/gemini-2.0-flash-exp",
                    "api_key": os.environ.get('GEMINI_API_KEY'),
                }
            }]
        }
    ) 

max_choices = len(list(docs.docnames))
settings.answer.answer_max_sources = max_choices
settings.answer.evidence_k = relevancy * max_choices

model_response = docs.query(model_input, settings=settings)

Relevant log output

litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.APIConnectionError: <asyncio.locks.Event object at 0x3855b0950 [unset]> is bound to a different event loop
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/main.py", line 421, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/llms/vertex_ai_and_google_ai_studio/gemini/vertex_and_google_ai_studio_gemini.py", line 1206, in async_completion
    response = await client.post(api_base, headers=headers, json=request_body)  # type: ignore
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/llms/custom_httpx/http_handler.py", line 138, in post
    raise e
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/litellm/llms/custom_httpx/http_handler.py", line 100, in post
    response = await self.client.send(req, stream=stream)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1661, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1689, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1726, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_client.py", line 1763, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpx/_transports/default.py", line 373, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
    raise exc from None
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 196, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/connection.py", line 101, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 143, in handle_async_request
    raise exc
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 113, in handle_async_request
    ) = await self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 186, in _receive_response_headers
    event = await self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_async/http11.py", line 224, in _receive_event
    data = await self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 35, in read
    return await self._stream.receive(max_bytes=max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/anyio/streams/tls.py", line 205, in receive
    data = await self._call_sslobject_method(self._ssl_object.read, max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/anyio/streams/tls.py", line 147, in _call_sslobject_method
    data = await self.transport_stream.receive()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1142, in receive
    await self._protocol.read_event.wait()
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/asyncio/locks.py", line 210, in wait
    fut = self._get_loop().create_future()
          ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniforge/base/envs/django/lib/python3.11/asyncio/mixins.py", line 20, in _get_loop
    raise RuntimeError(f'{self!r} is bound to a different event loop')
RuntimeError: <asyncio.locks.Event object at 0x3855b0950 [unset]> is bound to a different event loop

03:21:53 - LiteLLM:INFO: utils.py:2977 - 
LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini

LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini
HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=xxxxxx "HTTP/1.1 429 Too Many Requests"
03:21:53 - LiteLLM Router:INFO: router.py:849 - litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

03:21:53 - LiteLLM:INFO: utils.py:2977 - 
LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini
HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=xxxxxxxx "HTTP/1.1 429 Too Many Requests"

LiteLLM completion() model= gemini-2.0-flash-exp; provider = gemini
HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=xxxxxx "HTTP/1.1 429 Too Many Requests"
03:21:53 - LiteLLM Router:INFO: router.py:849 - litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

03:21:53 - LiteLLM Router:INFO: router.py:849 - litellm.acompletion(model=gemini/gemini-2.0-flash-exp) Exception litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

1.45.0

Twitter / LinkedIn details

No response

The text was updated successfully, but these errors were encountered:

krrishdholakia · 2024-12-27T17:08:27Z

hi @gurugecl

"error": {
"code": 429,
"message": "Resource has been exhausted (e.g. check quota).",
"status": "RESOURCE_EXHAUSTED"
}

your error is coming from google saying your resource is exhausted. This doesn't even look like it's coming from the router, just from the backend llm api. Closing for now as i don't see a litellm-specific issue, cc: @jamesbraza let me know if i'm missing something here

gurugecl · 2025-01-03T22:51:13Z

Yes but the issue seems to be that the error is coming on the very first request to Gemini and no amount of exponential backoff or waiting seems to be helping which I believe would indicate something is happening behind the scenes either with Paperqa or Litellm to exhaust the resource so quickly. The following seems to indicate that Gemini Flash 2.0 is supported 7188 and I even tried setting rate limits in the llm config like below according to the PaperQA docs but that did not seem to help

"rate_limit": {
                "gemini/gemini-2.0-flash-exp": "10 per 1 minute"  # Gemini's actual rate limit
            }

However if this not a LiteLLM issue I can reach out to PaperQA for further assistance

krrishdholakia · 2025-01-03T23:47:47Z

@gurugecl you can validate this with a simple test with litellm

from litellm import completion
import os

os.environ['GEMINI_API_KEY'] = ""
response = completion(
    model="gemini/gemini-2.0-flash-exp", 
    messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
)

cc: @jamesbraza - let me know if i'm missing something here

gurugecl · 2025-01-04T00:21:08Z

Yes that works fine as does using Gemini through the API directly. However the problem arises when Im using Gemini for PaperQA which in turn uses LiteLLM I believe as I describe in this issue I just opened with them, 786. I'm using their Docs object as I need to manually add papers, however most of their documentation is in regards to the ask function for querying papers not the Docs class so Im not sure if the rate limit can be set in the same way and setting the max_concurrent_requests as they suggested didn't resolve the issues yet either

krrishdholakia · 2025-01-04T00:22:55Z

@gurugecl does it work when trying to use the openai api? with them

krrishdholakia · 2025-01-04T00:23:59Z

if so - one way around this is to spin up the cli proxy

export GEMINI_API_KEY=<your-key>
litellm --model gemini/gemini-2.0-flash-exp

# RUNNING on http://0.0.0.0:4000

and then set the openai api base from paperqa to this ^

export OPENAI_API_BASE="http://0.0.0.0:4000"

More docs - https://docs.litellm.ai/docs/#quick-start-proxy---cli

gurugecl · 2025-01-04T03:34:38Z

Yes it works fine with the OAI api as I believe it is used by default. Unfortunately, I couldn't quite get the proxy to work due to additional errors however I was able to get the implementation working without it by modifying the setup of the Docs object. Not sure yet why original approach worked initially but thanks a lot for your assistance with this. Really appreciate it!

krrishdholakia · 2025-01-04T14:17:38Z

Nice. Curious what were the proxy errors?

krrishdholakia · 2025-01-04T19:20:14Z

bump on this? @gurugecl

would be helpful so i know if there's a key issue we might've missed

gurugecl · 2025-01-04T20:39:36Z

So Im not sure how much of an issue this is on your end but after setting up everything as we discussed I received the following error when trying to add new papers to the PaperQA Docs object:

LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Failed to load ddu148.pdf: litellm.BadRequestError: OpenAIException - Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: Setting {'encoding_format': 'base64'} is not supported by gemini. To drop it from the call, set `litellm.drop_params = True`.", 'type': 'None', 'param': None, 'code': '400'}} LiteLLM Retried: 3 times

Loaded 0 documents

minimal script:

from paperqa import Docs, Settings
import os
from pathlib import Path
import litellm

# Set LiteLLM verbose logging
litellm.set_verbose = True

# Set up environment
os.environ["OPENAI_API_BASE"] = "http://0.0.0.0:4000"

# Initialize Docs
docs = Docs()

# Load PDFs from a directory
pdf_dir = Path("test_pdfs") 
for pdf_file in pdf_dir.glob("*.pdf"):
    print(f"Loading {pdf_file.name}")
    try:
        docs.add(str(pdf_file))
    except Exception as e:
        print(f"Failed to load {pdf_file.name}: {str(e)}")

So I modified the litellm command to run like this:

litellm --model gemini/gemini-2.0-flash-exp --drop_params

However at that point it just hung when trying to add papers and that's when I was able to resolve the issues without using the proxy as I mentioned so I didn't attempt to debug the proxy issue beyond this point

jamesbraza · 2025-01-05T01:03:57Z

It looks like you've misconfigured Settings a bit based on the Pydantic errors, but the more interesting one to me is this one: Failed to load ddu148.pdf: litellm.BadRequestError: OpenAIException - Error code: 400 - {'error': {'message': "litellm.UnsupportedParamsError: Setting {'encoding_format': 'base64'} is not supported by gemini.

I am interested where the base64 part comes from, it's not within paper-qa

krrishdholakia · 2025-01-05T01:10:07Z

i believe that value is set when calling openai embeddings (i believe on their sdk, although i can check that)

krrishdholakia · 2025-01-05T01:10:54Z

nope, their default is float

--
how do you handle embedding calls? @jamesbraza

gurugecl · 2025-01-05T01:24:09Z

@jamesbraza yup, sorry, the settings errors are what I fixed to get it to work without the proxy and not related to the proxy issue. Yeah it's the other base64 error that comes up when using the proxy and adding new papers to the Docs object

jamesbraza · 2025-01-05T01:30:56Z

-- how do you handle embedding calls? @jamesbraza

We call litellm.aembedding: https://github.com/Future-House/llm-client/blob/v0.0.7/llmclient/embeddings.py#L149-L154

And I guess it seems from Setting {'encoding_format': 'base64'} is not supported by gemini it seems base64 is not supported by Gemini. Maybe base64 shouldn't be the default for all model providers?

krrishdholakia · 2025-01-05T03:21:36Z

Maybe base64 shouldn't be the default for all model providers?

where do you see this value being set? @jamesbraza

in general, yes - i don't think it needs to be unless there's a specific need for it

jamesbraza · 2025-01-05T04:16:28Z

It looks "base64" gets set within OpenAI's client library during httpx request creation: https://github.com/openai/openai-python/blob/v1.58.1/src/openai/resources/embeddings.py#L217

I wonder if something in @gurugecl 's code or LiteLLM embedding is incorrectly causing an OpenAI control flow to be used for a Gemini model. It could explain why {'encoding_format': 'base64'} is being passed to Gemini

krrishdholakia · 2025-01-05T06:05:05Z

@jamesbraza It was when he tried to call Gemini via the cli proxy using the openai route in the paperqa code

gurugecl · 2025-01-05T06:08:06Z

It looks "base64" gets set within OpenAI's client library during httpx request creation: https://github.com/openai/openai-python/blob/v1.58.1/src/openai/resources/embeddings.py#L217

I wonder if something in @gurugecl 's code or LiteLLM embedding is incorrectly causing an OpenAI control flow to be used for a Gemini model. It could explain why {'encoding_format': 'base64'} is being passed to Gemini

Yeah the code that throws the error just consists of the below minimum example

#7358 (comment)

plus running the proxy commands I was given in a separate terminal

#7358 (comment)

gurugecl added the bug Something isn't working label Dec 22, 2024

github-actions bot added the mlops user request label Dec 22, 2024

gurugecl changed the title ~~[Bug]: Rate Limit Errors with models other than OAI~~ [Bug]: Rate Limit Errors when using with PaperQA Dec 22, 2024

krrishdholakia closed this as not planned Won't fix, can't repro, duplicate, stale Dec 27, 2024

gurugecl mentioned this issue Jan 3, 2025

Gemini Rate Limit Issue Future-House/paper-qa#786

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Rate Limit Errors when using with PaperQA #7358

[Bug]: Rate Limit Errors when using with PaperQA #7358

gurugecl commented Dec 22, 2024 •

edited

Loading

krrishdholakia commented Dec 27, 2024

gurugecl commented Jan 3, 2025 •

edited

Loading

krrishdholakia commented Jan 3, 2025

gurugecl commented Jan 4, 2025 •

edited

Loading

krrishdholakia commented Jan 4, 2025

krrishdholakia commented Jan 4, 2025 •

edited

Loading

gurugecl commented Jan 4, 2025

krrishdholakia commented Jan 4, 2025

krrishdholakia commented Jan 4, 2025

gurugecl commented Jan 4, 2025 •

edited

Loading

jamesbraza commented Jan 5, 2025

krrishdholakia commented Jan 5, 2025

krrishdholakia commented Jan 5, 2025

gurugecl commented Jan 5, 2025 •

edited

Loading

jamesbraza commented Jan 5, 2025

krrishdholakia commented Jan 5, 2025 •

edited

Loading

jamesbraza commented Jan 5, 2025

krrishdholakia commented Jan 5, 2025

gurugecl commented Jan 5, 2025 •

edited

Loading

[Bug]: Rate Limit Errors when using with PaperQA #7358

[Bug]: Rate Limit Errors when using with PaperQA #7358

Comments

gurugecl commented Dec 22, 2024 • edited Loading

What happened?

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

krrishdholakia commented Dec 27, 2024

gurugecl commented Jan 3, 2025 • edited Loading

krrishdholakia commented Jan 3, 2025

gurugecl commented Jan 4, 2025 • edited Loading

krrishdholakia commented Jan 4, 2025

krrishdholakia commented Jan 4, 2025 • edited Loading

gurugecl commented Jan 4, 2025

krrishdholakia commented Jan 4, 2025

krrishdholakia commented Jan 4, 2025

gurugecl commented Jan 4, 2025 • edited Loading

jamesbraza commented Jan 5, 2025

krrishdholakia commented Jan 5, 2025

krrishdholakia commented Jan 5, 2025

gurugecl commented Jan 5, 2025 • edited Loading

jamesbraza commented Jan 5, 2025

krrishdholakia commented Jan 5, 2025 • edited Loading

jamesbraza commented Jan 5, 2025

krrishdholakia commented Jan 5, 2025

gurugecl commented Jan 5, 2025 • edited Loading

gurugecl commented Dec 22, 2024 •

edited

Loading

gurugecl commented Jan 3, 2025 •

edited

Loading

gurugecl commented Jan 4, 2025 •

edited

Loading

krrishdholakia commented Jan 4, 2025 •

edited

Loading

gurugecl commented Jan 4, 2025 •

edited

Loading

gurugecl commented Jan 5, 2025 •

edited

Loading

krrishdholakia commented Jan 5, 2025 •

edited

Loading

gurugecl commented Jan 5, 2025 •

edited

Loading