Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Implement a rate limiter for API calls #1058

Open
plaguss opened this issue Nov 13, 2024 · 0 comments
Open

[FEATURE] Implement a rate limiter for API calls #1058

plaguss opened this issue Nov 13, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@plaguss
Copy link
Contributor

plaguss commented Nov 13, 2024

Is your feature request related to a problem? Please describe.
Some users have problems reaching the limit of calls using API providers such as OpenAILLM and would like some type of way of limiting the number of calls.

Describe the solution you'd like
We can define a @rate_limited decorator to keep the logic separated, and apply it to async def agenerate methods in the AsyncLLM. By default, no rate limit is applied, but the behaviour can be modified via argument to agenerate, making it accessible for the user but simpler to maintain.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

Initial draft (courtesy of claude):

import asyncio
import functools
from datetime import datetime
from typing import Optional, Dict
from weakref import WeakKeyDictionary

def with_rate_limit():
    """
    Decorator that enables optional rate limiting through a rate_limit parameter.
    The parameter can be passed to any decorated method to enable rate limiting for that specific call.
    """
    # Use WeakKeyDictionary to avoid memory leaks from stored instances
    limiters: Dict[object, Dict[float, asyncio.Lock]] = WeakKeyDictionary()
    last_calls: Dict[object, Dict[float, datetime]] = WeakKeyDictionary()
    
    def decorator(func):
        @functools.wraps(func)
        async def wrapper(self, *args, **kwargs):
            # Extract rate_limit from kwargs, default to None (no limiting)
            rate_limit = kwargs.pop('rate_limit', None)
            
            if rate_limit is not None:
                # Initialize instance storage if needed
                if self not in limiters:
                    limiters[self] = {}
                    last_calls[self] = {}
                
                # Initialize rate limit specific storage if needed
                if rate_limit not in limiters[self]:
                    limiters[self][rate_limit] = asyncio.Lock()
                    last_calls[self][rate_limit] = None
                
                # Apply rate limiting
                async with limiters[self][rate_limit]:
                    now = datetime.now()
                    if last_calls[self][rate_limit] is not None:
                        elapsed = (now - last_calls[self][rate_limit]).total_seconds()
                        min_interval = 1.0 / rate_limit
                        if elapsed < min_interval:
                            delay = min_interval - elapsed
                            await asyncio.sleep(delay)
                    
                    last_calls[self][rate_limit] = datetime.now()
                    return await func(self, *args, **kwargs)
            else:
                # No rate limiting if rate_limit is None
                return await func(self, *args, **kwargs)
                
        return wrapper
    return decorator

Applied to a specific case:

class OpenAILLM(AsyncLLM):
    @with_rate_limit()
    async def agenerate(self, ..., rate_limit: float | None = None):
        ...

Example of use (this would be taken care of when using the common methods generate_outputs and such, it could be modified with generation_kwargs={"rate_limit": 10}):

# Without rate limiting
await llm.agenerate("prompt")

# With rate limiting (2 calls per second)
await llm.agenerate("prompt", rate_limit=2.0)

# With different rate limit (1 call every 5 seconds)
await llm.agenerate("prompt", rate_limit=0.2)
@plaguss plaguss added the enhancement New feature or request label Nov 13, 2024
@plaguss plaguss added this to the 1.5.0 milestone Nov 13, 2024
@gabrielmbmb gabrielmbmb removed this from the 1.5.0 milestone Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants