[FEATURE] Implement a rate limiter for API calls #1058

plaguss · 2024-11-13T09:45:23Z

Is your feature request related to a problem? Please describe.
Some users have problems reaching the limit of calls using API providers such as OpenAILLM and would like some type of way of limiting the number of calls.

Describe the solution you'd like
We can define a @rate_limited decorator to keep the logic separated, and apply it to async def agenerate methods in the AsyncLLM. By default, no rate limit is applied, but the behaviour can be modified via argument to agenerate, making it accessible for the user but simpler to maintain.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

Initial draft (courtesy of claude):

import asyncio
import functools
from datetime import datetime
from typing import Optional, Dict
from weakref import WeakKeyDictionary

def with_rate_limit():
    """
    Decorator that enables optional rate limiting through a rate_limit parameter.
    The parameter can be passed to any decorated method to enable rate limiting for that specific call.
    """
    # Use WeakKeyDictionary to avoid memory leaks from stored instances
    limiters: Dict[object, Dict[float, asyncio.Lock]] = WeakKeyDictionary()
    last_calls: Dict[object, Dict[float, datetime]] = WeakKeyDictionary()
    
    def decorator(func):
        @functools.wraps(func)
        async def wrapper(self, *args, **kwargs):
            # Extract rate_limit from kwargs, default to None (no limiting)
            rate_limit = kwargs.pop('rate_limit', None)
            
            if rate_limit is not None:
                # Initialize instance storage if needed
                if self not in limiters:
                    limiters[self] = {}
                    last_calls[self] = {}
                
                # Initialize rate limit specific storage if needed
                if rate_limit not in limiters[self]:
                    limiters[self][rate_limit] = asyncio.Lock()
                    last_calls[self][rate_limit] = None
                
                # Apply rate limiting
                async with limiters[self][rate_limit]:
                    now = datetime.now()
                    if last_calls[self][rate_limit] is not None:
                        elapsed = (now - last_calls[self][rate_limit]).total_seconds()
                        min_interval = 1.0 / rate_limit
                        if elapsed < min_interval:
                            delay = min_interval - elapsed
                            await asyncio.sleep(delay)
                    
                    last_calls[self][rate_limit] = datetime.now()
                    return await func(self, *args, **kwargs)
            else:
                # No rate limiting if rate_limit is None
                return await func(self, *args, **kwargs)
                
        return wrapper
    return decorator

Applied to a specific case:

class OpenAILLM(AsyncLLM):
    @with_rate_limit()
    async def agenerate(self, ..., rate_limit: float | None = None):
        ...

Example of use (this would be taken care of when using the common methods generate_outputs and such, it could be modified with generation_kwargs={"rate_limit": 10}):

# Without rate limiting
await llm.agenerate("prompt")

# With rate limiting (2 calls per second)
await llm.agenerate("prompt", rate_limit=2.0)

# With different rate limit (1 call every 5 seconds)
await llm.agenerate("prompt", rate_limit=0.2)

The text was updated successfully, but these errors were encountered:

plaguss added the enhancement New feature or request label Nov 13, 2024

plaguss added this to the 1.5.0 milestone Nov 13, 2024

gabrielmbmb removed this from the 1.5.0 milestone Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Implement a rate limiter for API calls #1058

[FEATURE] Implement a rate limiter for API calls #1058

plaguss commented Nov 13, 2024

[FEATURE] Implement a rate limiter for API calls #1058

[FEATURE] Implement a rate limiter for API calls #1058

Comments

plaguss commented Nov 13, 2024