You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Some users have problems reaching the limit of calls using API providers such as OpenAILLM and would like some type of way of limiting the number of calls.
Describe the solution you'd like
We can define a @rate_limited decorator to keep the logic separated, and apply it to async def agenerate methods in the AsyncLLM. By default, no rate limit is applied, but the behaviour can be modified via argument to agenerate, making it accessible for the user but simpler to maintain.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Initial draft (courtesy of claude):
importasyncioimportfunctoolsfromdatetimeimportdatetimefromtypingimportOptional, DictfromweakrefimportWeakKeyDictionarydefwith_rate_limit():
""" Decorator that enables optional rate limiting through a rate_limit parameter. The parameter can be passed to any decorated method to enable rate limiting for that specific call. """# Use WeakKeyDictionary to avoid memory leaks from stored instanceslimiters: Dict[object, Dict[float, asyncio.Lock]] =WeakKeyDictionary()
last_calls: Dict[object, Dict[float, datetime]] =WeakKeyDictionary()
defdecorator(func):
@functools.wraps(func)asyncdefwrapper(self, *args, **kwargs):
# Extract rate_limit from kwargs, default to None (no limiting)rate_limit=kwargs.pop('rate_limit', None)
ifrate_limitisnotNone:
# Initialize instance storage if neededifselfnotinlimiters:
limiters[self] = {}
last_calls[self] = {}
# Initialize rate limit specific storage if neededifrate_limitnotinlimiters[self]:
limiters[self][rate_limit] =asyncio.Lock()
last_calls[self][rate_limit] =None# Apply rate limitingasyncwithlimiters[self][rate_limit]:
now=datetime.now()
iflast_calls[self][rate_limit] isnotNone:
elapsed= (now-last_calls[self][rate_limit]).total_seconds()
min_interval=1.0/rate_limitifelapsed<min_interval:
delay=min_interval-elapsedawaitasyncio.sleep(delay)
last_calls[self][rate_limit] =datetime.now()
returnawaitfunc(self, *args, **kwargs)
else:
# No rate limiting if rate_limit is Nonereturnawaitfunc(self, *args, **kwargs)
returnwrapperreturndecorator
Example of use (this would be taken care of when using the common methods generate_outputs and such, it could be modified with generation_kwargs={"rate_limit": 10}):
# Without rate limitingawaitllm.agenerate("prompt")
# With rate limiting (2 calls per second)awaitllm.agenerate("prompt", rate_limit=2.0)
# With different rate limit (1 call every 5 seconds)awaitllm.agenerate("prompt", rate_limit=0.2)
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Some users have problems reaching the limit of calls using API providers such as OpenAILLM and would like some type of way of limiting the number of calls.
Describe the solution you'd like
We can define a
@rate_limited
decorator to keep the logic separated, and apply it toasync def agenerate
methods in theAsyncLLM
. By default, no rate limit is applied, but the behaviour can be modified via argument toagenerate
, making it accessible for the user but simpler to maintain.Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Initial draft (courtesy of claude):
Applied to a specific case:
Example of use (this would be taken care of when using the common methods
generate_outputs
and such, it could be modified withgeneration_kwargs={"rate_limit": 10}
):The text was updated successfully, but these errors were encountered: