Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serverless rate limiting for AI model endpoints #814

Open
ebubae opened this issue Oct 23, 2024 · 0 comments
Open

Serverless rate limiting for AI model endpoints #814

ebubae opened this issue Oct 23, 2024 · 0 comments
Assignees

Comments

@ebubae
Copy link
Collaborator

ebubae commented Oct 23, 2024

Is your feature request related to a problem? Please describe.
Existing rate limiting solution requires endpoint to be continually running to correctly limit requests but our serverless infrastructure makes it such that it cannot easily maintain state between teardowns of the application on subsequent requests.

Describe the solution you'd like
Use Upstash rate limiting to rate limit incoming requests. More specifically we should

  • Rate limit incoming requests by developer token
  • Rate limit requests from internal main key by incoming IP address

Describe alternatives you've considered
Existing Redis solution doesn't scale as well as a serverless solution and does not have built-in rate limiting functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant