Use async/multi-threaded requests #258
Replies: 2 comments 1 reply
-
Hi @krrishdholakia! Let me convert this into a discussion. |
Beta Was this translation helpful? Give feedback.
-
Thanks for offering to submit a PR! We've considering threading here. The issue is that some LLM providers rate-limit or block requests sent with the same token if too many occur in parallel. I know for sure that OpenAI does this, I suspect that Anthropic and Cohere aren't much different in this regard. We are aware that executing prompts one after the other is a very unsatisfactory solution. OpenAI supports batching in their deprecated |
Beta Was this translation helpful? Give feedback.
-
Hey @rmitsch / @kabirkhan
have y'all considered using async/ threading here to make this call faster?
https://github.com/explosion/spacy-llm/blob/f03da9094ee49626ae3aaccd3129e7c3237454ee/spacy_llm/models/rest/anthropic/model.py#L96C1-L99C10
Happy to make a PR to help out here. Working on a library to simplify LLM API calling - noticed y'all call the REST endpoints which is awesome!
Beta Was this translation helpful? Give feedback.
All reactions