-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: retry mechanism #240
base: development
Are you sure you want to change the base?
feat: retry mechanism #240
Conversation
When you say "already retry requests", in which case, network failure? The idea was to cover problems like token size being too big, or the response from the LLM being truncated (sometimes the JSON is malformed for example). Will test in a bit. |
Yes network failures. I thought the issue was also only about the network failures. |
Very sorry if that wasn't clear enough, I will update the spec. Added the following:
|
Shouldn't the prompt be split into appropriate size according to the model token limit before sending it to the LLM? I think bruteforcing is not a good solution |
@whilefoo how do you determine the size? There is no API for this (afaik) and for the same model you can have different limits according to the plan (or tier) you are using. Not sure if there is a better way to handle it. |
You can count tokens using |
Maybe I don't understand but |
I didn't know that OpenAI also limits the context window as I couldn't find any information about that on the internet. My understanding is that 128k context window is the same for everyone but if you're on tier 1 you only have 30k TPM so you can't use the whole context window. If my understanding is correct then even if you split prompts into 30k tokens, you will still get rate limited if you send those requests within the same minute? |
I believe you are correct, would this be handled by the built-in retry or should we add delay between API calss? I think the biggest issue the retry can cover is the malformed JSON, there are plenty of times we just had to restart the whole comments evaluation because one output was incorrect and breaks the whole evaluation. |
I think they don't add any delays because their code is auto-generated by OpenAPI spec.
I think this can be solved by using structured outputs but this is OpenAI specific so if we plan to use OpenRouter then it still makes sense to implement it |
@whilefoo, this task has been idle for a while. Please provide an update. |
now that #225 is merged I will continue with this |
@whilefoo, this task has been idle for a while. Please provide an update. |
Resolves #236
When I was testing this I realized that OpenAI client already retries requests so technically we don't need this but it's more customizable