feat: retry mechanism #240

whilefoo · 2025-01-15T16:54:35Z

Resolves #236

When I was testing this I realized that OpenAI client already retries requests so technically we don't need this but it's more customizable

whilefoo · 2025-01-15T16:54:54Z

@gentlementlegen

gentlementlegen · 2025-01-15T17:00:59Z

When you say "already retry requests", in which case, network failure? The idea was to cover problems like token size being too big, or the response from the LLM being truncated (sometimes the JSON is malformed for example). Will test in a bit.

whilefoo · 2025-01-15T17:11:33Z

When you say "already retry requests", in which case, network failure? The idea was to cover problems like token size being too big, or the response from the LLM being truncated (sometimes the JSON is malformed for example). Will test in a bit.

Yes network failures. I thought the issue was also only about the network failures.

gentlementlegen · 2025-01-15T17:18:58Z

While some errors are not recoverable, the ones like us sending too many tokens to be handled can be avoided by splitting the prompt and retrying with a smaller prompt until it fits with the given model.

Very sorry if that wasn't clear enough, I will update the spec.

Added the following:

covering cases like the token amount sent being too large, responses having truncated JSON content, network failures

whilefoo · 2025-01-15T17:25:09Z

Shouldn't the prompt be split into appropriate size according to the model token limit before sending it to the LLM? I think bruteforcing is not a good solution

gentlementlegen · 2025-01-15T17:56:09Z

@whilefoo how do you determine the size? There is no API for this (afaik) and for the same model you can have different limits according to the plan (or tier) you are using. Not sure if there is a better way to handle it.

whilefoo · 2025-01-15T20:57:06Z

@whilefoo how do you determine the size? There is no API for this (afaik) and for the same model you can have different limits according to the plan (or tier) you are using. Not sure if there is a better way to handle it.

You can count tokens using tiktoken and model token limit is the same for all tiers, the only difference between tiers is rate limit (tokens per minute) so you just have wait until the rate limit clears.

gentlementlegen · 2025-01-16T02:16:00Z

Maybe I don't understand but js-tiktoken can count tokens, but it cannot tell you that the model gpt-4o is limited to 128000 can it? I know that at first we were limited to half of this and asked OpenAI to increase the token limit, which is now upgraded to 128000, that's why I mention this.

whilefoo · 2025-01-16T11:18:28Z

I didn't know that OpenAI also limits the context window as I couldn't find any information about that on the internet.

My understanding is that 128k context window is the same for everyone but if you're on tier 1 you only have 30k TPM so you can't use the whole context window. If my understanding is correct then even if you split prompts into 30k tokens, you will still get rate limited if you send those requests within the same minute?

gentlementlegen · 2025-01-16T11:37:06Z

I believe you are correct, would this be handled by the built-in retry or should we add delay between API calss?

I think the biggest issue the retry can cover is the malformed JSON, there are plenty of times we just had to restart the whole comments evaluation because one output was incorrect and breaks the whole evaluation.

whilefoo · 2025-01-16T12:00:03Z

I believe you are correct, would this be handled by the built-in retry or should we add delay between API calss?

I think they don't add any delays because their code is auto-generated by OpenAPI spec.
We could make a dummy request and see rate limits in the response headers.
Basically we know the model's context window either by hard-coding for most used models or from the config. We can technically get rate limits so together with context window we can calculate how many chunks we need in advance and then just retry until it succeeds.

I think the biggest issue the retry can cover is the malformed JSON, there are plenty of times we just had to restart the whole comments evaluation because one output was incorrect and breaks the whole evaluation.

I think this can be solved by using structured outputs but this is OpenAI specific so if we plan to use OpenRouter then it still makes sense to implement it

ubiquity-os-beta · 2025-01-20T02:26:55Z

@whilefoo, this task has been idle for a while. Please provide an update.

whilefoo · 2025-01-22T21:45:33Z

now that #225 is merged I will continue with this

ubiquity-os-beta · 2025-01-25T19:56:59Z

@whilefoo, this task has been idle for a while. Please provide an update.

whilefoo and others added 2 commits January 15, 2025 17:49

feat: retry mechanism

f1c3408

chore: updated manifest.json and dist build

d8ed7dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: retry mechanism #240

feat: retry mechanism #240

whilefoo commented Jan 15, 2025

whilefoo commented Jan 15, 2025

gentlementlegen commented Jan 15, 2025

whilefoo commented Jan 15, 2025

gentlementlegen commented Jan 15, 2025 •

edited

Loading

whilefoo commented Jan 15, 2025

gentlementlegen commented Jan 15, 2025

whilefoo commented Jan 15, 2025

gentlementlegen commented Jan 16, 2025 •

edited

Loading

whilefoo commented Jan 16, 2025

gentlementlegen commented Jan 16, 2025

whilefoo commented Jan 16, 2025 •

edited

Loading

ubiquity-os-beta bot commented Jan 20, 2025

whilefoo commented Jan 22, 2025

ubiquity-os-beta bot commented Jan 25, 2025

feat: retry mechanism #240

Are you sure you want to change the base?

feat: retry mechanism #240

Conversation

whilefoo commented Jan 15, 2025

whilefoo commented Jan 15, 2025

gentlementlegen commented Jan 15, 2025

whilefoo commented Jan 15, 2025

gentlementlegen commented Jan 15, 2025 • edited Loading

whilefoo commented Jan 15, 2025

gentlementlegen commented Jan 15, 2025

whilefoo commented Jan 15, 2025

gentlementlegen commented Jan 16, 2025 • edited Loading

whilefoo commented Jan 16, 2025

gentlementlegen commented Jan 16, 2025

whilefoo commented Jan 16, 2025 • edited Loading

ubiquity-os-beta bot commented Jan 20, 2025

whilefoo commented Jan 22, 2025

ubiquity-os-beta bot commented Jan 25, 2025

gentlementlegen commented Jan 15, 2025 •

edited

Loading

gentlementlegen commented Jan 16, 2025 •

edited

Loading

whilefoo commented Jan 16, 2025 •

edited

Loading