Token generation limit #1

xiscoding · 2023-08-14T10:27:42Z

I was getting the RateLimitReached Error yesterday (after around the 80th generation each prompt is around 10000 tokens). My simple workaround is below, but is there a better way?

def generate_examples(tokenizer, prompt, number_of_examples):
    # Generate examples
    prev_examples = []
    for i in range(number_of_examples):
        try:
            print(f'Generating example {i}')
            prompt_tokens = tokenizer.tokenize(prompt)
            prev_examples_tokens = [tokenizer.tokenize(example) for example in prev_examples]
            total_tokens = len(prompt_tokens) + sum(len(tokens) for tokens in prev_examples_tokens)
            print(f'Tokens in prompt and previous examples: {total_tokens}')
            example = generate_example(prompt, prev_examples, temperature)
            print(example)
            prev_examples.append(example)
    #         if i % 5 == 0:
    #             time.sleep(10)
        except openai.error.RateLimitError:
            print("RATELIMITREACHED: waiting 10 seconds")
            time.sleep(10)

here is a local version: https://github.com/xiscoding/local_gpt_llm_trainer

fredzannarbor · 2023-08-17T06:50:17Z

Having same issue. Matt probably has a way higher token limit than most of us!

nurena24 · 2023-08-24T02:46:33Z

same issue

tuanha1305 · 2023-08-28T08:04:58Z

I believe that this error is due to openAI limiting your requests. You can increase the rate limit or adjust the retry backoff. To increase the rate limit, you need to submit a form here: https://docs.google.com/forms/d/e/1FAIpQLSc6gSL3zfHFlL6gNIyUcjkEv29jModHGxg5_XGyr-PrE2LaHw/viewform. You can also learn more about model limits: https://platform.openai.com/docs/guides/rate-limits/error-mitigation.

xiscoding · 2023-08-28T09:14:30Z

I believe that this error is due to openAI limiting your requests. You can increase the rate limit or adjust the retry backoff. To increase the rate limit, you need to submit a form here: https://docs.google.com/forms/d/e/1FAIpQLSc6gSL3zfHFlL6gNIyUcjkEv29jModHGxg5_XGyr-PrE2LaHw/viewform. You can also learn more about model limits: https://platform.openai.com/docs/guides/rate-limits/error-mitigation.

The problem with increasing the rate limit for me is the increase in cost. The code I posted above is basically a simple retry backoff and I haven't had any issues with it. I was hoping for a solution that limited the token count of the output as it reaches the rate limit, but this messes up the outputs.

ishaan-jaff · 2023-11-22T18:13:58Z

You can try using the litellm router if you have multiple deployments of the same model, this will allow you to increase your effective rate limit
docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "gpt-3.5-turbo", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = await router.acompletion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token generation limit #1

Token generation limit #1

xiscoding commented Aug 14, 2023 •

edited

Loading

fredzannarbor commented Aug 17, 2023

nurena24 commented Aug 24, 2023

tuanha1305 commented Aug 28, 2023

xiscoding commented Aug 28, 2023

ishaan-jaff commented Nov 22, 2023 •

edited

Loading

Token generation limit #1

Token generation limit #1

Comments

xiscoding commented Aug 14, 2023 • edited Loading

fredzannarbor commented Aug 17, 2023

nurena24 commented Aug 24, 2023

tuanha1305 commented Aug 28, 2023

xiscoding commented Aug 28, 2023

ishaan-jaff commented Nov 22, 2023 • edited Loading

xiscoding commented Aug 14, 2023 •

edited

Loading

ishaan-jaff commented Nov 22, 2023 •

edited

Loading