Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token generation limit #1

Open
xiscoding opened this issue Aug 14, 2023 · 5 comments
Open

Token generation limit #1

xiscoding opened this issue Aug 14, 2023 · 5 comments

Comments

@xiscoding
Copy link

xiscoding commented Aug 14, 2023

I was getting the RateLimitReached Error yesterday (after around the 80th generation each prompt is around 10000 tokens). My simple workaround is below, but is there a better way?

def generate_examples(tokenizer, prompt, number_of_examples):
    # Generate examples
    prev_examples = []
    for i in range(number_of_examples):
        try:
            print(f'Generating example {i}')
            prompt_tokens = tokenizer.tokenize(prompt)
            prev_examples_tokens = [tokenizer.tokenize(example) for example in prev_examples]
            total_tokens = len(prompt_tokens) + sum(len(tokens) for tokens in prev_examples_tokens)
            print(f'Tokens in prompt and previous examples: {total_tokens}')
            example = generate_example(prompt, prev_examples, temperature)
            print(example)
            prev_examples.append(example)
    #         if i % 5 == 0:
    #             time.sleep(10)
        except openai.error.RateLimitError:
            print("RATELIMITREACHED: waiting 10 seconds")
            time.sleep(10)

here is a local version: https://github.com/xiscoding/local_gpt_llm_trainer

@fredzannarbor
Copy link

Having same issue. Matt probably has a way higher token limit than most of us!

@nurena24
Copy link

same issue

@tuanha1305
Copy link

I believe that this error is due to openAI limiting your requests. You can increase the rate limit or adjust the retry backoff. To increase the rate limit, you need to submit a form here: https://docs.google.com/forms/d/e/1FAIpQLSc6gSL3zfHFlL6gNIyUcjkEv29jModHGxg5_XGyr-PrE2LaHw/viewform. You can also learn more about model limits: https://platform.openai.com/docs/guides/rate-limits/error-mitigation.

@xiscoding
Copy link
Author

I believe that this error is due to openAI limiting your requests. You can increase the rate limit or adjust the retry backoff. To increase the rate limit, you need to submit a form here: https://docs.google.com/forms/d/e/1FAIpQLSc6gSL3zfHFlL6gNIyUcjkEv29jModHGxg5_XGyr-PrE2LaHw/viewform. You can also learn more about model limits: https://platform.openai.com/docs/guides/rate-limits/error-mitigation.

The problem with increasing the rate limit for me is the increase in cost. The code I posted above is basically a simple retry backoff and I haven't had any issues with it. I was hoping for a solution that limited the token count of the output as it reaches the rate limit, but this messes up the outputs.

@ishaan-jaff
Copy link

ishaan-jaff commented Nov 22, 2023

You can try using the litellm router if you have multiple deployments of the same model, this will allow you to increase your effective rate limit
docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "gpt-3.5-turbo", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = await router.acompletion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants