Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: vllm context length handling method #12146

Open
1 task done
whoo9112 opened this issue Jan 17, 2025 · 0 comments
Open
1 task done

[Usage]: vllm context length handling method #12146

whoo9112 opened this issue Jan 17, 2025 · 0 comments
Labels
usage How to use vllm

Comments

@whoo9112
Copy link

Your current environment

I am writing because I have a question during the test using vllm.
I did RAG with gemma2 model using vllm in a100 40gb.

We have seen here that the fragments of the document searched sometimes lead to prompts entering gemma2 going beyond the 8192 token max contextlength.

But it still generates the answer. In this regard, may I know which function or method is responsible for handling max context length?

How would you like to use vllm

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@whoo9112 whoo9112 added the usage How to use vllm label Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

1 participant