-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Qwen2.5-72B model hang/crash #61
Comments
cc @uaydonat |
Looks it happened again. I rebooted the system and reset the cards before running.
This was after 147 prompts completed in BBH:
|
I have not been able to reproduce this hang (which appears to occur during prefill based on the logs) on |
Will do, might be a while before that but thanks for trying to repro. For reference in this channel, I was able to repro 4x by sending ISL=3900, OSL=128 and ISL=4096,OSL=128 both batch 32, multiple batches of requests. Typically occured after 64 to 256 requests. |
Your current environment
Ubuntu 20.04
tt-metal (hf-llama branch yesterday): https://github.com/tenstorrent/tt-metal/tree/aac80d111840ccc324a105d499060e814ca7f2c0
vllm (dev pinned commit): https://github.com/tenstorrent/vllm/tree/b9564bf364e95a3850619fc7b2ed968cc71e30b7
running using: https://github.com/tenstorrent/tt-inference-server/blob/tstesco/qwen25-support/vllm-tt-metal-llama3/src/run_vllm_api_server.py
Model Input Dumps
No response
🐛 Describe the bug
After a few hundred completions during BBH eval run I got the crash shown below.
Repro:
Crash log (full log attached):
2025-02-05-qwen25-72b-crash.log
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: