Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Llama-3.2-11B-Vision-Instruct: block table already exists #55

Open
1 task done
tstescoTT opened this issue Jan 23, 2025 · 2 comments
Open
1 task done

[Bug]: Llama-3.2-11B-Vision-Instruct: block table already exists #55

tstescoTT opened this issue Jan 23, 2025 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@tstescoTT
Copy link

Your current environment

Model: Llama-3.2-11B-Vision-Instruct
TT device: N300
MESH_DEVICE=N300

Docker image: ghcr.io/tenstorrent/tt-inference-server/tt-metal-llama3-70b-src-base-vllm-ubuntu-22.04-amd64:v0.0.1-47fb1a2fb6e0-2f33504bad49

tt-metal branch: main (last verified commit: 47fb1a2)
vLLM branch: dev (last verified commit: 2f33504)

Model Input Dumps

No response

🐛 Describe the bug

I got this error a few times when running Llama-3.2-11B-Vision-Instruct in vLLM with successive batches of requests.

repro script: https://github.com/tenstorrent/tt-inference-server/blob/tstesco/dev/utils/prompt_client_cli.py

python prompt_client_cli.py \
    --num_prompts 1000 \
    --batch_size 16 \
    --tokenizer_model meta-llama/Llama-3.2-11B-Vision-Instruct \
    --distribution uniform \
    --max_prompt_length 2048 \
    --output_seq_len 2048 \
    --include_images \
    --image_width 512 \
    --image_height 512 \
    --use_chat_api \
    --skip_trace_precapture

vLLM logs

2025-01-23 06:31:03.164 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 2
2025-01-23 06:31:20.781 | INFO     | models.demos.llama3.tt.generator:prefill_forward:415 - Finished prefill for all users up to 1355 tokens, Starting decode...
INFO 01-23 06:31:20 engine.py:291] Added request chat-acfbf7706b2e43f097050667073e45db.
INFO 01-23 06:31:20 engine.py:291] Added request chat-c1bad9a30af946bcbd8f816e755f7e2f.
INFO 01-23 06:31:20 engine.py:291] Added request chat-6dcbf97b82f44f09a0ef0edda5826cd1.
INFO 01-23 06:31:20 engine.py:291] Added request chat-876baf03e5ea460493f26da5e1fa1d31.
INFO 01-23 06:31:20 engine.py:291] Added request chat-99395597fa604cdc850617b7dc454141.
INFO 01-23 06:31:20 engine.py:291] Added request chat-4276c1b3855c4521a216361466bee5cf.
INFO 01-23 06:31:20 engine.py:291] Added request chat-761d8db4be254b2baba1ece31dae19ec.
INFO 01-23 06:31:20 engine.py:291] Added request chat-858105b6143249938484930e1de1f540.
INFO 01-23 06:31:20 engine.py:291] Added request chat-e73c4d0cf147471d86d68b19bfffd1f5.
INFO 01-23 06:31:20 engine.py:291] Added request chat-bfa94d273cc14891a1cb079634c2e779.
INFO 01-23 06:31:20 engine.py:291] Added request chat-55f4e116fcfa44d29922709a2022818b.
INFO 01-23 06:31:20 engine.py:291] Added request chat-1a1fa5ce5889464c8cf6e4c6dab0fb27.
INFO 01-23 06:31:20 engine.py:291] Added request chat-ed093b5f24404e0db1d28f6d9f915147.
INFO 01-23 06:31:20 engine.py:291] Added request chat-208d21d33e2f4fbe97955a054d615dfd.
2025-01-23 06:31:20.854 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 1
2025-01-23 06:31:25.035 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 2
2025-01-23 06:31:29.867 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 3
DEBUG 01-23 06:31:35 client.py:154] Heartbeat successful.
DEBUG 01-23 06:31:38 client.py:170] Waiting for output from MQLLMEngine.
2025-01-23 06:31:46.061 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 4
2025-01-23 06:31:50.894 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 5
2025-01-23 06:32:07.262 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 6
2025-01-23 06:32:11.291 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 7
DEBUG 01-23 06:32:16 client.py:154] Heartbeat successful.
2025-01-23 06:32:27.381 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 8
2025-01-23 06:32:30.794 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 9
2025-01-23 06:32:35.063 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 10
2025-01-23 06:32:39.741 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 11
2025-01-23 06:32:43.717 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 12
2025-01-23 06:32:48.788 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 13
2025-01-23 06:32:53.521 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 14
DEBUG 01-23 06:32:56 client.py:154] Heartbeat successful.
2025-01-23 06:32:58.786 | INFO     | models.demos.llama3.tt.generator:prefill_forward:415 - Finished prefill for all users up to 1928 tokens, Starting decode...
INFO 01-23 06:32:58 metrics.py:396] Avg prompt throughput: 13.1 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 89.8%, CPU KV cache usage: 0.0%.
2025-01-23 06:33:20.416 | INFO     | models.demos.llama3.tt.generator:_capture_trace:551 - Done Compiling Model
2025-01-23 06:33:20.480 | INFO     | models.demos.llama3.tt.generator:_capture_trace:622 - Done Capturing Decode Trace
INFO 01-23 06:33:20 metrics.py:396] Avg prompt throughput: 536.6 tokens/s, Avg generation throughput: 0.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 89.8%, CPU KV cache usage: 0.0%.
INFO 01-23 06:33:26 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 90.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:33:32 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.8 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 91.4%, CPU KV cache usage: 0.0%.
DEBUG 01-23 06:33:36 client.py:154] Heartbeat successful.
INFO 01-23 06:33:37 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.7 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 92.1%, CPU KV cache usage: 0.0%.
INFO 01-23 06:33:43 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.7 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 92.7%, CPU KV cache usage: 0.0%.
INFO 01-23 06:33:49 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.7 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 93.5%, CPU KV cache usage: 0.0%.
INFO 01-23 06:33:55 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.7 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 94.2%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:00 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 94.9%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:06 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 95.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:12 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.2 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 96.4%, CPU KV cache usage: 0.0%.
DEBUG 01-23 06:34:16 client.py:154] Heartbeat successful.
INFO 01-23 06:34:18 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 97.1%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:24 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 97.9%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:29 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.5 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 98.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:35 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.5 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 99.4%, CPU KV cache usage: 0.0%.
WARNING 01-23 06:34:41 scheduler.py:1483] Sequence group chat-208d21d33e2f4fbe97955a054d615dfd is preempted by PreemptionMode.RECOMPUTE mode because there is not enough KV cache space. This can affect the end-to-end performance. Increase gpu_memory_utilization or tensor_parallel_size to provide more KV cache memory. total_num_cumulative_preemption=1
INFO 01-23 06:34:41 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.2 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 1 reqs, GPU KV cache usage: 98.2%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:47 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.5 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 1 reqs, GPU KV cache usage: 98.8%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:53 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.4 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 1 reqs, GPU KV cache usage: 99.4%, CPU KV cache usage: 0.0%.
DEBUG 01-23 06:34:56 client.py:154] Heartbeat successful.
INFO 01-23 06:34:59 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 14.9 tokens/s, Running: 14 reqs, Swapped: 0 reqs, Pending: 2 reqs, GPU KV cache usage: 98.9%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:05 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 14.4 tokens/s, Running: 14 reqs, Swapped: 0 reqs, Pending: 2 reqs, GPU KV cache usage: 99.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:11 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 13.9 tokens/s, Running: 13 reqs, Swapped: 0 reqs, Pending: 3 reqs, GPU KV cache usage: 98.2%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:16 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 13.4 tokens/s, Running: 13 reqs, Swapped: 0 reqs, Pending: 3 reqs, GPU KV cache usage: 98.7%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:22 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 13.3 tokens/s, Running: 13 reqs, Swapped: 0 reqs, Pending: 3 reqs, GPU KV cache usage: 99.3%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:28 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 13.3 tokens/s, Running: 13 reqs, Swapped: 0 reqs, Pending: 3 reqs, GPU KV cache usage: 99.9%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:34 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 12.3 tokens/s, Running: 12 reqs, Swapped: 0 reqs, Pending: 4 reqs, GPU KV cache usage: 99.0%, CPU KV cache usage: 0.0%.
DEBUG 01-23 06:35:36 client.py:154] Heartbeat successful.
INFO 01-23 06:35:40 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 12.3 tokens/s, Running: 12 reqs, Swapped: 0 reqs, Pending: 4 reqs, GPU KV cache usage: 99.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:46 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.6 tokens/s, Running: 11 reqs, Swapped: 0 reqs, Pending: 5 reqs, GPU KV cache usage: 98.1%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:52 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.2 tokens/s, Running: 11 reqs, Swapped: 0 reqs, Pending: 5 reqs, GPU KV cache usage: 98.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:58 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.3 tokens/s, Running: 11 reqs, Swapped: 0 reqs, Pending: 5 reqs, GPU KV cache usage: 99.1%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:04 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.2 tokens/s, Running: 11 reqs, Swapped: 0 reqs, Pending: 5 reqs, GPU KV cache usage: 99.7%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:09 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.7 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 97.7%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:14 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.2 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 98.2%, CPU KV cache usage: 0.0%.
DEBUG 01-23 06:36:16 client.py:154] Heartbeat successful.
INFO 01-23 06:36:20 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.2 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 98.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:26 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.2 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 99.1%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:32 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.1 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 99.5%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:38 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.1 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 99.9%, CPU KV cache usage: 0.0%.
...
ERROR 01-23 06:36:40 engine.py:159] AssertionError('block table already exists')
ERROR 01-23 06:36:40 engine.py:159] Traceback (most recent call last):
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 157, in start
ERROR 01-23 06:36:40 engine.py:159]     self.run_engine_loop() 
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 220, in run_engine_loop
ERROR 01-23 06:36:40 engine.py:159]     request_outputs = self.engine_step()
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 238, in engine_step 
ERROR 01-23 06:36:40 engine.py:159]     raise e
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 229, in engine_step 
ERROR 01-23 06:36:40 engine.py:159]     return self.engine.step()
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/engine/llm_engine.py", line 1358, in step
ERROR 01-23 06:36:40 engine.py:159]     ) = self.scheduler[virtual_engine].schedule()
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/scheduler.py", line 1219, in schedule
ERROR 01-23 06:36:40 engine.py:159]     scheduler_outputs: SchedulerOutputs = self._schedule()
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/scheduler.py", line 1178, in _schedule
ERROR 01-23 06:36:40 engine.py:159]     return self._schedule_default()
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/scheduler.py", line 1013, in _schedule_default
ERROR 01-23 06:36:40 engine.py:159]     prefills = self._schedule_prefills(budget,
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/scheduler.py", line 949, in _schedule_prefills
ERROR 01-23 06:36:40 engine.py:159]     self._allocate_and_set_running(seq_group)
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/scheduler.py", line 1413, in _allocate_and_set_running 
ERROR 01-23 06:36:40 engine.py:159]     self.block_manager.allocate(seq_group)
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/block_manager.py", line 190, in allocate
ERROR 01-23 06:36:40 engine.py:159]     assert (request_id
ERROR 01-23 06:36:40 engine.py:159] AssertionError: block table already exists
...
ERROR:    Exception in ASGI application
...
vllm.engine.multiprocessing.MQEngineDeadError: Engine loop is not running. Inspect the stacktrace to find the original error: AssertionError('block table already exists').
CRITICAL 01-23 06:36:40 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     127.0.0.1:58226 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [37226]
DEBUG 01-23 06:36:43 client.py:157] Shutting down MQLLMEngineClient check health loop.
DEBUG 01-23 06:36:43 client.py:224] Shutting down MQLLMEngineClient output handler.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@tstescoTT tstescoTT added the bug Something isn't working label Jan 23, 2025
@tstescoTT
Copy link
Author

I also see hangs during prefill sometimes with the same workload on N300, see below I bumped the VLLM_RPC_TIMEOUT to 15 minutes to be sure:

INFO 01-24 01:32:47 engine.py:291] Added request chat-ff8f90f321124398bc24ff642f1247c4.
INFO 01-24 01:32:47 engine.py:291] Added request chat-662a199459da43f180bbc26947e968c7.
2025-01-24 01:32:47.303 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 1
2025-01-24 01:32:50.630 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 2
2025-01-24 01:32:54.265 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 3
2025-01-24 01:32:57.918 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 4
2025-01-24 01:33:01.514 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 5
2025-01-24 01:33:05.184 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 6
2025-01-24 01:33:08.932 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 7
2025-01-24 01:33:12.989 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 8
2025-01-24 01:33:17.748 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 9
DEBUG 01-24 01:33:21 client.py:154] Heartbeat successful.
2025-01-24 01:33:23.628 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 10
DEBUG 01-24 01:47:47 client.py:170] Waiting for output from MQLLMEngine.
ERROR 01-24 01:48:21 client.py:250] TimeoutError('No heartbeat received from MQLLMEngine')
ERROR 01-24 01:48:21 client.py:250] NoneType: None
DEBUG 01-24 01:48:21 client.py:144] Shutting down MQLLMEngineClient check health loop due to timeout

@tstescoTT
Copy link
Author

Occurred again on https://github.com/tenstorrent/vllm/tree/b9564bf364e95a3850619fc7b2ed968cc71e30b7

  File "/home/container_app_user/vllm/vllm/utils.py", line 506, in merge_async_iterators
    item = await d
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
    await func()
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 242, in stream_response
    async for chunk in self.body_iterator:
  File "/home/container_app_user/vllm/vllm/entrypoints/openai/serving_completion.py", line 262, in completion_stream_generator
    async for prompt_idx, res in result_generator:
  File "/home/container_app_user/vllm/vllm/utils.py", line 506, in merge_async_iterators
    item = await d
  File "/home/container_app_user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
  File "/home/container_app_user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
  File "/home/container_app_user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
  [Previous line repeated 95 more times]
AssertionError: block table already exists
                 Device | INFO     | Closing user mode device drivers
CRITICAL 02-02 23:11:55 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     127.0.0.1:45676 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [37]
DEBUG 02-02 23:11:55 client.py:157] Shutting down MQLLMEngineClient check health loop.
DEBUG 02-02 23:11:55 client.py:224] Shutting down MQLLMEngineClient output handler.
                 Device | INFO     | Closing user mode device drivers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants