[Bug]: Llama-3.2-11B-Vision-Instruct: block table already exists #55

tstescoTT · 2025-01-23T16:46:44Z

Your current environment

Model: Llama-3.2-11B-Vision-Instruct
TT device: N300
MESH_DEVICE=N300

Docker image: ghcr.io/tenstorrent/tt-inference-server/tt-metal-llama3-70b-src-base-vllm-ubuntu-22.04-amd64:v0.0.1-47fb1a2fb6e0-2f33504bad49

tt-metal branch: main (last verified commit: 47fb1a2)
vLLM branch: dev (last verified commit: 2f33504)

Model Input Dumps

No response

🐛 Describe the bug

I got this error a few times when running Llama-3.2-11B-Vision-Instruct in vLLM with successive batches of requests.

repro script: https://github.com/tenstorrent/tt-inference-server/blob/tstesco/dev/utils/prompt_client_cli.py

python prompt_client_cli.py \
    --num_prompts 1000 \
    --batch_size 16 \
    --tokenizer_model meta-llama/Llama-3.2-11B-Vision-Instruct \
    --distribution uniform \
    --max_prompt_length 2048 \
    --output_seq_len 2048 \
    --include_images \
    --image_width 512 \
    --image_height 512 \
    --use_chat_api \
    --skip_trace_precapture

vLLM logs

2025-01-23 06:31:03.164 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 2
2025-01-23 06:31:20.781 | INFO     | models.demos.llama3.tt.generator:prefill_forward:415 - Finished prefill for all users up to 1355 tokens, Starting decode...
INFO 01-23 06:31:20 engine.py:291] Added request chat-acfbf7706b2e43f097050667073e45db.
INFO 01-23 06:31:20 engine.py:291] Added request chat-c1bad9a30af946bcbd8f816e755f7e2f.
INFO 01-23 06:31:20 engine.py:291] Added request chat-6dcbf97b82f44f09a0ef0edda5826cd1.
INFO 01-23 06:31:20 engine.py:291] Added request chat-876baf03e5ea460493f26da5e1fa1d31.
INFO 01-23 06:31:20 engine.py:291] Added request chat-99395597fa604cdc850617b7dc454141.
INFO 01-23 06:31:20 engine.py:291] Added request chat-4276c1b3855c4521a216361466bee5cf.
INFO 01-23 06:31:20 engine.py:291] Added request chat-761d8db4be254b2baba1ece31dae19ec.
INFO 01-23 06:31:20 engine.py:291] Added request chat-858105b6143249938484930e1de1f540.
INFO 01-23 06:31:20 engine.py:291] Added request chat-e73c4d0cf147471d86d68b19bfffd1f5.
INFO 01-23 06:31:20 engine.py:291] Added request chat-bfa94d273cc14891a1cb079634c2e779.
INFO 01-23 06:31:20 engine.py:291] Added request chat-55f4e116fcfa44d29922709a2022818b.
INFO 01-23 06:31:20 engine.py:291] Added request chat-1a1fa5ce5889464c8cf6e4c6dab0fb27.
INFO 01-23 06:31:20 engine.py:291] Added request chat-ed093b5f24404e0db1d28f6d9f915147.
INFO 01-23 06:31:20 engine.py:291] Added request chat-208d21d33e2f4fbe97955a054d615dfd.
2025-01-23 06:31:20.854 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 1
2025-01-23 06:31:25.035 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 2
2025-01-23 06:31:29.867 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 3
DEBUG 01-23 06:31:35 client.py:154] Heartbeat successful.
DEBUG 01-23 06:31:38 client.py:170] Waiting for output from MQLLMEngine.
2025-01-23 06:31:46.061 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 4
2025-01-23 06:31:50.894 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 5
2025-01-23 06:32:07.262 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 6
2025-01-23 06:32:11.291 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 7
DEBUG 01-23 06:32:16 client.py:154] Heartbeat successful.
2025-01-23 06:32:27.381 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 8
2025-01-23 06:32:30.794 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 9
2025-01-23 06:32:35.063 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 10
2025-01-23 06:32:39.741 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 11
2025-01-23 06:32:43.717 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 12
2025-01-23 06:32:48.788 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 13
2025-01-23 06:32:53.521 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 14
DEBUG 01-23 06:32:56 client.py:154] Heartbeat successful.
2025-01-23 06:32:58.786 | INFO     | models.demos.llama3.tt.generator:prefill_forward:415 - Finished prefill for all users up to 1928 tokens, Starting decode...
INFO 01-23 06:32:58 metrics.py:396] Avg prompt throughput: 13.1 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 89.8%, CPU KV cache usage: 0.0%.
2025-01-23 06:33:20.416 | INFO     | models.demos.llama3.tt.generator:_capture_trace:551 - Done Compiling Model
2025-01-23 06:33:20.480 | INFO     | models.demos.llama3.tt.generator:_capture_trace:622 - Done Capturing Decode Trace
INFO 01-23 06:33:20 metrics.py:396] Avg prompt throughput: 536.6 tokens/s, Avg generation throughput: 0.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 89.8%, CPU KV cache usage: 0.0%.
INFO 01-23 06:33:26 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 90.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:33:32 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.8 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 91.4%, CPU KV cache usage: 0.0%.
DEBUG 01-23 06:33:36 client.py:154] Heartbeat successful.
INFO 01-23 06:33:37 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.7 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 92.1%, CPU KV cache usage: 0.0%.
INFO 01-23 06:33:43 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.7 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 92.7%, CPU KV cache usage: 0.0%.
INFO 01-23 06:33:49 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.7 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 93.5%, CPU KV cache usage: 0.0%.
INFO 01-23 06:33:55 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.7 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 94.2%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:00 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 94.9%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:06 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 95.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:12 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.2 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 96.4%, CPU KV cache usage: 0.0%.
DEBUG 01-23 06:34:16 client.py:154] Heartbeat successful.
INFO 01-23 06:34:18 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 97.1%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:24 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.6 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 97.9%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:29 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.5 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 98.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:35 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.5 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 99.4%, CPU KV cache usage: 0.0%.
WARNING 01-23 06:34:41 scheduler.py:1483] Sequence group chat-208d21d33e2f4fbe97955a054d615dfd is preempted by PreemptionMode.RECOMPUTE mode because there is not enough KV cache space. This can affect the end-to-end performance. Increase gpu_memory_utilization or tensor_parallel_size to provide more KV cache memory. total_num_cumulative_preemption=1
INFO 01-23 06:34:41 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.2 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 1 reqs, GPU KV cache usage: 98.2%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:47 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.5 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 1 reqs, GPU KV cache usage: 98.8%, CPU KV cache usage: 0.0%.
INFO 01-23 06:34:53 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 15.4 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 1 reqs, GPU KV cache usage: 99.4%, CPU KV cache usage: 0.0%.
DEBUG 01-23 06:34:56 client.py:154] Heartbeat successful.
INFO 01-23 06:34:59 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 14.9 tokens/s, Running: 14 reqs, Swapped: 0 reqs, Pending: 2 reqs, GPU KV cache usage: 98.9%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:05 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 14.4 tokens/s, Running: 14 reqs, Swapped: 0 reqs, Pending: 2 reqs, GPU KV cache usage: 99.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:11 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 13.9 tokens/s, Running: 13 reqs, Swapped: 0 reqs, Pending: 3 reqs, GPU KV cache usage: 98.2%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:16 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 13.4 tokens/s, Running: 13 reqs, Swapped: 0 reqs, Pending: 3 reqs, GPU KV cache usage: 98.7%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:22 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 13.3 tokens/s, Running: 13 reqs, Swapped: 0 reqs, Pending: 3 reqs, GPU KV cache usage: 99.3%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:28 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 13.3 tokens/s, Running: 13 reqs, Swapped: 0 reqs, Pending: 3 reqs, GPU KV cache usage: 99.9%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:34 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 12.3 tokens/s, Running: 12 reqs, Swapped: 0 reqs, Pending: 4 reqs, GPU KV cache usage: 99.0%, CPU KV cache usage: 0.0%.
DEBUG 01-23 06:35:36 client.py:154] Heartbeat successful.
INFO 01-23 06:35:40 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 12.3 tokens/s, Running: 12 reqs, Swapped: 0 reqs, Pending: 4 reqs, GPU KV cache usage: 99.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:46 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.6 tokens/s, Running: 11 reqs, Swapped: 0 reqs, Pending: 5 reqs, GPU KV cache usage: 98.1%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:52 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.2 tokens/s, Running: 11 reqs, Swapped: 0 reqs, Pending: 5 reqs, GPU KV cache usage: 98.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:35:58 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.3 tokens/s, Running: 11 reqs, Swapped: 0 reqs, Pending: 5 reqs, GPU KV cache usage: 99.1%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:04 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.2 tokens/s, Running: 11 reqs, Swapped: 0 reqs, Pending: 5 reqs, GPU KV cache usage: 99.7%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:09 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.7 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 97.7%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:14 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.2 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 98.2%, CPU KV cache usage: 0.0%.
DEBUG 01-23 06:36:16 client.py:154] Heartbeat successful.
INFO 01-23 06:36:20 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.2 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 98.6%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:26 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.2 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 99.1%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:32 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.1 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 99.5%, CPU KV cache usage: 0.0%.
INFO 01-23 06:36:38 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.1 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 6 reqs, GPU KV cache usage: 99.9%, CPU KV cache usage: 0.0%.
...
ERROR 01-23 06:36:40 engine.py:159] AssertionError('block table already exists')
ERROR 01-23 06:36:40 engine.py:159] Traceback (most recent call last):
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 157, in start
ERROR 01-23 06:36:40 engine.py:159]     self.run_engine_loop() 
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 220, in run_engine_loop
ERROR 01-23 06:36:40 engine.py:159]     request_outputs = self.engine_step()
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 238, in engine_step 
ERROR 01-23 06:36:40 engine.py:159]     raise e
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 229, in engine_step 
ERROR 01-23 06:36:40 engine.py:159]     return self.engine.step()
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/engine/llm_engine.py", line 1358, in step
ERROR 01-23 06:36:40 engine.py:159]     ) = self.scheduler[virtual_engine].schedule()
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/scheduler.py", line 1219, in schedule
ERROR 01-23 06:36:40 engine.py:159]     scheduler_outputs: SchedulerOutputs = self._schedule()
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/scheduler.py", line 1178, in _schedule
ERROR 01-23 06:36:40 engine.py:159]     return self._schedule_default()
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/scheduler.py", line 1013, in _schedule_default
ERROR 01-23 06:36:40 engine.py:159]     prefills = self._schedule_prefills(budget,
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/scheduler.py", line 949, in _schedule_prefills
ERROR 01-23 06:36:40 engine.py:159]     self._allocate_and_set_running(seq_group)
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/scheduler.py", line 1413, in _allocate_and_set_running 
ERROR 01-23 06:36:40 engine.py:159]     self.block_manager.allocate(seq_group)
ERROR 01-23 06:36:40 engine.py:159]   File "/home/user/vllm/vllm/core/block_manager.py", line 190, in allocate
ERROR 01-23 06:36:40 engine.py:159]     assert (request_id
ERROR 01-23 06:36:40 engine.py:159] AssertionError: block table already exists
...
ERROR:    Exception in ASGI application
...
vllm.engine.multiprocessing.MQEngineDeadError: Engine loop is not running. Inspect the stacktrace to find the original error: AssertionError('block table already exists').
CRITICAL 01-23 06:36:40 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     127.0.0.1:58226 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [37226]
DEBUG 01-23 06:36:43 client.py:157] Shutting down MQLLMEngineClient check health loop.
DEBUG 01-23 06:36:43 client.py:224] Shutting down MQLLMEngineClient output handler.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

tstescoTT · 2025-01-24T02:21:36Z

I also see hangs during prefill sometimes with the same workload on N300, see below I bumped the VLLM_RPC_TIMEOUT to 15 minutes to be sure:

INFO 01-24 01:32:47 engine.py:291] Added request chat-ff8f90f321124398bc24ff642f1247c4.
INFO 01-24 01:32:47 engine.py:291] Added request chat-662a199459da43f180bbc26947e968c7.
2025-01-24 01:32:47.303 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 1
2025-01-24 01:32:50.630 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 2
2025-01-24 01:32:54.265 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 3
2025-01-24 01:32:57.918 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 4
2025-01-24 01:33:01.514 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 5
2025-01-24 01:33:05.184 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 6
2025-01-24 01:33:08.932 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 7
2025-01-24 01:33:12.989 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 8
2025-01-24 01:33:17.748 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 9
DEBUG 01-24 01:33:21 client.py:154] Heartbeat successful.
2025-01-24 01:33:23.628 | INFO     | models.demos.llama3.tt.generator:prefill_forward:392 - Prefilling User 10
DEBUG 01-24 01:47:47 client.py:170] Waiting for output from MQLLMEngine.
ERROR 01-24 01:48:21 client.py:250] TimeoutError('No heartbeat received from MQLLMEngine')
ERROR 01-24 01:48:21 client.py:250] NoneType: None
DEBUG 01-24 01:48:21 client.py:144] Shutting down MQLLMEngineClient check health loop due to timeout

tstescoTT · 2025-02-03T16:58:18Z

Occurred again on https://github.com/tenstorrent/vllm/tree/b9564bf364e95a3850619fc7b2ed968cc71e30b7

  File "/home/container_app_user/vllm/vllm/utils.py", line 506, in merge_async_iterators
    item = await d
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
    await func()
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 242, in stream_response
    async for chunk in self.body_iterator:
  File "/home/container_app_user/vllm/vllm/entrypoints/openai/serving_completion.py", line 262, in completion_stream_generator
    async for prompt_idx, res in result_generator:
  File "/home/container_app_user/vllm/vllm/utils.py", line 506, in merge_async_iterators
    item = await d
  File "/home/container_app_user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
  File "/home/container_app_user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
  File "/home/container_app_user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
  [Previous line repeated 95 more times]
AssertionError: block table already exists
                 Device | INFO     | Closing user mode device drivers
CRITICAL 02-02 23:11:55 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     127.0.0.1:45676 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [37]
DEBUG 02-02 23:11:55 client.py:157] Shutting down MQLLMEngineClient check health loop.
DEBUG 02-02 23:11:55 client.py:224] Shutting down MQLLMEngineClient output handler.
                 Device | INFO     | Closing user mode device drivers

tstescoTT added the bug Something isn't working label Jan 23, 2025

tstescoTT assigned skhorasganiTT Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Llama-3.2-11B-Vision-Instruct: block table already exists #55

[Bug]: Llama-3.2-11B-Vision-Instruct: block table already exists #55

tstescoTT commented Jan 23, 2025

tstescoTT commented Jan 24, 2025

tstescoTT commented Feb 3, 2025

[Bug]: Llama-3.2-11B-Vision-Instruct: block table already exists #55

[Bug]: Llama-3.2-11B-Vision-Instruct: block table already exists #55

Comments

tstescoTT commented Jan 23, 2025

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

tstescoTT commented Jan 24, 2025

tstescoTT commented Feb 3, 2025