Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Multiprocessing FileNotFound error in triton cache #6180

Closed
jl3676 opened this issue Jul 7, 2024 · 2 comments
Closed

[Bug]: Multiprocessing FileNotFound error in triton cache #6180

jl3676 opened this issue Jul 7, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@jl3676
Copy link

jl3676 commented Jul 7, 2024

Your current environment

Hi,

When loading the mistralai/Mixtral-8x22B-Instruct-v0.1 model using the LLM function, I keep running into a FileNotFound error where one VllmWorkerProcess appears to try to access a temporary triton cache file of another VllmWorkerProcess, which doesn't exist. My code ran perfectly fine the first time I ran it with the same LLM, but after I killed the first run and tried to run the same code again, I've been having this error. I was able to load another language model meta-llama/Meta-Llama-3-70B-Instruct using the same code. I've tried clearing triton and python caches, but the error persisted. I wasn't able to find any information about this error online, so I'm reaching out for help. I'd appreciate it very much if anyone could share any ideas for fixing this issue. I've attached the full log below.

Thanks,
Jing-Jing Li

🐛 Describe the bug

transformers counterpart, which finished running without an error:

model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

Minimal code that reproduces the error:

model = LLM(model="mistralai/Mixtral-8x22B-Instruct-v0.1",
                    dtype="auto",
                    trust_remote_code=True,
                    tokenizer_mode="auto",
                    tensor_parallel_size=8)

Log:

INFO 07-06 17:29:56 config.py:698] Defaulting to use mp for distributed inference
INFO 07-06 17:29:56 llm_engine.py:169] Initializing an LLM engine (v0.5.1) with config: model='mistralai/Mixtral-8x22B-Instruct-v0.1', speculative_config=None, tokenizer='mistralai/Mixtral-8x22B-Instruct-v0.1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=mistralai/Mixtral-8x22B-Instruct-v0.1, use_v2_block_manager=False, enable_prefix_caching=False)
(VllmWorkerProcess pid=676) INFO 07-06 17:29:58 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=677) INFO 07-06 17:29:58 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=680) INFO 07-06 17:29:58 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=681) INFO 07-06 17:29:58 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=679) INFO 07-06 17:29:58 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=682) INFO 07-06 17:29:58 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=678) INFO 07-06 17:29:58 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=680) INFO 07-06 17:29:59 utils.py:741] Found nccl from library libnccl.so.2
INFO 07-06 17:29:59 utils.py:741] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=680) INFO 07-06 17:29:59 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=676) INFO 07-06 17:29:59 utils.py:741] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=678) INFO 07-06 17:29:59 utils.py:741] Found nccl from library libnccl.so.2
INFO 07-06 17:29:59 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=676) INFO 07-06 17:29:59 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=678) INFO 07-06 17:29:59 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=679) INFO 07-06 17:29:59 utils.py:741] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=681) INFO 07-06 17:29:59 utils.py:741] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=679) INFO 07-06 17:29:59 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=682) INFO 07-06 17:29:59 utils.py:741] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=681) INFO 07-06 17:29:59 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=682) INFO 07-06 17:29:59 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=677) INFO 07-06 17:29:59 utils.py:741] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=677) INFO 07-06 17:29:59 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=676) WARNING 07-06 17:29:59 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=677) WARNING 07-06 17:29:59 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=682) WARNING 07-06 17:29:59 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
WARNING 07-06 17:29:59 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=681) WARNING 07-06 17:29:59 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=680) WARNING 07-06 17:29:59 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=679) WARNING 07-06 17:29:59 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=678) WARNING 07-06 17:29:59 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=678) INFO 07-06 17:30:00 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=682) INFO 07-06 17:30:00 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=681) INFO 07-06 17:30:00 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=679) INFO 07-06 17:30:00 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 07-06 17:30:00 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=677) INFO 07-06 17:30:00 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=680) INFO 07-06 17:30:00 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=676) INFO 07-06 17:30:00 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 07-06 17:30:46 model_runner.py:255] Loading model weights took 32.7642 GB
(VllmWorkerProcess pid=678) INFO 07-06 17:30:46 model_runner.py:255] Loading model weights took 32.7642 GB
(VllmWorkerProcess pid=676) INFO 07-06 17:30:46 model_runner.py:255] Loading model weights took 32.7642 GB
(VllmWorkerProcess pid=677) INFO 07-06 17:30:46 model_runner.py:255] Loading model weights took 32.7642 GB
(VllmWorkerProcess pid=681) INFO 07-06 17:30:46 model_runner.py:255] Loading model weights took 32.7642 GB
(VllmWorkerProcess pid=680) INFO 07-06 17:30:46 model_runner.py:255] Loading model weights took 32.7642 GB
(VllmWorkerProcess pid=679) INFO 07-06 17:30:46 model_runner.py:255] Loading model weights took 32.7642 GB
(VllmWorkerProcess pid=682) INFO 07-06 17:30:46 model_runner.py:255] Loading model weights took 32.7642 GB
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: [Errno 2] No such file or directory: '/home/jingjingl/.triton/cache/071003440889f160321353eb6ba91eac/fused_moe_kernel.cubin.tmp.pid_677_441001', Traceback (most recent call last):
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/worker/worker.py", line 173, in determine_num_available_blocks
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     self.model_runner.profile_run()
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 874, in profile_run
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1243, in execute_model
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/model_executor/models/mixtral.py", line 348, in forward
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     hidden_states = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/model_executor/models/mixtral.py", line 276, in forward
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     hidden_states, residual = layer(positions, hidden_states,
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/model_executor/models/mixtral.py", line 232, in forward
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     hidden_states = self.block_sparse_moe(hidden_states)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/model_executor/models/mixtral.py", line 95, in forward
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     final_hidden_states = self.experts(hidden_states, router_logits)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 186, in forward
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     final_hidden_states = self.quant_method.apply(
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 68, in apply
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return fused_moe(x,
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 574, in fused_moe
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return fused_experts(hidden_states,
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 488, in fused_experts
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     invoke_fused_moe_kernel(curr_hidden_states,
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 246, in invoke_fused_moe_kernel
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     fused_moe_kernel[grid](
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in <lambda>
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     self.cache[device][key] = compile(
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/triton/compiler/compiler.py", line 202, in compile
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return CompiledKernel(so_path, metadata_group.get(metadata_filename))
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/triton/compiler/compiler.py", line 230, in __init__
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     self.asm = {
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/site-packages/triton/compiler/compiler.py", line 231, in <dictcomp>
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     file.suffix[1:]: file.read_bytes() if file.suffix[1:] == driver.binary_ext else file.read_text()
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/pathlib.py", line 1134, in read_text
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     with self.open(mode='r', encoding=encoding, errors=errors) as f:
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]   File "/home/jingjingl/.conda/envs/harm_project/lib/python3.10/pathlib.py", line 1119, in open
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]     return self._accessor.open(self, mode, buffering, encoding, errors,
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226] FileNotFoundError: [Errno 2] No such file or directory: '/home/jingjingl/.triton/cache/071003440889f160321353eb6ba91eac/fused_moe_kernel.cubin.tmp.pid_677_441001'
(VllmWorkerProcess pid=676) ERROR 07-06 17:30:48 multiproc_worker_utils.py:226]
@jl3676 jl3676 added the bug Something isn't working label Jul 7, 2024
@youkaichao
Copy link
Member

is it related with #6140 ?

@jl3676
Copy link
Author

jl3676 commented Jul 7, 2024

is it related with #6140 ?

Thanks for the pointer! This does look like a related solution, but it doesn't seem to apply to my setup because I'm not using vLLM from the docker image. However, through the issues linked in this post, I found out that someone worked around this bug by building triton from the source (since this appeared to be a triton bug and they recently pushed code to resolve it). Upgrading to the latest nightly release of triton solved my issue.

@jl3676 jl3676 closed this as completed Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants