-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to deploy mistralai/Mistral-Nemo-Instruct-2407 #88
Comments
2024-07-30T07:27:23.496500082Z File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 768, in from_dict Guess different error trying this meta-llama/Meta-Llama-3.1-8B-Instruct |
2024-07-30T20:39:41.413020519Z ValueError: I am not sure what the issue is it seems it is everytime I do something that is for llama |
This is actually an error with vLLM base image. This issue is fixed in the newest version of vLLM (v0.5.3) but this project has not been updated yet |
Awesome thanks for letting me know I will just keep updated. This is super awesome project thank you all |
I tried with that Dockerfile and deployed it
But the code also needs to be updated in the new version of vllm, |
Hello you all keep scratching my head why sometimes I can deploy all on list but stuff I find having issues
anyways this is my logs just trying to use this repo https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
I think same error was with llama 3.1 8b also not sure on quantization ones.
know it snoob stuff but thanks for help here is the logs
Search
0 matches
2024-07-30T07:21:51.526398877Z /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning:
resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True
.2024-07-30T07:21:51.526440887Z warnings.warn(
2024-07-30T07:21:51.915487979Z INFO 07-30 07:21:51 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='mistralai/Mistral-Nemo-Instruct-2407', speculative_config=None, tokenizer='mistralai/Mistral-Nemo-Instruct-2407', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=1024000, download_dir='/runpod-volume/huggingface-cache/hub', load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=mistralai/Mistral-Nemo-Instruct-2407)
2024-07-30T07:21:52.759581288Z INFO 07-30 07:21:52 utils.py:628] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
2024-07-30T07:21:53.481568872Z INFO 07-30 07:21:53 selector.py:27] Using FlashAttention-2 backend.
2024-07-30T07:21:53.845809406Z engine.py :110 2024-07-30 07:21:53,845 Error initializing vLLM engine: Head size 160 is not supported by PagedAttention. Supported head sizes are: [64, 80, 96, 112, 128, 256].
2024-07-30T07:21:53.854397126Z [rank0]: Traceback (most recent call last):
2024-07-30T07:21:53.854422216Z [rank0]: File "/src/handler.py", line 6, in
2024-07-30T07:21:53.854425386Z [rank0]: vllm_engine = vLLMEngine()
2024-07-30T07:21:53.854427736Z [rank0]: File "/src/engine.py", line 25, in init
2024-07-30T07:21:53.854429746Z [rank0]: self.llm = self._initialize_llm() if engine is None else engine
2024-07-30T07:21:53.854432376Z [rank0]: File "/src/engine.py", line 111, in _initialize_llm
2024-07-30T07:21:53.854434466Z [rank0]: raise e
2024-07-30T07:21:53.854437186Z [rank0]: File "/src/engine.py", line 105, in _initialize_llm
2024-07-30T07:21:53.854439236Z [rank0]: engine = AsyncLLMEngine.from_engine_args(AsyncEngineArgs(**self.config))
2024-07-30T07:21:53.854441546Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args
2024-07-30T07:21:53.854444006Z [rank0]: engine = cls(
2024-07-30T07:21:53.854446326Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 324, in init
2024-07-30T07:21:53.854448836Z [rank0]: self.engine = self._init_engine(*args, **kwargs)
2024-07-30T07:21:53.854450826Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
2024-07-30T07:21:53.854452766Z [rank0]: return engine_class(*args, **kwargs)
2024-07-30T07:21:53.854454676Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 160, in init
2024-07-30T07:21:53.854456636Z [rank0]: self.model_executor = executor_class(
2024-07-30T07:21:53.854458526Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 41, in init
2024-07-30T07:21:53.854460446Z [rank0]: self._init_executor()
2024-07-30T07:21:53.854462396Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 23, in _init_executor
2024-07-30T07:21:53.854464276Z [rank0]: self._init_non_spec_worker()
2024-07-30T07:21:53.854466236Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 69, in _init_non_spec_worker
2024-07-30T07:21:53.854468156Z [rank0]: self.driver_worker.load_model()
2024-07-30T07:21:53.854470026Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 118, in load_model
2024-07-30T07:21:53.854471956Z [rank0]: self.model_runner.load_model()
2024-07-30T07:21:53.854473856Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 164, in load_model
2024-07-30T07:21:53.854490726Z [rank0]: self.model = get_model(
2024-07-30T07:21:53.854492966Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/init.py", line 19, in get_model
2024-07-30T07:21:53.854495946Z [rank0]: return loader.load_model(model_config=model_config,
2024-07-30T07:21:53.854498016Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 222, in load_model
2024-07-30T07:21:53.854500036Z [rank0]: model = _initialize_model(model_config, self.load_config,
2024-07-30T07:21:53.854502056Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 88, in _initialize_model
2024-07-30T07:21:53.854504536Z [rank0]: return model_class(config=model_config.hf_config,
2024-07-30T07:21:53.854506526Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 338, in init
2024-07-30T07:21:53.854508526Z [rank0]: self.model = LlamaModel(config, quant_config, lora_config=lora_config)
The text was updated successfully, but these errors were encountered: