Unable to deploy mistralai/Mistral-Nemo-Instruct-2407 #88

TheMindExpansionNetwork · 2024-07-30T07:25:10Z

Hello you all keep scratching my head why sometimes I can deploy all on list but stuff I find having issues

anyways this is my logs just trying to use this repo https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407

I think same error was with llama 3.1 8b also not sure on quantization ones.

know it snoob stuff but thanks for help here is the logs

Search

0 matches
2024-07-30T07:21:51.526398877Z /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
2024-07-30T07:21:51.526440887Z warnings.warn(
2024-07-30T07:21:51.915487979Z INFO 07-30 07:21:51 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='mistralai/Mistral-Nemo-Instruct-2407', speculative_config=None, tokenizer='mistralai/Mistral-Nemo-Instruct-2407', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=1024000, download_dir='/runpod-volume/huggingface-cache/hub', load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=mistralai/Mistral-Nemo-Instruct-2407)
2024-07-30T07:21:52.759581288Z INFO 07-30 07:21:52 utils.py:628] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
2024-07-30T07:21:53.481568872Z INFO 07-30 07:21:53 selector.py:27] Using FlashAttention-2 backend.
2024-07-30T07:21:53.845809406Z engine.py :110 2024-07-30 07:21:53,845 Error initializing vLLM engine: Head size 160 is not supported by PagedAttention. Supported head sizes are: [64, 80, 96, 112, 128, 256].
2024-07-30T07:21:53.854397126Z [rank0]: Traceback (most recent call last):
2024-07-30T07:21:53.854422216Z [rank0]: File "/src/handler.py", line 6, in
2024-07-30T07:21:53.854425386Z [rank0]: vllm_engine = vLLMEngine()
2024-07-30T07:21:53.854427736Z [rank0]: File "/src/engine.py", line 25, in init
2024-07-30T07:21:53.854429746Z [rank0]: self.llm = self._initialize_llm() if engine is None else engine
2024-07-30T07:21:53.854432376Z [rank0]: File "/src/engine.py", line 111, in _initialize_llm
2024-07-30T07:21:53.854434466Z [rank0]: raise e
2024-07-30T07:21:53.854437186Z [rank0]: File "/src/engine.py", line 105, in _initialize_llm
2024-07-30T07:21:53.854439236Z [rank0]: engine = AsyncLLMEngine.from_engine_args(AsyncEngineArgs(**self.config))
2024-07-30T07:21:53.854441546Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args
2024-07-30T07:21:53.854444006Z [rank0]: engine = cls(
2024-07-30T07:21:53.854446326Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 324, in init
2024-07-30T07:21:53.854448836Z [rank0]: self.engine = self._init_engine(*args, **kwargs)
2024-07-30T07:21:53.854450826Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
2024-07-30T07:21:53.854452766Z [rank0]: return engine_class(*args, **kwargs)
2024-07-30T07:21:53.854454676Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 160, in init
2024-07-30T07:21:53.854456636Z [rank0]: self.model_executor = executor_class(
2024-07-30T07:21:53.854458526Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 41, in init
2024-07-30T07:21:53.854460446Z [rank0]: self._init_executor()
2024-07-30T07:21:53.854462396Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 23, in _init_executor
2024-07-30T07:21:53.854464276Z [rank0]: self._init_non_spec_worker()
2024-07-30T07:21:53.854466236Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 69, in _init_non_spec_worker
2024-07-30T07:21:53.854468156Z [rank0]: self.driver_worker.load_model()
2024-07-30T07:21:53.854470026Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 118, in load_model
2024-07-30T07:21:53.854471956Z [rank0]: self.model_runner.load_model()
2024-07-30T07:21:53.854473856Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 164, in load_model
2024-07-30T07:21:53.854490726Z [rank0]: self.model = get_model(
2024-07-30T07:21:53.854492966Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/init.py", line 19, in get_model
2024-07-30T07:21:53.854495946Z [rank0]: return loader.load_model(model_config=model_config,
2024-07-30T07:21:53.854498016Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 222, in load_model
2024-07-30T07:21:53.854500036Z [rank0]: model = _initialize_model(model_config, self.load_config,
2024-07-30T07:21:53.854502056Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 88, in _initialize_model
2024-07-30T07:21:53.854504536Z [rank0]: return model_class(config=model_config.hf_config,
2024-07-30T07:21:53.854506526Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 338, in init
2024-07-30T07:21:53.854508526Z [rank0]: self.model = LlamaModel(config, quant_config, lora_config=lora_config)

The text was updated successfully, but these errors were encountered:

TheMindExpansionNetwork · 2024-07-30T07:28:39Z

2024-07-30T07:27:23.496500082Z File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 768, in from_dict
2024-07-30T07:27:23.496505705Z config = cls(**config_dict)
2024-07-30T07:27:23.496516212Z File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/configuration_llama.py", line 161, in init
2024-07-30T07:27:23.496750253Z self._rope_scaling_validation()
2024-07-30T07:27:23.496770229Z File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/configuration_llama.py", line 182, in _rope_scaling_validation
2024-07-30T07:27:23.496776450Z raise ValueError(
2024-07-30T07:27:23.496792960Z ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
2024-07-30T07:27:38.388007495Z Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-07-30T07:27:38.390861478Z /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
2024-07-30T07:27:38.390897675Z warnings.warn(
2024-07-30T07:27:38.435671350Z engine.py :110 2024-07-30 07:27:38,434 Error initializing vLLM engine: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
2024-07-30T07:27:38.435749229Z Traceback (most recent call last):
2024-07-30T07:27:38.435755539Z File "/src/handler.py", line 6, in
2024-07-30T07:27:38.435760036Z vllm_engine = vLLMEngine()
2024-07-30T07:27:38.435765005Z File "/src/engine.py", line 25, in init
2024-07-30T07:27:38.435770196Z self.llm = self._initialize_llm() if engine is None else engine
2024-07-30T07:27:38.435786726Z File "/src/engine.py", line 111, in _initialize_llm
2024-07-30T07:27:38.435791799Z raise e
2024-07-30T07:27:38.435796432Z File "/src/engine.py", line 105, in _initialize_llm
2024-07-30T07:27:38.435800735Z engine = AsyncLLMEngine.from_engine_args(AsyncEngineArgs(**self.config))
2024-07-30T07:27:38.435805046Z File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 346, in from_engine_args
2024-07-30T07:27:38.435864074Z engine_config = engine_args.create_engine_config()
2024-07-30T07:27:38.435870006Z File "/usr/local/lib/python3.10/dist-packages/vllm/engine/arg_utils.py", line 520, in create_engine_config
2024-07-30T07:27:38.435875168Z model_config = ModelConfig(
2024-07-30T07:27:38.435879946Z File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 119, in init
2024-07-30T07:27:38.435884668Z self.hf_config = get_config(self.model, trust_remote_code, revision,
2024-07-30T07:27:38.435889614Z File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 38, in get_config
2024-07-30T07:27:38.435893901Z raise e
2024-07-30T07:27:38.435898595Z File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 23, in get_config
2024-07-30T07:27:38.435902844Z config = AutoConfig.from_pretrained(
2024-07-30T07:27:38.435907398Z File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 958, in from_pretrained
2024-07-30T07:27:38.436363428Z return config_class.from_dict(config_dict, **unused_kwargs)
2024-07-30T07:27:38.436383585Z File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 768, in from_dict
2024-07-30T07:27:38.436408462Z config = cls(**config_dict)
2024-07-30T07:27:38.436413777Z File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/configuration_llama.py", line 161, in init
2024-07-30T07:27:38.436418385Z self._rope_scaling_validation()
2024-07-30T07:27:38.436423064Z File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/configuration_llama.py", line 182, in _rope_scaling_validation
2024-07-30T07:27:38.436428017Z raise ValueError(
2024-07-30T07:27:38.436433264Z ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'

Guess different error trying this meta-llama/Meta-Llama-3.1-8B-Instruct

TheMindExpansionNetwork · 2024-07-30T20:41:05Z

2024-07-30T20:39:41.413020519Z ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'high_freq_factor': 4.0, 'low_freq_factor': 1.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

I am not sure what the issue is it seems it is everytime I do something that is for llama

originade · 2024-07-31T20:42:36Z

This is actually an error with vLLM base image. This issue is fixed in the newest version of vLLM (v0.5.3) but this project has not been updated yet

TheMindExpansionNetwork · 2024-07-31T22:01:46Z

Awesome thanks for letting me know I will just keep updated.

This is super awesome project thank you all

hypnocapybara · 2024-08-01T01:13:54Z

I tried with that Dockerfile and deployed it

FROM runpod/worker-vllm:stable-cuda12.1.0


RUN --mount=type=cache,target=/root/.cache/pip \
    python3 -m pip install --upgrade pip && \
    python3 -m pip install --upgrade -r /requirements.txt

RUN python3 -m pip install --upgrade vllm transformers


CMD ["python3", "/src/handler.py"]

But the code also needs to be updated in the new version of vllm, OpenAIServingChat accepts the config param

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to deploy mistralai/Mistral-Nemo-Instruct-2407 #88

Unable to deploy mistralai/Mistral-Nemo-Instruct-2407 #88

TheMindExpansionNetwork commented Jul 30, 2024

TheMindExpansionNetwork commented Jul 30, 2024

TheMindExpansionNetwork commented Jul 30, 2024

originade commented Jul 31, 2024

TheMindExpansionNetwork commented Jul 31, 2024

hypnocapybara commented Aug 1, 2024

Unable to deploy mistralai/Mistral-Nemo-Instruct-2407 #88

Unable to deploy mistralai/Mistral-Nemo-Instruct-2407 #88

Comments

TheMindExpansionNetwork commented Jul 30, 2024

TheMindExpansionNetwork commented Jul 30, 2024

TheMindExpansionNetwork commented Jul 30, 2024

originade commented Jul 31, 2024

TheMindExpansionNetwork commented Jul 31, 2024

hypnocapybara commented Aug 1, 2024