Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUN ERROR #35

Closed
superdabuniu opened this issue Dec 13, 2024 · 5 comments
Closed

RUN ERROR #35

superdabuniu opened this issue Dec 13, 2024 · 5 comments

Comments

@superdabuniu
Copy link

Problem

module src/training/trainer.py line 206
state_dict = {k:v for k, v in state_dict.items() if "wte" not in k}
Error: state_dict is None

Background

I just run scripts/finetune.sh on single A100 GPU. Due to lack of mpi4py, my environment can't use deepspeed, so I remove the --deepspeed scrpits/zero3.json.

My solution

Before this line, I add the following line refer to line 190-191
if state_dict is None:
state_dict = self.model.state_dict()

Is it correct?

@2U1
Copy link
Owner

2U1 commented Dec 13, 2024

Yes I've fixed the trainer code.
I didn't noticed it since I was only using deepspeed. Thanks.

if state_dict is None:
state_dict = self.model.state_dict()

@superdabuniu
Copy link
Author

Background

excute bash scripts/merge_lora.sh

Problem

  1. Error: "lack of config.json or configuration_phi3_v.py."
    I copy the *.json and *.py into the checkpoint-xxx from phi-3.5-vision-instruct.

  2. Error: "ModuleNotFoundError: No module named 'transformers_modules.phi_3'",
    update merge_lora_weights.py
    line 6: load_pretrained_model function, I add the parameters trust_remote_code=True

It still can't work. Can you give some suggestions?

@2U1
Copy link
Owner

2U1 commented Dec 13, 2024

That's becuase the auto config model name has changed you need to open the config file and fix it.
I've made a modifying code but I don't know why it's not working in your case right now.

Change the autoconfig in config.json like
"AutoConfig": "microsoft/Phi-3.5-vision-instruct--configuration_phi3_v.Phi3VConfig"

@superdabuniu
Copy link
Author

That's becuase the auto config model name has changed you need to open the config file and fix it. I've made a modifying code but I don't know why it's not working in your case right now.

Change the autoconfig in config.json like "AutoConfig": "microsoft/Phi-3.5-vision-instruct--configuration_phi3_v.Phi3VConfig"

my environment is offline, I download the phi-3.5-vision-instruct in my disk.

@2U1
Copy link
Owner

2U1 commented Dec 13, 2024

@superdabuniu You could change the model loading script in utils.py

processor = AutoProcessor.from_pretrained(model_base, trust_remote_code=True)
print('Loading Phi3-Vision from base model...')
model = AutoModelForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, trust_remote_code=True, **kwargs)

to load it from your local directory.

@2U1 2U1 closed this as completed Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants