Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such file or directory: pytorch_model.bin #74

Open
yaolu-zjut opened this issue Aug 29, 2024 · 2 comments
Open

No such file or directory: pytorch_model.bin #74

yaolu-zjut opened this issue Aug 29, 2024 · 2 comments

Comments

@yaolu-zjut
Copy link

yaolu-zjut commented Aug 29, 2024

          Issue resolved. The problem is that when constructing the trainer, `save_safetensors=False` should be set. Otherwise, the above `safe_serialization=False` will not work.  

https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments.save_safetensors

Originally posted by @WilliamYi96 in #45 (comment)

I use gemma2-2b-it with transformers, it reports 'No such file or directory: pytorch_model.bin'. Then I following the instructions of @WilliamYi96, but it still not work. Can somebody help me? Here is my code:

trainer = transformers.Trainer(
        model=model,
        train_dataset=train_data,
        eval_dataset=val_data,
        args=transformers.TrainingArguments(
            per_device_train_batch_size=args.micro_batch_size,
            gradient_accumulation_steps=gradient_accumulation_steps,
            warmup_steps=100,  # 100 ori
            num_train_epochs=args.num_epochs,
            learning_rate=args.learning_rate,
            fp16=True,  # not torch.cuda.is_bf16_supported()
            bf16=False,  # torch.cuda.is_bf16_supported()
            logging_steps=10,
            logging_first_step=True,
            optim="adamw_torch",
            evaluation_strategy="steps",
            save_strategy="steps",
            eval_steps=100,
            save_steps=200,
            output_dir=args.output_dir,
            save_total_limit=20,
            max_grad_norm=1.0,
            report_to="none",
            load_best_model_at_end=True,
            # lr_scheduler_type="linear",
            ddp_find_unused_parameters=False if ddp else None,
            group_by_length=args.group_by_length,
            run_name=args.output_dir.split('/')[-1],
            metric_for_best_model="{}_loss".format(args.data_path),
            save_safetensors=False
        ),
        data_collator=transformers.DataCollatorForSeq2Seq(
            tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
        ),
    )

    model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
    trainer.train()

    if args.save_model:
        output_lora_dir = '/public/MountData/yaolu/LLM_pretrained/pruned_model/partial_tuing_alpaca_{}_{}/'.format(args.base_model, args.partial_layer_name)
        if not os.path.exists(output_lora_dir):
            os.mkdir(output_lora_dir)
        model.save_pretrained(output_lora_dir, safe_serialization=False)
@yaolu-zjut yaolu-zjut changed the title Issue resolved. The problem is that when constructing the trainer, save_safetensors=False should be set. Otherwise, the above safe_serialization=False will not work. No such file or directory: pytorch_model.bin Aug 29, 2024
@irislin1006
Copy link

Hi @yaolu-zjut,

This should work on the latest version. I posted more details in huggingface/transformers#31734 (comment).

@VincentZ-2020
Copy link

I also met this. Upgrading the transformer version == 4.44.2 does not solve.save_safetensors=False can be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants