No such file or directory: pytorch_model.bin #74

yaolu-zjut · 2024-08-29T08:36:22Z

          Issue resolved. The problem is that when constructing the trainer, `save_safetensors=False` should be set. Otherwise, the above `safe_serialization=False` will not work.

https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments.save_safetensors

Originally posted by @WilliamYi96 in #45 (comment)

I use gemma2-2b-it with transformers, it reports 'No such file or directory: pytorch_model.bin'. Then I following the instructions of @WilliamYi96, but it still not work. Can somebody help me? Here is my code:

trainer = transformers.Trainer(
        model=model,
        train_dataset=train_data,
        eval_dataset=val_data,
        args=transformers.TrainingArguments(
            per_device_train_batch_size=args.micro_batch_size,
            gradient_accumulation_steps=gradient_accumulation_steps,
            warmup_steps=100,  # 100 ori
            num_train_epochs=args.num_epochs,
            learning_rate=args.learning_rate,
            fp16=True,  # not torch.cuda.is_bf16_supported()
            bf16=False,  # torch.cuda.is_bf16_supported()
            logging_steps=10,
            logging_first_step=True,
            optim="adamw_torch",
            evaluation_strategy="steps",
            save_strategy="steps",
            eval_steps=100,
            save_steps=200,
            output_dir=args.output_dir,
            save_total_limit=20,
            max_grad_norm=1.0,
            report_to="none",
            load_best_model_at_end=True,
            # lr_scheduler_type="linear",
            ddp_find_unused_parameters=False if ddp else None,
            group_by_length=args.group_by_length,
            run_name=args.output_dir.split('/')[-1],
            metric_for_best_model="{}_loss".format(args.data_path),
            save_safetensors=False
        ),
        data_collator=transformers.DataCollatorForSeq2Seq(
            tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
        ),
    )

    model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
    trainer.train()

    if args.save_model:
        output_lora_dir = '/public/MountData/yaolu/LLM_pretrained/pruned_model/partial_tuing_alpaca_{}_{}/'.format(args.base_model, args.partial_layer_name)
        if not os.path.exists(output_lora_dir):
            os.mkdir(output_lora_dir)
        model.save_pretrained(output_lora_dir, safe_serialization=False)

The text was updated successfully, but these errors were encountered:

irislin1006 · 2024-09-07T23:28:52Z

Hi @yaolu-zjut,

This should work on the latest version. I posted more details in huggingface/transformers#31734 (comment).

VincentZ-2020 · 2024-09-14T03:15:04Z

I also met this. Upgrading the transformer version == 4.44.2 does not solve.save_safetensors=False can be useful.

yaolu-zjut changed the title ~~Issue resolved. The problem is that when constructing the trainer, save_safetensors=False should be set. Otherwise, the above safe_serialization=False will not work.~~ No such file or directory: pytorch_model.bin Aug 29, 2024

yaolu-zjut mentioned this issue Aug 29, 2024

Cannot find the best model after training huggingface/transformers#31734

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No such file or directory: pytorch_model.bin #74

No such file or directory: pytorch_model.bin #74

yaolu-zjut commented Aug 29, 2024 •

edited

Loading

irislin1006 commented Sep 7, 2024

VincentZ-2020 commented Sep 14, 2024

No such file or directory: pytorch_model.bin #74

No such file or directory: pytorch_model.bin #74

Comments

yaolu-zjut commented Aug 29, 2024 • edited Loading

irislin1006 commented Sep 7, 2024

VincentZ-2020 commented Sep 14, 2024

yaolu-zjut commented Aug 29, 2024 •

edited

Loading