Post training more than 1 epoch leads to performance degradation #81

sidhantls · 2024-09-22T12:37:21Z

I'm running post-training on a pruning model. After post-training, I get degraded performance - eg. mmlu goes down to 24%. is this expected?

MODEL=meta-llama/Llama-2-7b-hf

prune_ckpt_path='llama_prune'
tune_ckpt_path='model'

RATIO=0.10
  # Pruning step
  echo "[START] - Start Pruning with RATIO=$RATIO"
  python hf_prune.py --base_model=$MODEL --pruning_ratio $RATIO --device cpu --eval_device cuda \
    --block_wise --block_mlp_layer_start 4 --block_mlp_layer_end 30 \
    --block_attention_layer_start 4 --block_attention_layer_end 30 \
    --save_ckpt_log_name $prune_ckpt_path --pruner_type taylor \
    --taylor param_first --save_model 

  echo "[FINISH] - Finish Pruning Model"

  # Tuning step
  echo "[START] - Start Tuning with RATIO=$RATIO"
  python post_training.py --prune_model $prune_ckpt_path/pytorch_model.bin --data_path yahma/alpaca-cleaned \
    --output_dir $tune_ckpt_path --wandb_project llama_tune --lora_r 8 --num_epochs 2 \
    --learning_rate 1e-4 --batch_size 64

  echo "[FINISH] - Finish Tuning for RATIO=$RATIO"

The text was updated successfully, but these errors were encountered:

sidhantls · 2024-10-07T11:57:46Z

Looked into it, it occurs at 150 training step

sidhantls linked a pull request Oct 7, 2024 that will close this issue

fix large loss during llama2 post-training #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post training more than 1 epoch leads to performance degradation #81

Post training more than 1 epoch leads to performance degradation #81

sidhantls commented Sep 22, 2024 •

edited

Loading

sidhantls commented Oct 7, 2024

Post training more than 1 epoch leads to performance degradation #81

Post training more than 1 epoch leads to performance degradation #81

Comments

sidhantls commented Sep 22, 2024 • edited Loading

sidhantls commented Oct 7, 2024

sidhantls commented Sep 22, 2024 •

edited

Loading