Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post training more than 1 epoch leads to performance degradation #81

Open
sidhantls opened this issue Sep 22, 2024 · 1 comment · May be fixed by #82
Open

Post training more than 1 epoch leads to performance degradation #81

sidhantls opened this issue Sep 22, 2024 · 1 comment · May be fixed by #82

Comments

@sidhantls
Copy link

sidhantls commented Sep 22, 2024

I'm running post-training on a pruning model. After post-training, I get degraded performance - eg. mmlu goes down to 24%. is this expected?

MODEL=meta-llama/Llama-2-7b-hf

prune_ckpt_path='llama_prune'
tune_ckpt_path='model'

RATIO=0.10
  # Pruning step
  echo "[START] - Start Pruning with RATIO=$RATIO"
  python hf_prune.py --base_model=$MODEL --pruning_ratio $RATIO --device cpu --eval_device cuda \
    --block_wise --block_mlp_layer_start 4 --block_mlp_layer_end 30 \
    --block_attention_layer_start 4 --block_attention_layer_end 30 \
    --save_ckpt_log_name $prune_ckpt_path --pruner_type taylor \
    --taylor param_first --save_model 

  echo "[FINISH] - Finish Pruning Model"

  # Tuning step
  echo "[START] - Start Tuning with RATIO=$RATIO"
  python post_training.py --prune_model $prune_ckpt_path/pytorch_model.bin --data_path yahma/alpaca-cleaned \
    --output_dir $tune_ckpt_path --wandb_project llama_tune --lora_r 8 --num_epochs 2 \
    --learning_rate 1e-4 --batch_size 64

  echo "[FINISH] - Finish Tuning for RATIO=$RATIO"
@sidhantls
Copy link
Author

image

Looked into it, it occurs at 150 training step

@sidhantls sidhantls linked a pull request Oct 7, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant