LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [arXiv]
Mingyang Zhang1,2, Hao Chen1, Chunhua Shen1,3, Zhen Yang1, Linlin Ou2, Xinyi Yu2, Bohan Zhuang1
Zhejiang University1, Zhejiang University of Technology2, Ant Group3
This repository contains code for reproducing LoRAPrune. LoRAPrune can iteratively prune LPMs in a memory-efficient manner. Specifically, LoRAPrune uses a LoRA-guided pruning criterion, which uses the weights and gradients of LoRA, rather than the gradients of pre-trained weights for importance estimation.
- June, 20, 2024: Code is released!
- May, 20, 2024: LoRAPrune is accepted by ACL 2024 Findings!
- Support more LLMs.
pip install -r requirement.txt
sh script/prune.sh
This script would compress the LLaMA-7B model. You need to download LLaMA-7B pretrained weights. The dataset would be automatically downloaded and sampled. You also can prune more larger LPMs, e.g., LLaMA-13B, LLaMA-30B and LLaMA-65B.
To save GPU memory, you can optionally quantize the pre-trained weights to 8 bits by adding --load_in_8bit
.
sh script/evaluate.sh
After pruning, you can evalute the pruning resutls on Wixitext2 and PTB datasets.
For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.
If you find this project useful, please cite
@misc{zhang2023pruning,
title={Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning},
author={Mingyang Zhang and Hao Chen and Chunhua Shen and Zhen Yang and Linlin Ou and Xinyi Yu and Bohan Zhuang},
year={2023},
eprint={2305.18403},
archivePrefix={arXiv},
primaryClass={cs.LG}
}