Skip to content

Commit

Permalink
Add MultiPruner results and improvements to install and readme
Browse files Browse the repository at this point in the history
Co-authored-by: Yuan, Jinjie <[email protected]>
  • Loading branch information
jpablomch and Yuan0320 committed Dec 12, 2024
1 parent 0b81417 commit ea31bab
Show file tree
Hide file tree
Showing 19 changed files with 798 additions and 31 deletions.
60 changes: 35 additions & 25 deletions MultiPruner/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,22 @@
Official implementation of [Fine-Grained Training-Free Structure Removal in Foundation Models]().

This repo contains the code for **MultiPruner**, a novel pruning approach that surpasses recent training-free pruning
methods by adopting a multidimensional, iterative, fine-grained pruning strategy.
methods, e.g., BlockPruner (Zhong el al., 2024) and ShortGPT (Men et al., 2024), by adopting a multidimensional, iterative, fine-grained pruning strategy.
Please refer to our paper for more details.

## News
- **[2025.xx.xx]** Release the code for **MultiPruner**. :tada:
- **[2024.12.14]** Release the code for **MultiPruner**. :tada:

## Supported Models

- Llama: [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
- Qwen: [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B), [Qwen/Qwen1.5-7B](https://huggingface.co/Qwen/Qwen1.5-7B)

## Setup

Here is an installation script developed from scratch.
Use the following instructions to create a virtual environment with the required dependencies.

```
pip install virtualenv
virtualenv multipruner-env
source multipruner-env/bin/activate
pip install torch==2.3.1
# install dependencies
bash install.sh
```
Expand Down Expand Up @@ -115,32 +115,42 @@ This investigation may facilitate practical applications. The results of Llama-2
| MultiPruner w/ finetune | 18% | 66.16 | -2.80% | 95.94% |


## Released Pruned Models 🤗
## Released Pruned Models and Configurations 🤗

We have released several compressed models by MultiPruner:
We have released several compressed models or pruning configurations to reproduce the results in the paper:

| Source Model | Pruning Ratio | Recovery Tuning | Pruned Model |
|-----------------------------------------------------------------------------------------|---------------|-----------------|---------------------------------------------------------------------------------------------------------------|
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 7% || [IntelLabs/MultiPruner-Llama-2-6.3b](https://huggingface.co/IntelLabs/MultiPruner-Llama-2-6.3b) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 10% || [IntelLabs/MultiPruner-Llama-2-6.1b](https://huggingface.co/IntelLabs/MultiPruner-Llama-2-6.1b) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 12% || [IntelLabs/MultiPruner-Llama-2-5.9b](https://huggingface.co/IntelLabs/MultiPruner-Llama-2-5.9b) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 12% || [IntelLabs/MultiPruner-Llama-2-5.9b-alpaca](https://huggingface.co/IntelLabs/MultiPruner-Llama-2-5.9b-alpaca) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 14% || [IntelLabs/MultiPruner-Llama-2-5.8b](https://huggingface.co/IntelLabs/MultiPruner-Llama-2-5.8b) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 15% || [IntelLabs/MultiPruner-Llama-2-5.7b](https://huggingface.co/IntelLabs/MultiPruner-Llama-2-5.7b) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 15% || [IntelLabs/MultiPruner-Llama-2-5.7b-alpaca](https://huggingface.co/IntelLabs/MultiPruner-Llama-2-5.7b-alpaca) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 18% || [IntelLabs/MultiPruner-Llama-2-5.5b](https://huggingface.co/IntelLabs/MultiPruner-Llama-2-5.5b) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 18% || [IntelLabs/MultiPruner-Llama-2-5.5b-alpaca](https://huggingface.co/IntelLabs/MultiPruner-Llama-2-5.5b-alpaca) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 22% || [IntelLabs/MultiPruner-Llama-2-5.3b](https://huggingface.co/IntelLabs/MultiPruner-Llama-2-5.3b) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 22% || [IntelLabs/MultiPruner-Llama-2-5.3b-alpaca](https://huggingface.co/IntelLabs/MultiPruner-Llama-2-5.3b-alpaca) |
| [Qwen/Qwen1.5-7B](https://huggingface.co/Qwen/Qwen1.5-7B) | 22% || [IntelLabs/MultiPruner-Qwen1.5-6b](https://huggingface.co/IntelLabs/MultiPruner-Qwen1.5-6b) |
| [baichuan-inc/Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base) | 22% || [IntelLabs/MultiPruner-Baichuan2-5.8b](https://huggingface.co/IntelLabs/MultiPruner-Baichuan2-5.8b) |
| Source Model | Pruning Ratio | Pruned Model Configuration / HF link |
|-----------------------------------------------------------------------------------------|-----------------|-----------------------------------------------------------------------------------------------------|
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 7% | [MultiPruner-Llama-2-6.3b Config File](./results/Llama-2-7B/ratio_7) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 10% | [MultiPruner-Llama-2-6.3b Config File](./results/Llama-2-7B/ratio_10) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 12% | [MultiPruner-Llama-2-6.3b Config File](./results/Llama-2-7B/ratio_12) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 14% | [MultiPruner-Llama-2-6.3b Config File](./results/Llama-2-7B/ratio_14) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 15% | [MultiPruner-Llama-2-6.3b Config File](./results/Llama-2-7B/ratio_15) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 18% | [MultiPruner-Llama-2-6.3b Config File](./results/Llama-2-7B/ratio_18) |
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 22% | [MultiPruner-Llama-2-6.3b Config File](./results/Llama-2-7B/ratio_22) |
| [Qwen/Qwen1.5-7B](https://huggingface.co/Qwen/Qwen1.5-7B) | 22% | [IntelLabs/MultiPruner-Qwen1.5-6b](https://huggingface.co/IntelLabs/MultiPruner-Qwen1.5-6b) |
| [baichuan-inc/Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base) | 22% | [IntelLabs/MultiPruner-Baichuan2-5.8b](https://huggingface.co/IntelLabs/MultiPruner-Baichuan2-5.8b) |
<sup>*</sup> *For Llama models, we provide the pruning configuration files to reproduce the results in the paper.*

### Loading the compressed model for evaluation

```bash
python eval.py --model_path <path to compressed model> --output_path <path to evaluation results>
```

## Acknowledgement

MultiPruner benefits from the following work:

```bibtex
@article{zhong2024blockpruner,
title={BlockPruner: Fine-grained Pruning for Large Language Models},
author={Zhong, Longguang and Wan, Fanqi and Chen, Ruijun and Quan, Xiaojun and Li, Liangzhi},
journal={arXiv preprint arXiv:2406.10594},
year={2024}
}
```

## Citation
If you find MultiPruner's code and paper helpful, please kindly cite:
```bibtex
Expand Down
21 changes: 15 additions & 6 deletions MultiPruner/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,22 @@ set -e
set -x

MULTIPRUNER_PATH=$PWD
mkdir third_party && cd third_party

pip install 'numpy<2.0.0' setuptools==69.5.1
python3.10 -m venv venv
source venv/bin/activate

mkdir -pv third_party
pushd third_party

mkdir third_party && cd third_party
git clone https://github.com/huggingface/transformers.git
cd transformers && git checkout v4.42.4 && git apply --ignore-space-change --ignore-whitespace ${MULTIPRUNER_PATH}/patches/transformers-v4.42.4.patch && pip install -e . && cd ..
pushd transformers
git checkout v4.42.4
git apply --ignore-space-change --ignore-whitespace ${MULTIPRUNER_PATH}/patches/transformers-v4.42.4.patch
pip install -e .

pushd ${MULTIPRUNER_PATH}

pip install -r requirements.txt

echo "Environment all ready. execute 'source venv/bin/activate' to run"

pip install datasets accelerate sentencepiece protobuf bitsandbytes
pip install lm-eval==0.4.2
9 changes: 9 additions & 0 deletions MultiPruner/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
numpy<2.0.0
setuptools==69.5.1
datasets
accelerate
sentencepiece
protobuf
bitsandbytes
lm-eval==0.4.2
torch==2.3.1
12 changes: 12 additions & 0 deletions MultiPruner/results/Llama-2-7B/ratio_10/eval.res.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"total_params": 6738415616,
"pruned_params": 6063132672,
"ratio": 10.02139052385812,
"ppl_wikitext2": 6.55,
"5cs_acc_avg": 67.02,
"arc_challenge": 44.45,
"arc_easy": 71.0,
"hellaswag": 74.07000000000001,
"winogrande": 68.19,
"piqa": 77.37
}
78 changes: 78 additions & 0 deletions MultiPruner/results/Llama-2-7B/ratio_10/pruning_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
{
"pruned_attn_idx": [
25,
27,
21,
23,
24
],
"pruned_mlp_idx": [],
"pruned_attn_width": {
"0": 4096,
"1": 3840,
"2": 3840,
"3": 4096,
"4": 4096,
"5": 3968,
"6": 4096,
"7": 4096,
"8": 3968,
"9": 4096,
"10": 4096,
"11": 4096,
"12": 4096,
"13": 4096,
"14": 4096,
"15": 4096,
"16": 4096,
"17": 3968,
"18": 4096,
"19": 3968,
"20": 3968,
"21": 4096,
"22": 3968,
"23": 4096,
"24": 4096,
"25": 4096,
"26": 4096,
"27": 4096,
"28": 3968,
"29": 4096,
"30": 3968,
"31": 4096
},
"pruned_mlp_width": {
"0": 11008,
"1": 11008,
"2": 5888,
"3": 11008,
"4": 11008,
"5": 11008,
"6": 11008,
"7": 9984,
"8": 11008,
"9": 11008,
"10": 11008,
"11": 9984,
"12": 11008,
"13": 11008,
"14": 11008,
"15": 11008,
"16": 11008,
"17": 11008,
"18": 11008,
"19": 11008,
"20": 11008,
"21": 11008,
"22": 11008,
"23": 1792,
"24": 11008,
"25": 11008,
"26": 11008,
"27": 1792,
"28": 11008,
"29": 11008,
"30": 11008,
"31": 11008
}
}
12 changes: 12 additions & 0 deletions MultiPruner/results/Llama-2-7B/ratio_12/eval.res.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"total_params": 6738415616,
"pruned_params": 5931012096,
"ratio": 11.982097365482536,
"ppl_wikitext2": 7.1,
"5cs_acc_avg": 66.47999999999999,
"arc_challenge": 44.03,
"arc_easy": 69.82000000000001,
"hellaswag": 73.77,
"winogrande": 68.43,
"piqa": 76.33
}
79 changes: 79 additions & 0 deletions MultiPruner/results/Llama-2-7B/ratio_12/pruning_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
{
"pruned_attn_idx": [
25,
27,
21,
23,
24,
29
],
"pruned_mlp_idx": [],
"pruned_attn_width": {
"0": 4096,
"1": 4096,
"2": 3840,
"3": 3968,
"4": 4096,
"5": 4096,
"6": 4096,
"7": 4096,
"8": 3968,
"9": 4096,
"10": 4096,
"11": 4096,
"12": 4096,
"13": 4096,
"14": 4096,
"15": 4096,
"16": 3968,
"17": 3968,
"18": 4096,
"19": 3968,
"20": 3968,
"21": 4096,
"22": 3968,
"23": 4096,
"24": 4096,
"25": 4096,
"26": 4096,
"27": 4096,
"28": 3712,
"29": 4096,
"30": 3968,
"31": 4096
},
"pruned_mlp_width": {
"0": 11008,
"1": 11008,
"2": 5888,
"3": 11008,
"4": 11008,
"5": 11008,
"6": 11008,
"7": 9984,
"8": 11008,
"9": 11008,
"10": 11008,
"11": 11008,
"12": 11008,
"13": 11008,
"14": 11008,
"15": 11008,
"16": 11008,
"17": 11008,
"18": 11008,
"19": 11008,
"20": 11008,
"21": 11008,
"22": 11008,
"23": 1792,
"24": 11008,
"25": 1792,
"26": 11008,
"27": 4864,
"28": 11008,
"29": 11008,
"30": 11008,
"31": 11008
}
}
12 changes: 12 additions & 0 deletions MultiPruner/results/Llama-2-7B/ratio_14/eval.res.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"total_params": 6738415616,
"pruned_params": 5796794368,
"ratio": 13.973926537926385,
"ppl_wikitext2": 7.56,
"5cs_acc_avg": 65.93,
"arc_challenge": 43.519999999999996,
"arc_easy": 68.64,
"hellaswag": 72.27,
"winogrande": 67.96,
"piqa": 77.25999999999999
}
Loading

0 comments on commit ea31bab

Please sign in to comment.