- Aaquib Syed | [email protected]
- Phillip Huang Guo | [email protected]
- Vijaykaarti Sundarapandiyan | [email protected]
Massive language models with billions of parameters have significant compute expenses and thus can benefit from pruning. Pruning techniques for massive models are typically iterative and require extensive weight retraining after pruning. SparseGPT, a recently introduced one-shot technique for pruning such models, enables pruning without retraining. We improve upon SparseGPT by fine-tuning during pruning with minimal training steps, and we perform experiments against magnitude pruning and find that our iteratively fine-tuned SparseGPT models significantly outperform their magnitude pruning counterparts at high sparsity.
SparseGPT.ipynb has code to prune and Finetuning.ipynb has code to finetune the pruned models. Use Iterative_Pruning.ipynb to iteratively prune and finetune using FullyShardedDataParallel.
- Change the model name:
model_name = "facebook/opt-125m"
- Run the notebook
- Change model size in
model_size = "opt-125m"
- Adjust following parameters:
- Amount of sentences used for calibration:
calibration_size=128
- Max length of tokens in a sentence:
token_length=512
- Amount of batches for calibration:
calibration_batch_size=2
- Small constant to add for matrix inverses:
EPSILON = 1e-8
- Block size for pruning:
B = 4
- Adaptive mask selection blocksize:
Bs = 2
- Amount of sentences used for calibration:
- Adjust how many sparsities to generate:
SPARSENESS_LIST = [0.5]
- Run the notebook
- Adjust model sizes to tune:
model_size in ['opt-1.3b']
- Adjust sparsities to tune:
SPARSITIES = [1, 0.9, 0.7, 0.5, 0.3, 0.2]
- Run the notebook
- Change model sizes to prune
model_size in ['opt-125m', 'opt-350m', 'opt-1.3b']
- Adjust following parameters:
- Amount of sentences used for calibration:
calibration_size=128
- Max length of tokens in a sentence:
token_length=512
- Amount of batches for calibration:
calibration_batch_size=2
- Small constant to add for matrix inverses:
EPSILON = 1e-8
- Block size for pruning:
B = 4
- Adaptive mask selection blocksize:
Bs = 2
- Amount of sentences used for calibration:
- Adjust how many sparsities to generate:
SPARSENESS_LIST = [0.5]
- Run the notebook
As the graphs in Figure 1 demonstrate, SparseGPT iterative pruning and fine-tuning is stronger than every other technique beyond 0.4 sparseness on OPT-125M and 0.6 sparseness on OPT-1.3B. We find that SparseGPT non-iterative pruning and fine-tuning is moderately successful compared to no fine-tuning in all cases, but is beaten out significantly by both iterative pruning and fine-tuning methods beyond 0.5 sparseness.