Tested on Macbook Pro M1 Max -- pytorch nightly
This repository is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference. In order to download the checkpoints and tokenizer, fill this google form
In a conda env with pytorch / cuda available, run
pip install -r requirements.txt
Then in this repository
pip install -e .
If you are using MPS commit, use these to disable mps backend memory limit + fallback
export PYTORCH_ENABLE_MPS_FALLBACK=1
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
torchrun --nproc_per_node 1 example.py --ckpt_dir $TARGET_FOLDER/model_size --tokenizer_path $TARGET_FOLDER/tokenizer.model
Different models require different MP values:
Model | MP |
---|---|
7B | 1 |
13B | 2 |
33B | 4 |
65B | 8 |
See MODEL_CARD.md
See the LICENSE file.