Running Llama on Meteorite

This example uses Meteorite to create an HTTP API for hosting models.

This example hosts llama.cpp using the llama-cpp-python.

The model

The LLaMA model can be obtained from here.

.
├── ...
├── model                   
│   ├── 7B
│   │   ├── ggml-model-q4_0.bin
│   │   ├── ggml-model-f16.bin
│   │   ├── params.json
│   │   ├── consolidated.00.pth
│   │   └── checklist.chk
│   ├── ggml-vocab.bin
│   ├── tokenizer.model
│   └── tokenizer_checklist.chk
└── ...

Run the Meteorite application

The example was developed with Python 3.8.

Install requirements

pip install -r requirements.txt

Run the application

The Meteorite application is written in main.py

python main.py

Once the application is started:

➜ python main.py
[2023-04-16T15:43:36Z INFO  actix_server::builder] starting 1 workers
[2023-04-16T15:43:36Z INFO  actix_server::server] Actix runtime found; starting in Actix runtime

Acknowledgements

This example is not possible without the following projects:

llama.cpp - Inference of LLaMA model in pure C/C++
llama-cpp-python - Simple Python bindings for @ggerganov's llama.cpp library.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Running Llama on Meteorite

The model

Run the Meteorite application

Install requirements

Run the application

Acknowledgements

About

Releases

Packages

Languages

deploifai/llama-meteorite

Folders and files

Latest commit

History

Repository files navigation

Running Llama on Meteorite

The model

Run the Meteorite application

Install requirements

Run the application

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages