This example uses Meteorite to create an HTTP API for hosting models.
This example hosts llama.cpp using the llama-cpp-python.
The LLaMA model can be obtained from here.
.
├── ...
├── model
│ ├── 7B
│ │ ├── ggml-model-q4_0.bin
│ │ ├── ggml-model-f16.bin
│ │ ├── params.json
│ │ ├── consolidated.00.pth
│ │ └── checklist.chk
│ ├── ggml-vocab.bin
│ ├── tokenizer.model
│ └── tokenizer_checklist.chk
└── ...
The example was developed with Python 3.8
.
pip install -r requirements.txt
The Meteorite application is written in main.py
python main.py
Once the application is started:
➜ python main.py
[2023-04-16T15:43:36Z INFO actix_server::builder] starting 1 workers
[2023-04-16T15:43:36Z INFO actix_server::server] Actix runtime found; starting in Actix runtime
This example is not possible without the following projects:
- llama.cpp - Inference of LLaMA model in pure C/C++
- llama-cpp-python - Simple Python bindings for @ggerganov's llama.cpp library.