The models used in our paper can be downloaded from the following links (compatible speculative models are listed as children of their respective target models):

Dolphin 70B
- TinyLlama OpenOrca 1.1B
- Orca2
Goliath 120B
- XWinLM 7B
- XWinLM 13B
Falcon 180B
- Falcon 7B
- Falcon 40B
The above links host files for several different quantization levels. The specific quantizations used can be found in Table 1 of our paper.

When running PipeInfer, one must pass the large, target model as the -m parameter, and the smaller speculative model as the -md parameter. An example for running Dolphin 70B with a speculative model of TinyLlama, given that each model is downloaded to a models/ folder:
```
-m models/models/dolphin-2.2-70b.Q3_K_M.gguf -md models/tinyllama-1.1b-1t-openorca.Q4_K_M.gguf
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MODELS.md

MODELS.md

Files

MODELS.md

Latest commit

History

MODELS.md

File metadata and controls