Skip to content

Latest commit

 

History

History
24 lines (16 loc) · 1.33 KB

MODELS.md

File metadata and controls

24 lines (16 loc) · 1.33 KB

The models used in our paper can be downloaded from the following links (compatible speculative models are listed as children of their respective target models):

  • Dolphin 70B

  • Goliath 120B

  • Falcon 180B

    The above links host files for several different quantization levels. The specific quantizations used can be found in Table 1 of our paper.

    When running PipeInfer, one must pass the large, target model as the -m parameter, and the smaller speculative model as the -md parameter. An example for running Dolphin 70B with a speculative model of TinyLlama, given that each model is downloaded to a models/ folder:

    -m models/models/dolphin-2.2-70b.Q3_K_M.gguf -md models/tinyllama-1.1b-1t-openorca.Q4_K_M.gguf