-
Notifications
You must be signed in to change notification settings - Fork 13
Backend Selection
AutoGGUF provides an easy and powerful utility to download various llama.cpp backends. You can find the interface in the top right corner:
By default, it's blank. You can set it to grab the latest releases from GitHub using their releases API by setting the AUTOGGUF_BACKEND
environment variable to enabled
on launch. In order to manually grab the latest releases (~40 past ones are available), you can click the Refresh Releases button.
You should see the fields populated after it grabs the data:
The Select Release area shows the current release selected for download, similar to how you would download it from GitHub Releases. The Select Asset dropdown shows various builds for different systems, such as 64 bit builds for Windows/macOS/Linux. There are also cudart builds; these are DLLs used by llama.cpp for CUDA acceleration.
The process for downloading a release is as follows (non-CUDA):
- Refresh the releases
- Select the appropriate release
- Click Download (the progress bar will display the download progress)
- Any errors will be logged to the log, or console if you're running from source
The process for downloading a CUDA enabled release:
- Refresh the releases
- Select an appropriate CUDA release (NOT the ones that say cudart)
- Download it
- Select a cudart asset corresponding to your download CUDA version and system's CUDA
- Select Extract CUDA Files
- Select the CUDA release you downloaded earlier
- Click Download
- A progress bar will show, and at the end it will automatically extract to your CUDA enabled release
After downloading the releases, you can use the Llama.cpp Backends dropdown to select one to be used for quantization:
If your newly download backend does not show, simply hit Refresh Backends and select the backend from the dropdown.