Backend Selection

AutoGGUF provides an easy and powerful utility to download various llama.cpp backends. You can find the interface in the top right corner:

By default, it's blank. You can set it to grab the latest releases from GitHub using their releases API by setting the AUTOGGUF_BACKEND environment variable to enabled on launch. In order to manually grab the latest releases (~40 past ones are available), you can click the Refresh Releases button.

You should see the fields populated after it grabs the data:

The Select Release area shows the current release selected for download, similar to how you would download it from GitHub Releases. The Select Asset dropdown shows various builds for different systems, such as 64 bit builds for Windows/macOS/Linux. There are also cudart builds; these are DLLs used by llama.cpp for CUDA acceleration.

The process for downloading a release is as follows (non-CUDA):

Refresh the releases
Select the appropriate release
Click Download (the progress bar will display the download progress)
Any errors will be logged to the log, or console if you're running from source

The process for downloading a CUDA enabled release:

Refresh the releases
Select an appropriate CUDA release (NOT the ones that say cudart)
Download it
Select a cudart asset corresponding to your download CUDA version and system's CUDA
Select Extract CUDA Files
Select the CUDA release you downloaded earlier
Click Download
A progress bar will show, and at the end it will automatically extract to your CUDA enabled release

After downloading the releases, you can use the Llama.cpp Backends dropdown to select one to be used for quantization:

If your newly download backend does not show, simply hit Refresh Backends and select the backend from the dropdown.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend Selection

Clone this wiki locally