QCompanion is a fork of Luxadevi/Ollama-Companion modified to remove the Ollama components and support a more versatile quantization and finetuning workload. It's built using Streamlit to provide a web interface to a robust scheduling system that queues jobs in llama.cpp and LLaMA-Factory for model quantization and finetuning, respectively, with Lilac integration coming soon. Fork it yourself or run it vanilla - your choice.
QCompanion is designed around its scheduling system, where you can schedule jobs and leave them to run later. All jobs use the same queue, from conversion to quantization to finetuning (except maybe downloading, that's a WIP), so you can leave your PC on overnight or while you're away and not worry about parallel crashes or anything. You can also edit the queue directly (e.g. to add custom commands not supported by the GUI app) by editing util/queue.txt
.
Clone the repository:
git clone https://github.com/christianazinn/QCompanion.git
sudo chmod +x install.sh
./install.sh
To start QCompanion locally, run
streamlit run Homepage.py
or
./start.sh -local
to run QCompanion on localhost:8501. To serve QCompanion publicly over a Cloudflare tunnel, run
./start.sh
without the -local flag.
Note: Windows support is currently unavailable. This project is a work in progress and both llama.cpp and LLaMA-Factory are compatible with Windows, but focus will be on adding more functionality before porting to Windows. In the meantime, you can run QCompanion on your Windows machine via the Windows Subsystem for Linux.
You can develop your own Streamlit components and integrate them into Ollama-Companion in the pages
subdirectory. Add them to the page list in Homepage.py
to be able to view them from the app itself. Be aware of the existence of the utils
subdirectory and make good use of the stuff that's already built.
To download model files from HuggingFace, follow these steps:
-
Visit the Model Page: Go to the Hugging Face model page you wish to download. For example: MistralAI/Mistral-7B-Instruct-v0.2.
-
Copy Model Path: On the model page, locate the icon next to the username of the model's author (usually a clipboard or copy symbol). Click to copy the model path, e.g.,
mistralai/Mistral-7B-Instruct-v0.2
. -
Paste in the Input Field: Paste the copied model path directly into the designated input field in your application.
-
Get File List: Click the "Get file list" button to retrieve a list of available files in this repository.
-
Review File List: Ensure the list contains the correct model files you wish to download. These will usually be
safetensors
and related files. -
Download Model: Click the "Download Model" button to queue a download job for the selected files.
-
File Storage: The model files will be saved in the
llama.cpp/models
directory on your device.
From a safetensors
model in HuggingFace format, you'll need to convert and quantize it to gguf
format for use in llama.cpp and related applications. To do this, follow these steps:
-
Select a Model Folder: Choose a folder within
llama.cpp/models
that contains the model you wish to convert. -
Set Conversion Options: Select your desired conversion options from the provided checkboxes: FP32, FP16, or Q8_0, in decreasing order of quality and size.
-
Execute Conversion: Click the "Run Commands" button to queue a conversion job.
-
Output Location: Converted models will be saved in the
High-Precision-Quantization
subfolder within the selected model folder.
-
Select GGUF File: Choose the GGUF file you wish to quantize from the dropdown list.
-
Quantization Options: Check the boxes next to the quantization options you want to apply. Q, K, and I-quants are supported.
-
Run Quantization: Click the "Run Selected Commands" button to queue the quantization jobs.
-
Save Location: The quantized models will be saved in the
/modelname/Medium-Precision-Quantization
folder.
Use this section to securely upload your converted models to Hugging Face.
-
Select a Model: Choose a model from the dropdown list. These models are located in the
llama.cpp/models
directory. -
Enter Repository Name: Specify a name for the new Hugging Face repository where your model will be uploaded.
-
Choose Files for Upload: Select the files you wish to upload from the subfolders of the chosen model.
-
Add README Content: Optionally, write content for the README.md file of your new repository.
- For enhanced security, use an encrypted token. Encrypt your Hugging Face token on the Token Encrypt page and enter it in the "Enter Encrypted Token" field.
- Alternatively, enter an unencrypted Hugging Face token directly.
- Upload Files: Click the "Upload Selected Files" button to initiate the upload to Hugging Face.
After completing these steps, your uploaded models will be accessible at https://huggingface.co/your-username/your-repo-name
.
Coming soon, we have:
-
Importance matrix (imatrix) support for quantization
-
Options to set custom output directory and more command-line arguments (e.g.
-c
,-b
) -
LLaMA-Factory integration for finetuning
-
Lilac integration for dataset management
-
GPU offload for select tasks
-
Ability to queue jobs for files that don't yet exist
Check the docs for more information!
Licensed under the Apache License.