epic: Model Converter Pipeline #22

dan-homebrew · 2024-09-08T08:44:48Z

Goal

Built-in model library has curated model.yaml with best parameters
Aim for a best-in-class user experience that
Include system prompts that provide good experience? (may be anti-pattern)

User Story

We have a Model Converter that can take in a Huggingface Model repo
Compile GGUF to a Cortex Model Repo (i.e. tag-based)
Future: ONNX, TensorRT-LLM (using TRTLLM-Cloud)
Should clearly show any errors
Should auto-populate README

Decisions

Cortex.cpp: Built-in model library format cortex.cpp#1178

Tasklist

Model Compilation Pipeline

Update Model compilation Infra
Is there a way for us to "queue" up models?

Future Roadmap

Model Recommendations: can we consider recommending bigger models (e.g. q8) if hardware is strong?

The text was updated successfully, but these errors were encountered:

nguyenhoangthuan99 · 2024-09-13T05:07:03Z

Objectives

Implement model quantization CI
Update model.yaml for three models
Organize branch structure as per discussion epic: Implement new Model Folder and model.yaml cortex.cpp#1154

Quantization Strategy

Each quantization will be tagged in Hugging Face repo, e.g., 8b-gguf-q4-km
This approach will:
- Facilitate easier management of models from cortex.cpp
- Simplify model downloading and execution commands

Example Command

This is an example command to run model with tag

cortex pull llama3.1:8b-gguf-q4-km
cortex run llama3.1:8b-gguf-q4-km

This concise command provides sufficient information for users.

Tasks

Develop CI runner for building all quantization for each model:
- Download from original source
- Convert to GGUF format
- Perform quantization
- Update Hugging Face repository
Create script to update model.yaml for models:
- Update default parameters
- Update system prompts

This approach will streamline model management and improve user experience when working with cortex.cpp.

nguyenhoangthuan99 · 2024-09-17T01:40:14Z

CI Pipelines for Model Conversion and Quantization

This PR introduces two CI pipelines to streamline the model processing workflow:

1. CI Convert and Quantization Pipeline

This pipeline automates the process of converting and quantizing models.

Inputs:

Source Hugging Face model repository (e.g., meta-llama/Meta-Llama-3.1-8B-Instruct)
Source model size (e.g., 8b)
Target model ID: The repo_id in cortexso/janhq where the processed model will be pushed (e.g., llama3.1)
Quantization level: Either a specific level (e.g., 'q4-km') or 'all' for all supported levels
Supported levels: q2-k, q3-ks, q3-km, q3-kl, q4-ks, q4-km, q5-ks, q5-km, q6-k, q8-0

Process:

Download the source model repository if not already present
Convert the source model to GGUF format
Quantize the GGUF model to the specified level(s)
Upload the quantized model to the target repository under the appropriate branch

Result:

After successful processing, new tags will be added to the model repository. For example, see the llama3 repository:

2. CI Update model.yml Pipeline

This pipeline updates the model.yml file with new information.

Inputs:

Key-value pairs to update, separated by spaces (e.g "max_tokens=4096 top_p=0.9 top_k=0.5")
Source model size (e.g., 8b)
Target model ID: The repo_id in cortexso/janhq where the updated model.yml will be pushed (e.g., llama3.1)
Quantization level: Either a specific level (e.g., 'q4-km') or 'all' for all supported levels
Supported levels: q2-k, q3-ks, q3-km, q3-kl, q4-ks, q4-km, q5-ks, q5-km, q6-k, q8-0

Process:

Set up the necessary environment
Execute a script to update the model.yml file with the new information
Upload the updated model.yml file to Hugging Face

These pipelines automate crucial steps in model processing and metadata management, streamlining the workflow for model updates and deployments.

freelerobot · 2024-09-24T08:00:32Z

@nguyenhoangthuan99 how do we use this pipeline?
i.e. how are we adding new models

nguyenhoangthuan99 · 2024-09-25T06:04:23Z

The cortexso model repo must be created before running this pipeline (e.g. llama3 must be created before running below example, the hf login token in CI doesn't have permission to create repo)

Supported quantization levels: q2-k, q3-ks, q3-km, q3-kl, q4-ks, q4-km, q5-ks, q5-km, q6-k, q8-0

To use this pipeline:

Go to https://github.com/janhq/cortex.llamacpp/actions
Select the Convert model to gguf with specified quant workflow inside the action tab
Click on Run workflow

And input all parameters

Note that the Target HuggingFace model ID to push is cortexso model repo, in my example it is llama3
After click run, go to tab Action and we can see the workflow is running
When the CI is finished, we can go to the cortex so repo https://huggingface.co/cortexso/llama3, to check if the model is updated

dan-homebrew · 2024-09-26T14:41:24Z

@nguyenhoangthuan99 I am refactoring the "Built-in Model Library" to a separate epic: #21

We will need to do a lot of housekeeping
Let's focus this epic on the Model Converter Pipeline.

hiento09 · 2024-09-27T04:14:24Z

Infra:

GitHub Actions grants runner group permission for the repo janhq/models
GitHub Actions grants secret permission for the repo janhq/models

nguyenhoangthuan99 · 2024-09-27T07:15:12Z

I add the updated model converter pipeline to janhq/models repo. And also add a pipeline to automatically update the model.yml file in hugging face cc @gabrielle-ong, now we can run CI pipeline in this repo.

Guild for update model.yml file

Click to Update model.yml with specific quant
Click run workflow

Please update with the format
"top_p=0.9" "top_k=40" "stop=['<|end_of_text|>', '<|eot_id|>']"

Note that the prompt_template field should not update this way because this field sometimes cannot handle proper special character on string.

gabrielle-ong · 2024-10-02T03:05:36Z

Marking as complete, successfully done for mistral-nemo and llama3.2
To run model converter pipeline from janhq/models repo

dan-homebrew changed the title ~~epic: Jan has built-in Models~~ epic: Jan and Cortex's Built-in Model Library Sep 8, 2024

dan-homebrew assigned nguyenhoangthuan99 and hiento09 Sep 8, 2024

dan-homebrew changed the title ~~epic: Jan and Cortex's Built-in Model Library~~ epic: Jan and Cortex's Built-in Model Library + Pipeline Sep 8, 2024

dan-homebrew added the type: epic A major feature or initiative label Sep 8, 2024

dan-homebrew changed the title ~~epic: Jan and Cortex's Built-in Model Library + Pipeline~~ epic: Jan and Cortex's Built-in Model Library + Model Compilation Pipeline Sep 8, 2024

dan-homebrew mentioned this issue Sep 8, 2024

docs: Show contributors how to add model compilations to HF Cortex curated models janhq/cortex.so#78

Closed

dan-homebrew assigned dan-homebrew and unassigned nguyenhoangthuan99 Sep 8, 2024

dan-homebrew mentioned this issue Sep 10, 2024

epic: Fix Local Engine issues (llama.cpp) janhq/jan#3614

Closed

10 tasks

dan-homebrew assigned louis-jan Sep 10, 2024

dan-homebrew changed the title ~~epic: Jan and Cortex's Built-in Model Library + Model Compilation Pipeline~~ epic: Jan and Cortex's Built-in Model Library has latest models + pipeline Sep 10, 2024

dan-homebrew assigned nguyenhoangthuan99 and unassigned dan-homebrew Sep 10, 2024

dan-homebrew unassigned louis-jan and hiento09 Sep 16, 2024

nguyenhoangthuan99 mentioned this issue Sep 17, 2024

Feat all quants ci janhq/cortex.llamacpp#231

Merged

nguyenhoangthuan99 closed this as completed in janhq/cortex.llamacpp#231 Sep 18, 2024

freelerobot reopened this Sep 24, 2024

nguyenhoangthuan99 mentioned this issue Sep 25, 2024

model: Mistral Nemo #19

Closed

4 tasks

dan-homebrew changed the title ~~epic: Jan and Cortex's Built-in Model Library has latest models + pipeline~~ epic: Jan and Cortex's Built-in Model Library has latest model Sep 26, 2024

dan-homebrew changed the title ~~epic: Jan and Cortex's Built-in Model Library has latest model~~ epic: Model Converter Pipeline Sep 26, 2024

dan-homebrew mentioned this issue Sep 26, 2024

epic: New Model Support SOP + Systems #20

Closed

4 tasks

dan-homebrew assigned hiento09 and gabrielle-ong Sep 27, 2024

dan-homebrew transferred this issue from janhq/cortex.cpp Sep 29, 2024

dan-homebrew added the type: model library + pipeline label Sep 29, 2024

gabrielle-ong closed this as completed Oct 2, 2024

github-project-automation bot moved this from Completed to Review + QA in Jan & Cortex Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epic: Model Converter Pipeline #22

epic: Model Converter Pipeline #22

dan-homebrew commented Sep 8, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 13, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 17, 2024 •

edited

Loading

freelerobot commented Sep 24, 2024

nguyenhoangthuan99 commented Sep 25, 2024 •

edited

Loading

dan-homebrew commented Sep 26, 2024 •

edited

Loading

hiento09 commented Sep 27, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 27, 2024

gabrielle-ong commented Oct 2, 2024

epic: Model Converter Pipeline #22

epic: Model Converter Pipeline #22

Comments

dan-homebrew commented Sep 8, 2024 • edited Loading

Goal

User Story

Decisions

Tasklist

Model Compilation Pipeline

Future Roadmap

nguyenhoangthuan99 commented Sep 13, 2024 • edited Loading

Objectives

Quantization Strategy

Example Command

Tasks

nguyenhoangthuan99 commented Sep 17, 2024 • edited Loading

CI Pipelines for Model Conversion and Quantization

1. CI Convert and Quantization Pipeline

Inputs:

Process:

Result:

2. CI Update model.yml Pipeline

Inputs:

Process:

freelerobot commented Sep 24, 2024

nguyenhoangthuan99 commented Sep 25, 2024 • edited Loading

dan-homebrew commented Sep 26, 2024 • edited Loading

hiento09 commented Sep 27, 2024 • edited Loading

nguyenhoangthuan99 commented Sep 27, 2024

gabrielle-ong commented Oct 2, 2024

dan-homebrew commented Sep 8, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 13, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 17, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 25, 2024 •

edited

Loading

dan-homebrew commented Sep 26, 2024 •

edited

Loading

hiento09 commented Sep 27, 2024 •

edited

Loading