Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epic: Model Converter Pipeline #22

Closed
2 tasks
Tracked by #3614
dan-homebrew opened this issue Sep 8, 2024 · 8 comments · Fixed by janhq/cortex.llamacpp#231
Closed
2 tasks
Tracked by #3614

epic: Model Converter Pipeline #22

dan-homebrew opened this issue Sep 8, 2024 · 8 comments · Fixed by janhq/cortex.llamacpp#231
Assignees
Labels

Comments

@dan-homebrew
Copy link

dan-homebrew commented Sep 8, 2024

Goal

  • Built-in model library has curated model.yaml with best parameters
  • Aim for a best-in-class user experience that
  • Include system prompts that provide good experience? (may be anti-pattern)

User Story

  • We have a Model Converter that can take in a Huggingface Model repo
  • Compile GGUF to a Cortex Model Repo (i.e. tag-based)
  • Future: ONNX, TensorRT-LLM (using TRTLLM-Cloud)
  • Should clearly show any errors
  • Should auto-populate README

Decisions

Tasklist

Model Compilation Pipeline

  • Update Model compilation Infra
  • Is there a way for us to "queue" up models?

Future Roadmap

  • Model Recommendations: can we consider recommending bigger models (e.g. q8) if hardware is strong?
@dan-homebrew dan-homebrew changed the title epic: Jan has built-in Models epic: Jan and Cortex's Built-in Model Library Sep 8, 2024
@dan-homebrew dan-homebrew changed the title epic: Jan and Cortex's Built-in Model Library epic: Jan and Cortex's Built-in Model Library + Pipeline Sep 8, 2024
@dan-homebrew dan-homebrew added the type: epic A major feature or initiative label Sep 8, 2024
@dan-homebrew dan-homebrew changed the title epic: Jan and Cortex's Built-in Model Library + Pipeline epic: Jan and Cortex's Built-in Model Library + Model Compilation Pipeline Sep 8, 2024
@dan-homebrew dan-homebrew changed the title epic: Jan and Cortex's Built-in Model Library + Model Compilation Pipeline epic: Jan and Cortex's Built-in Model Library has latest models + pipeline Sep 10, 2024
@nguyenhoangthuan99
Copy link
Contributor

nguyenhoangthuan99 commented Sep 13, 2024

Objectives

  1. Implement model quantization CI
  2. Update model.yaml for three models
  3. Organize branch structure as per discussion epic: Implement new Model Folder and model.yaml cortex.cpp#1154

Quantization Strategy

  • Each quantization will be tagged in Hugging Face repo, e.g., 8b-gguf-q4-km
  • This approach will:
    • Facilitate easier management of models from cortex.cpp
    • Simplify model downloading and execution commands

Example Command

This is an example command to run model with tag

cortex pull llama3.1:8b-gguf-q4-km
cortex run llama3.1:8b-gguf-q4-km

This concise command provides sufficient information for users.

Tasks

  1. Develop CI runner for building all quantization for each model:

    • Download from original source
    • Convert to GGUF format
    • Perform quantization
    • Update Hugging Face repository
  2. Create script to update model.yaml for models:

    • Update default parameters
    • Update system prompts

This approach will streamline model management and improve user experience when working with cortex.cpp.

@nguyenhoangthuan99
Copy link
Contributor

nguyenhoangthuan99 commented Sep 17, 2024

CI Pipelines for Model Conversion and Quantization

This PR introduces two CI pipelines to streamline the model processing workflow:

1. CI Convert and Quantization Pipeline

This pipeline automates the process of converting and quantizing models.

Inputs:

  • Source Hugging Face model repository (e.g., meta-llama/Meta-Llama-3.1-8B-Instruct)
  • Source model size (e.g., 8b)
  • Target model ID: The repo_id in cortexso/janhq where the processed model will be pushed (e.g., llama3.1)
  • Quantization level: Either a specific level (e.g., 'q4-km') or 'all' for all supported levels
    Supported levels: q2-k, q3-ks, q3-km, q3-kl, q4-ks, q4-km, q5-ks, q5-km, q6-k, q8-0

Process:

  1. Download the source model repository if not already present
  2. Convert the source model to GGUF format
  3. Quantize the GGUF model to the specified level(s)
  4. Upload the quantized model to the target repository under the appropriate branch

Result:

After successful processing, new tags will be added to the model repository. For example, see the llama3 repository:

Image showing model tags

2. CI Update model.yml Pipeline

This pipeline updates the model.yml file with new information.

Inputs:

  • Key-value pairs to update, separated by spaces (e.g "max_tokens=4096 top_p=0.9 top_k=0.5")
  • Source model size (e.g., 8b)
  • Target model ID: The repo_id in cortexso/janhq where the updated model.yml will be pushed (e.g., llama3.1)
  • Quantization level: Either a specific level (e.g., 'q4-km') or 'all' for all supported levels
    Supported levels: q2-k, q3-ks, q3-km, q3-kl, q4-ks, q4-km, q5-ks, q5-km, q6-k, q8-0

Process:

  1. Set up the necessary environment
  2. Execute a script to update the model.yml file with the new information
  3. Upload the updated model.yml file to Hugging Face

These pipelines automate crucial steps in model processing and metadata management, streamlining the workflow for model updates and deployments.

@freelerobot
Copy link
Contributor

@nguyenhoangthuan99 how do we use this pipeline?
i.e. how are we adding new models

@freelerobot freelerobot reopened this Sep 24, 2024
@nguyenhoangthuan99
Copy link
Contributor

nguyenhoangthuan99 commented Sep 25, 2024

The cortexso model repo must be created before running this pipeline (e.g. llama3 must be created before running below example, the hf login token in CI doesn't have permission to create repo)

Supported quantization levels: q2-k, q3-ks, q3-km, q3-kl, q4-ks, q4-km, q5-ks, q5-km, q6-k, q8-0

To use this pipeline:

  • Go to https://github.com/janhq/cortex.llamacpp/actions

  • Select the Convert model to gguf with specified quant workflow inside the action tab
    Image

  • Click on Run workflow
    Image
    And input all parameters
    Image
    Note that the Target HuggingFace model ID to push is cortexso model repo, in my example it is llama3

  • After click run, go to tab Action and we can see the workflow is running
    Image
    Image

  • When the CI is finished, we can go to the cortex so repo https://huggingface.co/cortexso/llama3, to check if the model is updated

@dan-homebrew dan-homebrew changed the title epic: Jan and Cortex's Built-in Model Library has latest models + pipeline epic: Jan and Cortex's Built-in Model Library has latest model Sep 26, 2024
@dan-homebrew dan-homebrew changed the title epic: Jan and Cortex's Built-in Model Library has latest model epic: Model Converter Pipeline Sep 26, 2024
@dan-homebrew
Copy link
Author

dan-homebrew commented Sep 26, 2024

@nguyenhoangthuan99 I am refactoring the "Built-in Model Library" to a separate epic: #21

  • We will need to do a lot of housekeeping
  • Let's focus this epic on the Model Converter Pipeline.

@hiento09
Copy link
Contributor

hiento09 commented Sep 27, 2024

Infra:

  • GitHub Actions grants runner group permission for the repo janhq/models
  • GitHub Actions grants secret permission for the repo janhq/models

@nguyenhoangthuan99
Copy link
Contributor

I add the updated model converter pipeline to janhq/models repo. And also add a pipeline to automatically update the model.yml file in hugging face cc @gabrielle-ong, now we can run CI pipeline in this repo.

Guild for update model.yml file

  1. Click to Update model.yml with specific quant
    Image
  2. Click run workflow
    Image

Please update with the format
"top_p=0.9" "top_k=40" "stop=['<|end_of_text|>', '<|eot_id|>']"

Note that the prompt_template field should not update this way because this field sometimes cannot handle proper special character on string.

@dan-homebrew dan-homebrew transferred this issue from janhq/cortex.cpp Sep 29, 2024
@gabrielle-ong
Copy link

Marking as complete, successfully done for mistral-nemo and llama3.2
To run model converter pipeline from janhq/models repo

@github-project-automation github-project-automation bot moved this from Completed to Review + QA in Jan & Cortex Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants