Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Fix, update & improve models in Jan Hub #46

Open
5 tasks
imtuyethan opened this issue Oct 21, 2024 · 23 comments
Open
5 tasks

bug: Fix, update & improve models in Jan Hub #46

imtuyethan opened this issue Oct 21, 2024 · 23 comments
Assignees
Labels
P1: important Important feature / fix type: bug Something isn't working type: enhancement New feature or request

Comments

@imtuyethan
Copy link

imtuyethan commented Oct 21, 2024

Problem

I have encountered many issues with the wrong model default settings (incorrect prompt template, the stop words missing, etc.).
e.g., comments in Jan 0.5.7 Release Sign Off janhq/jan#3818


Model Testing Results

I have tested 45 models from Jan Hub, here are the results.

Next step

  • Update correct default settings for failed models
  • Better description for all models
  • Consider removing legacy models
  • Update Hub with new trending models?

cc @hahuyhoang411

No. Model Name Issue Identified Status
1 Llama 3.2 1B Instruct Q8
2 Llama 3.2 3B Instruct Q8
3 Qwen2.5 7B Instruct Q4
4 Qwen2.5 Coder 7B Instruct Q4
5 Llama 3.1 8B Instruct Q4
6 Qwen2.5 14B Instruct Q4
7 Codestral 22B Q4 Error in response format, wrong prompt template?
8 TinyLlama Chat 1.1B Q4 Garbled response, error in response format
9 LlamaCorn 1.1B Q8
10 Deepseek Coder 1.3B Instruct Q8
11 Gemma 1.1 2B Q4 Error in response format, wrong prompt template?
12 Gemma 2 2B Q4
13 Phi-3 Mini Instruct Q4
14 Stable Zephyr 3B Q8
15 Llama 2 Chat 7B Q4 Error in response format, wrong stop word insertion?
16 CodeNinja 7B Q4 Error in response format, wrong prompt template?
17 LaVa 7B Garbled response, sometimes cannot run
18 Mistral 7B Instruct Q4 Error in response format, wrong stop word insertion?
19 Noromaid 7B Q4
20 Openchat-3.5 7B Q4
21 Stealth 7B Q4
22 Trinity-v1.2 7B Q4
23 Vistral 7B Q4 Error in response format, wrong stop word insertion?
24 Qwen 2 7B Instruct Q4 Error in response format, wrong prompt template?
25 Qwen Chat 7B Q4
26 Llama 3 8B Instruct Q4
27 Hermes Pro Llama 3 8B Q4
28 Aya 23 8B Q4
29 Gemma 1.1 7B Q4 Error in response format, wrong stop word insertion?
30 BakLlava 1 Garbled response, sometimes cannot run, wrong stop word insertion?
31 Gemma 2 9B Q4
32 LaVa 13B Q4 Garbled response; prompt template issue?
33 Wizard Coder Python 13B Q4 Garbled response; prompt template issue?
34 Phi-3 Medium Instruct Q4
35 Gemma 2 27B Q4
36 Qwen2.5 32B Instruct Q4
37 Deepseek Coder 35B Instruct Q4
38 Phind 34B Q4 Error in response format, wrong stop word insertion?
39 Yi 34B Q4
40 Command-R v01 34B Q4 Garbled response; prompt template issue?
41 Aya 23 35B Q4
42 Mixtral 8x7B Instruct Q4 Error in response format, wrong stop word insertion?
43 Llama 3.1 70B Instruct Q4
44 Llama 2 Chat 70B Q4 Error in response format, wrong stop word insertion?
45 Qwen2.5 72B Instruct Q4

On one note

We will need to develop model.yaml to easily define model capabilities (e.g. function calling, vision, etc). Users are facing an issue with imported LlaVa: janhq/jan#3855

  • model.yaml should have some sort of capabilities field, e.g. tools: true
  • Jan allows users to "edit" Models, e.g. view a model's functionalities + edit it
  • Cortex: users will just edit model.yaml directly

Related

@imtuyethan imtuyethan self-assigned this Oct 21, 2024
@imtuyethan
Copy link
Author

imtuyethan commented Oct 21, 2024

Off topic:

Grammar issue (for all self-imported models by users):

Screenshot 2024-10-16 at 11 58 36 PM
  • Please change to "Self-imported model by user"
  • The way we define tags is weird.

Cloud models description could be better

These descriptions are not helpful:

Screenshot 2024-10-17 at 12 04 57 AM Screenshot 2024-10-17 at 12 04 32 AM

@imtuyethan
Copy link
Author

imtuyethan commented Oct 21, 2024

114 (windows-dev-tensorRT-llm)
OS: Windows 11 Pro (Version 23H2, build 22631.4037)
CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores)
RAM: 32 GB
GPU: NVIDIA GeForce RTX 3090
Storage: 599 GB local disk (C:)


Codestral 22B Q4:

The response is weird:

Screen.Recording.2024-10-21.at.7.25.14.PM.mov
Screen.Recording.2024-10-21.at.7.36.25.PM.mov

@imtuyethan
Copy link
Author

imtuyethan commented Oct 21, 2024

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Model: Tinyllama Chat 1.1B Q4

Seems like wrong prompt template?

Screenshot 2024-10-16 at 9 43 43 PM Screenshot 2024-10-16 at 9 43 53 PM

With the same prompt, Llama 3.2 1B Instruct Q8 gave me a correct/thorough answer.

@imtuyethan
Copy link
Author

imtuyethan commented Oct 21, 2024

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Gemma 1.1 2B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 1 52 33 AM

@imtuyethan
Copy link
Author

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Llama 2 Chat 7B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 2 32 32 PM

@imtuyethan
Copy link
Author

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


CodeNinja 7B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 2 34 32 PM

@imtuyethan
Copy link
Author

imtuyethan commented Oct 22, 2024

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


LlaVa 7B

Weird responses:

Reported by user: https://zoom.us/clips/share/riUumJZ0uuzb5vQvZ2eZbMkmOq1nvU7O8VTD5FuBNtxRaO89rp9xA7CibJFCLlGju3nfyLsB_19iPegc0nSM4qxV.POPOcY7WXml_Ab8P

Screen.Recording.2024-10-22.at.2.37.24.PM.mov
Screen.Recording.2024-10-22.at.5.23.12.PM.mov

@imtuyethan
Copy link
Author

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Mistral 7B Instruct Q4

Missing stop word?

Screenshot 2024-10-22 at 6 03 59 PM

@imtuyethan
Copy link
Author

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Vistral 7B Q4

Missing stop word?

Screenshot 2024-10-22 at 6 07 18 PM

@imtuyethan
Copy link
Author

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Qwen 2 7B Instruct Q4

Weird format:

Screenshot 2024-10-22 at 6 08 58 PM

@imtuyethan
Copy link
Author

imtuyethan commented Oct 22, 2024

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


BakLlava 1

Issue similar as LlaVa 7B

Screenshot 2024-10-22 at 7 11 57 PM

@imtuyethan
Copy link
Author

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Gemma 1.1 7B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 7 13 58 PM

@imtuyethan
Copy link
Author

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


LlaVa 13B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 7 33 55 PM

@imtuyethan
Copy link
Author

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Wizard Coder Python 13B Q4

Wrong prompt template?

Screenshot 2024-10-22 at 7 35 40 PM

@imtuyethan
Copy link
Author

114 (windows-dev-tensorRT-llm)
OS: Windows 11 Pro (Version 23H2, build 22631.4037)
CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores)
RAM: 32 GB
GPU: NVIDIA GeForce RTX 3090
Storage: 599 GB local disk (C:)


Command-R v01 34B Q4

Pretty sure wrong prompt template:

Screenshot 2024-10-22 at 7 42 22 PM

@dan-homebrew
Copy link

@imtuyethan I recommending converting the Checklist you have above, into a table so we can track the status/fixing status.

Please work with @hahuyhoang411 - it may be that certain models are unsavable, and we should just remove them from the library.

@imtuyethan
Copy link
Author

Device: windows-dev-tensorrt-llm
Status: Running
Node: 3x-3090s
CPU: 1.26% of 16
RAM: 6.06/96 GiB
Disk: 600 GiB


Mixtral 8x7B Instruct Q4

Screenshot 2024-10-22 at 11 08 56 PM

@imtuyethan
Copy link
Author

Device: windows-dev-tensorrt-llm
Status: Running
Node: 3x-3090s
CPU: 1.26% of 16
RAM: 6.06/96 GiB
Disk: 600 GiB


Phind 34B Q4

Screenshot 2024-10-22 at 11 45 28 PM

@imtuyethan
Copy link
Author

Device: windows-dev-tensorrt-llm
Status: Running
Node: 3x-3090s
CPU: 1.26% of 16
RAM: 6.06/96 GiB
Disk: 600 GiB


Llama 2 Chat 70B Q4

Screenshot 2024-10-22 at 11 53 20 PM

@imtuyethan
Copy link
Author

imtuyethan commented Oct 22, 2024

Tasklist

I have QA-ed all models, please check ticket description for the latest update:

  • Llama 3.2 1B Instruct Q8
  • Llama 3.2 3B Instruct Q8
  • Qwen2.5 7B Instruct Q4
  • Qwen2.5 Coder 7B Instruct Q4
  • Llama 3.1 8B Instruct Q4
  • Qwen2.5 14B Instruct Q4
  • Codestral 22B Q4
  • TinyLlama Chat 1.1B Q4
  • LlamaCorn 1.1B Q8
  • Deepseek Coder 1.3B Instruct Q8
  • Gemma 1.1 2B Q4
  • Gemma 2 2B Q4
  • Phi-3 Mini Instruct Q4
  • Stable Zephyr 3B Q8
  • Llama 2 Chat 7B Q4
  • CodeNinja 7B Q4
  • LlaVa 7B
  • Mistral 7B Instruct Q4
  • Noromaid 7B Q4
  • Openchat-3.5 7B Q4
  • Stealth 7B Q4
  • Trinity-v1.2 7B Q4
  • Vistral 7B Q4
  • Qwen 2 7B Instruct Q4
  • Qwen Chat 7B Q4
  • Llama 3 8B Instruct Q4
  • Hermes Pro Llama 3 8B Q4
  • Aya 23 8B Q4
  • Gemma 1.1 7B Q4
  • BakLlava 1
  • Gemma 2 9B Q4
  • LlaVa 13B Q4
  • Wizard Coder Python 13B Q4
  • Phi-3 Medium Instruct Q4
  • Gemma 2 27B Q4
  • Qwen2.5 32B Instruct Q4
  • Deepseek Coder 33B Instruct Q4
  • Phind 34B Q4
  • Yi 34B Q4
  • Command-R v01 34B Q4
  • Aya 23 35B Q4
  • Mixtral 8x7B Instruct Q4
  • Llama 3.1 70B Instruct Q4
  • Llama 2 Chat 70B Q4
  • Qwen2.5 72B Instruct Q4

@imtuyethan imtuyethan added the type: bug Something isn't working label Oct 22, 2024
@imtuyethan imtuyethan changed the title QA: Test all models from Hub feat: Fix, update & improve models in Jan Hub Oct 22, 2024
@imtuyethan imtuyethan added type: enhancement New feature or request P1: important Important feature / fix labels Oct 22, 2024
@imtuyethan imtuyethan changed the title feat: Fix, update & improve models in Jan Hub bug: Fix, update & improve models in Jan Hub Oct 22, 2024
@hahuyhoang411
Copy link
Collaborator

hahuyhoang411 commented Oct 24, 2024

Current hub contains a lot of outdated models, and some new models have a prompt template bug. Here is my suggestion based on @imtuyethan QA-ed list:

The rationale for this delete list is model has been released >6months will be removed.

Delete list:

  • TinyLlama Chat 1.1B Q4
  • LlamaCorn 1.1B Q8
  • Deepseek Coder 1.3B Instruct Q8
  • Gemma 1.1 2B Q4 (Only keep Gemma 2 2B Q4)
  • Phi-3 Mini Instruct Q4 -> microsoft/Phi-3.5-mini-instruct
  • Stable Zephyr 3B Q8
  • Llama 2 Chat 7B Q4
  • CodeNinja 7B Q4
  • Mistral 7B Instruct Q4 -> mistralai/Ministral-8B-Instruct-2410
  • Noromaid 7B Q4
  • Openchat-3.5 7B Q4
  • Stealth 7B Q4 (bye our merge)
  • Trinity-v1.2 7B Q4 (bye another merge)
  • Vistral 7B Q4
  • Qwen 2 7B Instruct Q4
  • Qwen Chat 7B Q4
  • Llama 3 8B Instruct Q4
  • Hermes Pro Llama 3 8B Q4
  • Gemma 1.1 7B Q4
  • BakLlava 1
  • LlaVa 7B -> Llava 1.6
  • LlaVa 13B Q4
  • Wizard Coder Python 13B Q4
  • Phi-3 Medium Instruct Q4
  • Deepseek Coder 33B Instruct Q4 -> deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
  • Phind 34B Q4
  • Yi 34B Q4
  • Mixtral 8x7B Instruct Q4 -> mistralai/Mistral-Small-Instruct-2409
  • Llama 2 Chat 70B Q4
  • Aya 23 8B Q4
  • Aya 23 35B Q4

Keep list:

  • LLM:

    • Meta:
      • Llama 3.3 70B Instruct Q4
      • Llama 3.2 1B Instruct Q8
      • Llama 3.2 3B Instruct Q8
      • Llama 3.1 8B Instruct Q4
      • Llama 3.1 70B Instruct Q4
    • Qwen:
      • Qwen2.5 7B Instruct Q4
      • Qwen2.5 0.5B Instruct Q4
      • Qwen2.5 1.5B Instruct Q4
      • Qwen2.5 1.5B Math Q4
      • Qwen2.5 1.5B Coder Q4
      • Qwen2.5 3B Instruct Q4
      • Qwen2.5 Coder 7B Instruct Q4
      • Qwen2.5 Math 7B Instruct Q4
      • Qwen2.5 14B Instruct Q4
      • Qwen2.5 32B Instruct Q4
      • Qwen2.5 72B Instruct Q4
      • QwQ 32B Reasoning Q4
    • Google:
      • Gemma 2 2B Q4
      • Gemma 2 9B Q4
      • Gemma 2 27B Q4
    • Cohere:
      • Command-R v01 34B Q4
      • Aya Expanse 8B Q4
      • Aya Expanse 32B Q4
    • Mistral:
      • Codestral 22B Q4
      • Ministral-8B-Instruct-2410 (new)
      • Mistral-Small-Instruct-2409 (new)
      • Mistral-Large-Instruct-2407 (new) -> too large
    • Deepseek:
      • DeepSeek-Coder-V2-Lite-Instruct (new) -> too large
    • Sailor (SEA languages):
      • Sailor 2 1B Q8
      • Sailor 2 8B Q4
      • Sailor 2 20B Q4
    • Microsoft:
      • Phi-3.5-mini-instruct (new)
    • NVIDIA:
      • Llama-3.1-Nemotron-70B-Instruct-HF (new)
    • IBM:
      • Granite-3.0 3B (new)
      • Granite-3.0 8B (new)
    • Intellect Prime:
      • Intellect-1 10B Q4
    • AIDC-AI:
    • Marco-o1
    • AllenAI:
      • Olmo2
      • Tulu3 8B Q4
  • VLM: VLMs are a bit more tricky
    LLava 1.6 (new)
    Qwen2-VL-7B-Instruct (new)
    Pixtral-12B-2409 (new)
    Llama-3.2-11B-Vision-Instruct (new)
    GOT-OCR2_0 (new)
    Molmo-7B-D-0924 (new)
    MiniCPM-V-2_6 (new)

@imtuyethan
Copy link
Author

imtuyethan commented Oct 25, 2024

@hahuyhoang411 Should we add more new/trending models? The list seems short for a whole model hub.

Some edge cases we need to handle:

We can delete them from Hub, but they still show up on the users' side if they have downloaded these legacy models. How do we inform them when these models don't work?

@dan-homebrew dan-homebrew transferred this issue from janhq/jan Oct 31, 2024
@hahuyhoang411 hahuyhoang411 moved this from Scheduled to In Progress in Jan & Cortex Nov 12, 2024
@hahuyhoang411
Copy link
Collaborator

I edited the list based on current trends. cc @imtuyethan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: important Important feature / fix type: bug Something isn't working type: enhancement New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

3 participants