Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Model failed to run due to MacOS filesystem incompatibility (MacOS behavior with exFAT drives) #3846

Closed
1 of 3 tasks
J-Siu opened this issue Oct 20, 2024 · 11 comments
Closed
1 of 3 tasks
Assignees
Labels
can't reproduce Needs repro instructions type: bug Something isn't working

Comments

@J-Siu
Copy link

J-Siu commented Oct 20, 2024

Jan version

0.5.6

Describe the Bug

When I try to use "Codestral 22B Q4", prompt give no response. The app.log says failed to load.

In the log it is trying to load:
'/Volumes/T7B01/ai/jan/models/codestral-22b/._Codestral-22B-v0.1-Q4_K_M.gguf'

I check the folder, the actual file should be:
'/Volumes/T7B01/ai/jan/models/codestral-22b/Codestral-22B-v0.1-Q4_K_M.gguf'

PS: Same thing happen to Deepseek Coder 33B Instruct Q4. Jan try to load file starting with ._, which does not exist.

Steps to Reproduce

  1. Go to model page and use "Codestral 22B Q4"
  2. In prompt, type "hi", nothing happen.

Screenshots / Logs

2024-10-20T20:29:38.271Z [SPECS]::Version: 0.5.62024-10-20T20:29:38.274Z [SPECS]::CPUs: [{"model":"Apple M2","speed":2400,"times":{"user":4908840,"nice":0,"sys":3007610,"idle":25556600,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":4555520,"nice":0,"sys":2512530,"idle":26444360,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":4131220,"nice":0,"sys":2167740,"idle":27270840,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":4890100,"nice":0,"sys":1997070,"idle":26731880,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":3225190,"nice":0,"sys":1412130,"idle":29022680,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":2168460,"nice":0,"sys":1003950,"idle":30524130,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":1269420,"nice":0,"sys":776550,"idle":31672900,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":978150,"nice":0,"sys":597670,"idle":32158780,"irq":0}}]
2024-10-20T20:29:38.275Z [SPECS]::Machine: arm64
2024-10-20T20:29:38.275Z [SPECS]::Endianness: LE
2024-10-20T20:29:38.275Z [SPECS]::Free Mem: 7015940096
2024-10-20T20:29:38.275Z [SPECS]::Parallelism: 8
2024-10-20T20:29:38.275Z [SPECS]::Total Mem: 25769803776
2024-10-20T20:29:38.275Z [SPECS]::OS Version: Darwin Kernel Version 24.0.0: Tue Sep 24 23:37:13 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T8112
2024-10-20T20:29:38.275Z [SPECS]::OS Platform: darwin
2024-10-20T20:29:38.275Z [SPECS]::OS Release: 24.0.0
2024-10-20T20:29:38.275Z [CORTEX]::Debug: Adding additional dependencies for @janhq/inference-cortex-extension 1.0.19
2024-10-20T20:29:38.275Z [CORTEX]::CPU information - 8
2024-10-20T20:29:38.276Z [CORTEX]:: Request to kill cortex
2024-10-20T20:29:38.292Z [CORTEX]:: cortex process is terminated
2024-10-20T20:29:38.293Z [CORTEX]:: Spawning cortex subprocess...
2024-10-20T20:29:38.293Z [CORTEX] PATH: /usr/bin:/bin:/usr/sbin:/sbin::/Volumes/T7B01/ai/jan/engines/@janhq/inference-cortex-extension/1.0.19:/Volumes/T7B01/ai/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64
2024-10-20T20:29:38.293Z [CORTEX]:: Spawn cortex at path: /Volumes/T7B01/ai/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64/cortex-cpp, and args: 1,127.0.0.1,3928
2024-10-20T20:29:38.293Z [CORTEX]::Debug: Cortex engine path: /Volumes/T7B01/ai/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64
2024-10-20T20:29:38.400Z [CORTEX]:: Loading model with params {"cpu_threads":8,"ctx_len":4096,"prompt_template":"{system_message} [INST] {prompt} [/INST]","llama_model_path":"/Volumes/T7B01/ai/jan/models/codestral-22b/._Codestral-22B-v0.1-Q4_K_M.gguf","ngl":57,"system_prompt":"","user_prompt":" [INST] ","ai_prompt":" [/INST]","model":"codestral-22b"}
2024-10-20T20:29:38.400Z [CORTEX]:: cortex is ready
2024-10-20T20:29:38.413Z [CORTEX]:: 20241020 20:29:38.299871 UTC 440817 INFO  cortex-cpp version: 0.5.0 - main.cc:73
20241020 20:29:38.300316 UTC 440817 INFO  Server started, listening at: 127.0.0.1:3928 - main.cc:78
20241020 20:29:38.300317 UTC 440817 INFO  Please load your model - main.cc:79
20241020 20:29:38.300319 UTC 440817 INFO  Number of thread is:8 - main.cc:86
20241020 20:29:38.404506 UTC 440819 INFO  CPU instruction set: fpu = 0| mmx = 0| sse = 0| sse2 = 0| sse3 = 0| ssse3 = 0| sse4_1 = 0| sse4_2 = 0| pclmulqdq = 0| avx = 0| avx2 = 0| avx512_f = 0| avx512_dq = 0| avx512_ifma = 0| avx512_pf = 0| avx512_er = 0| avx512_cd = 0| avx512_bw = 0| has_avx512_vl = 0| has_avx512_vbmi = 0| has_avx512_vbmi2 = 0| avx512_vnni = 0| avx512_bitalg = 0| avx512_vpopcntdq = 0| avx512_4vnniw = 0| avx512_4fmaps = 0| avx512_vp2intersect = 0| aes = 0| f16c = 0| - server.cc:288
20241020 20:29:38.412285 UTC 440819 INFO  Loaded engine: cortex.llamacpp - server.cc:314
20241020 20:29:38.412294 UTC 440819 INFO  cortex.llamacpp version: 0.1.25 - llama_engine.cc:163
20241020 20:29:38.412687 UTC 440819 INFO  Number of parallel is set to 1 - llama_engine.cc:352
20241020 20:29:38.412691 UTC 440819 DEBUG [LoadModelImpl] cache_type: f16 - llama_engine.cc:365
20241020 20:29:38.412694 UTC 440819 DEBUG [LoadModelImpl] Enabled Flash Attention - llama_engine.cc:374
20241020 20:29:38.412702 UTC 440819 DEBUG [LoadModelImpl] stop: null
 - llama_engine.cc:395
{"timestamp":1729456178,"level":"INFO","function":"LoadModelImpl","line":418,"message":"system info","n_threads":8,"total_threads":8,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | "}

2024-10-20T20:29:38.413Z [CORTEX]::Error: gguf_init_from_file: invalid magic characters '����'
llama_model_load: error loading model: llama_model_loader: failed to load model from /Volumes/T7B01/ai/jan/models/codestral-22b/._Codestral-22B-v0.1-Q4_K_M.gguf

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/Volumes/T7B01/ai/jan/models/codestral-22b/._Codestral-22B-v0.1-Q4_K_M.gguf'

2024-10-20T20:29:38.413Z [CORTEX]:: {"timestamp":1729456178,"level":"ERROR","function":"LoadModel","line":185,"message":"llama.cpp unable to load model","model":"/Volumes/T7B01/ai/jan/models/codestral-22b/._Codestral-22B-v0.1-Q4_K_M.gguf"}
20241020 20:29:38.413322 UTC 440819 ERROR Error loading the model - llama_engine.cc:423

2024-10-20T20:29:38.417Z [CORTEX]:: Load model success with response {}
2024-10-20T20:29:38.417Z [CORTEX]:: Validating model codestral-22b
2024-10-20T20:29:38.419Z [CORTEX]:: Validate model state with response 409
2024-10-20T20:29:38.420Z [CORTEX]:: Validate model state failed with response {"message":"Model has not been loaded, please load model into cortex.llamacpp"} and status is "Conflict"
2024-10-20T20:29:38.420Z [CORTEX]::Error: Validate model status failed

What is your OS?

  • MacOS
  • Windows
  • Linux
@J-Siu J-Siu added the type: bug Something isn't working label Oct 20, 2024
@imtuyethan imtuyethan self-assigned this Oct 21, 2024
@imtuyethan imtuyethan added this to Menlo Oct 21, 2024
@github-project-automation github-project-automation bot moved this to Investigating in Menlo Oct 21, 2024
@imtuyethan
Copy link
Contributor

114 (windows-dev-tensorRT-llm)
OS: Windows 11 Pro (Version 23H2, build 22631.4037)
CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores)
RAM: 32 GB
GPU: NVIDIA GeForce RTX 3090
Storage: 599 GB local disk (C:)


Can run Codestral 22B Q4 on my end:

However the response is weird, maybea prompt template issue which i will followup on this ticket instead: janhq/models#46
https://github.com/user-attachments/assets/5380f2b7-d137-423d-beaa-21d41e33d67f

App log:
Screenshot 2024-10-21 at 7 29 46 PM

@J-Siu Have you ever tried to rename the model?? Can you delete the model, download it again & try again, thanks a bunch!!

Can also run Deepseek Coder 33B Instruct Q4 on my end:

Screen.Recording.2024-10-21.at.7.23.34.PM.mov

@J-Siu You may check out #3703 if you have encountered the same issue, we've been through many updates, the legacy model could be corrupted.

@imtuyethan imtuyethan added the can't reproduce Needs repro instructions label Oct 21, 2024
@J-Siu
Copy link
Author

J-Siu commented Oct 21, 2024

This look like 3703 but not the same. In 3703 video, the model started. In my case, the model did not start.

I am deleting the model and downloading again. Will update soon.

@J-Siu
Copy link
Author

J-Siu commented Oct 21, 2024

@imtuyethan I basically deleted the whole data directory and re-download everything to test. It is working now. The strange thing is I downloaded that model yesterday🤦‍♂️

However, there are lot of cortex error in app.log, is that normal:

2024-10-21T17:03:02.035Z [SPECS]::Version: 0.5.62024-10-21T17:03:02.036Z [SPECS]::CPUs: [{"model":"Apple M2","speed":2400,"times":{"user":13558770,"nice":0,"sys":6269080,"idle":85707820,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":12951890,"nice":0,"sys":5430740,"idle":87218150,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":11123090,"nice":0,"sys":4621160,"idle":89990010,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":14848750,"nice":0,"sys":4465670,"idle":86533630,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":6163930,"nice":0,"sys":1995580,"idle":97807020,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":4703450,"nice":0,"sys":1256960,"idle":100088390,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":2148090,"nice":0,"sys":895430,"idle":103035350,"irq":0}},{"model":"Apple M2","speed":2400,"times":{"user":1430550,"nice":0,"sys":669980,"idle":104000130,"irq":0}}]
2024-10-21T17:03:02.036Z [SPECS]::Machine: arm64
2024-10-21T17:03:02.036Z [SPECS]::Endianness: LE
2024-10-21T17:03:02.036Z [SPECS]::Parallelism: 8
2024-10-21T17:03:02.036Z [SPECS]::Free Mem: 3581624320
2024-10-21T17:03:02.036Z [SPECS]::Total Mem: 25769803776
2024-10-21T17:03:02.036Z [SPECS]::OS Platform: darwin
2024-10-21T17:03:02.036Z [SPECS]::OS Release: 24.0.0
2024-10-21T17:03:02.036Z [SPECS]::OS Version: Darwin Kernel Version 24.0.0: Tue Sep 24 23:37:13 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T8112
2024-10-21T17:03:02.036Z [CORTEX]::Debug: Adding additional dependencies for @janhq/inference-cortex-extension 1.0.19
2024-10-21T17:03:02.036Z [CORTEX]::CPU information - 8
2024-10-21T17:03:02.037Z [CORTEX]:: Request to kill cortex
2024-10-21T17:03:02.054Z [CORTEX]:: cortex process is terminated
2024-10-21T17:03:02.054Z [CORTEX]:: Spawning cortex subprocess...
2024-10-21T17:03:02.054Z [CORTEX]::Debug: Cortex engine path: /Volumes/T7W01/ai/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64
2024-10-21T17:03:02.054Z [CORTEX]:: Spawn cortex at path: /Volumes/T7W01/ai/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64/cortex-cpp, and args: 1,127.0.0.1,3928
2024-10-21T17:03:02.054Z [CORTEX] PATH: /usr/bin:/bin:/usr/sbin:/sbin::/Volumes/T7B01/ai/jan/engines/@janhq/inference-cortex-extension/1.0.19::/Volumes/T7W01/ai/jan/engines/@janhq/inference-cortex-extension/1.0.19:/Volumes/T7W01/ai/jan/extensions/@janhq/inference-cortex-extension/dist/bin/mac-arm64
2024-10-21T17:03:02.482Z [CORTEX]:: cortex is ready
2024-10-21T17:03:02.482Z [CORTEX]:: Loading model with params {"cpu_threads":8,"ctx_len":4096,"prompt_template":"{system_message} [INST] {prompt} [/INST]","llama_model_path":"/Volumes/T7W01/ai/jan/models/codestral-22b/Codestral-22B-v0.1-Q4_K_M.gguf","ngl":57,"system_prompt":"","user_prompt":" [INST] ","ai_prompt":" [/INST]","model":"codestral-22b"}
2024-10-21T17:03:02.703Z [CORTEX]:: 20241021 17:03:02.405761 UTC 1128482 INFO  cortex-cpp version: 0.5.0 - main.cc:73
20241021 17:03:02.406469 UTC 1128482 INFO  Server started, listening at: 127.0.0.1:3928 - main.cc:78
20241021 17:03:02.406471 UTC 1128482 INFO  Please load your model - main.cc:79
20241021 17:03:02.406474 UTC 1128482 INFO  Number of thread is:8 - main.cc:86
20241021 17:03:02.492463 UTC 1128498 INFO  CPU instruction set: fpu = 0| mmx = 0| sse = 0| sse2 = 0| sse3 = 0| ssse3 = 0| sse4_1 = 0| sse4_2 = 0| pclmulqdq = 0| avx = 0| avx2 = 0| avx512_f = 0| avx512_dq = 0| avx512_ifma = 0| avx512_pf = 0| avx512_er = 0| avx512_cd = 0| avx512_bw = 0| has_avx512_vl = 0| has_avx512_vbmi = 0| has_avx512_vbmi2 = 0| avx512_vnni = 0| avx512_bitalg = 0| avx512_vpopcntdq = 0| avx512_4vnniw = 0| avx512_4fmaps = 0| avx512_vp2intersect = 0| aes = 0| f16c = 0| - server.cc:288
20241021 17:03:02.701597 UTC 1128498 INFO  Loaded engine: cortex.llamacpp - server.cc:314
20241021 17:03:02.701669 UTC 1128498 INFO  cortex.llamacpp version: 0.1.25 - llama_engine.cc:163
20241021 17:03:02.701771 UTC 1128498 INFO  Number of parallel is set to 1 - llama_engine.cc:352
20241021 17:03:02.701775 UTC 1128498 DEBUG [LoadModelImpl] cache_type: f16 - llama_engine.cc:365
20241021 17:03:02.701792 UTC 1128498 DEBUG [LoadModelImpl] Enabled Flash Attention - llama_engine.cc:374
20241021 17:03:02.701801 UTC 1128498 DEBUG [LoadModelImpl] stop: null
 - llama_engine.cc:395
{"timestamp":1729530182,"level":"INFO","function":"LoadModelImpl","line":418,"message":"system info","n_threads":8,"total_threads":8,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | "}

2024-10-21T17:03:02.716Z [CORTEX]::Error: llama_model_loader: loaded meta data with 27 key-value pairs and 507 tensors from /Volumes/T7W01/ai/jan/models/codestral-22b/Codestral-22B-v0.1-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Codestral-22B-v0.1
llama_model_loader: - kv   2:                          llama.block_count u32              = 56
llama_model_loader: - kv   3:                       llama.context_length u32              = 32768
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 6144
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 16384
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 48
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 15
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 32768
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128

2024-10-21T17:03:02.716Z [CORTEX]::Error: llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = default

2024-10-21T17:03:02.721Z [CORTEX]::Error: llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,32768]   = ["<unk>", "<s>", "</s>", "[INST]", "[...

2024-10-21T17:03:02.731Z [CORTEX]::Error: llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,32768]   = [0.000000, 0.000000, 0.000000, 0.0000...

2024-10-21T17:03:02.733Z [CORTEX]::Error: llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,32768]   = [2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - kv  23:                      quantize.imatrix.file str              = /models/Codestral-22B-v0.1-GGUF/Codes...
llama_model_loader: - kv  24:                   quantize.imatrix.dataset str              = /training_data/calibration_datav3.txt
llama_model_loader: - kv  25:             quantize.imatrix.entries_count i32              = 392
llama_model_loader: - kv  26:              quantize.imatrix.chunks_count i32              = 148
llama_model_loader: - type  f32:  113 tensors
llama_model_loader: - type q4_K:  337 tensors
llama_model_loader: - type q6_K:   57 tensors

2024-10-21T17:03:02.739Z [CORTEX]::Error: llm_load_vocab: special tokens cache size = 751

2024-10-21T17:03:02.742Z [CORTEX]::Error: llm_load_vocab: token to piece cache size = 0.1732 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32768
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 6144
llm_load_print_meta: n_layer          = 56
llm_load_print_meta: n_head           = 48
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128

2024-10-21T17:03:02.742Z [CORTEX]::Error: llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 6
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 16384
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 22.25 B
llm_load_print_meta: model size       = 12.42 GiB (4.80 BPW) 
llm_load_print_meta: general.name     = Codestral-22B-v0.1
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 781 '<0x0A>'
llm_load_print_meta: max token length = 48

2024-10-21T17:03:02.742Z [CORTEX]::Error: llm_load_tensors: ggml ctx size =    0.47 MiB

2024-10-21T17:03:02.796Z [CORTEX]::Error: ggml_backend_metal_log_allocated_size: allocated buffer, size = 12288.00 MiB, (12288.06 / 16384.02)

ggml_backend_metal_log_allocated_size: allocated buffer, size =   483.98 MiB, (12772.05 / 16384.02)

2024-10-21T17:03:02.796Z [CORTEX]::Error: llm_load_tensors: offloading 56 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 57/57 layers to GPU
llm_load_tensors:        CPU buffer size =   108.00 MiB
llm_load_tensors:      Metal buffer size = 12614.46 MiB
.............................
2024-10-21T17:03:02.796Z [CORTEX]::Error: ..................
2024-10-21T17:03:02.796Z [CORTEX]::Error: ..........
2024-10-21T17:03:02.796Z [CORTEX]::Error: ......
2024-10-21T17:03:02.796Z [CORTEX]::Error: .......
2024-10-21T17:03:02.796Z [CORTEX]::Error: ..........
2024-10-21T17:03:02.796Z [CORTEX]::Error: .......
2024-10-21T17:03:02.796Z [CORTEX]::Error: ....
2024-10-21T17:03:02.796Z [CORTEX]::Error: ......
2024-10-21T17:03:02.796Z [CORTEX]::Error: ..

2024-10-21T17:03:02.798Z [CORTEX]::Error: llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 2048
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating

2024-10-21T17:03:02.798Z [CORTEX]::Error: ggml_metal_init: found device: Apple M2

2024-10-21T17:03:02.798Z [CORTEX]::Error: ggml_metal_init: picking default device: Apple M2

2024-10-21T17:03:02.799Z [CORTEX]::Error: ggml_metal_init: using embedded metal library

2024-10-21T17:03:02.805Z [CORTEX]::Error: ggml_metal_init: GPU name:   Apple M2
ggml_metal_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 17179.89 MB

2024-10-21T17:03:02.886Z [CORTEX]::Error: llama_kv_cache_init:      Metal KV buffer size =   896.00 MiB
llama_new_context_with_model: KV self size  =  896.00 MiB, K (f16):  448.00 MiB, V (f16):  448.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.13 MiB

2024-10-21T17:03:02.887Z [CORTEX]::Error: llama_new_context_with_model:      Metal compute buffer size =   448.02 MiB
llama_new_context_with_model:        CPU compute buffer size =    80.02 MiB
llama_new_context_with_model: graph nodes  = 1575
llama_new_context_with_model: graph splits = 2

2024-10-21T17:03:44.916Z [CORTEX]:: Load model success with response {}
2024-10-21T17:03:44.918Z [CORTEX]:: Validating model codestral-22b
2024-10-21T17:03:44.922Z [CORTEX]:: Validate model state with response 200
2024-10-21T17:03:44.924Z [CORTEX]:: Validate model state success with response {"model_data":"{\"frequency_penalty\":0.0,\"grammar\":\"\",\"ignore_eos\":false,\"logit_bias\":[],\"min_p\":0.05000000074505806,\"mirostat\":0,\"mirostat_eta\":0.10000000149011612,\"mirostat_tau\":5.0,\"model\":\"/Volumes/T7W01/ai/jan/models/codestral-22b/Codestral-22B-v0.1-Q4_K_M.gguf\",\"n_ctx\":4096,\"n_keep\":0,\"n_predict\":2,\"n_probs\":0,\"penalize_nl\":false,\"penalty_prompt_tokens\":[],\"presence_penalty\":0.0,\"repeat_last_n\":64,\"repeat_penalty\":1.0,\"seed\":4294967295,\"stop\":[],\"stream\":false,\"temperature\":0.800000011920929,\"tfs_z\":1.0,\"top_k\":40,\"top_p\":0.949999988079071,\"typical_p\":1.0,\"use_penalty_prompt_tokens\":false}","model_loaded":true}

@imtuyethan
Copy link
Contributor

Thanks, @J-Siu, for reporting, Cortex.cpp is going through a lot of refactoring right now to stabilize the app eventually, thanks for being patient with us.

@namchuai @vansangpfiev Can you guys help take a look at the Cortex logs, thank you!

@J-Siu
Copy link
Author

J-Siu commented Oct 22, 2024

@imtuyethan Thanks for the reply. Closing this as the original issue is fixed.

PS: I went through the app.log again. It seems to me the log level of those errors are set wrong, maybe they should be "INFO" instead of "ERROR"? Let me know if you want me to open a new issue for the cortex error.

@J-Siu J-Siu closed this as completed Oct 22, 2024
@github-project-automation github-project-automation bot moved this from Investigating to Review + QA in Menlo Oct 22, 2024
@imtuyethan
Copy link
Contributor

imtuyethan commented Oct 22, 2024

@imtuyethan Thanks for the reply. Closing this as the original issue is fixed.

PS: I went through the app.log again. It seems to me the log level of those errors is set wrong; maybe they should be "INFO" instead of "ERROR"? Let me know if you want me to open a new issue for the cortex error.

@J-Siu That would be great, the Cortex team usually only triages bugs reported in their repo: https://github.com/janhq/cortex.cpp

@imtuyethan imtuyethan moved this from Review + QA to Completed in Menlo Oct 22, 2024
@dan-menlo
Copy link
Contributor

@imtuyethan Thanks for the reply. Closing this as the original issue is fixed.

PS: I went through the app.log again. It seems to me the log level of those errors are set wrong, maybe they should be "INFO" instead of "ERROR"? Let me know if you want me to open a new issue for the cortex error.

@J-Siu I'll relay this message to the Cortex team - I think this is a really great reminder and thanks for holding us to coding best practices 🙏

@imtuyethan imtuyethan added this to the v0.5.7 milestone Oct 23, 2024
@J-Siu
Copy link
Author

J-Siu commented Oct 24, 2024

I think I narrow down the issue.

On MacOS, and Jan data folder on external drive format as exfat, which is the default for most external ssd.

app.log

$cat app.log|grep -na unable
42:2024-10-24T09:55:56.679Z [CORTEX]:: {"timestamp":1729763756,"level":"ERROR","function":"LoadModel","line":185,"message":"llama.cpp unable to load model","model":"/Volumes/T7B01/ai/jan/models/llama3.2-1b-instruct/._Llama-3.2-1B-Instruct-Q8_0.gguf"}
86:2024-10-24T09:56:08.774Z [CORTEX]:: {"timestamp":1729763768,"level":"ERROR","function":"LoadModel","line":185,"message":"llama.cpp unable to load model","model":"/Volumes/T7B01/ai/jan/models/dolphin-phi-2/._dolphin-2_6-phi-2.Q8_0.gguf"}

The issue go away yesterday when I moved the data folder to one of my apfs ssd for testing. However, once I moved it back to a exfat ssd, issue come back.

My Jan data folder was on my exfat ssd for a long time (I was the one asking for a warning popup when moving the data folder in discord, so it must be awhile). Not sure what changed.

@J-Siu J-Siu reopened this Oct 24, 2024
@github-project-automation github-project-automation bot moved this from Completed to In Progress in Menlo Oct 24, 2024
@imtuyethan imtuyethan moved this from In Progress to Investigating in Menlo Oct 25, 2024
@imtuyethan imtuyethan removed this from the v0.5.7 milestone Oct 25, 2024
@J-Siu
Copy link
Author

J-Siu commented Oct 26, 2024

The extfs / "._" file issue seems to affect multiple apps/tools I use. I am throwing in the towel and switching all my MacOS data drive to apfs😭

I did some searching and found https://apple.stackexchange.com/questions/14980/why-are-dot-underscore-files-created-and-how-can-i-avoid-them

This seems to be affecting .git directory too, and that is the final straw for me.

Let me know if you want me close this.

@imtuyethan imtuyethan changed the title bug: Cannot start Codestral 22B Q4 bug: Model fails to run due to MacOS filesystem incompatibility Nov 4, 2024
@imtuyethan imtuyethan changed the title bug: Model fails to run due to MacOS filesystem incompatibility bug: Model failed to run due to MacOS filesystem incompatibility Nov 4, 2024
@github-project-automation github-project-automation bot moved this from Investigating to Review + QA in Menlo Nov 4, 2024
@imtuyethan
Copy link
Contributor

The extfs / "._" file issue seems to affect multiple apps/tools I use. I am throwing in the towel and switching all my MacOS data drive to apfs😭

I did some searching and found https://apple.stackexchange.com/questions/14980/why-are-dot-underscore-files-created-and-how-can-i-avoid-them

This seems to be affecting .git directory too, and that is the final straw for me.

Let me know if you want me close this.

The extfs / "._" file issue seems to affect multiple apps/tools I use. I am throwing in the towel and switching all my MacOS data drive to apfs😭

I did some searching and found https://apple.stackexchange.com/questions/14980/why-are-dot-underscore-files-created-and-how-can-i-avoid-them

This seems to be affecting .git directory too, and that is the final straw for me.

Let me know if you want me close this.

@J-Siu Thanks for the detailed investigation! Seems like it is a known MacOS behavior with exFAT drives.

Since this affects multiple apps/tools (not just Jan), switching to APFS was definitely the right call. I'll add a note to our docs to recommend using APFS over exFAT for storing Jan's data folder on MacOS to help other users avoid this issue.

@imtuyethan imtuyethan moved this from Review + QA to Completed in Menlo Nov 4, 2024
@imtuyethan imtuyethan changed the title bug: Model failed to run due to MacOS filesystem incompatibility bug: Model failed to run due to MacOS filesystem incompatibility (MacOS behavior with exFAT drives) Nov 4, 2024
@J-Siu
Copy link
Author

J-Siu commented Nov 4, 2024

@imtuyethan Maybe Jan should refuse to create data folder on ExtFS with a pop-up (unless you go the route of updating file access call on MacOS🤣). And a popup warning for people already have their data folder on ExtFS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can't reproduce Needs repro instructions type: bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

3 participants