Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

将Ollama连接到Pot时出现报错“测试失败Type Error:Faild to fetch”. #730

Open
BurningSky0306 opened this issue Mar 5, 2024 · 0 comments

Comments

@BurningSky0306
Copy link

Description

在启动Ollama之后,我能够直接在Windows PowerShell中与模型对话。同时,我也能够在Logseq中使用“Ollama-Logseq-Plugin”来调用模型进行对话,但是我无法将Ollama添加到Pot中。在这个过程中,我尝试了qwen:7b, mistral:7b, gemma:7b, llama2:7b.这些模型,都出现了同样的问题。

Ollama版本:0.1.28
CPU:AMD R5-5600H
GPU:NVIDIA GTX1650

我在Windows PowerShell中查看了对应的报错信息,具体如下:

PS C:\Users\Lenovo> ollama serve
time=2024-03-05T11:25:49.684+08:00 level=INFO source=images.go:710 msg="total blobs: 25"
time=2024-03-05T11:25:49.711+08:00 level=INFO source=images.go:717 msg="total unused blobs removed: 0"
time=2024-03-05T11:25:49.714+08:00 level=INFO source=routes.go:1021 msg="Listening on 127.0.0.1:11434 (version 0.1.28)"
time=2024-03-05T11:25:49.714+08:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-05T11:25:49.883+08:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cpu cuda_v11.3]"
[GIN] 2024/03/05 - 11:25:56 | 200 |     12.9131ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2024/03/05 - 11:25:57 | 200 |       9.464ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2024/03/05 - 11:25:57 | 200 |     10.3098ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2024/03/05 - 11:25:58 | 200 |     10.6555ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2024/03/05 - 11:25:58 | 200 |     10.2732ms |       127.0.0.1 | GET      "/api/tags"
time=2024-03-05T11:26:04.714+08:00 level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-03-05T11:26:04.714+08:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library nvml.dll"
time=2024-03-05T11:26:04.722+08:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [c:\\Windows\\System32\\nvml.dll C:\\Windows\\system32\\nvml.dll]"
time=2024-03-05T11:26:05.548+08:00 level=INFO source=gpu.go:99 msg="Nvidia GPU detected"
time=2024-03-05T11:26:05.548+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-05T11:26:05.583+08:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 7.5"
time=2024-03-05T11:26:05.583+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-05T11:26:05.596+08:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 7.5"
time=2024-03-05T11:26:05.596+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-05T11:26:05.596+08:00 level=INFO source=dyn_ext_server.go:385 msg="Updating PATH to C:\\Users\\Lenovo\\AppData\\Local\\Temp\\ollama868531629\\cuda_v11.3;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Windows\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA Corporation\\NVIDIA NvDLISR;C:\\Program Files\\dotnet\\;D:\\MPlusEMM\\;D:\\MATLAB2023aEMM\\bin;C:\\Users\\Lenovo\\AppData\\Local\\Microsoft\\WindowsApps;D:\\ffmpegEMM\\ffmpeg\\bin;C:\\Users\\Lenovo\\AppData\\Local\\Pandoc\\;C:\\Users\\Lenovo\\AppData\\Local\\Programs\\Ollama"
loading library C:\Users\Lenovo\AppData\Local\Temp\ollama868531629\cuda_v11.3\ext_server.dll
time=2024-03-05T11:26:05.620+08:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: C:\\Users\\Lenovo\\AppData\\Local\\Temp\\ollama868531629\\cuda_v11.3\\ext_server.dll"
time=2024-03-05T11:26:05.620+08:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: GeForce GTX 1650, compute capability 7.5, VMM: yes
llama_model_loader: loaded meta data with 20 key-value pairs and 387 tensors from C:\Users\Lenovo\.ollama\models\blobs\sha256-87f26aae09c7f052de93ff98a2282f05822cc6de4af1a2a159c5bd1acbd10ec4 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.name str              = Qwen2-beta-7B-Chat
llama_model_loader: - kv   2:                          qwen2.block_count u32              = 32
llama_model_loader: - kv   3:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv   4:                     qwen2.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  qwen2.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 qwen2.attention.head_count u32              = 32
llama_model_loader: - kv   7:              qwen2.attention.head_count_kv u32              = 32
llama_model_loader: - kv   8:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv   9:                qwen2.use_parallel_residual bool             = true
llama_model_loader: - kv  10:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  11:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  12:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  13:                      tokenizer.ggml.merges arr[str,151387]  = ["臓 臓", "臓臓 臓臓", "i n", "臓 t",...
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32              = 151643
llama_model_loader: - kv  15:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  17:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - kv  19:                          general.file_type u32              = 2
llama_model_loader: - type  f32:  161 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens definition check successful ( 293/151936 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 151936
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 7.72 B
llm_load_print_meta: model size       = 4.20 GiB (4.67 BPW)
llm_load_print_meta: general.name     = Qwen2-beta-7B-Chat
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 30 '?'
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 17 repeating layers to GPU
llm_load_tensors: offloaded 17/33 layers to GPU
llm_load_tensors:        CPU buffer size =  4297.21 MiB
llm_load_tensors:      CUDA0 buffer size =  1846.89 MiB
...................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:  CUDA_Host KV buffer size =   480.00 MiB
llama_kv_cache_init:      CUDA0 KV buffer size =   544.00 MiB
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_new_context_with_model:  CUDA_Host input buffer size   =    13.02 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   148.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =   320.75 MiB
llama_new_context_with_model: graph splits (measure): 3
{"function":"initialize","level":"INFO","line":433,"msg":"initializing slots","n_slots":1,"tid":"20388","timestamp":1709609168}
{"function":"initialize","level":"INFO","line":445,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"20388","timestamp":1709609168}
time=2024-03-05T11:26:08.854+08:00 level=INFO source=dyn_ext_server.go:161 msg="Starting llama main loop"
{"function":"update_slots","level":"INFO","line":1565,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"21248","timestamp":1709609168}
{"function":"launch_slot_with_data","level":"INFO","line":826,"msg":"slot is processing task","slot_id":0,"task_id":0,"tid":"21248","timestamp":1709609168}
{"function":"update_slots","level":"INFO","line":1801,"msg":"slot progression","n_past":0,"n_prompt_tokens_processed":67,"slot_id":0,"task_id":0,"tid":"21248","timestamp":1709609168}
{"function":"update_slots","level":"INFO","line":1825,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":0,"tid":"21248","timestamp":1709609168}
CUDA error: out of memory
  current device: 0, in function ggml_cuda_pool_malloc_vmm at C:\Users\jmorg\git\ollama\llm\llama.cpp\ggml-cuda.cu:8583
  cuMemCreate(&handle, reserve_size, &prop, 0)
GGML_ASSERT: C:\Users\jmorg\git\ollama\llm\llama.cpp\ggml-cuda.cu:256: !"CUDA error"
PS C:\Users\Lenovo>

我在Windows PowerShell中启动了Ollama服务后,在Pot中尝试进行连接,随后就出现了“CUDA error: out of memory”的错误。我查询了Ollama有关这个问题的issue,发现有人提到更新Ollama到0.1.28版本后就没有这个问题了。我更新到了0.1.28版本后,在将Ollama连接到Pot时仍然存在该问题,但是在将Ollama通过"Ollama-Logseq-Plugin"连接到Logseq中就没有出现该问题。

Reproduction

1.通过Windows PowerShell启动Ollama或通过快捷方式启动Ollama
2. 打开Pot
3. 服务设置
4. 添加内置服务“Ollama”
5. 模型设置为“qwen:7b"(我已经将该模型下载到电脑)
6. Pot能够检测到Ollama
image
7. 点击”保存“
8. 出现报错”测试失败Type Error: Faild fetch"

Platform

Windows

System Version

Windows 10 家庭中文版 22H2

Window System (Linux Only)

None

Software Version

Pot 2.7.9

Log File

No response

Additional Information

其实在最开始我更新Pot到2.7.9,第一次安装Ollama,并通过Pot下载Gemma:2b模型时,我能够正常将Ollama连接到Pot中并在Pot中调用模型进行翻译,但是后续我自行在Windows PowerShell中通过命令行安装更多的模型后,我就无法再将更多模型连接到Pot中去了,我尝试了重装Ollama并在Pot中重新添加内置服务“Ollama”,但是在这之后就一直无法将Ollama连接到Pot中了。
我在Logseq中能够通过“Ollama-Logseq-Plugin"来调用模型进行对话,这是该插件的Github页面:Ollama-Logseq-Plugin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant