You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Vendor ID: HiSilicon
Model name: Kunpeng-920
Model: 0
Thread(s) per core: 1
Core(s) per cluster: 64
Socket(s): -
Cluster(s): 4
Stepping: 0x1
Frequency boost: disabled
CPU max MHz: 2600.0000
CPU min MHz: 200.0000
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 16 MiB (256 instances)
L1i cache: 16 MiB (256 instances)
L2 cache: 128 MiB (256 instances)
L3 cache: 256 MiB (8 instances)
NUMA node(s): 8
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
NUMA node2 CPU(s): 64-95
NUMA node3 CPU(s): 96-127
NUMA node4 CPU(s): 128-159
NUMA node5 CPU(s): 160-191
NUMA node6 CPU(s): 192-223
NUMA node7 CPU(s): 224-255
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.2.0
[pip3] torch==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.46.3
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.4.post2.dev152+g1f6584ee.d20241127
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect
VLLM_CPU_KVCACHE_SPACE=1
LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/cv2/../../lib64:
result:
slot print_timing: id 0 | task 60 |
prompt eval time = 3636.68 ms / 126 tokens ( 28.86 ms per token, 34.65 tokens per second)
eval time = 4827.58 ms / 10 tokens ( 482.76 ms per token, 2.07 tokens per second)
total time = 8464.26 ms / 136 tokens
slot launch_slot_: id 0 | task 71 | processing task
slot update_slots: id 0 | task 71 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 123
slot update_slots: id 0 | task 71 | kv cache rm [0, end)
slot update_slots: id 0 | task 71 | prompt processing progress, n_past = 123, n_tokens = 123, progress = 1.000000
slot update_slots: id 0 | task 71 | prompt done, n_past = 123, n_tokens = 123
slot release: id 0 | task 71 | stop processing: n_past = 132, truncated = 0
slot print_timing: id 0 | task 71 |
prompt eval time = 3569.76 ms / 123 tokens ( 29.02 ms per token, 34.46 tokens per second)
eval time = 4830.39 ms / 10 tokens ( 483.04 ms per token, 2.07 tokens per second)
cpu usage:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
6e0062c2574d nice_banzai 255.28% 158MiB / 61.34GiB 0.25% 0B / 0B 0B / 0B 513
Motivation
Hope the llama.cpp can run with a high performence, like 10 tokens/s : )
Possible Implementation
No response
The text was updated successfully, but these errors were encountered:
feikiss
changed the title
Feature Request: Performance enhancement in ARM CPU (KunPeng 920)
Feature Request: Performance enhancement in ARM CPU (Kunpeng 920)
Dec 10, 2024
Prerequisites
Feature Description
The Llama.cpp runs very slow in ARM CPU (Kunpeng 920).
I pulled the docker image for arm and run an instance with setting 4 cores in same numa.
commands I use:
CPU info
result:
cpu usage:
Motivation
Hope the llama.cpp can run with a high performence, like 10 tokens/s : )
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: