Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]The locally deployed deepseek-v3 loses 5 points compared to the API #212

Open
Wen1163204547 opened this issue Jan 3, 2025 · 7 comments

Comments

@Wen1163204547
Copy link

I deploy deepseek-v3 locally using 8xH20, test LiveBench-0831 with temperature=0, and without system prompt. The result shows a 5-point drop compared to the API. Are this released model and the API the same model?
image

@GeeeekExplorer
Copy link
Contributor

Which inference engine are you using in the local deployment?

@Wen1163204547
Copy link
Author

@GeeeekExplorer vllm-0.6.6.post1

@qq1469617613
Copy link

能分享一份本地部署的部署文档嘛 我也想本地部署一下试试

@Wen1163204547
Copy link
Author

能分享一份本地部署的部署文档嘛 我也想本地部署一下试试

做了些集成,只能分享一些参数配置:
max_num_seqs: 32
quantization: fp8
max_model_len: 9000
trust_remote_code: true
tensor_parallel_size: 8
enable_chunked_prefill: true
gpu_memory_utilization: 0.98
max_num_batched_tokens: 1024

@chenatu
Copy link

chenatu commented Jan 8, 2025

请问8张H20速度每秒多少token

@Wen1163204547
Copy link
Author

@GeeeekExplorer 我试了vllm-0.6.6.post1 / vllm-0.6.6,都是53上下,但是API能跑到57左右。请问API和开源的模型是同一个模型吗?

@Wen1163204547
Copy link
Author

请问8张H20速度每秒多少token

用两台H100吧,比H20快很多

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants