Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gnark MSM is slow #107

Open
doutv opened this issue Nov 5, 2024 · 1 comment
Open

Gnark MSM is slow #107

doutv opened this issue Nov 5, 2024 · 1 comment

Comments

@doutv
Copy link
Collaborator

doutv commented Nov 5, 2024

Settings:
10 signatures, 297w constraints
doutv/gnark@8d84de2
#101

CPU Parallel MSM

2200ms
MSM: 80% time
14:39:12 DBG computed H acceleration=none backend=groth16 computeH took=380.517423 curve=bn254 nbConstraints=2976099
14:39:13 DBG ar.MultiExp done MSMG1 2322982 took=735.652846 acceleration=none backend=groth16 curve=bn254 nbConstraints=2976099
14:39:14 DBG bs1.MultiExp done MSMG1 4156677 took=1158.065916 acceleration=none backend=groth16 curve=bn254 nbConstraints=2976099
14:39:14 DBG Bs.MultiExp done MSMG2 4156677 took=1377.753998 acceleration=none backend=groth16 curve=bn254 nbConstraints=2976099
14:39:14 DBG krs2.MultiExp done MSMG1 4194303 took=1364.779481 acceleration=none backend=groth16 curve=bn254 nbConstraints=2976099
14:39:14 DBG krs.MultiExp done MSMG1 2508595 took=947.443756 acceleration=none backend=groth16 curve=bn254 nbConstraints=2976099
14:39:14 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=2976099 took=2187.031405

GPU

Possible direction: Learn from https://github.com/cysic-labs/ZPrize-23-Prize1/tree/main

Parallel MSM

~1900ms
14:40:36 DBG computed H acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099 took=420.794825
14:40:36 DBG CPU krs done MSMG1 4900423 took=340.924428 acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099
14:40:37 DBG ar done MSMG1 2322982 took=1340.610384 acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099
14:40:37 DBG bs1 done MSMG1 4156677 took=1322.286289 acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099
14:40:37 DBG Bs done MSMG2 4156677 took=1374.595625 acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099
14:40:37 DBG krs2 done MSMG1 4194303 took=91.539657 acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099
14:40:37 DBG prover done acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099 took=1873.277231

Serial MSM

~2300ms
14:59:21 DBG constraint system solver done nbConstraints=2976099 took=2367.610769
14:59:22 DBG ar done MSMG1 2322982 took=77.610163 acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099
14:59:22 DBG bs1 done MSMG1 4156677 took=111.784353 acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099
14:59:22 DBG CPU krs done MSMG1 4900423 took=407.84321 acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099
14:59:22 DBG computed H acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099 took=434.042317
14:59:22 DBG krs2 done MSMG1 4194303 took=33.391437 acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099
14:59:24 DBG Bs done MSMG2 4156677 took=1187.116359 acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099
14:59:24 DBG prover done acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099 took=2298.206923
Bs MSMG2 too slow

@doutv
Copy link
Collaborator Author

doutv commented Nov 5, 2024

Real MSM in Gnark prove is much slower than mock data benchmark

func BenchmarkMsmG1(b *testing.B) {

Benchmark MSM G1 size: 12 million
Device input without data copy: 94ms
img_v3_02fv_b2aa1658-3d28-4601-b967-6aac54dd146h

Real MSM G1 size: 4 million
Device input without data copy: 407ms

14:59:22 DBG CPU krs done MSMG1 4900423 took=407.84321 acceleration=zeknox backend=groth16 curve=bn254 nbConstraints=2976099

G2 is similar, real MSM G2 is much slower than benchmarkG2

Potential reason

mock data: scalars are random generated, they are uniformly distributed over the domain
real MSM: scalars are unevenly distributed. Some ranges are crowded -> Some MSM buckets with too many scalars -> Some CUDA threads are overloaded, computation is not evenly distributed across all threads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant