compute `fft` on multiple cores #1

kwanCCC · 2022-05-05T14:24:29Z

benchmark for making proof

64 cores
AMD EPYC 7542 32-Core Processor
Linux AMD-10-10-17-3 5.4.0-26-generic Ubuntu SMP x86_64 GNU/Linux

num of gate	serial	parallel
32	[35.099 ms 35.177 ms 35.261 ms]	[79.786 ms 79.922 ms 80.079 ms]
64	[44.420 ms 44.566 ms 44.709 ms]	[135.46 ms 135.71 ms 135.87 ms]
128	[72.038 ms 72.250 ms 72.458 ms]	[162.33 ms 162.57 ms 162.90 ms]
256	[120.19 ms 120.50 ms 120.80 ms]	[207.36 ms 207.65 ms 207.90 ms]
512	[177.21 ms 177.69 ms 178.04 ms]	[255.12 ms 255.95 ms 256.74 ms]
1024	[318.83 ms 319.37 ms 319.88 ms]	[373.69 ms 374.70 ms 375.35 ms]
2048	[561.37 ms 563.03 ms 565.32 ms]	[564.12 ms 565.35 ms 566.25 ms]
4096	[939.41 ms 941.26 ms 944.20 ms]	[840.42 ms 842.35 ms 843.70 ms]
8192	[1.8037 s 1.8067 s 1.8101 s]	[1.4302 s 1.4327 s 1.4344 s]
32768	[7.4818 s 7.4935 s 7.5064 s]	[5.2018 s 5.2176 s 5.2342 s]
65536	[16.076 s 16.151 s 16.219 s]	[10.904 s 10.976 s 11.045 s]
131072	[38.577 s 38.758 s 38.927 s]	[27.012 s 27.167 s 27.336 s]

summary:

when the number of gate is lower than 2048, the serial version is faster than the parallel version. Because there are too many create and destroy of thread pool
when the number of gates is increasing, the performance of parallels is about 1.5 times that of serials.

keyvank · 2022-05-05T23:18:31Z

Beautifully written. Thank you! Though the speedup is much lower from what I expected. Only 1.5 times for a 64 core CPU? :( Feel free to merge yourself. LGTM 👍

keyvank · 2022-05-05T23:26:21Z

Ah, this is for whole proof. I thought its only for fft! That’s a good speedup for the start :)

kwanCCC · 2022-05-06T02:02:47Z

Thanks man. Yep, this is used as a universally adapted optimization for the whole proof. As I know, there are three implementations of PLONKish, dusk-network, bellman and the newest halo2, respectively. There are differences not only in implementation, but also in performance. For current repo, I think there are some further optimizations
including the GPU version like you said and making a gadget about aggregation of proofs. I would keep on doing this :)

keyvank · 2022-05-06T08:01:47Z

@kwanCCC Great job! Thanks man. Can you also check if different parts of the proof generation process can be done in parallel with each other? This also helps: dusk-network#680
Feel free to merge this yourself. Also, can you keep it rebased with original dusk-plonk repo?

Optimize PLONK

kwanCCC added 2 commits May 5, 2022 17:23

compute fft on multiple cores

6a988ee

ignore .idea in .gitignore

f4206d3

kwanCCC requested a review from keyvank May 5, 2022 14:24

kwanCCC self-assigned this May 5, 2022

kwanCCC and others added 7 commits May 18, 2022 15:29

gpu kit

807a5f4

concurrent computation on GPUs

be64651

Optimize PLONK

40396d1

Optimize PLONK

e34b015

optimize: preprocess, round 1, round2, round3, round4

50daff2

optimize: round 6

ea81f54

Merge pull request #2 from zeeka-network/feature/gpu

3f2e1d6

Optimize PLONK

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute `fft` on multiple cores #1

compute `fft` on multiple cores #1

kwanCCC commented May 5, 2022 •

edited

Loading

keyvank commented May 5, 2022

keyvank commented May 5, 2022

kwanCCC commented May 6, 2022 •

edited

Loading

keyvank commented May 6, 2022

compute fft on multiple cores #1

Are you sure you want to change the base?

compute fft on multiple cores #1

Conversation

kwanCCC commented May 5, 2022 • edited Loading

keyvank commented May 5, 2022

keyvank commented May 5, 2022

kwanCCC commented May 6, 2022 • edited Loading

keyvank commented May 6, 2022

compute `fft` on multiple cores #1

compute `fft` on multiple cores #1

kwanCCC commented May 5, 2022 •

edited

Loading

kwanCCC commented May 6, 2022 •

edited

Loading