Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute fft on multiple cores #1

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

kwanCCC
Copy link

@kwanCCC kwanCCC commented May 5, 2022

benchmark for making proof

64 cores
AMD EPYC 7542 32-Core Processor
Linux AMD-10-10-17-3 5.4.0-26-generic Ubuntu SMP x86_64 GNU/Linux

num of gate serial parallel
32 [35.099 ms 35.177 ms 35.261 ms] [79.786 ms 79.922 ms 80.079 ms]
64 [44.420 ms 44.566 ms 44.709 ms] [135.46 ms 135.71 ms 135.87 ms]
128 [72.038 ms 72.250 ms 72.458 ms] [162.33 ms 162.57 ms 162.90 ms]
256 [120.19 ms 120.50 ms 120.80 ms] [207.36 ms 207.65 ms 207.90 ms]
512 [177.21 ms 177.69 ms 178.04 ms] [255.12 ms 255.95 ms 256.74 ms]
1024 [318.83 ms 319.37 ms 319.88 ms] [373.69 ms 374.70 ms 375.35 ms]
2048 [561.37 ms 563.03 ms 565.32 ms] [564.12 ms 565.35 ms 566.25 ms]
4096 [939.41 ms 941.26 ms 944.20 ms] [840.42 ms 842.35 ms 843.70 ms]
8192 [1.8037 s 1.8067 s 1.8101 s] [1.4302 s 1.4327 s 1.4344 s]
32768 [7.4818 s 7.4935 s 7.5064 s] [5.2018 s 5.2176 s 5.2342 s]
65536 [16.076 s 16.151 s 16.219 s] [10.904 s 10.976 s 11.045 s]
131072 [38.577 s 38.758 s 38.927 s] [27.012 s 27.167 s 27.336 s]

summary:

  1. when the number of gate is lower than 2048, the serial version is faster than the parallel version. Because there are too many create and destroy of thread pool
  2. when the number of gates is increasing, the performance of parallels is about 1.5 times that of serials.

@kwanCCC kwanCCC requested a review from keyvank May 5, 2022 14:24
@kwanCCC kwanCCC self-assigned this May 5, 2022
@keyvank
Copy link
Member

keyvank commented May 5, 2022

Beautifully written. Thank you! Though the speedup is much lower from what I expected. Only 1.5 times for a 64 core CPU? :( Feel free to merge yourself. LGTM 👍

@keyvank
Copy link
Member

keyvank commented May 5, 2022

Ah, this is for whole proof. I thought its only for fft! That’s a good speedup for the start :)

@kwanCCC
Copy link
Author

kwanCCC commented May 6, 2022

Thanks man. Yep, this is used as a universally adapted optimization for the whole proof. As I know, there are three implementations of PLONKish, dusk-network, bellman and the newest halo2, respectively. There are differences not only in implementation, but also in performance. For current repo, I think there are some further optimizations
including the GPU version like you said and making a gadget about aggregation of proofs. I would keep on doing this :)

@keyvank
Copy link
Member

keyvank commented May 6, 2022

@kwanCCC Great job! Thanks man. Can you also check if different parts of the proof generation process can be done in parallel with each other? This also helps: dusk-network#680
Feel free to merge this yourself. Also, can you keep it rebased with original dusk-plonk repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants