-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster algorithm #15
base: master
Are you sure you want to change the base?
faster algorithm #15
Conversation
@hdevalence hey, if you get some time, could you take a look at this? I think I applied everything we talked about, but afterwards I discovered the difficulty of switching back and forth between the two scalar representations. The approach I came up with seems sound, but I'd appreciate another pair of eyeballs on it. Also if you can think of any cleanups or better ways to approach this. The big comment in I'm bummed that a significant part of the search time (3.2us out of 3.4us) is spent in converting the Edwards point (where we can use addition) to the Montgomery form (which is what gets base64-converted). I was hoping that point addition would be the dominant factor, but it's only 0.2us . I can't think of any way around that, however. @tarcieri, we discussed this approach a long time ago too, I'd love to hear your thoughts. |
The basic operation used to take 17us/iter on my 2019 mac mini (3.2GHz Core i7). The new approach takes 3.8us/iter. refs #12
b83e8ee
to
803e4c5
Compare
Could we get this merged in? |
I can't speak for anyone else, but there's no way I'm going to download this mysterious zip file with no explanation or anything, from someone who isn't already a part of the existing issue. If you want to help, please explain what you have as a solution, and link to source code we can review instead of a black box zip file. |
I made a statically linked build running in a docker container that targets this branch - source code hopefully this is less sketchy ;) |
i like how you sent it to virustotal, and what you got? nothing? |
Hello, I've created a similar tool https://github.com/AlexanderYastrebov/wireguard-vanity-key based on your ideas from here 👍 To squeeze last drops of performance I've eliminated allocations and adjust scalar only once outside of the main search loop.
Indeed, my benchmark shows the same.
Looks like any offset may work but I found other offset values fail clamping test much often. |
Hello, I found a way to speed up Montomery bytes encoding using vector division, see AlexanderYastrebov/wireguard-vanity-key#3 The speedup is 7x and makes point addition dominate over montgomery byte encoding: |
5x speedup by switching from generating new keypairs for each trial. Instead, we count scalars and add points. This will also help with distributing the search among untrusted worker machines.
closes #12