Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster algorithm #15

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

faster algorithm #15

wants to merge 3 commits into from

Conversation

warner
Copy link
Owner

@warner warner commented Mar 4, 2020

5x speedup by switching from generating new keypairs for each trial. Instead, we count scalars and add points. This will also help with distributing the search among untrusted worker machines.

closes #12

@warner warner force-pushed the 12-count-scalars branch from da601dc to b83e8ee Compare March 4, 2020 05:55
@warner
Copy link
Owner Author

warner commented Mar 4, 2020

@hdevalence hey, if you get some time, could you take a look at this? I think I applied everything we talked about, but afterwards I discovered the difficulty of switching back and forth between the two scalar representations. The approach I came up with seems sound, but I'd appreciate another pair of eyeballs on it. Also if you can think of any cleanups or better ways to approach this. The big comment in lib.rs should explain my reasoning.

I'm bummed that a significant part of the search time (3.2us out of 3.4us) is spent in converting the Edwards point (where we can use addition) to the Montgomery form (which is what gets base64-converted). I was hoping that point addition would be the dominant factor, but it's only 0.2us . I can't think of any way around that, however.

@tarcieri, we discussed this approach a long time ago too, I'd love to hear your thoughts.

@warner warner self-assigned this Mar 4, 2020
warner added 3 commits March 27, 2020 18:18
The basic operation used to take 17us/iter on my 2019 mac mini (3.2GHz Core
i7). The new approach takes 3.8us/iter.

refs #12
@warner warner force-pushed the 12-count-scalars branch from b83e8ee to 803e4c5 Compare March 28, 2020 01:22
@eliliam
Copy link

eliliam commented Dec 1, 2021

Could we get this merged in?

@megapro17
Copy link

wireguard-vanity-address.zip

@eliliam
Copy link

eliliam commented Dec 13, 2022

wireguard-vanity-address.zip

I can't speak for anyone else, but there's no way I'm going to download this mysterious zip file with no explanation or anything, from someone who isn't already a part of the existing issue.

If you want to help, please explain what you have as a solution, and link to source code we can review instead of a black box zip file.

@mchangrh
Copy link

I made a statically linked build running in a docker container that targets this branch - source code

hopefully this is less sketchy ;)

@megapro17
Copy link

i like how you sent it to virustotal, and what you got? nothing?

@AlexanderYastrebov
Copy link

Hello, I've created a similar tool https://github.com/AlexanderYastrebov/wireguard-vanity-key based on your ideas from here 👍

To squeeze last drops of performance I've eliminated allocations and adjust scalar only once outside of the main search loop.

I'm bummed that a significant part of the search time (3.2us out of 3.4us) is spent in converting the Edwards point (where we can use addition) to the Montgomery form (which is what gets base64-converted). I was hoping that point addition would be the dominant factor, but it's only 0.2us . I can't think of any way around that, however.

Indeed, my benchmark shows the same.

// We offset by 8 to make sure that each new privkey will meet the same
// clamping criteria: we assume the keyspace is large enough that we're
// unlikely to wrap around.

Looks like any offset may work but I found other offset values fail clamping test much often.
It might be possible to make addition faster for special offset values e.g. identity point but this likely won't make any difference given that BytesMontgomery dominate the time spent (96%)

Screenshot from 2025-01-11 12-00-18

@AlexanderYastrebov
Copy link

Hello, I found a way to speed up Montomery bytes encoding using vector division, see AlexanderYastrebov/wireguard-vanity-key#3

The speedup is 7x and makes point addition dominate over montgomery byte encoding:

Screenshot from 2025-02-05 22-17-01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

faster search algorithm
5 participants