-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster search algorithm #12
Comments
This will require using |
The
Not clean enough to land just yet, but it's looking quite promising. |
and benchmark smaller pieces, for comparison against an upcoming (faster) algorithm (#12)
The basic operation used to take 17us/iter on my 2019 mac mini (3.2GHz Core i7). The new approach takes 3.8us/iter. refs #12
The basic operation used to take 17us/iter on my 2019 mac mini (3.2GHz Core i7). The new approach takes 3.8us/iter. refs #12
The basic operation used to take 17us/iter on my 2019 mac mini (3.2GHz Core i7). The new approach takes 3.8us/iter. refs #12
At RWC this year, we (maybe @gtank or @tarcieri?) talked about a faster approach to finding suitable keypairs. The current scheme performs a large number of independent trials (in parallel on all available CPU cores), where each one picks a random secret key (a scalar), performs the (expensive) scalarmult operation to transform it into a public key (a point), then examines the base64-encoded public key for a match against the desired prefix.
The new scheme would take advantage of the fact that point addition is much much faster than scalar multiplication. The initialization step looks like:
s
to a randomly-selected scalar, clamped as usualscalar_offset
to 8 (i.e. the Ed25519 group's cofactor)p
toscalarmult(s, BASEPOINT)
point_offset
toscalarmult(scalar_offset, BASEPOINT)
Then each step of the loop looks like:
p
and test against the desired prefix. If it matches, print the result and start over again from initializations = s + scalar_offset
andp = p + point_offset
The scalar addition is done modulo the group order, and the point addition follows the usual rules of point addition.
The speed is bounded by the point addition, and we only do one scalarmult per output keypair. Much much faster. For any sensible scalar (large enough to have wrapped around at least once, which is ensured when the clamping step sets the
2^254
bit, I think), addingpoint_offset
makes the point jump to an entirely new portion of the numberspace, so the prefix won't be very correlated, and we won't lose much yield to correlation. (if our prefix comes from the high-order bits, and the point-addition only changed the low-order bits, then the prefix would change very slowly).Some details to figure out before implementing this:
scalar_offset
must be a multiple of the cofactor for this to work at all (assumings
is a valid+safe private key,s + scalar_offset
must also be one)clamp(s + scalar_offset) != s + scalar_offset
, then we'll also have problems. We could either test this at each cycle of the loop, or rely upon the fact that the group order is so large (about2^252
) that we'll never be unlucky enough for this to ever happen.This is related to #1 because you could give each worker the starting point
p
(and they'd all use the samepoint_offset
) and then just ask them to tell you how many steps they took before getting to a suitable pubkey. Then on the leader machine (which is the only one to know the private scalar) you just add their step count to the private scalar to get the resulting private key.The text was updated successfully, but these errors were encountered: