Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbiased random prime generation #23

Open
fjarri opened this issue May 2, 2023 · 5 comments
Open

Unbiased random prime generation #23

fjarri opened this issue May 2, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request maths Needs help of a mathematician

Comments

@fjarri
Copy link
Member

fjarri commented May 2, 2023

Currently random prime generation starts from a random number and runs a sieve until a prime is found. This can introduce bias, selecting primes with large leads more often. Some assorted considerations:

  • How dangerous is it, actually? Any sources?
  • GMP re-samples the start position every 0x10000 samples, quoting "deep science" (with no further explanations). OpenSSL doesn't re-sample.
  • Just sampling random numbers each time and running BPSW on them makes the generation several times slower.
  • To avoid touching the RNG many times, we can start with a random a < 2^(k-1), and generate candidates as 2^(k-1) + (a + i * b mod 2^(k-1)) where b is a random odd number. This will uniformly cover all the range [2^(k-1), 2^k) (right?) May be a little faster but not too much.
  • See https://eprint.iacr.org/2011/481.pdf ("Close to Uniform Prime Number Generation With Fewer Random Bits") for a more advanced algorithm.
@fjarri fjarri added the enhancement New feature or request label May 2, 2023
@fjarri
Copy link
Member Author

fjarri commented May 3, 2023

Some investigation results of the algorithm from the paper above (Fig. 2). The steps 1-7 (finding b that's coprime to a product of small primes m) are quite fast, usually b is found in 1-2 iterations. But since m is combined out of a smaller number of primes we use in the sieve, the iteration in the steps 8-11 produces more composites compared to Sieve. For example, for U1024, and \ell = 128, m is composed out of ~120 primes, while Sieve uses 2047 primes. This makes finding a prime about two times slower.

So it doesn't seem like we can just replace the existing sieve, but the method does have value when an unbiased generation is required. Notes:

  • How do we select b if we're looking for a safe prime? Can we adjust the iteration so that the resulting 2b + 1 (or b/2) is also coprime to m? Otherwise the difference with regular sieving will be huge.
  • We will have to pre-compute m, lambda(m) and so on for each Uint size we plan to use. Probably put it in a trait.
  • Note that since we're using Montgomery representation to find b, m must be odd, so b may be even. In the steps 8-11 we can adjust the generated a so that a * m + b is always odd (or = 3 mod 4 for safe primes).
  • It would be useful to have a "random nonzero DynResidue" method, so that we don't have to convert the generated randoms into Montgomery in the step 4.

@fjarri fjarri added the maths Needs help of a mathematician label May 4, 2023
@kayabaNerve
Copy link

kayabaNerve commented Dec 27, 2023

I can note my explicit interest in this for use in a hash-to-prime function. I'd also be curious to benchmark it when it's not using a distinctly sampled random numbers, yet only generating a single new byte/chunk and shifting over the generated bits (questioning how much of the cost is due to generating KB-MB of random numbers).

@fjarri
Copy link
Member Author

fjarri commented Dec 27, 2023

I didn't measure it, but it feels like the contribution of RNG is quite minimal.

@dvdplm
Copy link
Contributor

dvdplm commented Nov 7, 2024

How dangerous is it, actually? Any sources?

I think we are safe from this, at least as far as the presets go. Here's why: the dangerous bias enters the game when the same sieve is used many times with the same sequence.

Our sieve starts with a random odd integer k and generates a sequence k, k+2, k+4, …, k+200, … k+2n. Let's say we find a prime with candidate k+200 and then continue searching for more primes in the same sequence of candidates, then the primes found will be biased towards the specific value of k (for instance if k is congruent to 1 mod 3, then all primes from that sequence of candidates will be congruent to 1 mod 3).

In our case however, we exit the loop once a prime is found and discard the Sieve; if the user needs a second prime, they have to generate a new random odd k and this will result in a sequence of candidates equally likely to be congruent to 1 mod 3 as to 2 mod 3.

For use cases where the Sieve is used to pre-sieve a large number of candidates and then "here's a bunch of ints, find me all primes therein", then those primes could display a bias. As far as I can tell, how dangerous such a bias is really depends on what the user do with the primes.

@fjarri
Copy link
Member Author

fjarri commented Nov 13, 2024

A way to get an unbiased sampling of safe primes given an unbiased sampling of regular primes: https://www.degruyter.com/document/doi/10.1515/jmc-2013-5011/pdf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request maths Needs help of a mathematician
Projects
None yet
Development

No branches or pull requests

3 participants