Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uniform_real_distribution should use uniform_int_distribution #44

Open
drauch opened this issue Sep 22, 2018 · 6 comments
Open

uniform_real_distribution should use uniform_int_distribution #44

drauch opened this issue Sep 22, 2018 · 6 comments

Comments

@drauch
Copy link

drauch commented Sep 22, 2018

Hello again!

generate_uniform_real for is_integral<Engine::result_type> == true should probably do something like this (pseudo code):

auto dist = uniform_int_distribution<unsigned long>( unsigned long min, unsigned long max );
auto numerator = dist(eng);
auto divisor = unsigned long max;
[...]

instead of using:

result_type numerator = static_cast<T>(subtract<base_result>()(eng(), (eng.min)()));
result_type divisor = static_cast<T>(subtract<base_result>()((eng.max)(), (eng.min)())) + 1;
[...]

The current implementation fails to produce more different values than the range of the Engine is able to give. E.g., a PRNG which only ever produces values in the range [0..1] would lead to returning only 2 different real values.

What are your thoughts about this?

Best regards,
Dominik

PS: I noticed this when using good ol' rand() as an Engine behind the scenes.

@swatanabe
Copy link
Collaborator

swatanabe commented Sep 22, 2018 via email

@drauch
Copy link
Author

drauch commented Sep 22, 2018

As an entry-level C++ programmer I haven't heard about Boost's multiprecision floats until now - nice thingy :-). I thought that unsigned long is covering enough grounds for double (which it is, imho), but you're right, one had to choose an integral type which has the same amount of bits as the corresponding real type has - is there some kind of big_integral<bits> type in Boost as well which we could use?

Thanks for your pointers, as I'm trying to translate some of boost::random to C#, and the highest supported type of my library is going to be C#'s double, I conclude that I can go with uniform_int_distribution.

Thanks again, you're really helpful with your quick answers!

@swatanabe
Copy link
Collaborator

swatanabe commented Sep 22, 2018 via email

@DiscreteLogarithm
Copy link

DiscreteLogarithm commented Oct 27, 2018

@drauch

The current implementation fails to produce more different values than the range of the Engine is able to give.

The actual number of different real values produced is much less than the number of different integer values fed into the uniform_01 generator. For example, when you divide a 32-bit integer by 232, the spacing between consecutive numbers is 2-32, while the spacing between consecutive 32-bit single precision float numbers in [0.5, 1) is 2-24. So for numbers in [0.5, 1), every 28 different integer values are rounded to only one float value. Considering these rounding effects only 10 × 223 different real values are produced.

That being said, the number of different values of a 32-bit single precision float number in [0, 1) is also much less than 232 (obviously it should be able to represent negative numbers and numbers greater than 1 as well). The number of representable single precision float numbers in [0, 1) is 127 × 223. But the common algorithm of dividing a 32-bit integer by 232 can not even produce all these 127 × 223 values because it can not produce numbers less than 2-32 and it can not achieve the precision of single precision floats in [2-32, 2-9).

In summary the common algorithm has two problems: it wastes the entropy of the Engine for numbers near 1 and it fails to produce all representable floating-point numbers near 0. Trying to relieve one problem by changing the size of the integer only exacerbate the other. Using a bigger integer type increases the waste and usually will not fully resolve the problem of not producing all representable float values. To represent all single precision floating-point numbers, 149-bit random integers are needed. To do the same for double precision floating-point numbers requires 1074-bit random integers.

An efficient algorithm should only uses enough bits for the production of the mantissa (23 in the case of single precision) and use the rest towards the production of a geometric variate which would be used as the exponent. The exponent is a geometric variate with p=0.5 because the probability of a uniform real being in [0.5, 1) is 0.5, the probability of being in [0.25, 0.5) is 0.25, in [0.125, 0.25) it is 0.125 and so on. Details of such an algorithm is described in section 4 of this preprint. There's also an implementation of this algorithm. The implementation uses templates and traits of std::numeric_limit class and does not assume any specific size for the result_type of the Engine or the float_type requested.

  • generate_canonical allows you to specify the number of bits that you want.

Actually the standard says that the number of bits generated is the lesser of requested bits and numeric_limits::digits so it does not solve the problem

template<class RealType, size_t bits, class URBG>
RealType generate_canonical(URBG& g);
Complexity: Exactly k = max(1, ⌈b/ log2 R⌉) invocations of g, where b is the lesser of numeric_limits::digits and bits, and R is the value of g.max() − g.min() + 1.

So for example in the case of single precision float, specifying any number of bits greater than 23 won't make a difference.

PS: At the time of writing issue #47 , I haven't carefully read this one and I opened a new one. I'm sorry if that decision was wrong.

@drauch
Copy link
Author

drauch commented Oct 28, 2018

If you generate the exponent and mantissa separately, how do you fix the "within the range min/max" issue? E.g., in this scenario:

min = mantissa all 1, exponent 1025
max = mantissa all 0, exponent 1027

You need to generate 1025/1027 with a probability greater than 0 but very very low (1 / number of existing mantissas + 2).

@DiscreteLogarithm
Copy link

Actually it's not similar to uniform_real_distribution in the sense that it does not generate uniform variates with a given min and max. It only generates numbers in [0, 1). One can multiply the result by max-min and then add min, to get numbers in [min, max).

It may not produce all representable floating-point values in [min, max) with the correct probability due to possible rounding errors but it would perform much better than the common algorithm which produces only a fraction of representable floating-point values before multiplying the result by max-min and adding min.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants