fix: respect both mulitple_of and minimum/maximum constraints #505

richardxia · 2024-02-26T21:28:48Z

Pull Request Checklist

New code has 100% test coverage
(If applicable) The prose documentation has been updated to reflect the changes introduced by this PR
(If applicable) The reference documentation has been updated to reflect the changes introduced by this PR
Pre-Commit Checks were ran and passed
Tests were ran and passed

Description

Library Code Changes

Previously, generate_constrained_number() would potentially generate invalid numbers when multiple_of is not None and exactly one of either minimum or maximum is not None, since it would just return multiple_of without respecting the upper or lower bound.

This significantly changes the implementation of the code to correctly handle this code. The generate_constrained_number() method has been completely removed, being replaced with a generate_constrained_multiple_of() function. A major difference between the old function and the new function is that the new one does not accept a method argument for generating random numbers. This is because in the new function, we always use create_random_integer(), since the problem reduces to generating a random integer multiplier.

The high-level algorithm behind generate_constrained_multiple_of() is that we need to constrain the random integer generator to generate numbers such that when they are multiplied with multiple_of, they still fit within the original bounds constraints. This simplify involves dividing the original bounds by multiple_of, with some special handling for negative multiple_of numbers as well as carefully chosen rounding behavior.

We also need to make some changes to other functions.

get_increment() needs to take an additional argument for the actual value that the increment is for. This is because floating-point numbers can't use a static increment or else it might get rounded away if the numbers are too large. Python fortunately provides a math.ulp() function for computing this for a given float value, so we make use of that function. We still use the original float_info.epsilon constant as a lower bound on the increment, though, since in the case that the value is too close to zero, we still need to make sure that the increment doesn't disappear when used against other numbers.

Finally, we rename and modify passes_pydantic_multiple_validator() to is_almost_multiple_of(), modifying its implementation to defer the casting of values to float() to minimize rounding errors. This specifically affects Decimal numbers, where casting to float too early causes too much loss of precision.

Test Changes

A significant number of changes were made to the tests as well, since the original tests missed the bug being fixed here. Each of the integer, floating-point, and decimal tests has been updated to assert that the result is actually within the minimum and maximum constraints. In addition, we remove some unnecessary sorting of the randomly generated test input values, since this was unnecessarily constraining multiple_of to be greater than or less than the minimum and maximum values. This was causing a lot of the scenarios involving negative values to be skipped.

Lastly, the floating-point and decimal tests need additional constraints to avoid unrealistic extreme values from hitting precision issues. This was done by adding a number of constraints on the number of significant digits in the input numbers and on the relative magnitudes of the input numbers.

Close Issue(s)

Bug: Setting ge and multiple_of but not le constraints causes invalid values to be generated #503

guacs · 2024-02-28T03:49:48Z

@richardxia thanks for the writeup! I haven't had a chance to look into it in detail yet, but I'll do that soon :)

Also, feel free to change the API if it's necessary. Or add a new function to handle these cases. For example, you mentioned passes_pydantic_multiple_validator - you can change that to handle floats and Decimal types differently if that'll make the implementation easer.

guacs · 2024-03-13T09:38:09Z

@richardxia sorry for the delay, but I've had a chance to look into the PR.

So I think the biggest issues that we're facing here is the opaque nature of method in generate_constrained_number. I am comfortable with deprecating that and then using dedicated functions for each numerical type i.e. generate_constrained_integer, generate_constrained_float etc. These would have no need to take in a method but could decide what method to create the constrained number depending on the constraints and other context. Furthermore, each of them could determine whether the generated number passes the constraints however they wish without relying on pydantic_pydantic_multiple_validator (this is a faulty implementation in pydantic as well as seen in this issue though I think that's only for pydantic v1). WDYT?

However, these would have to be deterministic since users have to be able to reliably create the same instances with the same values if needed (based on the seeding of the Random instance on the factory).

I think we have room for making whatever changes we need here, and deprecate accordingly, without too many issues since most, if not all, users wouldn't be using these functions but only the APIs directly related to the factories.

guacs · 2024-03-13T09:40:15Z

Also, I think we should be fine as long as reasonable scenarios are working, If hypothesis keeps coming up with extreme cases that won't work, then we can just restrict hypothesis or something along those lines and just document that such extreme cases may fail.

richardxia · 2024-03-14T16:46:21Z

@guacs, thanks for taking the time to review my PR and leaving comments. What you said sounds good to me, in terms of deprecating the method argument to generator_constrained_number(). I think I'll probably be busy for the rest of the work week, but I may have some time over the weekend to revisit this and hopefully come up with a simpler change.

Previously, `generate_constrained_number()` would potentially generate invalid numbers when `multiple_of` is not None and exactly one of either `minimum` or `maximum` is not None, since it would just return `multiple_of` without respecting the upper or lower bound. This significantly changes the implementation of the code to correctly handle this code. The `generate_constrained_number()` method has been completely removed, being replaced with a `generate_constrained_multiple_of()` function. A major difference between the old function and the new function is that the new one does not accept a `method` argument for generating random numbers. This is because in the new function, we always use `create_random_integer()`, since the problem reduces to generating a random integer multiplier. The high-level algorithm behind `generate_constrained_multiple_of()` is that we need to constrain the random integer generator to generate numbers such that when they are multiplied with `multiple_of`, they still fit within the original bounds constraints. This simplify involves dividing the original bounds by `multiple_of`, with some special handling for negative `multiple_of` numbers as well as carefully chosen rounding behavior. We also need to make some changes to other functions. `get_increment()` needs to take an additional argument for the actual value that the increment is for. This is because floating-point numbers can't use a static increment or else it might get rounded away if the numbers are too large. Python fortunately provides a `math.ulp()` function for computing this for a given float value, so we make use of that function. We still use the original `float_info.epsilon` constant as a lower bound on the increment, though, since in the case that the value is too close to zero, we still need to make sure that the increment doesn't disappear when used against other numbers. Finally, we rename and modify `passes_pydantic_multiple_validator()` to `is_almost_multiple_of()`, modifying its implementation to defer the casting of values to `float()` to minimize rounding errors. This specifically affects Decimal numbers, where casting to float too early causes too much loss of precision. A significant number of changes were made to the tests as well, since the original tests missed the bug being fixed here. Each of the integer, floating-point, and decimal tests has been updated to assert that the result is actually within the minimum and maximum constraints. In addition, we remove some unnecessary sorting of the randomly generated test input values, since this was unnecessarily constraining `multiple_of` to be greater than or less than the minimum and maximum values. This was causing a lot of the scenarios involving negative values to be skipped. Lastly, the floating-point and decimal tests need additional constraints to avoid unrealistic extreme values from hitting precision issues. This was done by adding a number of constraints on the number of significant digits in the input numbers and on the relative magnitudes of the input numbers.

richardxia · 2024-03-18T07:45:33Z

polyfactory/value_generators/constrained_numbers.py

@@ -1,8 +1,9 @@
 from __future__ import annotations

 from decimal import Decimal
+from math import ceil, floor, ulp


It looks like I'm failing the Python 3.8 tests because I'm importing ulp, which was only added in Python 3.9. I'll probably have to implement an alternative way of computing the correct increment for a given floating-point number.

richardxia · 2024-03-18T07:51:42Z

I pushed a completely rewritten solution to this branch, but it looks like I'm currently failing the Python 3.8 tests because I accidentally used a function that was only implemented in Python 3.9. I'll have to revisit this tomorrow.

In the meantime, please let me know if the new changes I made look like they're in the right direction. I made some breaking changes to the functions in constrained_numbers.py. I wasn't sure if it was important to keep these functions backwards-compatible, but let me know if it is, and I can think about how to do that. get_increment() will be tricky to preserve backwards compatibility, since it's technically impossible for that function to do its job correctly with its current function signature.

Python 3.8 doesn't have the ulp() function, so we need to manually compute an increment ourselves.

github-actions · 2024-03-19T01:32:39Z

Documentation preview will be available shortly at https://litestar-org.github.io/polyfactory-docs-preview/505

guacs · 2024-03-29T05:27:44Z

polyfactory/value_generators/constrained_numbers.py

-    }
-    return cast("T", values[t_type])
+    # See https://github.com/python/mypy/issues/17045 for why the redundant casts are ignored.
+    if t_type == int:


All these checks should maybe use is_safe_subclass so that we handle custom int as well. Same for float and Decimal.

guacs

@richardxia thanks for this! I think this looks good. We just have to make sure that any functions that we're changing (such as names or the arguments it takes) should be deprecated and we create new functions as replacements for them. Once that's done, I think this is good to go :)

guacs · 2024-03-29T05:31:30Z

polyfactory/value_generators/constrained_numbers.py

+        # When ``value`` is large in magnitude, we need to choose an increment that is large enough
+        # to not be rounded away, but when ``value`` small in magnitude, we need to prevent the
+        # incerement from vanishing. ``float_info.epsilon`` is defined as the smallest delta that
+        # can be represented between 1.0 and the next largest number, but it's not sufficient for
+        # larger values. We instead open up the floating-point representation to grab the exponent
+        # and calculate our own increment. This can be replaced with ``math.ulp`` in Python 3.9 and
+        # later.
+        _, exp = frexp(value)
+        increment = float_info.radix ** (exp - float_info.mant_dig + 1)
+        return cast("T", max(increment, float_info.epsilon))


I think it's probably fine to use math.ulp in the case of Python 3.9 and then fall back to this in case it's not 3.8. We have a constant called PY_38 in polyfactory.constants.py that you can use to handle this. Or maybe even something like:

if PY_38: def get_float_increment(value: float) -> float: # the current logic else: def get_float_increment(value: float) -> float: # math.ulp based logic

Then we can just call this function without having to worry about the Python version.

guacs · 2024-03-29T05:32:08Z

polyfactory/value_generators/constrained_numbers.py



-def get_increment(t_type: type[T]) -> T:
+def get_increment(value: T, t_type: type[T]) -> T:


This would be a breaking change. Instead, we should have another get_increment_v2 function and deprecate this one.

guacs · 2024-03-29T05:32:57Z

polyfactory/value_generators/constrained_numbers.py

@@ -99,8 +100,8 @@ def is_multiply_of_multiple_of_in_range(
    return False


-def passes_pydantic_multiple_validator(value: T, multiple_of: T) -> bool:
-    """Determine whether a given value passes the pydantic multiple_of validation.
+def is_almost_multiple_of(value: T, multiple_of: T) -> bool:


We should keep the passes_pydantic_multiple_)validator as it is and deprecate it and then create a new is_almost_multiple_of function as a replacement.

Actually, we could just do something like this:

def passes_pydantic_multiple_validator(value: T, multiple_of: T) -> bool: return is_almost_mulitple_of(value, multiple_of)

That is, changing the logic in it is fine since it's doing the same, but we would need to keep the old name around.

guacs · 2024-03-29T05:34:55Z

polyfactory/value_generators/constrained_numbers.py

@@ -210,33 +225,36 @@ def get_constrained_number_range(
    return minimum, maximum


-def generate_constrained_number(
+def generate_constrained_multiple_of(


Same as above regarding it being breaking changes.

richardxia requested review from a team as code owners February 26, 2024 21:28

richardxia force-pushed the fix-gen-constrained-number-ge-multipleof branch from a65482d to 8881603 Compare March 18, 2024 07:18

richardxia commented Mar 18, 2024

View reviewed changes

Replace math.ulp with manually implemented code.

08db6fa

Python 3.8 doesn't have the ulp() function, so we need to manually compute an increment ourselves.

guacs reviewed Mar 29, 2024

View reviewed changes

guacs requested changes Mar 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: respect both mulitple_of and minimum/maximum constraints #505

fix: respect both mulitple_of and minimum/maximum constraints #505

richardxia commented Feb 26, 2024 •

edited

Loading

guacs commented Feb 28, 2024

guacs commented Mar 13, 2024

guacs commented Mar 13, 2024

richardxia commented Mar 14, 2024

richardxia Mar 18, 2024

richardxia commented Mar 18, 2024

github-actions bot commented Mar 19, 2024

guacs Mar 29, 2024

guacs left a comment

guacs Mar 29, 2024

guacs Mar 29, 2024

guacs Mar 29, 2024

guacs Mar 29, 2024

guacs Mar 29, 2024



		def get_increment(t_type: type[T]) -> T:
		def get_increment(value: T, t_type: type[T]) -> T:

fix: respect both mulitple_of and minimum/maximum constraints #505

Are you sure you want to change the base?

fix: respect both mulitple_of and minimum/maximum constraints #505

Conversation

richardxia commented Feb 26, 2024 • edited Loading

Pull Request Checklist

Description

Library Code Changes

Test Changes

Close Issue(s)

guacs commented Feb 28, 2024

guacs commented Mar 13, 2024

guacs commented Mar 13, 2024

richardxia commented Mar 14, 2024

richardxia Mar 18, 2024

Choose a reason for hiding this comment

richardxia commented Mar 18, 2024

github-actions bot commented Mar 19, 2024

guacs Mar 29, 2024

Choose a reason for hiding this comment

guacs left a comment

Choose a reason for hiding this comment

guacs Mar 29, 2024

Choose a reason for hiding this comment

guacs Mar 29, 2024

Choose a reason for hiding this comment

guacs Mar 29, 2024

Choose a reason for hiding this comment

guacs Mar 29, 2024

Choose a reason for hiding this comment

guacs Mar 29, 2024

Choose a reason for hiding this comment

richardxia commented Feb 26, 2024 •

edited

Loading