Remove TimeDelta `serialization_type` parameter; deserialize from float; improve serialization perf and accuracy #2654

ddelange · 2024-12-10T12:06:00Z

This PR allows propagating floats/strings with decimals to TimeDelta.deserialize without silently downcasting the input to int. Currently, serialization_type is not only being used for serialization, but also being used for deserialization, which adds no added value: a float cast can always safely be used for deserialization and is backwards compatible (but fixes this bug).

originally reported here: sloria/environs#372

MRE:

>>> from marshmallow import fields
>>> field = fields.TimeDelta()
>>> field.deserialize(12.9)
datetime.timedelta(seconds=12)

After this PR:

>>> from marshmallow import fields
>>> field = fields.TimeDelta()
>>> field.deserialize(12.9)
datetime.timedelta(seconds=12, microseconds=900000)

I added one test for string input, and changed the existing test for float input.

There are already math.isclose tests when passing floats to microseconds (which will be rounded by the datetime.timedelta constructor).

sloria

Thank you for the PR ❤️ This seems like the right behavior.

Lgtm but I'll give @lafrech some time to take a look in case I'm not thinking of some case where end users might rely on the current behavior.

tests/test_deserialization.py

lafrech · 2024-12-10T20:18:33Z

[I enabled issues again. The spam streak is over on flask-smorest. Let's hope it is here too.]

I don't have a full understanding of the issue but in case it went unnoticed, here are related issues/PRs:

I think current behaviour was introduced a while back in TimeDelta field bug #105.
Possibly related: Support serialization as float in TimeDelta field #1998.

Also, the docstring should be updated if we merge.

ddelange · 2024-12-10T22:34:20Z

Hi @lafrech 👋

You're right, the docstring needed some work. Please see the last commit. I hope that also clarifies the bug that this PR fixes: needlessly using serialization_type during deserialization.

sloria · 2024-12-11T16:13:39Z

Docs update looks good to me. Thanks @ddelange .

Thanks for digging up the original issue and PRs @lafrech. Looking back at those, I think allowing truncation to an integer was an oversight on my part rather than an explicit decision. Even the ISO8601 spec mentioned in #105 (comment) allows for fractional components:

The smallest value used may also have a decimal fraction,[39] as in "P0.5Y" to indicate half a year. This decimal fraction may be specified with either a comma or a full stop, as in "P0,5Y" or "P0.5Y".

@lafrech @deckar01 Do we consider this a breaking change? Even though this is a much smaller change than previous major version bumps, I'm not opposed to a major version bump with some explanatory text in the changelog that says that the change is very limited in scope and only affects users of the TimeDelta field.

ddelange · 2024-12-11T17:32:22Z

fwiw, this change only affects (helps) users that were serializing using serialization_type=float, but then not taking care when deserializing.

I would say this does not warrant a major version bump, but rather a bugfix release.

sloria · 2024-12-11T18:00:08Z

It would also affect users who didn't explicitly pass serialization_type or passed serialization_type=int, right (as your MRE shows)? Users who were relying on floats getting truncated would experience a breakage (not saying this is likely or desirable, but it's possible).

ddelange · 2024-12-11T18:20:46Z

right. the good ol' rely on a bug motto. programming sucks...

sloria · 2024-12-11T18:28:38Z

lol, yeah... i guess i'm not 100% sure it's a bug, given that there was an explicit test for the current behavior. i'm like 80% sure it was oversight, but it was long ago so want to give a bit of time for others to chime in before shipping the change

lafrech

I'm not sure.

One of our concerns when designing a field is how values round-trip.

Current behaviour lets the user decide whether the serialized version should be int or float, and considers it is being given serialized values accordingly. If values are serialized as int, then the type that is expected as input when deserializing is int.

The original name of the parameter was serde_type, not very nice but showing it works both ways.

Before this PR:

    # TimeDelta(serialization_type=int)
    12 -> timedelta(seconds=12) -> 12
    12.42 -> timedelta(seconds=12) -> 12

    # TimeDelta(serialization_type=float)
    12 -> timedelta(seconds=12) -> 12.0
    12.42 -> timedelta(seconds=12, microseconds=420000) -> 12.42

After this PR:

    # TimeDelta(serialization_type=int)
    12 -> timedelta(seconds=12) -> 12
    12.42 -> timedelta(seconds=12, microseconds=420000) -> 12

    # TimeDelta(serialization_type=float)
    12 -> timedelta(seconds=12) -> 12.0
    12.42 -> timedelta(seconds=12, microseconds=420000) -> 12.42

From this perspective, I'm not sure the latter is less surprising behaviour.

The PR makes the field a bit more permissive on inputs, accepting floats when expecting integers. This is not in line with Integer field which truncates floats.

If floats are to be expected, then serialization_type=float should be used.

If the issue is that TimeDelta defaults to int then maybe the default could be changed in next major version. Until then, nothing prevents the user to specify float, right?

I might be totally mistaken, in which case please ignore.

src/marshmallow/fields.py

lafrech · 2024-12-11T21:36:42Z

src/marshmallow/fields.py

-        Allow (de)serialization to `float` through use of a new `serialization_type` parameter.
-        `int` is the default to retain previous behaviour.
+        Allow serialization to `float` through use of a new `serialization_type` parameter.
+        Defaults to `int` for backwards compatibility.


Should we add a new versionchanged line here?

Never truncate a `float` to `int` during deserialization.

too concise?

or maybe

Never truncate a `float` to an `int` before deserializing it into a :class:`datetime.timedelta`.

ddelange · 2024-12-12T00:01:16Z

Thanks for your elaborate review!

From this perspective, I'm not sure the latter is less surprising behaviour.

To me the latter definitely looks 'right'. From another perspective: that float came from marshmallow (starting off with a timedelta). It got serialized and sent elsewhere for deserialization. When the receiving party deserializes, in the latter example he doesn't need prior knowledge about the sender's settings. Deserialization has now become robust.

lafrech · 2024-12-12T20:43:35Z

Is there anything wrong with TimeDelta(serialization_type=float), apart from the fact that it is not the default?

I mean does the change in this PR allow a use case that can't be achieved with current parameters? Or is the only benefit to make it the default choice?

ddelange · 2024-12-12T21:25:13Z

re: anything wrong - my argument here is that serialization_type should be used only for serialization and not for deserialization. but that's about it :)

lafrech · 2024-12-12T21:30:32Z

So would it be fine if the parameter was named serialization_deserialization_type, de_serialization_type or serialized_type of whatever we can come up with that makes it explicit it works both ways?

If it is just that, we could use a better naming and deprecate the old one.

ddelange · 2024-12-13T07:00:48Z

from my side there's no need for a breaking change here, this change just seemed like a nice robustness improvement to me (not considering this breaking).

it only came up because a user was surprised that his float was being silently truncated when feeding into @sloria's environs package. since that package only ever deserializes (values from unknown sources), this seemed like the right way forward. but feel free to close / take over if you do consider this a 'bad' change, no hard feelings :)

will look uglier than now, but environs could fix this without any PR om marshmallow by subclassing TimeDelta.__init__

sloria · 2024-12-18T17:53:53Z

I generally agree with @ddelange's comment in #2654 (comment), and the proposed change certainly makes sense for the use case in environs. On one hand, we might as well be robust on input. Boolean will accept various strings and integers while outputting to a bool. On the other hand, marshmallow doesn't always attempt to be lenient on inputs. For example, Date and DateTime assume that the input and output formats are the same.

@lafrech @deckar01 Any strong opinions?

sloria · 2024-12-18T22:21:31Z

Another possible path would be to

Add a deserialization_type to this PR and make it default to int (backwards-compatible)
Add a warning if a deserializing a float

something like

diff --git a/src/marshmallow/fields.py b/src/marshmallow/fields.py
index b564aee..19d82f6 100644
--- a/src/marshmallow/fields.py
+++ b/src/marshmallow/fields.py
@@ -1487,6 +1487,8 @@ def __init__(
         self,
         precision: str = SECONDS,
         serialization_type: type[int | float] = int,
+        *,
+        deserialization_type: type[int | float] = int,
         **kwargs,
     ):
         precision = precision.lower()
@@ -1508,9 +1510,12 @@ def __init__(
 
         if serialization_type not in (int, float):
             raise ValueError("The serialization type must be one of int or float")
+        if deserialization_type not in (int, float):
+            raise ValueError("The deserialization type must be one of int or float")
 
         self.precision = precision
         self.serialization_type = serialization_type
+        self.deserialization_type = deserialization_type
         super().__init__(**kwargs)
 
     def _serialize(self, value, attr, obj, **kwargs):
@@ -1527,8 +1532,15 @@ def _serialize(self, value, attr, obj, **kwargs):
         return value.total_seconds() / base_unit.total_seconds()
 
     def _deserialize(self, value, attr, data, **kwargs):
+        if isinstance(value, float) and self.deserialization_type is not float:
+            warnings.warn(
+                f"float value for attribute {attr!r} will truncated to an integer value in {self.precision}. "
+                + "It will be deserialized as a float in marshmallow 3.25.0.",
+                UserWarning,
+                stacklevel=2,
+            )
         try:
-            value = float(value)
+            value = self.deserialization_type(value)
         except (TypeError, ValueError) as error:
             raise self.make_error("invalid") from error

Release this in 3.24.0.
Change the default deserialization_type to float and remove the warning.
Release 3.25.0

this warns potentially-affected users of the change. but maybe it's not worth the added API surface? ¯\(ツ)/¯

deckar01 · 2024-12-21T09:08:00Z

total_seconds() returns a float, so that should be the default. I don’t think it needs an extra arg (deserialize from == serialize to). Truncation was documented and tested, so I think it needs a major release, even though the use case is questionable.

We might consider deprecating the type arg while we are at it. The truncation can be enveloped or a custom field and does not seem to warrant a convenience API.

ddelange · 2024-12-21T09:16:45Z

do you suggest to deprecate serialization_type altogether (major release) and make float the default? sounds good to me

lafrech · 2024-12-21T11:54:49Z

It seems to me that the current code allows everything using the right parameters and it's just the default behaviour that is being discussed, so there's no need to rush fixing/improving this in 3.x by adding another parameter or a deprecation notice. Simplifying things and removing parameters can be done in a 4.x change. We could pick the low-hanging fruits from the 4.0 milestone and ship soonish, postponing the other issues to 5.0.

…

-- Jérôme

sloria · 2024-12-21T16:52:01Z

@lafrech if i'm understanding your suggestion correctly:

change this PR to remove the serialization_type parameter and mark this for 4.0
optionally merge other small-but-breaking changes

in environs, serialization_type can be defaulted to float for marshmallow 3 compatibility.

if this is the proposal, i'm good with it

lafrech · 2024-12-21T19:24:33Z

Yes, scratch this and keep things as is in 3.x, then drop `serialization_type` and only support the `float` case in 4.0 as @deckar01 suggests. And rather than postpone 4.0 endlessly, select PRs from the 4.0 milestone that can be finalized easily.

ddelange

OK! now targeting 4.0: I've demolished serialization_type (with deprecation warning), updated tests and added to docstring:

.. versionchanged:: 4.0.0
    Deprecate `serialization_type` parameter, always serialize to float.

src/marshmallow/fields.py

ddelange · 2024-12-23T15:26:38Z

src/marshmallow/fields.py

+        if precision not in self._unit_to_microseconds_mapping:
+            units = ", ".join(self._unit_to_microseconds_mapping)
+            msg = f"The precision must be one of: {units}."


unified this msg formatting with the Enum field ref.

ddelange · 2024-12-23T21:01:39Z

src/marshmallow/fields.py

+        # limit float arithmetics to a single division to minimize precision loss
+        microseconds: int = utils.timedelta_to_microseconds(value)
+        microseconds_per_unit: int = self._unit_to_microseconds_mapping[self.precision]
+        return microseconds / microseconds_per_unit


apart from float precision issues with the old total_seconds approach, the new approach is 5x faster:

In [1]: from marshmallow import utils ...: from datetime import timedelta ...: ...: WEEKS = "weeks" ...: DAYS = "days" ...: HOURS = "hours" ...: MINUTES = "minutes" ...: SECONDS = "seconds" ...: MILLISECONDS = "milliseconds" ...: MICROSECONDS = "microseconds" ...: ...: _unit_to_microseconds_mapping = { ...: WEEKS: 1000000 * 60 * 60 * 24 * 7, ...: DAYS: 1000000 * 60 * 60 * 24, ...: HOURS: 1000000 * 60 * 60, ...: MINUTES: 1000000 * 60, ...: SECONDS: 1000000, ...: MILLISECONDS: 1000, ...: MICROSECONDS: 1, ...: } ...: ...: precision = WEEKS ...: ...: def serialize_old(value): ...: base_unit = timedelta(**{precision: 1}) ...: return value.total_seconds() / base_unit.total_seconds() ...: ...: def serialize_new(value): ...: microseconds: int = utils.timedelta_to_microseconds(value) ...: microseconds_per_unit: int = _unit_to_microseconds_mapping[precision] ...: return microseconds / microseconds_per_unit ...: ...: value = timedelta(weeks=1, microseconds=1) In [2]: %timeit serialize_old(value) 558 ns ± 3.16 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) In [3]: %timeit serialize_new(value) 113 ns ± 0.583 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

src/marshmallow/fields.py

for more information, see https://pre-commit.ci

Avoid silent integer downcast in TimeDelta.deserialize

1dcc544

This was referenced Dec 10, 2024

Providing millisecondes in timedelta does not work sloria/environs#372

Closed

fix: timedelta parsing for int and floats sloria/environs#368

Closed

sloria approved these changes Dec 10, 2024

View reviewed changes

sloria reviewed Dec 10, 2024

View reviewed changes

tests/test_deserialization.py Show resolved Hide resolved

Polish docstring

0755fe1

ddelange force-pushed the timedelta-deserialize branch from ea0509a to 0755fe1 Compare December 10, 2024 22:51

lafrech reviewed Dec 11, 2024

View reviewed changes

ddelange force-pushed the timedelta-deserialize branch 2 times, most recently from fd171e8 to 210339f Compare December 22, 2024 16:50

ddelange commented Dec 22, 2024

View reviewed changes

src/marshmallow/fields.py Outdated Show resolved Hide resolved

ddelange changed the title ~~Avoid silent integer downcast in TimeDelta.deserialize~~ Deprecate TimeDelta serialization_type parameter Dec 22, 2024

ddelange force-pushed the timedelta-deserialize branch 8 times, most recently from 0742e3f to 8d166b1 Compare December 23, 2024 15:23

Deprecate TimeDelta serialization_type parameter

3470680

ddelange force-pushed the timedelta-deserialize branch from 8d166b1 to 3470680 Compare December 23, 2024 15:24

ddelange commented Dec 23, 2024

View reviewed changes

sloria added this to the 4.0 milestone Dec 27, 2024

sloria reviewed Dec 30, 2024

View reviewed changes

src/marshmallow/fields.py Outdated Show resolved Hide resolved

sloria changed the base branch from dev to 4.0 December 30, 2024 05:13

Remove deprecation warning

7519edc

ddelange requested review from lafrech and sloria December 30, 2024 11:40

sloria changed the title ~~Deprecate TimeDelta serialization_type parameter~~ Remove TimeDelta serialization_type parameter Dec 30, 2024

sloria and others added 3 commits December 30, 2024 13:59

Update changelog

1f73216

Merge branch '4.0' into timedelta-deserialize

6c28329

[pre-commit.ci] auto fixes from pre-commit.com hooks

e77aeec

for more information, see https://pre-commit.ci

sloria changed the title ~~Remove TimeDelta serialization_type parameter~~ Remove TimeDelta serialization_type parameter; deserialize from float; improve serialization perf and accuracy Dec 30, 2024

sloria approved these changes Dec 30, 2024

View reviewed changes

sloria merged commit 0c90110 into marshmallow-code:4.0 Dec 30, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove TimeDelta `serialization_type` parameter; deserialize from float; improve serialization perf and accuracy #2654

Remove TimeDelta `serialization_type` parameter; deserialize from float; improve serialization perf and accuracy #2654

ddelange commented Dec 10, 2024 •

edited

Loading

sloria left a comment

lafrech commented Dec 10, 2024

ddelange commented Dec 10, 2024

sloria commented Dec 11, 2024

ddelange commented Dec 11, 2024

sloria commented Dec 11, 2024

ddelange commented Dec 11, 2024

sloria commented Dec 11, 2024

lafrech left a comment

lafrech Dec 11, 2024

ddelange Dec 11, 2024

ddelange Dec 12, 2024

ddelange commented Dec 12, 2024

lafrech commented Dec 12, 2024

ddelange commented Dec 12, 2024

lafrech commented Dec 12, 2024

ddelange commented Dec 13, 2024

sloria commented Dec 18, 2024

sloria commented Dec 18, 2024 •

edited

Loading

deckar01 commented Dec 21, 2024

ddelange commented Dec 21, 2024

lafrech commented Dec 21, 2024 via email

sloria commented Dec 21, 2024

lafrech commented Dec 21, 2024 via email

ddelange left a comment •

edited

Loading

ddelange Dec 23, 2024 •

edited

Loading

ddelange Dec 23, 2024 •

edited

Loading

Remove TimeDelta serialization_type parameter; deserialize from float; improve serialization perf and accuracy #2654

Remove TimeDelta serialization_type parameter; deserialize from float; improve serialization perf and accuracy #2654

Conversation

ddelange commented Dec 10, 2024 • edited Loading

sloria left a comment

Choose a reason for hiding this comment

lafrech commented Dec 10, 2024

ddelange commented Dec 10, 2024

sloria commented Dec 11, 2024

ddelange commented Dec 11, 2024

sloria commented Dec 11, 2024

ddelange commented Dec 11, 2024

sloria commented Dec 11, 2024

lafrech left a comment

Choose a reason for hiding this comment

lafrech Dec 11, 2024

Choose a reason for hiding this comment

ddelange Dec 11, 2024

Choose a reason for hiding this comment

ddelange Dec 12, 2024

Choose a reason for hiding this comment

ddelange commented Dec 12, 2024

lafrech commented Dec 12, 2024

ddelange commented Dec 12, 2024

lafrech commented Dec 12, 2024

ddelange commented Dec 13, 2024

sloria commented Dec 18, 2024

sloria commented Dec 18, 2024 • edited Loading

deckar01 commented Dec 21, 2024

ddelange commented Dec 21, 2024

lafrech commented Dec 21, 2024 via email

sloria commented Dec 21, 2024

lafrech commented Dec 21, 2024 via email

ddelange left a comment • edited Loading

Choose a reason for hiding this comment

ddelange Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

ddelange Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

Remove TimeDelta `serialization_type` parameter; deserialize from float; improve serialization perf and accuracy #2654

Remove TimeDelta `serialization_type` parameter; deserialize from float; improve serialization perf and accuracy #2654

ddelange commented Dec 10, 2024 •

edited

Loading

sloria commented Dec 18, 2024 •

edited

Loading

ddelange left a comment •

edited

Loading

ddelange Dec 23, 2024 •

edited

Loading

ddelange Dec 23, 2024 •

edited

Loading