Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid silent integer downcast in TimeDelta.deserialize #2654

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

ddelange
Copy link

@ddelange ddelange commented Dec 10, 2024

This PR allows propagating floats/strings with decimals to TimeDelta.deserialize without silently downcasting the input to int. Currently, serialization_type is not only being used for serialization, but also being used for deserialization, which adds no added value: a float cast can always safely be used for deserialization and is backwards compatible (but fixes this bug).

originally reported here: sloria/environs#372

MRE:

>>> from marshmallow import fields
>>> field = fields.TimeDelta()
>>> field.deserialize(12.9)
datetime.timedelta(seconds=12)

After this PR:

>>> from marshmallow import fields
>>> field = fields.TimeDelta()
>>> field.deserialize(12.9)
datetime.timedelta(seconds=12, microseconds=900000)

I added one test for string input, and changed the existing test for float input.

There are already math.isclose tests when passing floats to microseconds (which will be rounded by the datetime.timedelta constructor).

Copy link
Member

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR ❤️ This seems like the right behavior.

Lgtm but I'll give @lafrech some time to take a look in case I'm not thinking of some case where end users might rely on the current behavior.

@@ -741,7 +748,7 @@ def test_timedelta_field_deserialization(self):
assert isinstance(result, dt.timedelta)
assert result.days == 0
assert result.seconds == 12
assert result.microseconds == 0
assert result.microseconds == 900000
Copy link
Member

@sloria sloria Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 This was explicitly asserting that microseconds were lost. I'm not sure why, though

@lafrech
Copy link
Member

lafrech commented Dec 10, 2024

[I enabled issues again. The spam streak is over on flask-smorest. Let's hope it is here too.]

I don't have a full understanding of the issue but in case it went unnoticed, here are related issues/PRs:

Also, the docstring should be updated if we merge.

@ddelange
Copy link
Author

Hi @lafrech 👋

You're right, the docstring needed some work. Please see the last commit. I hope that also clarifies the bug that this PR fixes: needlessly using serialization_type during deserialization.

@ddelange ddelange force-pushed the timedelta-deserialize branch from ea0509a to 0755fe1 Compare December 10, 2024 22:51
@sloria
Copy link
Member

sloria commented Dec 11, 2024

Docs update looks good to me. Thanks @ddelange .

Thanks for digging up the original issue and PRs @lafrech. Looking back at those, I think allowing truncation to an integer was an oversight on my part rather than an explicit decision. Even the ISO8601 spec mentioned in #105 (comment) allows for fractional components:

The smallest value used may also have a decimal fraction,[39] as in "P0.5Y" to indicate half a year. This decimal fraction may be specified with either a comma or a full stop, as in "P0,5Y" or "P0.5Y".

@lafrech @deckar01 Do we consider this a breaking change? Even though this is a much smaller change than previous major version bumps, I'm not opposed to a major version bump with some explanatory text in the changelog that says that the change is very limited in scope and only affects users of the TimeDelta field.

@ddelange
Copy link
Author

fwiw, this change only affects (helps) users that were serializing using serialization_type=float, but then not taking care when deserializing.

I would say this does not warrant a major version bump, but rather a bugfix release.

@sloria
Copy link
Member

sloria commented Dec 11, 2024

It would also affect users who didn't explicitly pass serialization_type or passed serialization_type=int, right (as your MRE shows)? Users who were relying on floats getting truncated would experience a breakage (not saying this is likely or desirable, but it's possible).

@ddelange
Copy link
Author

right. the good ol' rely on a bug motto. programming sucks...

@sloria
Copy link
Member

sloria commented Dec 11, 2024

lol, yeah... i guess i'm not 100% sure it's a bug, given that there was an explicit test for the current behavior. i'm like 80% sure it was oversight, but it was long ago so want to give a bit of time for others to chime in before shipping the change

Copy link
Member

@lafrech lafrech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure.

One of our concerns when designing a field is how values round-trip.

Current behaviour lets the user decide whether the serialized version should be int or float, and considers it is being given serialized values accordingly. If values are serialized as int, then the type that is expected as input when deserializing is int.

The original name of the parameter was serde_type, not very nice but showing it works both ways.

Before this PR:

    # TimeDelta(serialization_type=int)
    12 -> timedelta(seconds=12) -> 12
    12.42 -> timedelta(seconds=12) -> 12

    # TimeDelta(serialization_type=float)
    12 -> timedelta(seconds=12) -> 12.0
    12.42 -> timedelta(seconds=12, microseconds=420000) -> 12.42

After this PR:

    # TimeDelta(serialization_type=int)
    12 -> timedelta(seconds=12) -> 12
    12.42 -> timedelta(seconds=12, microseconds=420000) -> 12

    # TimeDelta(serialization_type=float)
    12 -> timedelta(seconds=12) -> 12.0
    12.42 -> timedelta(seconds=12, microseconds=420000) -> 12.42

From this perspective, I'm not sure the latter is less surprising behaviour.

The PR makes the field a bit more permissive on inputs, accepting floats when expecting integers. This is not in line with Integer field which truncates floats.

If floats are to be expected, then serialization_type=float should be used.

If the issue is that TimeDelta defaults to int then maybe the default could be changed in next major version. Until then, nothing prevents the user to specify float, right?

I might be totally mistaken, in which case please ignore.

@@ -1526,7 +1528,7 @@ def _serialize(self, value, attr, obj, **kwargs):

def _deserialize(self, value, attr, data, **kwargs):
try:
value = self.serialization_type(value)
value = float(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that when deserializing an int this introduces an error due to floating point precision?

Like float(42) -> 42.0000000000000000001

Answering to myself, no because this will be rounded in timedelta:

>>> field.deserialize(12.9000000001)
datetime.timedelta(seconds=12, microseconds=900000)

Allow (de)serialization to `float` through use of a new `serialization_type` parameter.
`int` is the default to retain previous behaviour.
Allow serialization to `float` through use of a new `serialization_type` parameter.
Defaults to `int` for backwards compatibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a new versionchanged line here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never truncate a `float` to `int` during deserialization.

too concise?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe

Never truncate a `float` to an `int` before deserializing it into a :class:`datetime.timedelta`.

@ddelange
Copy link
Author

Thanks for your elaborate review!

From this perspective, I'm not sure the latter is less surprising behaviour.

To me the latter definitely looks 'right'. From another perspective: that float came from marshmallow (starting off with a timedelta). It got serialized and sent elsewhere for deserialization. When the receiving party deserializes, in the latter example he doesn't need prior knowledge about the sender's settings. Deserialization has now become robust.

@lafrech
Copy link
Member

lafrech commented Dec 12, 2024

Is there anything wrong with TimeDelta(serialization_type=float), apart from the fact that it is not the default?

I mean does the change in this PR allow a use case that can't be achieved with current parameters? Or is the only benefit to make it the default choice?

@ddelange
Copy link
Author

re: anything wrong - my argument here is that serialization_type should be used only for serialization and not for deserialization. but that's about it :)

@lafrech
Copy link
Member

lafrech commented Dec 12, 2024

So would it be fine if the parameter was named serialization_deserialization_type, de_serialization_type or serialized_type of whatever we can come up with that makes it explicit it works both ways?

If it is just that, we could use a better naming and deprecate the old one.

@ddelange
Copy link
Author

from my side there's no need for a breaking change here, this change just seemed like a nice robustness improvement to me (not considering this breaking).

it only came up because a user was surprised that his float was being silently truncated when feeding into @sloria's environs package. since that package only ever deserializes (values from unknown sources), this seemed like the right way forward. but feel free to close / take over if you do consider this a 'bad' change, no hard feelings :)

will look uglier than now, but environs could fix this without any PR om marshmallow by subclassing TimeDelta.__init__

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants