Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Fields items are serialized more than once by fields.Method #1995

Closed
benedekh opened this issue May 29, 2022 · 2 comments
Closed

Bug: Fields items are serialized more than once by fields.Method #1995

benedekh opened this issue May 29, 2022 · 2 comments

Comments

@benedekh
Copy link

benedekh commented May 29, 2022

After fixing #1993 in #1994, I stumbled upon another bug. List items are serialized more than once by fields.Method. To reproduce the bug, let's have the following Schema:

class ExampleSchema(Schema):
	dummy_list = fields.List(fields.Method(serialize="serialize_me"))

	def serialize_me(self, obj):
		return [str(value) for value in obj["dummy_list"]]

Let's have the following serialization code:

obj = {"dummy_list": ["hello", "world"]}
dumped = ExampleSchema().dump(obj)

After serialization dumped will be

{"dummy_list": [["hello", "world"], ["hello", "world"]]}

instead of

{"dummy_list": ["hello", "world"]}

The cause of the problem is, the serialize_me method never receives the value that should be serialized (neither as an argument, nor in **kwargs), it only gets the obj that is being serialized. By looking at the source code here, we see that the _serialize_method of fields.Method does not get the value as parameter.


Although the fix seems to be quite simple, just pass the value to the _serialize_method, it would be a breaking API change that would need a new major release.

To avoid a new major release, I would like to ask the maintainer(s) of the repository if they see any other solution to fix this bug?

@benedekh benedekh changed the title Bug: Fields items are serialized more than once by fields.Method. Bug: Fields items are serialized more than once by fields.Method Jun 10, 2022
@lafrech lafrech added the bug label Oct 17, 2022
@lafrech
Copy link
Member

lafrech commented Oct 17, 2022

I'd say the error lies in container fields such as List field.

    def _serialize(self, value, attr, obj, **kwargs) -> list[typing.Any] | None:
        if value is None:
            return None
        return [self.inner._serialize(each, attr, obj, **kwargs) for each in value]

When serializing a list, obj is the object, attr the name of the attribute holding the list and value the content of that attribute (the elements in the list). Then we call the inner field serialize method on each element, and we pass attr and obj which doesn't make much sense considering that at this stage it is supposed to work on a single element, with no knowledge about the list. We can't really make up an attr and an obj that make sense here.

This is related to #2039.

I don't see a nice solution to this. In fact, what I say in #2039 is that we shouldn't pass attr and an obj but this would break Function and Method field and I have no idea how to make those fields work without passing the object.

Thanks for adding another example of why this design is wrong, but I can't think of a better implementation.

(Sorry I didn't get the time to further look into #1993 and #1994.)

@sloria
Copy link
Member

sloria commented Jan 19, 2025

i think the current behavior is actually the expected behavior of List, which is meant to use the serde behavior of the inner field. so it makes sense that the Method would be called for each element in the list.

that said, this isn't the intended usage for fields.Method. perhaps we should document this, but Method shouldn't be nested within other fields. it should be used directly under a schema, like so:

from dataclasses import dataclass

from marshmallow import Schema, fields


@dataclass
class Book:
    name: str


@dataclass
class Author:
    books: list[Book]


class BookSchema(Schema):
    name = fields.String()


class AuthorSchema(Schema):
    upper_book_names = fields.Method("get_book_names")

    def get_book_names(self, author: Author):
        return [book.name.upper() for book in author.books]


author = Author(books=[Book(name="Book1"), Book(name="Book2")])

schema = AuthorSchema()

dumped = schema.dump(author)
# {
#     "upper_book_names": [
#         "BOOK1",
#         "BOOK2",
#     ],
# }

this a bit of a contrived example though. the above use case is better solved with fields.Pluck. assuming we implement #2787 , the uppercasing could then be done with post_dump.

def uppercase(value):
    return value.upper()


class AuthorSchema(Schema):
    upper_book_names = fields.List(
        fields.Pluck(BookSchema(), "name"), attribute="books", post_dump=(uppercase,)
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants