Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to map an operation to treespec instead of only node #173

Open
1 task done
lqhuang opened this issue Nov 15, 2024 · 8 comments
Open
1 task done

[Question] How to map an operation to treespec instead of only node #173

lqhuang opened this issue Nov 15, 2024 · 8 comments
Labels
question Further information is requested

Comments

@lqhuang
Copy link

lqhuang commented Nov 15, 2024

Required prerequisites

  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)

Motivation

I'm designing a lib for serializing and deserializing Python structured data class (like dataclass or attrs). I realized the idea and operations defined by PyTree are especially suitable for this purpose.

Basically, serializing a Python container into a specific protocol (like JSON or YAML) is flattening a custom data container into primitive and then reconstruct back to serializable native container like dict/list/tuple`.

That's totally the same with what tree_map did (flatten → traverse → unflatten) but with customized flatten and unflatten functions.

And in previous implementations (see xattrs _struct_funcs.py#L61 or attrs _funcs.py#L11), it strongly depends on explicit recursions, which are quite cumbersome and hard to debug. I think the PyTree way will make all these easier.

I have researched all major PyTree implementations, even thinking about writing it by myself, but I found optree now (@XuehaiPan excellent)! Unfortunately, in my practice, a real serde system is often require doing some transforms on key fields, but current operations don't support this feature yet.

def asdict(
    inst: Any,
    *,
    dict_factory: type[Mapping] = dict,
    filter=None,
    key_serializer=None,
    value_serializer=None,
    copy=deepcopy,
) -> Mapping[Hashable, Any]:
    ...

For example, in the above codes, key_serializer would be applied on key (treespec) to transform case (e.g.: from full_name to fullName), and value_serializer would be applied on value (all leaves or nodes) to decide how value to be serialized.

Solution

Add some functions like optree.tree_map_on_treespec?

I'm still thinking the concrete signature.

Alternatives

No response

Additional context

I know it's an unusual or even “unused” feature (for its original purpose). If you would like to accept the proposal, I'm glad to try to implement first MVP under your guidance. And I need to know where should I start to investigate first.

@lqhuang lqhuang added the enhancement New feature or request label Nov 15, 2024
@XuehaiPan
Copy link
Member

XuehaiPan commented Nov 15, 2024

Hi @lqhuang, the treespec object is hashable and pickleable. I do not understand the obstacle of serializing a pytree very well.

In [1]: import optree

In [2]: treespec = optree.tree_structure({'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}})

In [3]: treespec
Out[3]: PyTreeSpec({'a': *, 'b': *, 'c': {'d': *, 'e': *}})

In [4]: hash(treespec)
Out[4]: 2985958577142014292

In [5]: import pickle

In [6]: pickle.dumps(treespec)
Out[6]: b'\x80\x04\x95\xaa\x00\x00\x00\x00\x00\x00\x00\x8c\x06optree\x94\x8c\nPyTreeSpec\x94\x93\x94)\x81\x94((K\x01K\x00NNNK\x01K\x01Nt\x94(K\x01K\x00NNNK\x01K\x01Nt\x94(K\x01K\x00NNNK\x01K\x01Nt\x94(K\x01K\x00NNNK\x01K\x01Nt\x94(K\x05K\x02]\x94(\x8c\x01d\x94\x8c\x01e\x94eNNK\x02K\x03]\x94(h\th\net\x94(K\x05K\x03]\x94(\x8c\x01a\x94\x8c\x01b\x94\x8c\x01c\x94eNNK\x04K\x06]\x94(h\x0eh\x0fh\x10et\x94t\x94\x89\x8c\x00\x94\x87\x94b.'

In [7]: pickle.loads(pickle.dumps(treespec))
Out[7]: PyTreeSpec({'a': *, 'b': *, 'c': {'d': *, 'e': *}})

Also, have you ever tried the paths and accessors, they are also hashable and pickleable.

In [8]: treespec.paths()
Out[8]: [('a',), ('b',), ('c', 'd'), ('c', 'e')]

In [9]: treespec.accessors()
Out[9]: 
[
    PyTreeAccessor(*['a'], (MappingEntry(key='a', type=<class 'dict'>),)),
    PyTreeAccessor(*['b'], (MappingEntry(key='b', type=<class 'dict'>),)),
    PyTreeAccessor(*['c']['d'], (MappingEntry(key='c', type=<class 'dict'>), MappingEntry(key='d', type=<class 'dict'>))),
    PyTreeAccessor(*['c']['e'], (MappingEntry(key='c', type=<class 'dict'>), MappingEntry(key='e', type=<class 'dict'>)))
]

@lqhuang
Copy link
Author

lqhuang commented Nov 16, 2024

@XuehaiPan Hey, Yeah, I don't want to just serialize a PyTree, yeah it's not an obstacle.

I want to a tree_map-like functions working on "key". And as you showed in reply, I thought it's probably already implemented, which named as tree_map_with_accessor / tree_map_with_paths.

I will try it later. Thanks for your hints! 🥹

Best wishes. Has a happy weekend!

@lqhuang lqhuang closed this as completed Nov 16, 2024
@lqhuang lqhuang reopened this Nov 16, 2024
@lqhuang
Copy link
Author

lqhuang commented Nov 16, 2024

I have read the documents and example of https://optree.readthedocs.io/en/latest/ops.html#optree.tree_map_with_accessor. I think what I need is a little slightly different with tree_map_with_accessor.

Likewise, Here is an example for tree_map_with_key (function name may update in the future):

>>> tree_map_with_key(lambda k, x: k * 2, {'x': 7, 'y': (42, 64), 'z': None})
{'xx': 8, 'yy': (42, 64), 'zz': None}

That means the function or the operator is applied on treespec / keys / paths.

@XuehaiPan
Copy link
Member

XuehaiPan commented Nov 16, 2024

One thing I want to point out is that a pytree is not narrowed to the nested dictionary. It can be an arbitrary nested container or an opaque object. In your example, it is not well-defined to change the path if the entry is a sequence index.

To change the dictionary keys, you can do:

In [1]: import optree

In [2]: tree = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}}

In [3]: def double(x):
   ...:     return x * 2
   ...:     

In [4]: def is_dict(x):
   ...:     return type(x) is dict
   ...:     

In [5]: def double_dict_keys(x):
   ...:     if is_dict(x):
   ...:         return {double(k): double_dict_keys(v) for k, v in x.items()}
   ...:     return x
   ...:     

In [6]: optree.tree_map(double_dict_keys, tree, is_leaf=is_dict)
Out[6]: {'aa': 1, 'bb': 2, 'cc': {'dd': 3, 'ee': 4}}

@lqhuang
Copy link
Author

lqhuang commented Nov 16, 2024

It can be an arbitrary nested container or an opaque object.

Yeah, it's ok in my cases. It would raise a NotImplemented-like Error.

In your example, it is not well-defined to change the path if the entry is a sequence index.

Could you give me a more concrete example about input data? I think I will do nothing when the entry is a sequence index if I find treespec point it's a sequence like data container.

optree.tree_map(double_dict_keys, tree, is_leaf=is_dict)

Here is a question about is_leaf=is_dict, what if I want to keep the same leaf/node definitions? As I understand, this may alter the structure of the whole tree?

And return {double(k): double_dict_keys(v) for k, v in x.items()}, we look like back to recursion again.

Thanks for your demo for changing the dictionary keys, good example!

@XuehaiPan
Copy link
Member

XuehaiPan commented Nov 16, 2024

Could you give me a more concrete example about input data? I think I will do nothing when the entry is a sequence index if I find treespec point it's a sequence like data container.

In [1]: import optree

In [2]: import collections

In [3]: MyTuple = collections.namedtuple('MyTuple', ['x', 'y', 'z'])

In [4]: tree = {'a': 1, 'b': (2, MyTuple(3, 4, 5), [6, 7]), 'c': {'d': (8,), 'e': [9, 0]}}

In [5]: optree.tree_structure(tree)
Out[5]: PyTreeSpec({'a': *, 'b': (*, MyTuple(x=*, y=*, z=*), [*, *]), 'c': {'d': (*,), 'e': [*, *]}})

In [6]: optree.tree_leaves(tree)
Out[6]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]

In [7]: optree.tree_paths(tree)
Out[7]: [('a',), ('b', 0), ('b', 1, 0), ('b', 1, 1), ('b', 1, 2), ('b', 2, 0), ('b', 2, 1), ('c', 'd', 0), ('c', 'e', 0), ('c', 'e', 1)]

In [8]: optree.tree_accessors(tree)
Out[8]:
[
    PyTreeAccessor(*['a'], (MappingEntry(key='a', type=<class 'dict'>),)),
    PyTreeAccessor(*['b'][0], (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=0, type=<class 'tuple'>))),
    PyTreeAccessor(*['b'][1].x, (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=1, type=<class 'tuple'>), NamedTupleEntry(field='x', type=<class '__main__.MyTuple'>))),
    PyTreeAccessor(*['b'][1].y, (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=1, type=<class 'tuple'>), NamedTupleEntry(field='y', type=<class '__main__.MyTuple'>))),
    PyTreeAccessor(*['b'][1].z, (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=1, type=<class 'tuple'>), NamedTupleEntry(field='z', type=<class '__main__.MyTuple'>))),
    PyTreeAccessor(*['b'][2][0], (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=2, type=<class 'tuple'>), SequenceEntry(index=0, type=<class 'list'>))),
    PyTreeAccessor(*['b'][2][1], (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=2, type=<class 'tuple'>), SequenceEntry(index=1, type=<class 'list'>))),
    PyTreeAccessor(*['c']['d'][0], (MappingEntry(key='c', type=<class 'dict'>), MappingEntry(key='d', type=<class 'dict'>), SequenceEntry(index=0, type=<class 'tuple'>))),
    PyTreeAccessor(*['c']['e'][0], (MappingEntry(key='c', type=<class 'dict'>), MappingEntry(key='e', type=<class 'dict'>), SequenceEntry(index=0, type=<class 'list'>))),
    PyTreeAccessor(*['c']['e'][1], (MappingEntry(key='c', type=<class 'dict'>), MappingEntry(key='e', type=<class 'dict'>), SequenceEntry(index=1, type=<class 'list'>)))
]

Here is a question about is_leaf=is_dict, what if I want to keep the same leaf/node definitions? As I understand, this may alter the structure of the whole tree?

Changing the key of the dictionary already changed the tree structure. This is_leaf function should be used with the recursion together in the map function as you mentioned.

@XuehaiPan XuehaiPan changed the title [Feature Request] Add features or functions that could apply or map an operation to treespec instead of only node [Question] How to map an operation to treespec instead of only node Nov 16, 2024
@XuehaiPan XuehaiPan added question Further information is requested and removed enhancement New feature or request labels Nov 16, 2024
@XuehaiPan
Copy link
Member

It can be an arbitrary nested container or an opaque object.

Yeah, it's ok in my cases. It would raise a NotImplemented-like Error.

In your example, it is not well-defined to change the path if the entry is a sequence index.

Could you give me a more concrete example about input data? I think I will do nothing when the entry is a sequence index if I find treespec point it's a sequence like data container.

I'm changing this issue to Question instead of feature request because it is difficult to give an API definition to general pytree (beyond nested dictionary).

@lqhuang
Copy link
Author

lqhuang commented Nov 17, 2024

I'm changing this issue to Question instead of feature request because it is difficult to give an API definition to general pytree (beyond nested dictionary).

Sorry, I will try to give more basic examples and

I'm still thinking the concrete signature.

try to write first sensible function signature.

Thanks for your kind help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants