[Question] How to map an operation to `treespec` instead of only `node` #173

lqhuang · 2024-11-15T13:59:26Z

Required prerequisites

I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)

Motivation

I'm designing a lib for serializing and deserializing Python structured data class (like dataclass or attrs). I realized the idea and operations defined by PyTree are especially suitable for this purpose.

Basically, serializing a Python container into a specific protocol (like JSON or YAML) is flattening a custom data container into primitive and then reconstruct back to serializable native container like dict/list/tuple`.

That's totally the same with what tree_map did (flatten → traverse → unflatten) but with customized flatten and unflatten functions.

And in previous implementations (see xattrs _struct_funcs.py#L61 or attrs _funcs.py#L11), it strongly depends on explicit recursions, which are quite cumbersome and hard to debug. I think the PyTree way will make all these easier.

I have researched all major PyTree implementations, even thinking about writing it by myself, but I found optree now (@XuehaiPan excellent)! Unfortunately, in my practice, a real serde system is often require doing some transforms on key fields, but current operations don't support this feature yet.

def asdict(
    inst: Any,
    *,
    dict_factory: type[Mapping] = dict,
    filter=None,
    key_serializer=None,
    value_serializer=None,
    copy=deepcopy,
) -> Mapping[Hashable, Any]:
    ...

For example, in the above codes, key_serializer would be applied on key (treespec) to transform case (e.g.: from full_name to fullName), and value_serializer would be applied on value (all leaves or nodes) to decide how value to be serialized.

Solution

Add some functions like optree.tree_map_on_treespec?

I'm still thinking the concrete signature.

Alternatives

No response

Additional context

I know it's an unusual or even “unused” feature (for its original purpose). If you would like to accept the proposal, I'm glad to try to implement first MVP under your guidance. And I need to know where should I start to investigate first.

The text was updated successfully, but these errors were encountered:

XuehaiPan · 2024-11-15T14:18:13Z

Hi @lqhuang, the treespec object is hashable and pickleable. I do not understand the obstacle of serializing a pytree very well.

In [1]: import optree

In [2]: treespec = optree.tree_structure({'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}})

In [3]: treespec
Out[3]: PyTreeSpec({'a': *, 'b': *, 'c': {'d': *, 'e': *}})

In [4]: hash(treespec)
Out[4]: 2985958577142014292

In [5]: import pickle

In [6]: pickle.dumps(treespec)
Out[6]: b'\x80\x04\x95\xaa\x00\x00\x00\x00\x00\x00\x00\x8c\x06optree\x94\x8c\nPyTreeSpec\x94\x93\x94)\x81\x94((K\x01K\x00NNNK\x01K\x01Nt\x94(K\x01K\x00NNNK\x01K\x01Nt\x94(K\x01K\x00NNNK\x01K\x01Nt\x94(K\x01K\x00NNNK\x01K\x01Nt\x94(K\x05K\x02]\x94(\x8c\x01d\x94\x8c\x01e\x94eNNK\x02K\x03]\x94(h\th\net\x94(K\x05K\x03]\x94(\x8c\x01a\x94\x8c\x01b\x94\x8c\x01c\x94eNNK\x04K\x06]\x94(h\x0eh\x0fh\x10et\x94t\x94\x89\x8c\x00\x94\x87\x94b.'

In [7]: pickle.loads(pickle.dumps(treespec))
Out[7]: PyTreeSpec({'a': *, 'b': *, 'c': {'d': *, 'e': *}})

Also, have you ever tried the paths and accessors, they are also hashable and pickleable.

In [8]: treespec.paths()
Out[8]: [('a',), ('b',), ('c', 'd'), ('c', 'e')]

In [9]: treespec.accessors()
Out[9]: 
[
    PyTreeAccessor(*['a'], (MappingEntry(key='a', type=<class 'dict'>),)),
    PyTreeAccessor(*['b'], (MappingEntry(key='b', type=<class 'dict'>),)),
    PyTreeAccessor(*['c']['d'], (MappingEntry(key='c', type=<class 'dict'>), MappingEntry(key='d', type=<class 'dict'>))),
    PyTreeAccessor(*['c']['e'], (MappingEntry(key='c', type=<class 'dict'>), MappingEntry(key='e', type=<class 'dict'>)))
]

lqhuang · 2024-11-16T07:42:14Z

@XuehaiPan Hey, Yeah, I don't want to just serialize a PyTree, yeah it's not an obstacle.

I want to a tree_map-like functions working on "key". And as you showed in reply, I thought it's probably already implemented, which named as tree_map_with_accessor / tree_map_with_paths.

I will try it later. Thanks for your hints! 🥹

Best wishes. Has a happy weekend!

lqhuang · 2024-11-16T07:52:32Z

I have read the documents and example of https://optree.readthedocs.io/en/latest/ops.html#optree.tree_map_with_accessor. I think what I need is a little slightly different with tree_map_with_accessor.

Likewise, Here is an example for tree_map_with_key (function name may update in the future):

>>> tree_map_with_key(lambda k, x: k * 2, {'x': 7, 'y': (42, 64), 'z': None})
{'xx': 8, 'yy': (42, 64), 'zz': None}

That means the function or the operator is applied on treespec / keys / paths.

XuehaiPan · 2024-11-16T10:33:58Z

One thing I want to point out is that a pytree is not narrowed to the nested dictionary. It can be an arbitrary nested container or an opaque object. In your example, it is not well-defined to change the path if the entry is a sequence index.

To change the dictionary keys, you can do:

In [1]: import optree

In [2]: tree = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}}

In [3]: def double(x):
   ...:     return x * 2
   ...:     

In [4]: def is_dict(x):
   ...:     return type(x) is dict
   ...:     

In [5]: def double_dict_keys(x):
   ...:     if is_dict(x):
   ...:         return {double(k): double_dict_keys(v) for k, v in x.items()}
   ...:     return x
   ...:     

In [6]: optree.tree_map(double_dict_keys, tree, is_leaf=is_dict)
Out[6]: {'aa': 1, 'bb': 2, 'cc': {'dd': 3, 'ee': 4}}

lqhuang · 2024-11-16T14:34:47Z

It can be an arbitrary nested container or an opaque object.

Yeah, it's ok in my cases. It would raise a NotImplemented-like Error.

In your example, it is not well-defined to change the path if the entry is a sequence index.

Could you give me a more concrete example about input data? I think I will do nothing when the entry is a sequence index if I find treespec point it's a sequence like data container.

optree.tree_map(double_dict_keys, tree, is_leaf=is_dict)

Here is a question about is_leaf=is_dict, what if I want to keep the same leaf/node definitions? As I understand, this may alter the structure of the whole tree?

And return {double(k): double_dict_keys(v) for k, v in x.items()}, we look like back to recursion again.

Thanks for your demo for changing the dictionary keys, good example!

XuehaiPan · 2024-11-16T16:10:56Z

Could you give me a more concrete example about input data? I think I will do nothing when the entry is a sequence index if I find treespec point it's a sequence like data container.

In [1]: import optree

In [2]: import collections

In [3]: MyTuple = collections.namedtuple('MyTuple', ['x', 'y', 'z'])

In [4]: tree = {'a': 1, 'b': (2, MyTuple(3, 4, 5), [6, 7]), 'c': {'d': (8,), 'e': [9, 0]}}

In [5]: optree.tree_structure(tree)
Out[5]: PyTreeSpec({'a': *, 'b': (*, MyTuple(x=*, y=*, z=*), [*, *]), 'c': {'d': (*,), 'e': [*, *]}})

In [6]: optree.tree_leaves(tree)
Out[6]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]

In [7]: optree.tree_paths(tree)
Out[7]: [('a',), ('b', 0), ('b', 1, 0), ('b', 1, 1), ('b', 1, 2), ('b', 2, 0), ('b', 2, 1), ('c', 'd', 0), ('c', 'e', 0), ('c', 'e', 1)]

In [8]: optree.tree_accessors(tree)
Out[8]:
[
    PyTreeAccessor(*['a'], (MappingEntry(key='a', type=<class 'dict'>),)),
    PyTreeAccessor(*['b'][0], (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=0, type=<class 'tuple'>))),
    PyTreeAccessor(*['b'][1].x, (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=1, type=<class 'tuple'>), NamedTupleEntry(field='x', type=<class '__main__.MyTuple'>))),
    PyTreeAccessor(*['b'][1].y, (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=1, type=<class 'tuple'>), NamedTupleEntry(field='y', type=<class '__main__.MyTuple'>))),
    PyTreeAccessor(*['b'][1].z, (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=1, type=<class 'tuple'>), NamedTupleEntry(field='z', type=<class '__main__.MyTuple'>))),
    PyTreeAccessor(*['b'][2][0], (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=2, type=<class 'tuple'>), SequenceEntry(index=0, type=<class 'list'>))),
    PyTreeAccessor(*['b'][2][1], (MappingEntry(key='b', type=<class 'dict'>), SequenceEntry(index=2, type=<class 'tuple'>), SequenceEntry(index=1, type=<class 'list'>))),
    PyTreeAccessor(*['c']['d'][0], (MappingEntry(key='c', type=<class 'dict'>), MappingEntry(key='d', type=<class 'dict'>), SequenceEntry(index=0, type=<class 'tuple'>))),
    PyTreeAccessor(*['c']['e'][0], (MappingEntry(key='c', type=<class 'dict'>), MappingEntry(key='e', type=<class 'dict'>), SequenceEntry(index=0, type=<class 'list'>))),
    PyTreeAccessor(*['c']['e'][1], (MappingEntry(key='c', type=<class 'dict'>), MappingEntry(key='e', type=<class 'dict'>), SequenceEntry(index=1, type=<class 'list'>)))
]

Here is a question about is_leaf=is_dict, what if I want to keep the same leaf/node definitions? As I understand, this may alter the structure of the whole tree?

Changing the key of the dictionary already changed the tree structure. This is_leaf function should be used with the recursion together in the map function as you mentioned.

XuehaiPan · 2024-11-16T16:13:51Z

It can be an arbitrary nested container or an opaque object.

Yeah, it's ok in my cases. It would raise a NotImplemented-like Error.

In your example, it is not well-defined to change the path if the entry is a sequence index.

Could you give me a more concrete example about input data? I think I will do nothing when the entry is a sequence index if I find treespec point it's a sequence like data container.

I'm changing this issue to Question instead of feature request because it is difficult to give an API definition to general pytree (beyond nested dictionary).

lqhuang · 2024-11-17T02:30:31Z

I'm changing this issue to Question instead of feature request because it is difficult to give an API definition to general pytree (beyond nested dictionary).

Sorry, I will try to give more basic examples and

I'm still thinking the concrete signature.

try to write first sensible function signature.

Thanks for your kind help

lqhuang added the enhancement New feature or request label Nov 15, 2024

lqhuang closed this as completed Nov 16, 2024

lqhuang reopened this Nov 16, 2024

XuehaiPan changed the title ~~[Feature Request] Add features or functions that could apply or map an operation to treespec instead of only node~~ [Question] How to map an operation to treespec instead of only node Nov 16, 2024

XuehaiPan added question Further information is requested and removed enhancement New feature or request labels Nov 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to map an operation to `treespec` instead of only `node` #173

[Question] How to map an operation to `treespec` instead of only `node` #173

lqhuang commented Nov 15, 2024 •

edited

Loading

XuehaiPan commented Nov 15, 2024 •

edited

Loading

lqhuang commented Nov 16, 2024 •

edited

Loading

lqhuang commented Nov 16, 2024 •

edited

Loading

XuehaiPan commented Nov 16, 2024 •

edited

Loading

lqhuang commented Nov 16, 2024 •

edited

Loading

XuehaiPan commented Nov 16, 2024 •

edited

Loading

XuehaiPan commented Nov 16, 2024

lqhuang commented Nov 17, 2024

[Question] How to map an operation to treespec instead of only node #173

[Question] How to map an operation to treespec instead of only node #173

Comments

lqhuang commented Nov 15, 2024 • edited Loading

Required prerequisites

Motivation

Solution

Alternatives

Additional context

XuehaiPan commented Nov 15, 2024 • edited Loading

lqhuang commented Nov 16, 2024 • edited Loading

lqhuang commented Nov 16, 2024 • edited Loading

XuehaiPan commented Nov 16, 2024 • edited Loading

lqhuang commented Nov 16, 2024 • edited Loading

XuehaiPan commented Nov 16, 2024 • edited Loading

XuehaiPan commented Nov 16, 2024

lqhuang commented Nov 17, 2024

[Question] How to map an operation to `treespec` instead of only `node` #173

[Question] How to map an operation to `treespec` instead of only `node` #173

lqhuang commented Nov 15, 2024 •

edited

Loading

XuehaiPan commented Nov 15, 2024 •

edited

Loading

lqhuang commented Nov 16, 2024 •

edited

Loading

lqhuang commented Nov 16, 2024 •

edited

Loading

XuehaiPan commented Nov 16, 2024 •

edited

Loading

lqhuang commented Nov 16, 2024 •

edited

Loading

XuehaiPan commented Nov 16, 2024 •

edited

Loading