Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moderately large BTrees fail to pickle with RuntimeError: maximum recursion depth exceeded #44

Open
jamadden opened this issue Aug 25, 2016 · 10 comments

Comments

@jamadden
Copy link
Member

Example:

Python 3.4.5 (default, Jun 27 2016, 04:57:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import BTrees, pickle
>>> bt = BTrees.LOBTree.BTree()
>>> for i in range(15000): bt[i] = i
...
>>> pickle.dumps(bt)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: maximum recursion depth exceeded while calling a Python object

You can of course call sys.setrecursionlimit to make this go away...up to a point, at which time your Python segfaults with something like this (I think _pickle.so here is from zodbpickle):

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   org.python.python               0x0000000105f41b1e _PyUnicode_AsUTF8String + 26
1   org.python.python               0x0000000105f416f5 PyUnicode_AsEncodedString + 409
2   cPersistence.so                 0x0000000107981978 Per_getattro + 232
3   org.python.python               0x0000000105ef2054 PyObject_CallMethodObjArgs + 124
4   cPersistence.so                 0x0000000107980050 pickle___reduce__ + 144
5   org.python.python               0x0000000105ef19f2 PyObject_Call + 103
6   org.python.python               0x0000000105f899d2 PyEval_CallObjectWithKeywords + 165
7   org.python.python               0x0000000105f3b256 object_reduce_ex + 196
8   org.python.python               0x0000000105ef19f2 PyObject_Call + 103
9   _pickle.so                      0x0000000107e3a71d _Pickle_FastCall + 51
10  _pickle.so                      0x0000000107e394a6 save + 4598
11  _pickle.so                      0x0000000107e3b8b0 store_tuple_elements + 59
12  _pickle.so                      0x0000000107e39003 save + 3411
13  _pickle.so                      0x0000000107e3ab80 save_reduce + 1068
14  _pickle.so                      0x0000000107e39a1d save + 5997
15  _pickle.so                      0x0000000107e3b8b0 store_tuple_elements + 59
16  _pickle.so                      0x0000000107e39003 save + 3411
17  _pickle.so                      0x0000000107e3ab80 save_reduce + 1068
18  _pickle.so                      0x0000000107e39a1d save + 5997
19  _pickle.so                      0x0000000107e3b8b0 store_tuple_elements + 59

The exact number of items it takes for this to happen depends on the BTree type; an IIBTree, for example, can hold more items before it gets here (different bucket sizes?). I suspect it also depends on the system and the word size. On my system the limit is somewhere under 15,000 items for the LOBTree above (where even the values are primitive types).

This is a consequence of the way a BTree is composed of buckets which are composed of other buckets...and their stored state all boils down to tuples:

>>> state = bt.__getstate__()
>>> len(state), type(state)
(2, <class 'tuple'>)
>>> type(state[0][0]), type(state[0][0].__getstate__()[0])
(<class 'BTrees.LOBTree.LOBucket'>, <class 'tuple'>)

This works out fine when used in the context of ZODB because the sub-buckets are replaced with persistent object identifiers. It just means that large-ish BTree can't be pickled outside of ZODB or some similar system.

I doubt anything can be done about this without major redesign and breaking compatibility, but maybe someone will have an idea. And maybe it's worth a mention in some docs somewhere? (I also wanted to leave this here for Google's sake.)

(For what it's worth, this isn't actually a problem for me. It came up in the context of testing some cache persistence strategies for RelStorage. Pickling a single large dict with ~600K keys in it is very slow prior to pickle protocol 4, so I wondered if BTrees might work better. Because of this they didn't, and I didn't want to role a mini-persistent object system, so I went a different direction.)

@fgregg
Copy link
Contributor

fgregg commented Oct 27, 2016

I'm running into this right now. There was some work, eight years ago, to have pickle not use python's call stack, but instead use a deque object. This would have allowed for pickling of deeply nested structures. http://bugs.python.org/issue3119

I'm looking to see if anyone has an implementation of this idea, and I'll update this issue if I find one.

@fgregg
Copy link
Contributor

fgregg commented Oct 27, 2016

Alternately, it might be possible do take an approach similar to https://github.com/google/pygtrie/blob/master/pygtrie.py#L165-L252 that flattens out the structure in __getstate__ and rehydrates in the __setstate__.

@jamadden, @tseaver would you all consider such an approach. It would be very good for folks like me that want to use BTrees outsize of zope. But, I'm not sure if it would be a meaningful deoptimzation for zope.

If you think it's a possible path, I might start working on a PR.

@tseaver
Copy link
Member

tseaver commented Oct 27, 2016

@fgregg I would be worried about any "intrusive" change to make pickling outside ZODB work: the machinery is tricky. Perhaps another tree implementation would be a better choice? E.g.: bintrees.

@fgregg
Copy link
Contributor

fgregg commented Oct 27, 2016

That makes complete sense, @tseaver. Because I am taking advantage of some of the facilities of zope.index, I might purse a fork of BTrees. If I get something working, we can revisit the question with concrete examples.

@jimfulton
Copy link
Member

I wonder what use cases is motivating this.

@fgregg
Copy link
Contributor

fgregg commented Nov 11, 2016

Hi Jim, Here's my use case:
http://dedupe.readthedocs.io/en/latest/Making-smart-comparisons.html#index-blocks

On Fri, Nov 11, 2016 at 8:09 AM Jim Fulton [email protected] wrote:

I wonder what use cases is motivating this.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#44 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAgxbWC-_ZGnAATbZYHk8gV9phBc9ZhYks5q9HcTgaJpZM4JtRZ9
.

@jimfulton
Copy link
Member

There's nothing in the document that indicates to me a need to pickle a BTree.

Is there some implicit assumption that you aren't using ZODB? If so, why dat? :)

@fgregg
Copy link
Contributor

fgregg commented Nov 11, 2016

No, I'm not using ZODB, because I've never gotten it to work with my
unusual use case.

On Fri, Nov 11, 2016 at 9:27 AM Jim Fulton [email protected] wrote:

There's nothing in the document that indicates to me a need to pickle a
BTree.

Is there some implicit assumption that you aren't using ZODB? If so, why
dat? :)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#44 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAgxbVg7-Dww6Kh5jm3dME-FbCIpXDKqks5q9IlHgaJpZM4JtRZ9
.

@jimfulton
Copy link
Member

That's a bummer. If you feel like mentioning the issues you ran into, maybe on the ZODB list, I'd be interested to see if Ican help.

ZODB (or some of its machinery) seem like a good fit for this use case as they inherently break BTrees up and pickle the parts separately. It's a little hard to say though because I don't know what you're doing with the pickle.

@jimfulton
Copy link
Member

If ZODB isn't involved, it it practical to pickle BTree items?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants