TypeError with OOBTree.getitem for composed keys containing None with py3 #74

ml31415 · 2017-07-16T21:40:40Z

from BTrees.OOBTree import OOBTree
keys = [('a', 100), ('b', 200), ('c', None)]

tree = OOBTree()
for i, key in enumerate(keys):
    tree[key] = i

tree[('c', None)] # works
tree[('a', None)] # TypeError
tree[(None, None)] # TypeError

I recently migrated from py2, where the above code worked fine. With py3, the more strict comparison seems to cause some trouble. Tested with the latest version from pypi.

ml31415 · 2017-07-22T14:14:51Z

I'm a bit confused by the comments in the test suite:

        # Check that a None key can be deleted in Python 2.
        # This doesn't work on Python 3 because None is unorderable,
        # so the tree can't be searched. But None also can't be inserted,
        # and we don't support migrating Python 2 databases to Python 3.

This is stated in a test function, which is explicitly run for py2 only. While at the same time in the very same file some lines above, there are a couple of tests using None as a key which are done for py2 as well as for py3. So I can add None as a key, but it doesn't get tested, if I can delete it?! I guess some more consistent behaviour would be desireable in general. Also, not supporting py2 databases is something I'd hope should be considered last resort.

To add some more incentives to fix this issue, I'd set a bug bounty of $300 to get this sorted out in a consistent way.

jamadden · 2017-07-22T14:21:12Z

Those comments are presumably outdated. Support for None was added back to BTrees in 4.4.0, and I'm guessing those comments weren't updated.

The support for None was special cased to make Python 3 behave like Python 2.

In general, though, I don't know what we can do about arbitrary unorderable keys. It doesn't seem practical to try to special case tuples of any length that may have None at any position.

>>> sys.version_info
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> (1, None) < (None, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'NoneType'

ml31415 · 2017-07-22T14:35:00Z

Would it be possible, to let the user provide a custom compare function? So it might be possible, to recreate the previous behaviour, even if it may be rather slow.

jamadden · 2017-07-22T14:52:40Z

Wouldn't a "custom compare function" just be a class implementing __cmp__? Then you can make sure it works the same in both Python 2 and 3.

Note that it's possible to implement __cmp__ in a collections.namedtuple subclass IIRC.

jamadden · 2017-07-22T15:03:45Z

tree[('c', None)] # works
tree[('a', None)] # TypeError

This is another example of #52 (empty tree will accept a non-orderable type, but non-empty trees won't).

mgedmin · 2017-07-22T15:05:12Z

Nitpick: there's no __cmp__ on Python 3, you have to implement __lt__ and all the other rich comparison functions (usuall the help of functools.total_ordering).

ml31415 · 2017-07-22T15:06:09Z

I'd rather like to keep the original None and native tuples in there and not make a custom comparable class out of it, in order to remain backwards compatible. Also, I consider the pickling of custom classes, even if comfortable in the beginning, as very painful in the long run, as it makes code refactoring always require database updates. And the only way that I see to avoid that, is to work with inbuilt types only. Providing a sort key on the other hand is rather wide spread throughout the whole standard lib.

jamadden · 2017-07-22T22:16:59Z

There are three ways I have thought of to provide sort keys, and they all have drawbacks as far as BTrees itself goes.

I'll prefix this by saying that I feel that providing a comparison function (cmp) is right out. That pattern is deprecated in Python 2 and gone in Python 3 (as @mdedmin reminded us), and it would be unPythonic to try to revive it publicly.

So, implementations. The first way is to copy sorted and list.sort and accept a key function for everything that might need to do some comparisons. That includes both __setitem__ and __getitem__ for the tree[key] and tree[key] = value cases. As BTree is a dict-like object, and a dict-like object only accepts one parameter for those two methods, that just can't work, unless you start down the path of "magic" key values that the BTree knows to inspect and split into a key and a function pair...and nobody wants to deal with that. Nobody wants to deal with having to pass a key function every time they try to subscript a tree either.

The second way would be to make the key function a property of the BTree itself. That avoids the dict-like issues, it's true. But lambda functions can't be pickled, so the key function would have to be named as a module global---and at that point you're right back to "it makes code refactoring always require database updates" (which isn't strictly true, there are ways to manage that pretty cleanly, starting with leaving BWC imports in place moving up to zope.deprecation). It's not clear to me that a module global function is any better than a module global class with custom comparison functions anyway. (This approach has the secondary problem of needing a new pickle format for the tree, which is complicated but possible, I think.)

The last way I thought of is to define a _sort_key instance method for OOBTree and have it internally be called every time the BTree needed to do a comparison (which is a lot). Then you could subclass the OOBTree and override that method. This has performance implications even for those trees that don't use it, which is not good (the C implementation uses a macro to inline the call to the C API comparison function, and this would add the substantial overhead of calling a python function, and it's non-trivial to make sure it's called the absolute minimum number of times; the previous approach could probably do a null-pointer check to avoid this overhead but I don't think that's possible with a potential subclass method). Plus, that's right back to the "it makes code refactoring always require database updates" concern, and once again, it's not strictly clear to me how that's any kind of an improvement over just implementing the comparison in the datatype---surely if they're meant to be sortable in a BTree, it would be good to be able to sort them other places too?

The BTree approach (and indeed, the most general Python approach) to sorting non-orderable objects is to have the user define the order by implementing the appropriate comparison methods. That seems more sound to me than any of the options I discussed above.

If one can accept having custom objects live in the ZODB, then I think code compatibility can easily be kept by using a wrapper (similar to zope.container) around the OOBTree that checks incoming keys to see if they are the appropriate namedtuple comparable subclass, and if not, converting them.

ml31415 · 2017-07-23T18:08:19Z

Big thanks for your long discussion of this issue! I'm afraid I have to agree with you, that the cleanest way would be sortable objects then. In that case, there is not much more to talk about for this issue, except that it's rather unintuitive that the first tuple with None gets accepted.

jamadden · 2017-07-24T13:25:46Z

it's rather unintuitive that the first tuple with None gets accepted

I was thinking about that some. The only reason the first tuple got accepted was because it was automatically already higher in sort order than anything else in the tree simply based on the first element (my comment about this being like #52 is incorrect). If the first tuple with None in it was equal to some other element in its first index, that wouldn't happen:

>>> from BTrees.OOBTree import OOBTree
>>> tree = OOBTree()
>>> tree[('a', 100)] = 1
>>> tree[('a', None)] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < NoneType()

The same thing happens in reverse:

>>> tree = OOBTree()
>>> tree[('a', None)] = 1
>>> tree[('a', 100)] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: NoneType() < int()

In other words, this is still a continuing result of mixing objects that aren't mutually comparable in the same tree. Because tuples lazily only compare as many elements as necessary to determine the result it can show up at any point, just depending on the data.

>>> tree = OOBTree()
>>> tree[(None, 100)] = 1
>>> tree[('a', 100)] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: NoneType() < str()

So maybe it's the tuple's behaviour that's unintuitive 😉

>>> ('a', 100) < ('a',)
False
>>> ('a', 100) < ('a', None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < NoneType()

mgedmin · 2017-07-24T14:13:44Z

Python has a surprising number of footguns for a safe language. :/

ml31415 · 2017-07-25T06:08:03Z

I fully agree, that the tuple behaviour is unintuitive and imho not a good fit for a DB key. Some custom tuple class, fixing length to a certain value, and providing comparison for occuring Nones might be the solution for that.

jamadden · 2017-07-25T11:14:41Z

Cool. Are we ready to close this issue then?

ml31415 mentioned this issue Jul 16, 2017

TypeError with OOTree.__getitem__ for composed keys containing None with py3 zopefoundation/ZODB#167

Closed

ml31415 closed this as completed Jul 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError with OOBTree.getitem for composed keys containing None with py3 #74

TypeError with OOBTree.getitem for composed keys containing None with py3 #74

ml31415 commented Jul 16, 2017 •

edited by jamadden

Loading

ml31415 commented Jul 22, 2017 •

edited

Loading

jamadden commented Jul 22, 2017

ml31415 commented Jul 22, 2017

jamadden commented Jul 22, 2017 •

edited

Loading

jamadden commented Jul 22, 2017

mgedmin commented Jul 22, 2017

ml31415 commented Jul 22, 2017

jamadden commented Jul 22, 2017

ml31415 commented Jul 23, 2017 •

edited

Loading

jamadden commented Jul 24, 2017

mgedmin commented Jul 24, 2017

ml31415 commented Jul 25, 2017

jamadden commented Jul 25, 2017

TypeError with OOBTree.__getitem__ for composed keys containing None with py3 #74

TypeError with OOBTree.__getitem__ for composed keys containing None with py3 #74

Comments

ml31415 commented Jul 16, 2017 • edited by jamadden Loading

ml31415 commented Jul 22, 2017 • edited Loading

jamadden commented Jul 22, 2017

ml31415 commented Jul 22, 2017

jamadden commented Jul 22, 2017 • edited Loading

jamadden commented Jul 22, 2017

mgedmin commented Jul 22, 2017

ml31415 commented Jul 22, 2017

jamadden commented Jul 22, 2017

ml31415 commented Jul 23, 2017 • edited Loading

jamadden commented Jul 24, 2017

mgedmin commented Jul 24, 2017

ml31415 commented Jul 25, 2017

jamadden commented Jul 25, 2017

TypeError with OOBTree.getitem for composed keys containing None with py3 #74

TypeError with OOBTree.getitem for composed keys containing None with py3 #74

ml31415 commented Jul 16, 2017 •

edited by jamadden

Loading

ml31415 commented Jul 22, 2017 •

edited

Loading

jamadden commented Jul 22, 2017 •

edited

Loading

ml31415 commented Jul 23, 2017 •

edited

Loading