Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError with OOBTree.__getitem__ for composed keys containing None with py3 #74

Closed
ml31415 opened this issue Jul 16, 2017 · 13 comments
Closed

Comments

@ml31415
Copy link

ml31415 commented Jul 16, 2017

from BTrees.OOBTree import OOBTree
keys = [('a', 100), ('b', 200), ('c', None)]

tree = OOBTree()
for i, key in enumerate(keys):
    tree[key] = i

tree[('c', None)] # works
tree[('a', None)] # TypeError
tree[(None, None)] # TypeError

I recently migrated from py2, where the above code worked fine. With py3, the more strict comparison seems to cause some trouble. Tested with the latest version from pypi.

@ml31415
Copy link
Author

ml31415 commented Jul 22, 2017

I'm a bit confused by the comments in the test suite:

        # Check that a None key can be deleted in Python 2.
        # This doesn't work on Python 3 because None is unorderable,
        # so the tree can't be searched. But None also can't be inserted,
        # and we don't support migrating Python 2 databases to Python 3.

This is stated in a test function, which is explicitly run for py2 only. While at the same time in the very same file some lines above, there are a couple of tests using None as a key which are done for py2 as well as for py3. So I can add None as a key, but it doesn't get tested, if I can delete it?! I guess some more consistent behaviour would be desireable in general. Also, not supporting py2 databases is something I'd hope should be considered last resort.

To add some more incentives to fix this issue, I'd set a bug bounty of $300 to get this sorted out in a consistent way.

@jamadden
Copy link
Member

Those comments are presumably outdated. Support for None was added back to BTrees in 4.4.0, and I'm guessing those comments weren't updated.

The support for None was special cased to make Python 3 behave like Python 2.

In general, though, I don't know what we can do about arbitrary unorderable keys. It doesn't seem practical to try to special case tuples of any length that may have None at any position.

>>> sys.version_info
sys.version_info(major=3, minor=6, micro=1, releaselevel='final', serial=0)
>>> (1, None) < (None, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'NoneType'

@ml31415
Copy link
Author

ml31415 commented Jul 22, 2017

Would it be possible, to let the user provide a custom compare function? So it might be possible, to recreate the previous behaviour, even if it may be rather slow.

@jamadden
Copy link
Member

jamadden commented Jul 22, 2017

Wouldn't a "custom compare function" just be a class implementing __cmp__? Then you can make sure it works the same in both Python 2 and 3.

Note that it's possible to implement __cmp__ in a collections.namedtuple subclass IIRC.

@jamadden
Copy link
Member

tree[('c', None)] # works
tree[('a', None)] # TypeError

This is another example of #52 (empty tree will accept a non-orderable type, but non-empty trees won't).

@mgedmin
Copy link
Member

mgedmin commented Jul 22, 2017

Nitpick: there's no __cmp__ on Python 3, you have to implement __lt__ and all the other rich comparison functions (usuall the help of functools.total_ordering).

@ml31415
Copy link
Author

ml31415 commented Jul 22, 2017

I'd rather like to keep the original None and native tuples in there and not make a custom comparable class out of it, in order to remain backwards compatible. Also, I consider the pickling of custom classes, even if comfortable in the beginning, as very painful in the long run, as it makes code refactoring always require database updates. And the only way that I see to avoid that, is to work with inbuilt types only. Providing a sort key on the other hand is rather wide spread throughout the whole standard lib.

@jamadden
Copy link
Member

There are three ways I have thought of to provide sort keys, and they all have drawbacks as far as BTrees itself goes.

I'll prefix this by saying that I feel that providing a comparison function (cmp) is right out. That pattern is deprecated in Python 2 and gone in Python 3 (as @mdedmin reminded us), and it would be unPythonic to try to revive it publicly.

So, implementations. The first way is to copy sorted and list.sort and accept a key function for everything that might need to do some comparisons. That includes both __setitem__ and __getitem__ for the tree[key] and tree[key] = value cases. As BTree is a dict-like object, and a dict-like object only accepts one parameter for those two methods, that just can't work, unless you start down the path of "magic" key values that the BTree knows to inspect and split into a key and a function pair...and nobody wants to deal with that. Nobody wants to deal with having to pass a key function every time they try to subscript a tree either.

The second way would be to make the key function a property of the BTree itself. That avoids the dict-like issues, it's true. But lambda functions can't be pickled, so the key function would have to be named as a module global---and at that point you're right back to "it makes code refactoring always require database updates" (which isn't strictly true, there are ways to manage that pretty cleanly, starting with leaving BWC imports in place moving up to zope.deprecation). It's not clear to me that a module global function is any better than a module global class with custom comparison functions anyway. (This approach has the secondary problem of needing a new pickle format for the tree, which is complicated but possible, I think.)

The last way I thought of is to define a _sort_key instance method for OOBTree and have it internally be called every time the BTree needed to do a comparison (which is a lot). Then you could subclass the OOBTree and override that method. This has performance implications even for those trees that don't use it, which is not good (the C implementation uses a macro to inline the call to the C API comparison function, and this would add the substantial overhead of calling a python function, and it's non-trivial to make sure it's called the absolute minimum number of times; the previous approach could probably do a null-pointer check to avoid this overhead but I don't think that's possible with a potential subclass method). Plus, that's right back to the "it makes code refactoring always require database updates" concern, and once again, it's not strictly clear to me how that's any kind of an improvement over just implementing the comparison in the datatype---surely if they're meant to be sortable in a BTree, it would be good to be able to sort them other places too?

The BTree approach (and indeed, the most general Python approach) to sorting non-orderable objects is to have the user define the order by implementing the appropriate comparison methods. That seems more sound to me than any of the options I discussed above.

If one can accept having custom objects live in the ZODB, then I think code compatibility can easily be kept by using a wrapper (similar to zope.container) around the OOBTree that checks incoming keys to see if they are the appropriate namedtuple comparable subclass, and if not, converting them.

@ml31415
Copy link
Author

ml31415 commented Jul 23, 2017

Big thanks for your long discussion of this issue! I'm afraid I have to agree with you, that the cleanest way would be sortable objects then. In that case, there is not much more to talk about for this issue, except that it's rather unintuitive that the first tuple with None gets accepted.

@jamadden
Copy link
Member

it's rather unintuitive that the first tuple with None gets accepted

I was thinking about that some. The only reason the first tuple got accepted was because it was automatically already higher in sort order than anything else in the tree simply based on the first element (my comment about this being like #52 is incorrect). If the first tuple with None in it was equal to some other element in its first index, that wouldn't happen:

>>> from BTrees.OOBTree import OOBTree
>>> tree = OOBTree()
>>> tree[('a', 100)] = 1
>>> tree[('a', None)] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < NoneType()

The same thing happens in reverse:

>>> tree = OOBTree()
>>> tree[('a', None)] = 1
>>> tree[('a', 100)] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: NoneType() < int()

In other words, this is still a continuing result of mixing objects that aren't mutually comparable in the same tree. Because tuples lazily only compare as many elements as necessary to determine the result it can show up at any point, just depending on the data.

>>> tree = OOBTree()
>>> tree[(None, 100)] = 1
>>> tree[('a', 100)] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: NoneType() < str()

So maybe it's the tuple's behaviour that's unintuitive 😉

>>> ('a', 100) < ('a',)
False
>>> ('a', 100) < ('a', None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < NoneType()

@mgedmin
Copy link
Member

mgedmin commented Jul 24, 2017

Python has a surprising number of footguns for a safe language. :/

@ml31415
Copy link
Author

ml31415 commented Jul 25, 2017

I fully agree, that the tuple behaviour is unintuitive and imho not a good fit for a DB key. Some custom tuple class, fixing length to a certain value, and providing comparison for occuring Nones might be the solution for that.

@jamadden
Copy link
Member

Cool. Are we ready to close this issue then?

@ml31415 ml31415 closed this as completed Jul 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants