Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asyncio support #311

Open
penn5 opened this issue Jun 3, 2020 · 15 comments
Open

Asyncio support #311

penn5 opened this issue Jun 3, 2020 · 15 comments
Labels

Comments

@penn5
Copy link

penn5 commented Jun 3, 2020

Are there any plans for ZODB to support asynchronous reads and writes from the database with asyncio?

@penn5
Copy link
Author

penn5 commented Jun 3, 2020

Okay, I found #53. However, I'm not planning to use ZODB in a Zope project but something completely unrelated, which uses asyncio. From my understanding, in the current state this would break:

  • transactions, which are thread-local (explicit declaration will bypass this)
  • the async-ness, because commiting a transaction is blocking, and reading a large object from the DB could block sometimes too

@jamadden
Copy link
Member

jamadden commented Jun 3, 2020

#53 discusses some of the practical issues with asyncio. More fundamentally, ZODB's programming model of transparent demand-paged objects simply does not fit well with asyncio. In ZODB, any attribute access anywhere on any persistent object could potentially lead to (blocking) database calls. That doesn't work well with asycio's "all yield points must be excruciatingly annotated as such" model.

For asynchronous programming, gevent does work extremely well with ZODB (when backed by a storage like RelStorage and at least at one point, ZEO).

@penn5
Copy link
Author

penn5 commented Jun 3, 2020

Perhaps when accessing these attributes, an await can be applied to the actual attribute - as though the attribute is a coroutine (using an @Property, and ignored when not a ghost), and similarly awaiting on transaction saves? Without asyncio, ZODB is useless to me, as I'm already highly invested in it.

@jamadden
Copy link
Member

jamadden commented Jun 3, 2020

Persistent ZODB objects work by overridding __getattribute__ ( to retrieve their state when needed. (Note that this happens in a base class that all persistent objects must extend.) __getattribute__ is implicitly called by the Python runtime. I don't know how one would make that async.

EDITED to add: The link to __getattribute__ is the pure-Python implementation. CPython typically uses a base class implemented in C.

@penn5
Copy link
Author

penn5 commented Jun 3, 2020

Yep, was reading the source and saw that. Surely we can return a coroutine from __getattribute__, which can then be awaited by user code?

@jamadden
Copy link
Member

jamadden commented Jun 3, 2020

Even if that were possible (I'm not sure it is) it would be extremely unpleasant because you'd have to do that for every attribute access. And like a disease, it would spread, to every consumer of any persistent object (asyncio's major fatal flaw). Since persistent objects are just objects that no one has to treat any differently from any other object, that rather defeats the point (e.g., every consumer of any object would have to be modified to handle a persistent object — every function everywhere would have to be async def; what does that even mean for C extensions?).

Hypothetical sketch of what that would look like:

async def debit(account_number, amount):
    p_bank_account = connection.root()[account_number]
    can_debit = await p_bank_account.can_debit
    if not can_debit:
        frozen = await p_bank_account.frozen
        act_type = await p_bank_account.type
        throw AccountError(
            "Cannot debit. Account might be frozen (%s) or wrong type (%s)" % (frozen, act_type))
    # Calling a method first gets the attribute so we have to wait for that
    has_enough_funds = await p_bank_account.has_enough_funds 
    # The method itself probably uses other attributes, including methods, 
    # so it must be declared async too...
    has_enough_funds = await has_enough_funds(amount)
    if not has_enough_funds:
        throw AccountError("Cannot debit, insufficient funds")
    # Logging is a problem. We can't just pass the object there, logging won't wait for 
    # anything.
    logging.info("Debiting %s from account %s", amount, p_bank_account) # WRONG
    # etc…

@penn5
Copy link
Author

penn5 commented Jun 3, 2020

Regarding the way you fetch has_enough_funds, I would assume that methods aren't saved to the db (are transient) by default, so that could be safely removed.

Regarding your complaint that asyncio spreads like a disease, that's sort of the whole point of it. I'd say that it should be optional - there could he an AIODB class, perhaps?

About logging, you can probably just override __repr__ to provide the data in the str only if its available, as in it will just log "<evicted BankAccount at 0x12345678>" if the object was not loaded to RAM. For that to work you should create a new thing, _c_property which is either the property or a marker token, to mark that the property wasn't loaded to RAM.

@jamadden
Copy link
Member

jamadden commented Jun 3, 2020

Regarding the way you fetch has_enough_funds, I would assume that methods aren't saved to the db (are transient) by default, so that could be safely removed.

Nope. Methods go through the standard attribute lookup process, and that includes invoking __getattribute__. After all, a callable attribute may be anything: a method, a function, a callable object, a class. One doesn't necessarily know which until the object has been activated.

>>> class Persistent(object):
...     def method(self):
...         return 1
...     def __getattribute__(self, name):
...         print("Getting attribute", name)
...         return object.__getattribute__(self, name)
...
>>> p = Persistent()
>>> p.method()
Getting attribute method
1

@jamadden
Copy link
Member

jamadden commented Jun 3, 2020

there could he an AIODB class, perhaps?

Individual persistent objects talk to an IPersistentDataManager implementation stored in their _p_jar (as in "pickle jar") to save and retrieve their state. In ZODB, IPersistentDataManager is provided by the Connection, which is created by the DB with a particular IStorage instance that holds the data, but there are other persistent data manager implementations out there. At a minimum, one would need to create a new IPersistentDataManager implementation, but to work within the ZODB framework, it's probable that a new IStorage implementation would be the better option. To get the semantics sketched out above, one would probably also have to create a new Persistent base class.

Or perhaps everything async could be hidden inside the IStorage implementation by kicking work off to a asyncio cooperative thread pool and waiting for those tasks.

Happy hacking!

@penn5
Copy link
Author

penn5 commented Jun 3, 2020

Regarding the way you fetch has_enough_funds, I would assume that methods aren't saved to the db (are transient) by default, so that could be safely removed.

Nope. Methods go through the standard attribute lookup process, and that includes invoking __getattribute__. After all, a callable attribute may be anything: a method, a function, a callable object, a class. One doesn't necessarily know which until the object has been activated.

>>> class Persistent(object):
...     def method(self):
...         return 1
...     def __getattribute__(self, name):
...         print("Getting attribute", name)
...         return object.__getattribute__(self, name)
...
>>> p = Persistent()
>>> p.method()
Getting attribute method
1

Of course they go through getattribute, just don't know whether ZODB pickles them... It seems wasteful to pickle the functions when they are provided by the code

@penn5
Copy link
Author

penn5 commented Jun 3, 2020

Happy hacking!

Oh, okay. See you next week, when I try to start and utterly fail :P

@jamadden
Copy link
Member

jamadden commented Jun 3, 2020

Of course they go through getattribute, just don't know whether ZODB pickles them... It seems wasteful to pickle the functions when they are provided by the code

Sure, usually, they're not. Usually an_object.method() refers to a def method() inside some class statement somewhere. Those of course aren't pickled. But there's no way to know up front if you're dealing with that or if something has been customized by the instance. All __getattribute__ gets is the name.

>>> p.method = lambda: 42
>>> p.method()
Getting attribute method
42

@penn5
Copy link
Author

penn5 commented Jun 3, 2020

That's true. I guess it will need an await then unless they are annotated @classmethod perhaps. I guess it would still go through getattribute, but we can detect a classmethod and skip it, perhaps? I think that wouldn't work because the classmethod still can be overridden in instances, but perhaps a similar solution could work (applying a marker decorator to class-level functions we want to use, regardless of what's in the db)

@jamadden
Copy link
Member

jamadden commented Jun 3, 2020

They're you're getting into the descriptor protocol, which is somewhat complex (functions themselves are non-data descriptors, which is how they implement binding; a @classmethod just wraps a different descriptor around the function).

Fortunately, that complexity is not the responsibility of Persistent or ZODB. The job of Persistent.__getattribute__ is to restore the object's data (it's __dict__ and __slots__) using the storage provided by _p_jar (ZODB). Once that's done, actually finding the correct value, taking all the descriptor mechanics into account, is left to object itself.

Doing anything else would require a different implementation of Persistent.

@penn5
Copy link
Author

penn5 commented Jun 4, 2020

From my understanding of Persistent, invoking Persistent.__getattribute__ will still trigger the data storage to load the object, even if the object being fetched is a descriptor, and hence, without additional modification to either Persistent or the data store, the object will still be loaded even for fetching a descriptor.

Descriptors are searched for in the instance before the class, so assuming that all class-level descriptors are not replaceable on instances is false.

I think descriptors don't really solve the problem, the best way to avoid it would surely be an annotation applied from Persistent to mark that the class-level attribute never requires lookup (this would presumably make the library faster too)
Applying this imaginary annotation would also allow methods to bind implicitly since functions are descriptors - this flag would allow the __getattribute__ to passthrough without loading the object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants