Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio support #95

Open
wshayes opened this issue Feb 24, 2019 · 31 comments
Open

asyncio support #95

wshayes opened this issue Feb 24, 2019 · 31 comments
Assignees

Comments

@wshayes
Copy link
Contributor

wshayes commented Feb 24, 2019

Hi Joohwan,

We really like your arango library. I am curious what your plans are in regards to asyncio support. I saw you started another repo for it. I'd be interested in helping out where I could to move this forward.

Thanks!

@joowani
Copy link
Contributor

joowani commented Feb 24, 2019

Hi @wshayes,

Thank you for liking python-arango. I am glad to hear that people are finding it useful.

Unfortunately I am too busy right now to start a new project and my knowledge on asyncio (and async programming in general) is novice at best. I'm not sure when I will have time to invest in it. I will keep this issue open for any updates in the future.

Best,
Joohwan

@paulhoule
Copy link

I was about to put this request in when I found it was already in.

I thought about it a bit and here is what I know.

Most of the API would not have to change, in particular, you wouldn't have to make every def an async def. That is because you can make it so that in "async mode" the ordinary functions return a coroutine that returns the result instead of returning a result. Somebody can await that result in async code and it will be all good.

The first step is making an HttpClient that uses aiohttp instead of requests; that should be easy, and the key is that we return a coroutine that returns the result.

Another thing that needs to change are the Executors; I think we need to build a parallel set of
executors for the four different implementations of executor.

Then there is the plumbing to make sure that when you using the async HttpClient you also get
the async executors. Then I think we would be in good shape.

The main annoyance I see is that the async/await syntax is Python 3.5+ only. I think the clean solution is make an interface for the sync/asyncio implementations and then have a second package which
is Py3.5 that contains the asyncio implementation.

I might work up the motivation to make a pull request.

@joowani joowani self-assigned this Aug 24, 2019
@mooncake4132
Copy link
Contributor

This will also be a very useful feature for me. @joowani, is this still something you plan to invest time in?

@bloodbare
Copy link

Hi!

I was on the need of an async driver for arangodb so I took yours @joowani and did a fork on async style! Feel free to comment it.

https://github.com/bloodbare/aioarangodb

Thanks for your amazing work, its been so easy to adapt to async world

@joowani
Copy link
Contributor

joowani commented Jun 11, 2020

Hi @bloodbare,

Wow this is amazing! I'll make sure to mention your project in python-arango readme. You should also send an email to ArangoDB team so they can have your driver on their website. Awesome work.

@wshayes wshayes closed this as completed Jul 6, 2020
@rennenc
Copy link

rennenc commented May 20, 2021

@bloodbare's package looks amazing!

Is there still a plan to support async io in the formal package?
Seems like this repo keeps updating every few months while the asyncio's last commit is from 1y ago and less popular.

@mirrorrim
Copy link

Hi!

I was also looking for the asynchronous version and noticed that the version from @bloodbare had not been updated for over a year, so I made a fork (not a copy) and made it fully asynchronous. The current version (1.0.0) fully complient to python-arango 7.2.0.

https://github.com/mirrorrim/aioarango

@rennenc please take a look, I'll try to keep it up to date :)

@rennenc
Copy link

rennenc commented Jul 5, 2021

Thanks. I still think it would be great to have to adopted and maintained as part of the same repo.
Nevertheless, will take a look.

@bloodbare
Copy link

bloodbare commented Jul 5, 2021 via email

@mirrorrim
Copy link

Sorry, I didn't see any activity in your repository: 4 open issues and 1 pull request :(

@bloodbare
Copy link

bloodbare commented Jul 5, 2021 via email

@alexvanzyl
Copy link

First thanks @joowani for all the efforts on the python-arango library it's really a well-written project.

I was looking for an async solution and judging from the comments it seems like having an async version of this library will be beneficial. I see @mirrorrim and @bloodbare has already done some work to make this possible.

@mirrorrim or @bloodbare have you reached out or would you consider making either of your repositories part of the ArangoDB Community?

I think it would be cool to have this part of an official community package just to get more eyes on it and would be happy to contribute in terms of keeping it up to date with the python-arango package. Also would like to avoid creating another fork if possible.

@incorvia
Copy link

incorvia commented Jun 14, 2022

It would be great if @mirrorrim's work could be merged back into the main project as an optional API or it was somehow made an official community package as @alexvanzyl suggested.

@joakimnordling
Copy link

I agree with @incorvia, it would be really nice if we could get the async versions merged into the official community package and maintained as part of it in the future. I also want to highlight that at the moment this issue is closed and I think based on the discussion here it would be worth considering to reopen it and try to get forward with incorporating the work into the main community package, or what do you think?

@cw00dw0rd
Copy link
Contributor

We would gladly welcome the aioarango package as a part of the arangodb-community org! I think merging it as part of the python-arango package might be beneficial but would need @joowani and @aMahanna to weigh in on that.

If you want to bring the project over please ping me on Slack to discuss the details: https://join.slack.com/t/arangodb-community/shared_invite/zt-1b66mygms-j8TmOdXE7FojR5yA2Yg8kg (Chris.ArangoDB)

@cw00dw0rd cw00dw0rd reopened this Jun 20, 2022
@joowani
Copy link
Contributor

joowani commented Jun 20, 2022

I'm open to merging into python-arango, but it will nearly double the codebase (and double the effort to maintain). I'm not sure how much code reuse there could be between sync and async yet. It will probably require a non-trivial amount of work to refactor.

@incorvia
Copy link

This seems to be the main commit that was written to convert python-arango to use async.. perhaps a PR could be opened that uses this as a starting point. It doesn't look like it would double the code base even if async was conditionalized..

@mirrorrim any suggestions here since you did the work?

@joakimnordling
Copy link

In case you want some ideas for how to easily maintain the codebase with both the sync and async version, here's what we did for firedantic (shares many ideas with arangodantic that we also created, but for which we don't have a sync version).

  1. We created an async version of the library (this commit) in addition to the sync version.
  2. A colleague set up a tool that from the async version automatically generates the sync version of the code (this commit) The main things here is the unasync.py script that generates the sync version from the async one and the pre-commit hook that ensures the sync version is generated before committing.
  3. Now any further work is just a matter of updating the async version and ensure the sync version (which is generated automatically) works (mainly reviewing the generated code and run the tests) and possibly do minor adjustments to the unasync.py script.

I should highlight that the idea is from https://github.com/python-trio/unasync and the unasync.py we use is a modified version of https://github.com/encode/httpcore/blob/master/unasync.py.

A suggestion for how to incorporate this same into python-arango would be to restructure the code a into directories for the sync/async versions, then ensure the async version is up to date and then create a modified version of the unasync.py so it's able to generate the sync version from the async one. Some special treatment likely needed for the client (httpx/requests), but that should be possible to do as well. Then after that, the main work for maintaining the library should be just a matter of updating the async version and ensure there's no issues with the generated sync version, so not a lot more work than maintaining just one code base.

There might be better approaches (if you know any, please let me know as well), but this has served us really well for firedantic.

@davidschrooten
Copy link

davidschrooten commented Jun 30, 2022

What would be nice is to call a async version of db from the regular client instance. The aioarango library changed every method to async which might broadens the scope too much. It would probably be better to start with a limited set of functions that benefit the most of async operations such as reads and insertions.

@joowani
Copy link
Contributor

joowani commented Jun 30, 2022

@joakimnordling unasync.py looks interesting but feels a little hacky. I'm leaning towards @davidschrooten's suggestion a little more currently (which will allow us to take advantage of @mirrorrim and @bloodbare's work via copy paste essentially), but will explore both.

@ghost
Copy link

ghost commented Jan 31, 2023

I'm open to merging into python-arango, but it will nearly double the codebase (and double the effort to maintain). I'm not sure how much code reuse there could be between sync and async yet. It will probably require a non-trivial amount of work to refactor.

Why not keep the async definitions truly async?

@paulhoule
Copy link

paulhoule commented Jan 31, 2023 via email

@ghost
Copy link

ghost commented Feb 5, 2023

Today I am using

https://aioarango.readthedocs.io/en/latest/

I also see there is

https://aioarangodb.readthedocs.io/en/latest/

which I have not done an actual comparison so I don't feel a lot of reason
to push for changes.

As of today, neither libraries are actively developed nor maintained.
It's better to keep synchronous and async libraries separately, as it's doubtful that both versions will be used in the same project. As well one may find that in other DB drivers blocking and async versions are being developed separately.

@paulhoule
Copy link

paulhoule commented Feb 10, 2023 via email

@dasTholo
Copy link

I started a PR at Arangodantic. Maybe you want to have a look at it.
I have chosen (Asyncer)[https://asyncer.tiangolo.com/] as AsyncLib and python-arango as client.
With Asyncer i can call sync code from async etc. This Package is made by tiangolo. The Author from FastApi.

I would be very happy about comments and remarks

Arangodantic V2 PR

@apetenchea
Copy link
Member

Hi @dasTholo,

It's great to see proactive steps like yours, especially on such an important and highly requested feature. We recognize the increasing need for asyncio support and are actively pushing to add it to our roadmap.
I like your asyncify approach, it seems to simplify the implementation and avoid code duplication.

@bencz
Copy link

bencz commented Mar 8, 2024

Welll... I have created a simple class, to help with this....

import asyncio
from concurrent.futures import ThreadPoolExecutor
from arango import ArangoClient


class ArangoDBRepository:
    def __init__(self, database_url, database_name, username, password):
        self._executor = ThreadPoolExecutor()
        self._client = ArangoClient(hosts=database_url)
        self._db = self._client.db(database_name, username=username, password=password)

    async def run_in_executor(self, func, *args):
        loop = asyncio.get_event_loop()
        try:
            result = await loop.run_in_executor(self._executor, func, *args)
            return result
        except Exception as e:
            print(f"An error occurred: {e}")
            raise

    async def has_collection(self, collection_name):
        return await self.run_in_executor(lambda: self._db.has_collection(collection_name))

    async def create_collection(self, collection_name, edge=False):
        return await self.run_in_executor(lambda: self._db.create_collection(collection_name, edge=edge))

    async def delete_collection(self, collection_name):
        return await self.run_in_executor(lambda: self._db.delete_collection(collection_name))

    async def create_persistent_index(self,
                                      collection_name,
                                      index_name,
                                      index_fields,
                                      cache_enabled):
        return await self.run_in_executor(
            lambda: self._db.collection(collection_name).add_persistent_index(name=index_name,
                                                                              fields=index_fields,
                                                                              cacheEnabled=cache_enabled))

    async def execute_aql_query(self, query, bind_vars=None):
        def aql_query():
            cursor = self._db.aql.execute(query, bind_vars=bind_vars)
            return [doc for doc in cursor]

        return await self.run_in_executor(aql_query)

    async def insert(self, collection_name,
                     document,
                     overwrite,
                     silent,
                     return_new):
        def insert():
            return self._db.collection(collection_name).insert(document,
                                                               overwrite=overwrite,
                                                               silent=silent,
                                                               return_new=return_new)

        return await self.run_in_executor(insert)

    async def insert_many(self,
                          collection_name,
                          documents,
                          overwrite,
                          silent):
        def insert_many():
            return self._db.collection(collection_name).insert_many(documents, overwrite=overwrite, silent=silent)

        return await self.run_in_executor(insert_many)

    async def batch_find_by_key(self,
                                collection_name,
                                key):
        return await self.run_in_executor(lambda: self._db.collection(collection_name).find({'_key': key}).batch())

@apetenchea
Copy link
Member

Hey @bencz, I like your class. The ThreadPoolExecutor approach is great for ensuring some asynchronous functionality, while keeping the rest of the codebase flexible.

I would like to point out that we're about to start working on an asynchronous driver: python-arango-async. The time allocated for this is limited, but yes, it's finally happening. I can't give a clear estimation of when it would be ready, but I expect about 3 months. We plan to release it gradually, starting with a minimal release once we've covered the basic functionality, and then adding more incrementally. The new driver is going to follow the same python-arango interface for most classes.

In the meantime, workarounds such as yours are a great way to "asynchronize" already existing code. Thanks for providing that example!

@apetenchea apetenchea assigned apetenchea and unassigned joowani May 20, 2024
@arunzone
Copy link

arunzone commented Jun 11, 2024

I understand the limitation for python-arango-async delivery for production use. But this repo is being updated regulalry. can we have additional apis in this repo to support async to avoid differences in async repo?

@ronenlinx
Copy link

any news?

@apetenchea
Copy link
Member

apetenchea commented Jun 20, 2024

Hi! The async repo is still work in progress. It has not been updated in a while because the time I can dedicate to that is unfortunately limited, and first priority has to be the currently existing driver, for which we had to work on a major release.
I totally get that the situation is not great in terms of timing, but we'll be resuming work on the async driver very soon (this week or next one). I can't give you an ETA, but if you're looking for a ballpark estimate, we should release a functional first version of the async driver sometime this year.
Now @arunzone, regarding the additional APIs for this repo, I'm afraid that's currently not possible on our side. The main reason being that I would prefer to put the limited time that I got into developing the new async driver and maintaining the currently existing code in the synchronous one. Adding code, even temporary, would mean additional maintenance efforts on our side, which is not sustainable in the long run. So please bear with me, and if you really need some async wrapper, I know some of our community members have created some in the past (although I'm not sure how up-to-date these are).
Rest assured that I'm aware of that fact the most of our user base is already using the synchronous driver, therefore the async implementation won't totally deviate from the standard that we're all used to. An easy integration is one of our goals.
I know this takes a while, but it's coming, I promise 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests