Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set ThreadPool as default executor #383

Open
kyboi opened this issue Nov 27, 2024 · 1 comment
Open

Set ThreadPool as default executor #383

kyboi opened this issue Nov 27, 2024 · 1 comment

Comments

@kyboi
Copy link

kyboi commented Nov 27, 2024

There are many workflows that require interweaving async and non async (CPU intensive) blocking code. These cannot each be split up into separate tasks because there are locally stored files involved. The best solution is thus to offload the blocking tasks to the executor so as to not block the asyncio loop.

If I understand correctly, each worker process starts a ThreadPoolExecutor in which sync tasks are run. Being able to access this thread pool instead of making another one would be ideal. Currently we are working around this by having a custom receiver, accessing the instance of the threadpool and storing the reference in the application state.

from taskiq.receiver import Receiver


class CustomReceiver(Receiver):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # Store the executor in the broker's state for global access
        # This allows us to run CPU-heavy code on the workers
        # without blocking the asyncio loop
        self.broker.state.executor = self.executor

But I believe a much better solution would be to simply set the created threadpool as the default executor for the asyncio loop so it can be used without passing the reference around:

with ThreadPoolExecutor(args.max_threadpool_threads) as pool:
    loop = asyncio.get_event_loop()
    loop.set_default_executor(self.executor)
await asyncio.get_running_loop().run_in_executor(None, func)

Or in addition / at the minimum allow us to get the instance of the executor from the API.

@kyboi
Copy link
Author

kyboi commented Dec 12, 2024

In fact, it doesn't seem like a good idea to have more than one thread in that threadpool if all it is used for is genuine CPU-intensive sync tasks, provided IO tasks are run with asyncio.

As far as I can tell, the default of having many threads in a thread pool predates the widespread use of asyncio. If you are only doing blocking tasks on threads, due to the GIL, it is counter-productive to have more than one thread, and using the --workers options with multiprocessing should instead be used to match the CPU count.

image
image

https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant