-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JobManager: create & start in parallel #719
Comments
@jdries @soxofaan could we run the start job across multiple threads e.g. by: from threading import Thread
another option might be to look into asyncio which would run sinfle thread but jump between the various job creations/starts |
Threading won't work I'm afraid, because in Python only one thread can be active at a time and the current "start_job" requests are blocking. So the execution would not be parallel in reality . We would have to use a non-blocking request library like https://www.python-httpx.org, or use multiprocessing to have effective parallelism multiprocessing might be the easiest route for now (I'm not so sure how easy it will be to switch to httpx from our classic "requests" based implementation) |
Indeed, that would probably be a more modern approach, but it's not trivial to migrate everything (or at least a well chosen subset) we already have to this new paradigm |
reading a bit deeper into it, If we indeed want a full performance we need to make sure that all network requests, database queries, etc can run asynchronously. This might make the code overtly complex since as a standard we only support 2 parallel jobs... |
Ok I did some testing with requests in threads, and apparently it does work to do requests in parallel that way. I was probably confusing it with another threading problem I had before. |
Do we know what the upper limit is on the amount of threads we could use? Being able to add 20 jobs in parallel would already make a big difference. LCFM would probably prefer 100 at once |
Note that it might be counter-productive to do too much in parallel as well: I would default to something like 5, and maybe scale up a bit if you know what you are doing. threading tutorials typically point to thread pools (with fixed limit) to solve this easily |
Just to be sure, this is with the STAC-based implementation of the job manager? As pandas dataframes are not thread-safe, so unless you explicitly add locks, this might run awry at scale. |
Same concern here (and in PR #723) about pandas. While pandas as kind of database API was handy in the proof of concept phase of the job manager, I have the feeling it is now actually making progress harder than it could be. We had various struggles in the past when implementing new job manager features, and it is now making threading based features quite challenging. I think we should try to get/keep pandas out of the code paths that we want to run in threads. |
Creating and starting a job takes some time, which means there's an interval when the jobmanager is creating new jobs and resources are potentially unused.
If we can start jobs in parallel, we can decrease this.
Do note that we typically have rate-limiting in place on backends, so we have to be resilient there.
The text was updated successfully, but these errors were encountered: