-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added num_workers
to export files in parallel
#923
base: main
Are you sure you want to change the base?
Changes from all commits
35aadb5
10f7f04
6cd35dc
c1f387c
9d301a8
5204fa2
5860bf1
440723d
eb96725
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -57,44 +57,65 @@ | |
self._max_threads = max_threads | ||
self._thread_counter = 0 | ||
self._thread_lock = threading.Lock() | ||
self.tasks = set() | ||
self.canceled = False | ||
self.th_pool = None | ||
|
||
def run( | ||
self, | ||
chunk_gen, | ||
progress_bar=None, | ||
): | ||
results = [] | ||
with concurrent.futures.ThreadPoolExecutor(self._max_threads) as th_pool: | ||
tasks = set() | ||
self.th_pool = concurrent.futures.ThreadPoolExecutor(self._max_threads) | ||
try: | ||
self._thread_counter = 0 | ||
for chunk in chunk_gen: | ||
while len(tasks) >= self._max_threads: | ||
if self.canceled: | ||
break | ||
while len(self.tasks) >= self._max_threads: | ||
done, _ = concurrent.futures.wait( | ||
tasks, timeout=1, return_when="FIRST_COMPLETED" | ||
self.tasks, timeout=1, return_when="FIRST_COMPLETED" | ||
) | ||
self.done_task(done) | ||
|
||
tasks = tasks - done | ||
self.tasks = self.tasks - done | ||
self.update_progress_bar(progress_bar) | ||
|
||
tasks.add(th_pool.submit(self.do_task, chunk)) | ||
self.tasks.add(self.th_pool.submit(self.do_task, chunk)) | ||
self.update_progress_bar(progress_bar) | ||
|
||
while tasks: | ||
while self.tasks: | ||
if self.canceled: | ||
break | ||
done, _ = concurrent.futures.wait( | ||
tasks, timeout=1, return_when="FIRST_COMPLETED" | ||
self.tasks, timeout=1, return_when="FIRST_COMPLETED" | ||
) | ||
task_results = self.done_task(done) | ||
if task_results: | ||
results.extend(task_results) | ||
|
||
tasks = tasks - done | ||
self.tasks = self.tasks - done | ||
self.update_progress_bar(progress_bar) | ||
|
||
th_pool.shutdown() | ||
except: | ||
self.cancel_all() | ||
raise | ||
else: | ||
self.th_pool.shutdown() | ||
|
||
return results | ||
|
||
def cancel_all(self): | ||
self.cancel = True | ||
# Canceling tasks just in case any of them is scheduled to run. | ||
# Note that running tasks cannot be canceled, instead we will wait for | ||
# them to finish when shutting down thread loop executor by calling | ||
# shutdown() method. | ||
for task in self.tasks: | ||
task.cancel() | ||
if self.th_pool: | ||
self.th_pool.shutdown() # this will wait for running tasks to finish | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so, what does it mean for the cancellation? (potentially taking long time?) @skshetry how does cancellation for prefetch work? is it more or less immediate / with some finite timeout? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We schedule a (worker) task to run in a separate thread (not the main thread). The cancellation should be nearly immediate for this discussion. However, technically, it depends on what is currently running in the asyncio event loop, such as a synchronous function (in which case it will block unfortunately until completion). |
||
|
||
def update_progress_bar(self, progress_bar): | ||
if progress_bar is not None: | ||
with self._thread_lock: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate: probably can be done w/o a lock if this only about
+1
. locks imposes some performance loss. There should be something like atomicint or something