Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change file upload queue to eager uploads #293

Merged
merged 2 commits into from
Feb 15, 2024
Merged

Conversation

mathialo
Copy link
Collaborator

There is no real need for the file extractor to batch uploads like the other queue, in fact it makes it slower and worse.

This changes the file upload queues in some significant ways:

  • Instead of waiting to upload until a set of conditions, it starts uploading immediately.
  • The upload() method now acts more like a join, waiting on all the uploads in the queue to complete before returning.
  • A call to add_to_upload_queue when the queue is full will hang until the queue is no longer full before returning, instead of triggering and upload and hanging until everything is uploaded.
  • We require a max size, and remove the max wait time

As long as you use the queue in as a context, ie using

with FileUploadQueue(...) as queue:

and don't use time uploads, you should not have to change anything in your code. The behaviour of the queue will change, it will most likely be much faster, but it will not require any changes from you as a user of the queue.

I propose considering this a breaking change, and bumping the major version on next release, even though most users should be able to just update without changing their code. It's still a big change in the behaviour of the queue.

There is no real need for the file extractor to batch uploads like the
other queue, in fact it makes it slower and worse.

This changes the file upload queues in some significant ways:

 - Instead of waiting to upload until a set of conditions, it starts
   uploading immediately.
 - The `upload()` method now acts more like a `join`, waiting on all the
   uploads in the queue to complete before returning.
 - A call to `add_to_upload_queue` when the queue is full will hang
   until the queue is no longer full before returning, instead of
   triggering and upload and hanging until everything is uploaded.
 - We require a max size, and remove the max wait time

As long as you use the queue in as a context, ie using
``` python
with FileUploadQueue(...) as queue:
```
and don't use time uploads, you should not have to change anything in
your code. The behaviour of the queue will change, it will most likely
be much faster, but it will not require any changes from you as a user
of the queue.

I propose considering this a breaking change, and bumping the major
version on next release, even though most users should be able to just
update without changing their code. It's still a big change in the
behaviour of the queue.
@mathialo mathialo requested a review from a team as a code owner February 14, 2024 13:13
Copy link

codecov bot commented Feb 14, 2024

Codecov Report

Merging #293 (f20ebfb) into master (662593d) will decrease coverage by 0.43%.
The diff coverage is 68.29%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #293      +/-   ##
==========================================
- Coverage   79.52%   79.10%   -0.43%     
==========================================
  Files          22       22              
  Lines        2007     2024      +17     
==========================================
+ Hits         1596     1601       +5     
- Misses        411      423      +12     
Files Coverage Δ
cognite/extractorutils/uploader/files.py 79.20% <68.29%> (-9.17%) ⬇️

... and 2 files with indirect coverage changes

@einarmo
Copy link
Contributor

einarmo commented Feb 14, 2024

@mathialo
Copy link
Collaborator Author

We could rewrite to using conditions, but you would still end up in a polling loop, this time checking the cancellation token. I did not find a neat way of both waiting on a condition and an event.

Maybe that's better? I guess we can poll much slower since triggering a shutdown happens less often and can safely wait a second or two.

@einarmo
Copy link
Contributor

einarmo commented Feb 15, 2024

We could rewrite to using conditions, but you would still end up in a polling loop, this time checking the cancellation token. I did not find a neat way of both waiting on a condition and an event.

Maybe that's better? I guess we can poll much slower since triggering a shutdown happens less often and can safely wait a second or two.

Ugh, yeah, one would think an event could be combined with other conditions somehow, but no.

@mathialo mathialo merged commit 9810600 into master Feb 15, 2024
6 checks passed
@mathialo mathialo deleted the threadpool-files branch February 15, 2024 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants