Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the condition to handle files size 0 separately #401

Merged

Conversation

nithinb
Copy link
Contributor

@nithinb nithinb commented Nov 28, 2024

if the size of a file is 0, we should handle it differently. One main problem with the earlier approach was that, if there was an file with content (lets call this version of the file V1) and it was uploaded the file would be searchable and downloadable. Lets say the user removes the content of the file (lets call this version of the file V2), and the file that was accessible earlier should no longer be accessible. i.e. V1 should not be accessible and V2 should be accessible

With the old code, the problem would be that the reference to the file with content is not removed so even though the file is updated on CDF you can still access the original file (V1) and even download it. Based on the discussions on slack, we should not make any decisions as an extractor depending on the size or content of the file. So if a client/user has removed the content of the file we should look at it as an update and just upload the new empty file.

With this approach we would essentially be uploading an empty file and hence the file accessible on the UI will be V2.

@nithinb nithinb self-assigned this Nov 28, 2024
@nithinb nithinb requested a review from a team as a code owner November 28, 2024 09:21
Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 85.71429% with 4 lines in your changes missing coverage. Please review.

Project coverage is 74.04%. Comparing base (0dbca8c) to head (03ad935).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
cognite/extractorutils/uploader/files.py 85.71% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #401      +/-   ##
==========================================
+ Coverage   74.00%   74.04%   +0.04%     
==========================================
  Files          41       41              
  Lines        3343     3364      +21     
==========================================
+ Hits         2474     2491      +17     
- Misses        869      873       +4     
Files with missing lines Coverage Δ
cognite/extractorutils/uploader/files.py 85.97% <85.71%> (-0.42%) ⬇️

@einarmo
Copy link
Contributor

einarmo commented Nov 28, 2024

This won't work. The files backend doesn't accept empty streams in all clusters.

@nithinb nithinb force-pushed the DOG-4566-if-a-file-has-no-content-it-will-not-be-uploaded branch from 42b64b4 to 357f2de Compare December 2, 2024 07:41
@nithinb
Copy link
Contributor Author

nithinb commented Dec 3, 2024

@einarmo

So I did go through the httpx code and also printed the headers from within the Request Object within httpx.

so the headers for the file with 0 length content is as follows. The transport-encoding is not set as you can see. And I have tried it a few times and I do not see an error so far. Let me know if this is good to proceed forward :)

self.headers: Headers({'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'user-agent': 'python-httpx/0.27.2', 'accept': '*/*', 'content-length': '0', 'host': 'bluefield.cognitedata.com', 'x-cdp-app': 'file_extractor'})

Copy link
Contributor

@einarmo einarmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this does seem sensible, let's hope it doesn't cause any other issues.

…tor-utils into DOG-4566-if-a-file-has-no-content-it-will-not-be-uploaded
@nithinb nithinb merged commit 5820486 into master Dec 3, 2024
5 checks passed
@nithinb nithinb deleted the DOG-4566-if-a-file-has-no-content-it-will-not-be-uploaded branch December 3, 2024 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants