Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate inefficiency in form of some extra data downloaded in sync for large files #380

Open
ppolewicz opened this issue Apr 2, 2023 · 0 comments
Labels
help wanted Extra attention is needed

Comments

@ppolewicz
Copy link
Collaborator

Large files are downloaded in parts in parallel by all compliant clients, as specified by B2 integration checklist.

The extra data cost is there because when we start streaming the file, we don't (usually) know the size, so we are not requesting an exact range for the download of the first part, but we are sort of requesting the entire file. As soon as the headers come we realize that it's actually a big file, so we start a download session for the further parts of the file. When the data stream for the first part will emit enough data for us to satisfy the first part of the download range, we close the socket HOWEVER because there is some latency on the network, the server has already managed to send us some extra data which is being transmitted from the server to the client while the connection close request is transmitted from the client to the server. These few extra KB are lost.

When in context of sync we do know the file size. Synchronizer could communicate the file size to a specialized implementation of downloader which would then set a clean download part size using the Range header even for the first part.

@ppolewicz ppolewicz added the help wanted Extra attention is needed label Apr 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant