Eliminate inefficiency in form of some extra data downloaded in sync for large files #380

ppolewicz · 2023-04-02T18:06:26Z

Large files are downloaded in parts in parallel by all compliant clients, as specified by B2 integration checklist.

The extra data cost is there because when we start streaming the file, we don't (usually) know the size, so we are not requesting an exact range for the download of the first part, but we are sort of requesting the entire file. As soon as the headers come we realize that it's actually a big file, so we start a download session for the further parts of the file. When the data stream for the first part will emit enough data for us to satisfy the first part of the download range, we close the socket HOWEVER because there is some latency on the network, the server has already managed to send us some extra data which is being transmitted from the server to the client while the connection close request is transmitted from the client to the server. These few extra KB are lost.

When in context of sync we do know the file size. Synchronizer could communicate the file size to a specialized implementation of downloader which would then set a clean download part size using the Range header even for the first part.

ppolewicz added the help wanted Extra attention is needed label Apr 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate inefficiency in form of some extra data downloaded in sync for large files #380

Eliminate inefficiency in form of some extra data downloaded in sync for large files #380

ppolewicz commented Apr 2, 2023

Eliminate inefficiency in form of some extra data downloaded in sync for large files #380

Eliminate inefficiency in form of some extra data downloaded in sync for large files #380

Comments

ppolewicz commented Apr 2, 2023