-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming zip data #3
Comments
This could be possible with The zip file is laid out like this:
This means that it would be difficult to stream since you would need to start from the end of the zip data. |
Today I overhauled the internals for processing zip files (check out v0.3.0) and found some interesting things in the zip spec. I've gained a much better understanding of how zip files work. Section 4.3.5 of the spec caught my eye since it mentions streaming. My thoughts on how streaming can be implemented:Since the zip "header"/"end of central directory" is located at the end of the file, some metadata will not be known until the entire file has been streamed. However, the
public get data(): Uint8Array {
const data = this.zipData.slice(this._offset + this.size);
const decompress = decompressionMethods[this.compressionMethod];
// decompress validation check not included for readability
return decompress(data, this.compressedSize, this.uncompressedSize, this.flag);
} All that needs to be done is to pass the zip data buffer to Note that even if streaming is possible, you wouldn't be able to ZipFS without loading it all into memory still (since the entire buffer must be passed to the FS). This is still in the early stages, but adding streaming support is workable. I hope this has helped. - JP |
Wow great news, thanks! What do you think of adding support for |
Perhaps it would be possible to access the members on the view directly, though that could get complicated since the struct decorators would need to intercept get/set calls. Feel free to look at |
From my point of view, I can tell that loading 1GB zip with |
I'm currently working on releasing core 0.11 and the Emscripten backend. After that, I would be happy to address the ZIP backend. Hopefully that does not delay your project too much.
This actually makes a lot of sense. I apologize if I mistakingly thought the entire blob was preloaded.
I will see what I can do, though processing ZIP files is convoluted already so I'm not sure what other optimizations I can make. |
Issue: I have a 32 gb ZIP archive and I think I should not read the entire file and load it to ram just to look at the list of files within it.
AFAIR with the modern file reader API it is possible to read the file by chunks, but would it be possible to integrate some of these solutions here? Is there a right way to work with large ZIPs efficiently?
The text was updated successfully, but these errors were encountered: