Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zip multi-volume archive data sees no files #563

Open
TheLastProject opened this issue Oct 2, 2024 · 2 comments
Open

Zip multi-volume archive data sees no files #563

TheLastProject opened this issue Oct 2, 2024 · 2 comments

Comments

@TheLastProject
Copy link

I got a zip file from an user (context) which file on Linux describes as: "Zip multi-volume archive data, at least PKZIP v2.50 to extract".

Using zipInputStream.getNextEntry() on the ZipInputStream created of that file instantly returns null, even without reading a single file.

Sadly I cannot provide the file as it contains private data nor can I provide an example file as I don't know how to create such a file, but I was hoping creating an issue was better than nothing (hopefully someone else can provide a test case file!)

@obfusk
Copy link

obfusk commented Oct 22, 2024

FWIW, I made a test case: foo.zip.

@obfusk
Copy link

obfusk commented Oct 24, 2024

The issue was that for some unknown reason there's a spanned archive marker (0x08074b50, little endian) at the start of these ZIP files, right before the first local file header (0x04034b50), which results in iterating over the file using ZipInputStream.getNextEntry() failing as the first call immediately returns null.

I would not really consider that a bug given the limitations of using ZipInputStream, but detecting such a marker and skipping it would be fairly easy and a nice feature to have for these cases (arbitrary data in between entries is of course allowed by the ZIP format and can't really be handled when treating the file as a stream, you have to read the central directory for that so you can jump to the local header offsets correctly).

Thus, iterating over the entries using ZipFile.getFileHeaders() instead works fine as expected.

FWIW it's also possible to use this workaround to manually skip the marker (in case you really need to use ZipInputStream instead of ZipFile):

InputStream input = new BufferedInputStream(...);
byte[] buf = new byte[4];
input.mark(4);
for (int i = 0; i < 4; ++i) {
    if (input.read(buf, i, 1) != 1) {
        throw new IOException("File is less than 4 bytes.");
    }
}
if (new BigInteger(1, buf).intValue() != 0x504b0708) {
    input.reset();
}
ZipInputStream zipInputStream = new ZipInputStream(input, ...);
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants