Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zlib support to Decompressor #1189

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JohnHBrock
Copy link

Closes #1163.

Changes

  • Add zlib support to Decompressor.
  • Add unit tests for zlib functionality.

Python unfortunately doesn't have a file-like class for zlib, so I created my own called _ZlibFile.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2023
@@ -640,6 +664,70 @@ def test_decompressor_iterdatapipe(self):
with self.assertRaisesRegex(TypeError, "has no len"):
len(tar_decompress_dp)

def test_zlibfile_readall(self):
uncompressed_data_test_cases = [b"", b"some data", 10_000 * b"some data"]
for uncompressed_data in uncompressed_data_test_cases:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find an obvious way of using pytest parameterized tests that was compatible with expecttest, so I resorted to just manually iterating over a list of inputs.

# so that the whole input buffer can be decompressed within one
# .decompress() call.
data = self.read(sys.maxsize)
while data:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if python 3.7 support was desired, so I didn't use the walrus operator here.

@@ -87,6 +92,8 @@ def _detect_compression_type(self, path: str) -> CompressionType:
return self.types.ZIP
elif ext == ".bz2":
return self.types.BZIP2
elif ext == ".zlib":
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a strong standard for what extension zlib files should have. I've also seen .z and .zz.

@JohnHBrock
Copy link
Author

@ejguan This could use a review.

@ejguan
Copy link
Contributor

ejguan commented Jul 18, 2023

Unfortunately, TorchData is paused for development. You might want to just fork it and land it on your own repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add zlib support to Decompressor
3 participants