Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider replacing objects in the ZipRangeReader->files array with instances of a class #13

Open
digitaldogsbody opened this issue Sep 21, 2022 · 3 comments
Labels
discussion Things that need discussion

Comments

@digitaldogsbody
Copy link
Owner

At the moment, we extract the most useful bits of data (filename, (un)compressed size, offset, CRC, etc) and populate the public array.

However, if we start to implement features such as #12, then we will want to also be able to access other information from the Central Directory Headers at arbitrary points (i.e when ZipRangeReader::getStream is called, and it would be stupid to have to re-retrieve the CDR each time.

In order to make this cleaner for end users, it might be better to hide that data, otherwise the files array could start to get very messy.

This is where the discussion comes in - there are basically three options.

  1. Populate files with all the data from the CDRs and hope the end user doesn't get too confused. This would be adding 15 new pieces of data per file, some of which would be unprintable bytestrings.
  2. Add a internal_data key to the generated arrays, and stick everything in there, along with a warning to the user in the documentation to just ignore it.
  3. Write a class for ZipFileInformation, in which we could store all the information, and then populate the files array with instantiated objects of that class. This would allow us to use getters for all the data, meaning a) we could keep the useful information in public vars, and the end user could do $zip_info->files["MyFile"]->CRC and b) we could keep internally useful data in protected vars and have helper methods that return it when needed (via any logic that needs to be done)
@digitaldogsbody digitaldogsbody added the discussion Things that need discussion label Sep 21, 2022
@digitaldogsbody
Copy link
Owner Author

My personal feeling is that 3 is nice, because then functionality that is currently in ZipRangeReader, but applies to individual files can be devolved when appropriate.

For example, ZipRangeReader::getStream does the calculations to find the offset of the compressed bytestream (something not exposed directly in the headers due to their variable length). Having a ZipFileInformation class would allow us to have a getter function ZipFileInformation->byteStreamOffset to return the offset directly. That would promote reusability, extensibility etc.

This might also come in handy for potential future functionality, where we want to deal with other bits of data from the headers that are unprintable bytestrings - perhaps retrieving the last modified date/time (which btw are in a horrible format!), so having reusable code would be nice.

@DiegoPino
Copy link
Collaborator

Yes. I agree. 3 is the way

@digitaldogsbody
Copy link
Owner Author

Perhaps ZipFileInformation is not the best name, in case of confusion that it is information about the zip as a whole, rather than the files inside it. Hmmmm 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Things that need discussion
Projects
None yet
Development

No branches or pull requests

2 participants