Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] (possibly heresy) Shared Resources #858

Closed
God-damnit-all opened this issue Jan 9, 2022 · 4 comments
Closed

[FR] (possibly heresy) Shared Resources #858

God-damnit-all opened this issue Jan 9, 2022 · 4 comments

Comments

@God-damnit-all
Copy link

There are certain pages I like to make backups of whenever my script detects they've been changed. However, over the past couple of years, the amount of disk space my backup folder is taking up is a lot larger than I originally anticipated.

While it's antithetical to the primary intent of the project, I'm wondering if you'd be willing to implement an option to instead 'localize' resources into a 'SingleFile' subfolder with an MD5 hash appended to it (using SparkMD5 for fast generation). For instance, a resource like https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js would be saved to: SingleFile/ajax.googleapis.com/ajax/libs/jquery/jquery.min-4F252523D4AF0B478C810C2547A63E19.js

This would allow multiple pages to all utilize the same resources without causing any file collisions due to changes in the resources.

@gildas-lormeau
Copy link
Owner

gildas-lormeau commented Jan 10, 2022

Thank for the suggestion. I think this issue would more suited for SingleFileZ than SingleFile. To be honest, I have a problem with the idea that Single File saves web pages in several files...

@God-damnit-all
Copy link
Author

God-damnit-all commented Jan 10, 2022

Thank for the suggestion. I think this issue would more suited for SingleFileZ than SingleFile.

Really? I would've guessed the opposite. The extension seems more suited for backing up things that one simply comes across, while the CLI tool seems more suited for long-term use, as backups keep building up.

To be honest, I have a problem with the idea that Single File saves web pages in several files...

Well, that might depend on one's interpretation. Making multiple page saves share resources without clobbering each other is difficult without using something like warc. So, in essence, rather than having a copy of the same resource for each page, each instance of a resource is a single file that is shared throughout.

@God-damnit-all
Copy link
Author

God-damnit-all commented Jan 11, 2022

Having thought about it, particularly due to Windows filename restrictions and, potentially, maximum filepath length issues with long URLs, it would probably be better to have all the files named after only their hash and extension, and for them all to be in the same folder. SingleFile already offers the option to indicate what the filename of Base64 images originally was, so perhaps the same could be done for every other resource downloaded in this manner. I don't know if every tag has a supported property like that, but it wouldn't necessarily have to be a supported property so long as the user could look at the source or use inspect element to figure it out.

This would also potentially prevent duplicates from cropping up due to the website changing where the image is being retrieved from. This can be a real issue with sites that make heavy use of CDNs especially.

@gildas-lormeau
Copy link
Owner

I'm closing this issue because it's of scope. However, I recommend you to take a look at this issue: gildas-lormeau/SingleFileZ#119.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants