Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup all unused images #3

Open
Nutomic opened this issue Apr 5, 2024 · 3 comments
Open

Cleanup all unused images #3

Nutomic opened this issue Apr 5, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@Nutomic
Copy link

Nutomic commented Apr 5, 2024

Its easy to have unused images on Lemmy, eg when uploading an image and then not actually using it in the new post. Or when a comment gets deleted, embedded images will be unused. Its very tricky to clean this up from within Lemmy, as images can be referenced by many different tables, and in many cases they are embedded in markdown.

However it should be very easy to clean them up as follows:

  • Create a database dump
  • Loop through images stored in pictrs (I dont see an api endpoint for list of all images, seems you have to use POST /internal/export to store it in a file)
  • For each image, check whether the sql dump contains the image url. If not, delete it
  • Images uploaded after the database dump was created should be ignored
@Neriderc
Copy link

Neriderc commented Apr 18, 2024

This would be amazing!

I'd say you'd need a little buffer. It's possible someone uploads an image to a comment or post, but hasn't submitted it yet. You'd probably want to only delete images that were created say 24 hours or more before the lemmy/pictrs database dumps.

@wereii wereii added the enhancement New feature or request label May 5, 2024
@wereii
Copy link
Owner

wereii commented May 6, 2024

Considering the db dump requirement this is pretty involved.

Also having to scan through the gigabytes of the dump data (my little instance currently dumps about 5G uncompressed data) is probably not going to be efficient unless specifically optimized.
Maybe scanning the dump once, extracting anything that looks like URL of a media file (maybe even matching the instance host) then inserting it into unlogged table in postgres and only after that going through each pict-rs link and checking for existence could help?

From my POV this will have to be a separate tool, at least because of the dump requirement, I also don't like the idea of a tool calling pg_dumpall on my db, would rather do that part manually and run the tool on top of it.

@Nutomic
Copy link
Author

Nutomic commented May 6, 2024

The other alternative is to write an sql query which checks all the fields where images can be linked. So all markdown fields (post, comment, sidebars, user bio, private messages etc) as well as avatars, banners etc. This is definitely the cleaner solution but more effort to implement, whereas db dump is quick and dirty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants