Some duplicate images not in resultlist #27

blannoy · 2023-10-15T17:36:46Z

Hello,

The scan seems to miss some photos that are exactly the same (image, filename, dimensions). I don't know why they don't show up in the similarity map. Sometimes running it a few times eventually finds them. I suppose its because the model doesn't use the metadata, only the content to compare the images?

This got me thinking that a simple scanning feature could be just a comparison of photo metadata instead of content wise.
So this is somewhere between a bug and a feature request.

The deletion feature is also something that could be used separately, e.g. load a list of IDs and run that through the plugin.

PS: it's a really cool project

mtalcott · 2023-10-15T22:14:24Z

Yes, currently photo metadata like filename and dimensions are not taken into account to calculate similarity scores. So this is somewhat expected behavior. I agree that the same filename (perhaps excluding common suffixes for dupes like copy, (1), etc.) and same dimensions should boost the similarity score. Count that as a desired enhancement!

The model that calculates image similarity is static though, so I would not expect the output to vary for the same image. Perhaps it was comparing against different photos, resulting in slightly different similarity scores. Do you have any example images you'd be willing to provide?

blannoy · 2023-10-16T07:19:00Z

Hi,

I did some digging...

I extracted both 250px images from mongo & filesystem. When I run them through the model online (mediapipe mobilenet large) I get a score of 98.58% so thats below the std 99% limit. So I lowered the limit in the deduper and now I get the images (score of 98.79%, they are maybe a bit too noisy). The problem is that I get other images as well along side the ones that were missing the first time. Some of them are real duplicates, but others not (but very similar).

If there would be a way to filter the results "only the ones with the same size/dimensions/filename" that would be better.
So I guess a front-end result filter would do the trick, but my React skills are not up to the task ;)

mtalcott · 2023-12-31T20:34:00Z

Yes, I agree the filtering would be a great addition. That's represented as a desired enhancement over on #7

It's also a bummer that true duplicates get a 98.58% match with the MobileNet-V3 (large) model. Thanks for confirming that, I've found the same with some dupes in my own photos. I explored lowering the default limit, but also found that it started including many more non-duplicates as well. This is undesirable behavior likely due to using a model optimized for mobile use; previously I was using the clip-ViT-B-32 model and sentence-transformers which had better accuracy but at the expense of MUCH longer (3x?) runtime and out of memory errors due to MUCH higher memory usage. Perhaps there is a better model compatible with MediaPipe that can be used...

mtalcott added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Oct 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some duplicate images not in resultlist #27

Some duplicate images not in resultlist #27

blannoy commented Oct 15, 2023

mtalcott commented Oct 15, 2023

blannoy commented Oct 16, 2023

mtalcott commented Dec 31, 2023

Some duplicate images not in resultlist #27

Some duplicate images not in resultlist #27

Comments

blannoy commented Oct 15, 2023

mtalcott commented Oct 15, 2023

blannoy commented Oct 16, 2023

mtalcott commented Dec 31, 2023