-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some duplicate images not in resultlist #27
Comments
Yes, currently photo metadata like filename and dimensions are not taken into account to calculate similarity scores. So this is somewhat expected behavior. I agree that the same filename (perhaps excluding common suffixes for dupes like The model that calculates image similarity is static though, so I would not expect the output to vary for the same image. Perhaps it was comparing against different photos, resulting in slightly different similarity scores. Do you have any example images you'd be willing to provide? |
Hi, I did some digging... I extracted both 250px images from mongo & filesystem. When I run them through the model online (mediapipe mobilenet large) I get a score of 98.58% so thats below the std 99% limit. So I lowered the limit in the deduper and now I get the images (score of 98.79%, they are maybe a bit too noisy). The problem is that I get other images as well along side the ones that were missing the first time. Some of them are real duplicates, but others not (but very similar). If there would be a way to filter the results "only the ones with the same size/dimensions/filename" that would be better. |
Yes, I agree the filtering would be a great addition. That's represented as a desired enhancement over on #7 It's also a bummer that true duplicates get a 98.58% match with the MobileNet-V3 (large) model. Thanks for confirming that, I've found the same with some dupes in my own photos. I explored lowering the default limit, but also found that it started including many more non-duplicates as well. This is undesirable behavior likely due to using a model optimized for mobile use; previously I was using the |
Hello,
The scan seems to miss some photos that are exactly the same (image, filename, dimensions). I don't know why they don't show up in the similarity map. Sometimes running it a few times eventually finds them. I suppose its because the model doesn't use the metadata, only the content to compare the images?
This got me thinking that a simple scanning feature could be just a comparison of photo metadata instead of content wise.
So this is somewhere between a bug and a feature request.
The deletion feature is also something that could be used separately, e.g. load a list of IDs and run that through the plugin.
PS: it's a really cool project
The text was updated successfully, but these errors were encountered: