Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blacklist image hashes? #651

Open
frozenpandaman opened this issue Dec 16, 2023 · 14 comments
Open

Blacklist image hashes? #651

frozenpandaman opened this issue Dec 16, 2023 · 14 comments

Comments

@frozenpandaman
Copy link

Sorry for all the questions/issues lately. Is there a recommended way to blacklist certain image hashes? Hoping this will cut down on bot spam that seemingly posts the same things over and over.

I'm guessing there would naīvely be a way just to hardcode it into some .php file (if image hash in_array(whatever1, whatever2), return without doing anything or serve an error message) but I'm not sure where the image upload logic is happening.

@crazy4cars69
Copy link

I think there was a feature in vichan using MD5 hash to prevent certain images from being posted using filters

@RealAngeleno
Copy link

Yes. Do something like this:

`$config['filters'][] = array(
'condition' => array(
'custom' => function($post) {
if ( array_key_exists('filehash',$post) && in_array($post['filehash'],array(
'your filehash',

)))
return true;
else
return false;}),
'action' => 'ban',
'add_note' => true,
'all_boards' => true,
'expires' => 60 * 60 * 72, // Three Days
'reason' => 'Ban evasion.'
);`

Kuz hacked up a script to make it so that mods can see the md5 hashes, but he never made it public. Best way to find it is by checking the filehash column on the posts_[board] table.

@crazy4cars69
Copy link

Here is so only mods can see the file hash

Edit /templates/post/fileinfo.html

Paste bellow before {% include "post/image_identification.html" %}

		{% if post.mod|hasPermission(config.mod.show_file_hash) %}
			<br />
			<span>HASH: {{ post.filehash }}</span>
		{% endif %}

Make sure to also add this to /inc/config.php

    // View file hash
    $config['mod']['show_file_hash'] = MOD;

@frozenpandaman
Copy link
Author

@RealAngeleno @crazy4cars69 Thank you both so much! Really appreciated.

I'm not too great with PHP myself so being able to implement this was extremely helpful. Do let me know if you'd happen to be able to throw together some simple QuestyCaptcha code (i.e. just an input 'verification' field that checks the entered string against something in config.php) if you have time in the future.

@frozenpandaman
Copy link
Author

Unfortunately this doesn't seem to be helping with spam issues. Spambots have the same image saved presumably at different compression levels or with different metadata or something, causing the blacklist for specific hashes to be ineffective.

@crazy4cars69
Copy link

Unfortunately this doesn't seem to be helping with spam issues. Spambots have the same image saved presumably at different compression levels or with different metadata or something, causing the blacklist for specific hashes to be ineffective.

Yeah that will be a problem, either ban IP or IP range. Right now vichan doesn't have effective file hash spam detection apart from checking duplicate file and MD5 blacklist.

@RealAngeleno
Copy link

I was asked a bit about questycaptcha from a few other people too. That'll be my top priority for now, along with the wiki.

@RealAngeleno
Copy link

Though I wouldn't know the best way to implement it, as there's many ways to do it.

@RealAngeleno RealAngeleno reopened this Jan 5, 2024
@crazy4cars69
Copy link

Though I wouldn't know the best way to implement it, as there's many ways to do it.

2 versions, js which would need to add additional_javascript and would display image in base64 and reloadable, non-js which would use iframe and a reload button to refresh captcha. Keep questycaptcha config files in inc/questycaptcha/ and questycaptcha main file in root directory captcha.php. Don't forget to add it to report_captcha config

@frozenpandaman
Copy link
Author

I was asked a bit about questycaptcha from a few other people too. That'll be my top priority for now

@RealAngeleno AMAZING to hear, thank you!!!

@Zankaria
Copy link

Zankaria commented Feb 23, 2024

A possible solution would be using perceptual hashing.
It's potentially more CPU intensive, but perceptual hashes extracted from an image (not a file) have a short hamming distance from hashes extracted from similar images.

Basically you open the image you want to block, you resize it to a fixed size and produce a perceptual hash of that and store it. Then, when user tries to post a new image, you open the image, resize it and hash it.
With all that, if the hamming distance of the two hashes is bellow a given threshold, you classify the images as "similar enough" and reject the post.
This doesn't shield much from cropping or image rotation, but it handles resizing, changes in metadata and recompression very well.

@frozenpandaman
Copy link
Author

I was asked a bit about questycaptcha from a few other people too. That'll be my top priority for now, along with the wiki.

Just wanted to ask if there might be any updates on this yet, @RealAngeleno? Happy to make a new issue to track it if it's a bit separate from this original topic. Thanks!

@Black-Hand-Radio
Copy link

Unfortunately this doesn't seem to be helping with spam issues. Spambots have the same image saved presumably at different compression levels or with different metadata or something, causing the blacklist for specific hashes to be ineffective.

In my experience they just add a watermark at different locations. Since the images are always relatively high resolution, and they place the watermark at the middle/bottom, you can create partial image hashes, for example hash the decoded data of a certain region of an image and base your blacklist on that. This adds more processing, but there are several easier ways to block them before you even reach this point, for example referer detection or browser fingerprinting.

However I have no idea if they still do this watermarking thing or if they switched to a different method. Once you manage to block them, they'll turn up less and less often, so I simply don't get enough data lately.

@RealAngeleno
Copy link

Yeah vichan's built-in md5 hashing doesn't really work well with this. I do use a (rather jank however, but not as jank as before) method to use imagehash on python and therefore use perceptual hashes instead, which works wonders, but haven't added because it requires users to install yet another thing and because it can be slow. There is a PHP implementation I was given from an IB, though from what I've seen it's not very good at doing perceptual hashing and is also slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants