Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added auto captcha solving, clozing of ads and consent forms #135

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

alawi2306
Copy link

The script should be able to run headlessly

Added pytesseract to auto solve the captcha at the start

I haven't done extensive testing, feel free to test this out, and add your own changes

@simonfarah
Copy link
Owner

Hello. This is just what I was working on for this project! However it seems like you committed your virtual env and the geckodriver log file. Can you please remove them so I can review the code and test it?

@alawi2306
Copy link
Author

Sure, let me remove those

@simonfarah simonfarah self-requested a review January 10, 2025 15:29
Copy link
Owner

@simonfarah simonfarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it and here is some bugs:

  • Closing ads/popups is not working. What we can do from what I noticed is that all ads/popups have the word "google" in them, in the class name, script, etc.... So if we detect those elements and delete them from the DOM, we might be able to solve this.
  • OCR is working great, however in some cases when the detected text is wrong, zefoy redirect to the main page (the one containing the services) with a popup of wrong CAPTCHA. This is a use case we should handle since the CAPTCHA text is fairly hard to read sometimes.

Thank you a lot for your contribution! If this is a bit overwhelming, we can split the tasks between us,

@alawi2306
Copy link
Author

alawi2306 commented Jan 10, 2025

Thanks for your response!

I have ran a test run of 10 script calls on my local machine.

-You are right, the ocr is very inconsistent. It worked 50% of the time for me, and often the mistakes were only one letter where the picture purposefully made it difficult to tell. What we can do is preprocess the images, making them larger, turning them to grayscale etc so ocr can intepret them better. We can also implement a retry mechanism for this when it says "captcha incorrect" to retry it in a looping fashion until it gets it right.

-The ads for me are actually working fine. I didn't have to cloze any manually in the runs I did. It's worth noting that clozing the cookies popup requires a bit of patience as the script may be doing other things before trying to cloze them. Try looking for popup seen to check that the popup fn is being triggered.

I'll be attempting these fixes over the next coming days. I'll keep you updated!

@simonfarah
Copy link
Owner

That's great. I will be working on automatic driver installation and cloudflare bypassing since those were next on my list.

@alawi2306
Copy link
Author

Hello,

I have added preprocessing, as well as retry logic for the captcha.

I had issues clozing the retry modal, so instead we refersh the page and try the captcha logic. This is working well.

The ad clozing/consent clozing is working perfectly for me.

The last thing to do is to bypass cloudflare as sometimes you get a 502 gateway error when I believe zefoy blocks you, but this is a minor thing to do, and is not essential.

@simonfarah
Copy link
Owner

Hello! Thank you for that. Sorry for taking time to respond but I am a bit overwhelmed at the moment. I will review those changes and get back to you asap.

@simonfarah simonfarah self-requested a review January 15, 2025 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants