-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add ability to scrape scene URL on File Match window #1835
Comments
Most scrapers to my knowledge work and scrape all scenes publicly posted. Their are some sites that hide some stuff behind a paywall(xSinsVR), others have switched to react(LethalHardcore) and others are defunct(Whorecraft). There are lots of options to help prevent overloading your network with scraping. You can set a random time delay before requesting a new page. You can limit the max amount of scrappers running at once. #1665 Theses are the instructions for rate limiting. And to limiting parallel scrapers you need to append If you are finding a scene in the scene page and then scraping it to match it in the files page. Something is not right. Reset you search index if this is occurring. Sometimes XBVR scrapes a scene but it never gets indexed or other black magic computer things happen. Resetting your search index works most of the time. |
I found that if I re-run the individual scraper a couple times, it eventually gets most if not all scenes. However if you don't want to fill your database with content you'll never need (like one offs from a site), then the on-demand single scene scraping is still a useful feature. I don't want to go overboard because with feature creep this would become FappARR. Also, a lot of the --flags don't really make sense for those running the Docker instance. |
Docker you can use the environmental variable the launch command sets. In this case it would be
This is definitely you tripping the DDoS protection of the website. Sites with low bars for triggering this are NA, VRPorn, any VRBangers studio, and SLR flagging multiple parallel connections(At this time they don't appear to care about how quickly). You just have to be careful of how many full site scrapes you do. Start small and build up. I have lost count of how many scrapers I run every 12 hours but with these slowdowns in place I haven't ever hit the DDoS protection. |
It seems scraping doesn't go as far back as I'd like (even with limit to new scenes turned off), so I have to search for the scene's page and then put that in the scrape individual scene tool under options. It is a lot of extra steps and easy to lose my place. It would be great if I could click a button on the "Match file to scene" modal dialog and just paste in the URL of the scene to scrape, and both scrape the new scene details and match the file to the scene at the same time.
Would it be better if the scraper actually could scrape the entire history of a site? Yes. (I think it would make sense to let me do this in segments so I'm not hammering the website/getting banned for scraping, but I don't know if that's asking too much.)
The text was updated successfully, but these errors were encountered: