[Feature Request] Add ability to scrape scene URL on File Match window #1835

moToroTor · 2024-08-30T05:11:37Z

It seems scraping doesn't go as far back as I'd like (even with limit to new scenes turned off), so I have to search for the scene's page and then put that in the scrape individual scene tool under options. It is a lot of extra steps and easy to lose my place. It would be great if I could click a button on the "Match file to scene" modal dialog and just paste in the URL of the scene to scrape, and both scrape the new scene details and match the file to the scene at the same time.

Would it be better if the scraper actually could scrape the entire history of a site? Yes. (I think it would make sense to let me do this in segments so I'm not hammering the website/getting banned for scraping, but I don't know if that's asking too much.)

pops64 · 2024-08-31T01:26:53Z

Most scrapers to my knowledge work and scrape all scenes publicly posted. Their are some sites that hide some stuff behind a paywall(xSinsVR), others have switched to react(LethalHardcore) and others are defunct(Whorecraft).

There are lots of options to help prevent overloading your network with scraping. You can set a random time delay before requesting a new page. You can limit the max amount of scrappers running at once.

#1665 Theses are the instructions for rate limiting. And to limiting parallel scrapers you need to append --concurrent_scrapers x ,where x is the number of scrapers at once, to your xbvr launch command

If you are finding a scene in the scene page and then scraping it to match it in the files page. Something is not right. Reset you search index if this is occurring. Sometimes XBVR scrapes a scene but it never gets indexed or other black magic computer things happen. Resetting your search index works most of the time.

moToroTor · 2024-08-31T01:46:47Z

I found that if I re-run the individual scraper a couple times, it eventually gets most if not all scenes. However if you don't want to fill your database with content you'll never need (like one offs from a site), then the on-demand single scene scraping is still a useful feature. I don't want to go overboard because with feature creep this would become FappARR.

Also, a lot of the --flags don't really make sense for those running the Docker instance.

pops64 · 2024-08-31T05:29:12Z

Docker you can use the environmental variable the launch command sets. In this case it would be -e CONCURRENT_SCRAPERS=x append this to other -e switches if present for docker.

I found that if I re-run the individual scraper a couple times

This is definitely you tripping the DDoS protection of the website. Sites with low bars for triggering this are NA, VRPorn, any VRBangers studio, and SLR flagging multiple parallel connections(At this time they don't appear to care about how quickly). You just have to be careful of how many full site scrapes you do. Start small and build up.

I have lost count of how many scrapers I run every 12 hours but with these slowdowns in place I haven't ever hit the DDoS protection.

pops64 mentioned this issue Sep 3, 2024

SLR not scraping #1815

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add ability to scrape scene URL on File Match window #1835

[Feature Request] Add ability to scrape scene URL on File Match window #1835

moToroTor commented Aug 30, 2024

pops64 commented Aug 31, 2024 •

edited

Loading

moToroTor commented Aug 31, 2024

pops64 commented Aug 31, 2024 •

edited

Loading

[Feature Request] Add ability to scrape scene URL on File Match window #1835

[Feature Request] Add ability to scrape scene URL on File Match window #1835

Comments

moToroTor commented Aug 30, 2024

pops64 commented Aug 31, 2024 • edited Loading

moToroTor commented Aug 31, 2024

pops64 commented Aug 31, 2024 • edited Loading

pops64 commented Aug 31, 2024 •

edited

Loading

pops64 commented Aug 31, 2024 •

edited

Loading