Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LethalHardCoreVR Scraper Rebuild for React #1824

Closed
wants to merge 3 commits into from

Conversation

pops64
Copy link
Contributor

@pops64 pops64 commented Aug 20, 2024

Fully functionally again. Works 99% of the time. Cards aren't loaded in sequence and my current logic only checks if the last card is loaded before scroll the page to load more. This will occasionally cause a card that is late on loading to be missed as it is scrolled out of range of the viewport. (2 times out of the 10+ times I have ran test) A re scrape usually will pick theses missed ones back up.

The dockerFile breaks arm docker images. As it uses amd chrome. Probably a way to install arm chrome or may be a non issue. Their appears no way to install chromium on docker Ubuntu that I could find as chromium for Ubuntu's repo throws an error requiring it to be installed via snap.

Functionality that has been removed. Gallery Images aren't scraped. This requires scraping the photos index pages as their doesn't appear any link from the main scene page to the photos. The synopsis as been removed from LH.

Things left todo

  • Trailer - Link is available but I don't understand XBVRs functionally with trailers nor what link to pull.
  • Docker Arm Compatibility
  • Mac Compatibiltiy
  • WhorecraftVR is no longer available. This should become an SLR custom site for scraping but that would break users DBs. I will print to log Whorecraft no longer available

Devices I have tested it on

  • Windows. Works assuming chrome is installed. Did not try chromium
  • Linux. Through yarn dev. Works also requires chromium to be installed
  • Docker Tested it on amd64 platform. Works but Ubuntu images requires chrome and not chromium to be installed unless using snap chromium

pops64 and others added 3 commits August 18, 2024 04:23
Doesn't fully scrape all scenes. Breaks on cover image scrape. Rest of scene meta appears to work. Requires to install chromedp manually. Nodes aren't fully loaded by LH need to determine some way of telling when the page is ready
Their some optimizations that could be done in the XPATH query. It usually can pull all scenes in the first scrape. Occasionally something will slow down the network and cause a misfire. And result in a few mixed scenes. Needs to be test on a slow network. Currently hangs if something really goes wrong and can't find wants it looking for
Remove some old diag code. Correct docker install. Due to ubuntu being ubuntu we need to install chrome direct
@crwxaj
Copy link
Collaborator

crwxaj commented Aug 22, 2024

I'm sorry, but I won't merge this. I don't want to add that big of a dependency (and maintenance liability) for a single scraper.

@pops64 pops64 closed this Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants