Request will be rejected by webserver when headless is true.Code works well when headless is false. #1979

PurpleLightning312 · 2024-08-07T17:52:18Z

PurpleLightning312
Aug 7, 2024

start_urls = ["http://www.chinaunicombidding.cn/bidInformation"]
Response.text
<html><head></head><body></body></html>

lunden23 · 2024-08-08T14:09:03Z

lunden23
Aug 8, 2024

Certain functionality doesn't work in headless mode. For example requests that are created by JS and requires authentication that relies on proper browser emulation doesn't work in headless. Since you didn't provide a lot of context I would have to say that you must run it in headful mode.

0 replies

ddavis-ssc · 2024-10-16T22:10:57Z

ddavis-ssc
Oct 16, 2024

Adding to what @lunden23 mentioned here, my scraping success rate was absolute trash running it in headless mode (might as well just run requests with header / proxy, I've seen it do better).

My work around is I have 2 dedicated workstations that are purely for scraping activities. I can easily run 12 independent chrome windows / drivers on each machine and have them scrape in parallel with sub processes.

Note if you do this then you will need to frequently clean up your chrome profile data as starting a chrome session with undetected chrome driver will create a new profile. I learned this the hard way when I saw I had 1 TB in chrome profiles on my pc.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request will be rejected by webserver when headless is true.Code works well when headless is false. #1979

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Request will be rejected by webserver when headless is true.Code works well when headless is false. #1979

PurpleLightning312 Aug 7, 2024

Replies: 2 comments

lunden23 Aug 8, 2024

ddavis-ssc Oct 16, 2024

PurpleLightning312
Aug 7, 2024

lunden23
Aug 8, 2024

ddavis-ssc
Oct 16, 2024