Catch unhandled Browsershot exceptions in `crawlFailed` #469

superpenguin612 · 2024-07-29T21:21:37Z

Fixes #324.

I ran into this issue myself, and it ended up crashing the crawler prematurely. This PR catches any ProcessFailedExceptions on the Browsershot method, and sends it to observers via crawlFailed. I also included a test that loads a URL that never goes to network idle, triggering the 30s network timeout from Puppeteer. Originally, this creates an exception, but now it is caught appropriately.

One note here is that exceptions like Puppeteer not finding a Chrome binary are no longer displayed to the user by default, they would need to see the exception in crawlFailed. I'm not sure if there is a way to "fix" that, but that in theory should be what happens, anyway.

Feel free to ask to use a different method of testing this; this test takes 30s to execute which may be undesirable (maybe use Browsershot's waitForFunction to get the 5s timeout, like what @kmcluckie posted)?

…lFailed

This reverts commit ef3115f.

freekmurze · 2024-07-31T10:44:16Z

Thanks!

David Racovan added 2 commits July 29, 2024 16:21

Add error handling around Browsershot call and send exception to craw…

0210d60

…lFailed

Add tests

8f0e644

superpenguin612 changed the title ~~Catch unhandled Browsershot exceptions in crawlFailed.~~ Catch unhandled Browsershot exceptions in crawlFailed Jul 29, 2024

superpenguin612 mentioned this pull request Jul 29, 2024

Browsershot exceptions unhandled #324

Closed

David Racovan added 4 commits July 29, 2024 17:42

Remove erroneous only

3bb0e5b

Respect request delay

ef3115f

Revert "Respect request delay"

ae09480

This reverts commit ef3115f.

Respect request delay

ef07126

freekmurze merged commit 099ea77 into spatie:main Jul 31, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catch unhandled Browsershot exceptions in `crawlFailed` #469

Catch unhandled Browsershot exceptions in `crawlFailed` #469

superpenguin612 commented Jul 29, 2024

freekmurze commented Jul 31, 2024

Catch unhandled Browsershot exceptions in crawlFailed #469

Catch unhandled Browsershot exceptions in crawlFailed #469

Conversation

superpenguin612 commented Jul 29, 2024

freekmurze commented Jul 31, 2024

Catch unhandled Browsershot exceptions in `crawlFailed` #469

Catch unhandled Browsershot exceptions in `crawlFailed` #469