Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOM Cleaner: mb_eregi_replace errors out with retry-limit-in-match #283

Open
half0wl opened this issue Jul 27, 2021 · 2 comments
Open

DOM Cleaner: mb_eregi_replace errors out with retry-limit-in-match #283

half0wl opened this issue Jul 27, 2021 · 2 comments

Comments

@half0wl
Copy link

half0wl commented Jul 27, 2021

Reproduction:

>>> use PHPHtmlParser\Dom;
>>> $dom = new Dom;
>>> $dom->loadFromUrl("https://casper.com/gifts/?clickid=T02U6OVQYxyLUbdwUx0Mo36dUkB1HNWwiSMnwQ0");

Throws:

PHP Warning:  mb_eregi_replace(): mbregex search failure in php_mbereg_replace_exec(): retry-limit-in-match
over in <stripped>/paquettg/php-html-parser/src/PHPHtmlParser/Dom/Cleaner.php on line 81
PHPHtmlParser\Exceptions\LogicalException with message 'mb_eregi_replace returned false instead of a string.
Error when attempting to remove scripts 2.'

I've tried ini_set("pcre.backtrack_limit", "10000000000") after some Googlefu on the error, but it doesn't work.

I can reproduce this on pages with huge <script></script> tags, typically when there's a giant blob of JSON object in it.

@Deewde
Copy link

Deewde commented Jan 28, 2022

I have the exact same problem but with a different URL. I quick-fixed it by disabling script removal from the HTML with $dom->setOptions((new Options())->setRemoveScripts(false)); but I would rather have a real fix for this, especially because there's a warning that keeping script tags could have unforeseen consequences.

Any help on this issue please @paquettg ?

@Deewde
Copy link

Deewde commented Jan 28, 2022

Ok, I've fixed it without disabling tag removal by increasing the mb retry limit to 10 million. The self-documented php.ini describes this:

; This directive specifies maximum retry count for mbstring regular expressions. It is similar
; to the pcre.backtrack_limit for PCRE.
; Default: 1000000
;mbstring.regex_retry_limit=1000000

so I've used

ini_set("mbstring.regex_retry_limit", "10000000");

and all works fine on this front now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants