no-invalid-regex rule is probably broken #594

bartlomieju · 2021-01-17T14:49:11Z

Reported by @RDambrosio016 on Discord:

i think this test is incorrect
assert_eq!(validator.validate_pattern("[\\c0-�]", false), Ok(()));

that range is out of order
the odd thing is v8 accepts it but other tools i tried dont

i think theres something weird going on with a test in dlint's regex validator tests, because [🌷-🌸] is checked for being invalid, but running dlint on a file with it doesnt yield any errors. I still dont know why v8 does not accept it, i think its something weird with utf16 code points because its valid by utf8 code points.

The text was updated successfully, but these errors were encountered:

bartlomieju · 2021-01-18T17:19:49Z

Further investigation by @RDambrosio016 https://discord.com/channels/684898665143206084/775366479143108608/800774226894258207

@bartlomieju yeah according to the spec, if the regex doesnt have /u then the chars are utf16 code points, if its parsed with /u then they are utf32 code points (rust chars)
i dont think it's very hard to fix the validator to treat code points right
as far as i know, it should be fine to just encode_utf16() on the char, then if its multiple code points then yield the first one
although utf8 makes this... weird
because i don't think its possible in utf8 to partially advance over a multi-codepoint char without being inside of a char boundary. :sweating:
i think for now im going to just keep 32 bit codepoints since:
- 16 bit codepoints are hard to get working correctly
- most people dont put multi codepoint chars in their regex
- it makes error reporting easier for rules like no-misleading-character-class

if that's fine for you

bartlomieju added the bug Something isn't working label Jan 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no-invalid-regex rule is probably broken #594

no-invalid-regex rule is probably broken #594

bartlomieju commented Jan 17, 2021 •

edited

Loading

bartlomieju commented Jan 18, 2021

no-invalid-regex rule is probably broken #594

no-invalid-regex rule is probably broken #594

Comments

bartlomieju commented Jan 17, 2021 • edited Loading

bartlomieju commented Jan 18, 2021

bartlomieju commented Jan 17, 2021 •

edited

Loading