Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parser): parse regular expression with regex parser #4998

Merged
merged 1 commit into from
Aug 22, 2024

Conversation

Boshen
Copy link
Member

@Boshen Boshen commented Aug 20, 2024

Many false positives and incorrect errors. @leaysgur Enjoy 😁

Run just conformance to update the snapshot.

@Boshen Boshen requested a review from leaysgur August 20, 2024 07:05
Copy link

graphite-app bot commented Aug 20, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

@github-actions github-actions bot added the A-parser Area - Parser label Aug 20, 2024
tasks/coverage/src/suite.rs Outdated Show resolved Hide resolved
@Boshen Boshen marked this pull request as draft August 20, 2024 07:07
Copy link

codspeed-hq bot commented Aug 20, 2024

CodSpeed Performance Report

Merging #4998 will not alter performance

Comparing parser-add-regex (afe728a) with main (7156fd2)

Summary

✅ 29 untouched benchmarks

@leaysgur
Copy link
Contributor

leaysgur commented Aug 20, 2024

@Boshen Thanks for the setup! And I have a question.

How should I know errors to fix? Check each snapshot like this...?

image

@Boshen
Copy link
Member Author

Boshen commented Aug 20, 2024

It's not producing the correct error for

Expect Syntax Error: tasks/coverage/test262/test/language/literals/regexp/named-groups/invalid-non-id-continue-groupspecifier-4-u.js
Expect Syntax Error: tasks/coverage/test262/test/language/literals/regexp/named-groups/invalid-non-id-continue-groupspecifier-4.js

e.g. https://github.com/tc39/test262/blob/main/test/language/literals/regexp/named-groups/invalid-non-id-continue-groupspecifier-4-u.js

have the comment


/*---
description: GroupSpecifier must be identifier-like.
esid: prod-GroupSpecifier
negative:
  phase: parse
  type: SyntaxError
features: [regexp-named-groups]
---*/

$DONOTEVALUATE();

/(?<a\>.)/u;

For "Expect to Parse", they should parse without any errors.

Expect to Parse: tasks/coverage/test262/test/annexB/built-ins/RegExp/incomplete_hex_unicode_escape.js

e.g https://github.com/tc39/test262/blob/main/test/annexB/built-ins/RegExp/incomplete_hex_unicode_escape.js

/*---
description: An incomplete HexEscape or UnicodeEscape should be treated as an Identity Escape
info: |
    An incomplete HexEscape (e.g. /\x/) or UnicodeEscape (/\u/) should fall
    through to IdentityEscape
esid: prod-AtomEscape
---*/

// Hex escape
assert(/\x/.test("x"), "/\\x/");
assert(/\xa/.test("xa"), "/\\xa/");

// Unicode escape
assert(/\u/.test("u"), "/\\u/");
assert(/\ua/.test("ua"), "/\\ua/");

@leaysgur
Copy link
Contributor

leaysgur commented Aug 20, 2024

Thank you!

I will try to fix them reported by rg 'Expect (to Parse|Syntax Error)' parser_*.snap | rg RegExp -i.

  • Stage 3 Modifiers
  • ES2025 Duplicated named capture group

These are not yet supported. => Skipped fdae394

  • parser_test262.snap
    • Fixed ✨
  • parser_typescript.snap
    • Expect Syntax Error: tasks/coverage/typescript/tests/cases/compiler/firstMatchRegExpMatchArray.ts
      • Not related 🤔

@leaysgur leaysgur marked this pull request as ready for review August 21, 2024 01:28
@leaysgur
Copy link
Contributor

@Boshen Is this what you are looking for...?!

image

@Boshen
Copy link
Member Author

Boshen commented Aug 21, 2024

@Boshen Is this what you are looking for...?!

image

yes yes yes!


The next batch seems to the ones in tasks/coverage/parser_typescript.snap

e.g.

Expect Syntax Error: tasks/coverage/typescript/tests/cases/conformance/es6/unicodeExtendedEscapes/unicodeExtendedEscapesInRegularExpressions08.ts

@Boshen
Copy link
Member Author

Boshen commented Aug 21, 2024

I ran this in our E2E test suite and got the following errors.

Error
"node_modules/.pnpm/[email protected][email protected]/node_modules/ajv-formats/dist/formats.js"
  x Invalid regular expression: Too many capturing groups

Error
"node_modules/.pnpm/[email protected][email protected]/node_modules/ajv-formats/src/formats.ts"
  x Invalid regular expression: Too many capturing groups

Error
"node_modules/.pnpm/[email protected][email protected]/node_modules/ajv-formats/src/formats.js"
  x Invalid regular expression: Too many capturing groups

Error
"node_modules/.pnpm/@[email protected][email protected][email protected][email protected]_@[email protected]_less@4._t2hmrj4los77odlqbefbkljiye/node_modules/@nuxt/devtools/dist/client/_nuxt/JOMs3kJS.js"
  x Invalid regular expression: Too many capturing groups

Error
"node_modules/.pnpm/[email protected][email protected]/node_modules/ajv-formats/dist/formats.js"
  x Invalid regular expression: Too many capturing groups

Error
"node_modules/.pnpm/[email protected][email protected]/node_modules/ajv-formats/src/formats.ts"
  x Invalid regular expression: Too many capturing groups

Error
"node_modules/.pnpm/[email protected][email protected]/node_modules/ajv-formats/src/formats.js"
  x Invalid regular expression: Too many capturing groups

It's missing spans.

ajv-formats has some crazy regexs lol

exports.fastFormats = {
    ...exports.fullFormats,
    date: fmtDef(/^\d\d\d\d-[0-1]\d-[0-3]\d$/, compareDate),
    time: fmtDef(/^(?:[0-2]\d:[0-5]\d:[0-5]\d|23:59:60)(?:\.\d+)?(?:z|[+-]\d\d(?::?\d\d)?)$/i, compareTime),
    "date-time": fmtDef(/^\d\d\d\d-[0-1]\d-[0-3]\dt(?:[0-2]\d:[0-5]\d:[0-5]\d|23:59:60)(?:\.\d+)?(?:z|[+-]\d\d(?::?\d\d)?)$/i, compareDateTime),
    "iso-time": fmtDef(/^(?:[0-2]\d:[0-5]\d:[0-5]\d|23:59:60)(?:\.\d+)?(?:z|[+-]\d\d(?::?\d\d)?)?$/i, compareIsoTime),
    "iso-date-time": fmtDef(/^\d\d\d\d-[0-1]\d-[0-3]\d[t\s](?:[0-2]\d:[0-5]\d:[0-5]\d|23:59:60)(?:\.\d+)?(?:z|[+-]\d\d(?::?\d\d)?)?$/i, compareIsoDateTime),
    // uri: https://github.com/mafintosh/is-my-json-valid/blob/master/formats.js
    uri: /^(?:[a-z][a-z0-9+\-.]*:)(?:\/?\/)?[^\s]*$/i,
    "uri-reference": /^(?:(?:[a-z][a-z0-9+\-.]*:)?\/?\/)?(?:[^\\\s#][^\s#]*)?(?:#[^\\\s]*)?$/i,
    // email (sources from jsen validator):
    // http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address#answer-8829363
    // http://www.w3.org/TR/html5/forms.html#valid-e-mail-address (search for 'wilful violation')
    email: /^[a-z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?(?:\.[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?)*$/i,
};

@leaysgur
Copy link
Contributor

leaysgur commented Aug 21, 2024

parser_typescript.snap

  • Expect Syntax Error: tasks/coverage/typescript/tests/cases/conformance/es6/unicodeExtendedEscapes/
    • unicodeExtendedEscapesInRegularExpressions01.ts
    • unicodeExtendedEscapesInRegularExpressions02.ts
    • unicodeExtendedEscapesInRegularExpressions03.ts
    • unicodeExtendedEscapesInRegularExpressions04.ts
    • unicodeExtendedEscapesInRegularExpressions05.ts
    • unicodeExtendedEscapesInRegularExpressions06.ts
    • unicodeExtendedEscapesInRegularExpressions08.ts
    • unicodeExtendedEscapesInRegularExpressions09.ts
    • unicodeExtendedEscapesInRegularExpressions10.ts
    • unicodeExtendedEscapesInRegularExpressions11.ts
    • unicodeExtendedEscapesInRegularExpressions13.ts
    • unicodeExtendedEscapesInRegularExpressions15.ts
    • unicodeExtendedEscapesInRegularExpressions16.ts
    • unicodeExtendedEscapesInRegularExpressions18.ts

These all seem to be errors due to the use of the u flag in target: es5.

I think we do not have such a mechanism, what should we do?

Missing numbers like 07, 12 are include invalid RegExp patterns. Therefore they are not listed.

  • Expect Syntax Error: tasks/coverage/typescript/tests/cases/conformance/parser/ecmascript5/RegularExpressions/
    • parseRegularExpressionMixedWithComments.ts
    • parserRegularExpression2.ts
    • parserRegularExpression3.ts
    • parserRegularExpression4.ts
    • parserRegularExpression5.ts
    • parserRegularExpressionDivideAmbiguity1.ts
    • parserRegularExpressionDivideAmbiguity2.ts
    • parserRegularExpressionDivideAmbiguity5.ts

Here too, in all cases, the RegExp are used but errors are not related to RegExp pattern itself.


I will add Spans for Too many capturing groups tomorrow~!

@Boshen
Copy link
Member Author

Boshen commented Aug 21, 2024

These all seem to be errors due to the use of the u flag in target: es5.

I think we do not have such a mechanism, what should we do?

Here too, in all cases, the RegExp are used but errors are not related to RegExp pattern itself.

Let's skip these.

@Boshen
Copy link
Member Author

Boshen commented Aug 21, 2024

I will add Spans for Too many capturing groups tomorrow~!

They seem to be false positives too 😅

@leaysgur
Copy link
Contributor

@Boshen

  • Ignore TS tests using target:es5: 67ddf09
    • tasks/coverage/typescript/tests/cases/conformance/parser/ecmascript5/RegularExpressions/ are left as is
    • These were bugs of oxc_parser itself from the start 👀
  • Fix false positive Too many capturing groups: 195ff4d
    • Just a mistake, 2 ^ 32 should be 2.pow(32) in Rust...
  • Add Spans for all error reports: 353d4d6

@Boshen Boshen added the 0-merge Merge with Graphite Merge Queue label Aug 22, 2024
Copy link

graphite-app bot commented Aug 22, 2024

Merge activity

  • Aug 21, 11:06 PM EDT: The merge label 'merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Aug 21, 11:09 PM EDT: Boshen added this pull request to the Graphite merge queue.
  • Aug 21, 11:13 PM EDT: Boshen merged this pull request with the Graphite merge queue.

@Boshen
Copy link
Member Author

Boshen commented Aug 22, 2024

Nice work! 💯

The last favor I wanna ask is whether you'd like to review all the code once more and see if there are any refactoring opportunities or performance improves that can be found.

I always do another round of reviews myself when I finish such a big project, because I often forget what I wrote in the first place 😅

@Boshen
Copy link
Member Author

Boshen commented Aug 22, 2024

My next task would be adding the regex to the AST and have it accessible in the linter. #5060

Many false positives and incorrect errors. @leaysgur Enjoy 😁

Run `just conformance` to update the snapshot.
@graphite-app graphite-app bot merged commit afe728a into main Aug 22, 2024
26 checks passed
@graphite-app graphite-app bot deleted the parser-add-regex branch August 22, 2024 03:13
@leaysgur
Copy link
Contributor

✌🏻

I always do another round of reviews myself when I finish such a big project, because I often forget what I wrote in the first place 😅

Coincidence!
I’ve been working on something similar, and I already have some notes for that. 😎

However, there are a few things I need to confirm with you, so I’ll create another issue later.

@oxc-bot oxc-bot mentioned this pull request Aug 23, 2024
Boshen added a commit that referenced this pull request Aug 23, 2024
## [0.25.0] - 2024-08-23

- 78f135d ast: [**BREAKING**] Remove `ReferenceFlag` from
`IdentifierReference` (#5077) (Boshen)

- f2b8d82 semantic: [**BREAKING**] `ScopeTree::get_child_ids` +
`get_child_ids_mut` return value not `Option` (#5058) (overlookmotel)

- 5f4c9ab semantic: [**BREAKING**] Rename `SymbolTable::get_flag` to
`get_flags` (#5030) (overlookmotel)

- 58bf215 semantic: [**BREAKING**] Rename `Reference::flag` and
`flag_mut` methods to plural (#5025) (overlookmotel)

- c4c08a7 ast: [**BREAKING**] Rename
`IdentifierReference::reference_flags` field (#5024) (overlookmotel)

- d262a58 syntax: [**BREAKING**] Rename `ReferenceFlag` to
`ReferenceFlags` (#5023) (overlookmotel)

- c30e2e9 semantic: [**BREAKING**] `Reference::flag` method return
`ReferenceFlag` (#5019) (overlookmotel)

- ce4d469 codegen: [**BREAKING**] Remove const generic `MINIFY` (#5001)
(Boshen)

- b2ff2df parser: [**BREAKING**] Remove builder pattern from `Parser`
struct (#5000) (Boshen)

- f88970b ast: [**BREAKING**] Change order of fields in CallExpression
(#4859) (Burlin)

### Features

- 714373d ast: `inherit_variants!` macro add `into_*` methods (#5005)
(overlookmotel)
- 6800e69 oxc: Add `Compiler` and `CompilerInterface` (#4954) (Boshen)
- 2b21be3 oxc_minifier: Define plugin with postfix wildcard (#4979)
(IWANABETHATGUY)
- afe728a parser: Parse regular expression with regex parser (#4998)
(Boshen)
- 4b49cf8 transformer: Always pass in symbols and scopes (#5087)
(Boshen)
- f51d3f9 transformer/nullish-coalescing-operator: Handles nullish
coalescing expression in the FormalParamter (#4975) (Dunqing)
- f794870 transformer/nullish-coalescing-operator: Generate the correct
binding name (#4974) (Dunqing)
- 72ff2c6 transformer/nullish-coalescing-operator: Add comments in top
of file (#4972) (Dunqing)
- 6b885fe traverse: Expose `generate_uid_based_on_node` and
`generate_uid_in_current_scope_based_on_node` from `TraverseCtx` (#4965)
(Dunqing)

### Bug Fixes

- 7f3129e ast: Correct code comment (#5004) (overlookmotel)
- 1bd9365 coverage: Correctly check semantic data after transform
(#5035) (Boshen)
- 185eb20 isolated_declarations: Namespaces that are default exported
should be considered for expando functions (#4935) (michaelm)
- 2a5e15d npm: `libc` field should not be `null` (Boshen)
- efbdced parser: Only show flow error if it's a flow file (#5069)
(Boshen)
- ad2be97 semantic: Incorrect semantic check for label has same name
(#5041) (heygsc)
- d5de97d semantic: Transform checker check reference flags (#5092)
(overlookmotel)
- 90c74ee semantic: Transform checker check reference symbol IDs (#5090)
(overlookmotel)
- a8005b9 semantic: Transform checker check symbol redeclarations
(#5089) (overlookmotel)
- 205bff7 semantic: Transform checker check symbol references (#5088)
(overlookmotel)
- 4a57086 semantic: Transform checker check symbol IDs (#5078)
(overlookmotel)
- ea7d216 semantic: Transform checker check symbol spans (#5076)
(overlookmotel)
- 1b6b27a semantic: Transform checker check symbol flags (#5074)
(overlookmotel)
- 6d87b0f semantic: Fix error message for duplicated label (#5071)
(Boshen)
- 05fff16 semantic: Transform checker compare binding symbol IDs (#5057)
(overlookmotel)
- f187b71 semantic: Transform checker compare scope children (#5056)
(overlookmotel)
- b52c6a4 semantic: Transform checker compare scope parents (#5055)
(overlookmotel)
- da64014 semantic: Transform checker catch more scope flags mismatches
(#5054) (overlookmotel)
- 67d1a96 semantic: Transform checker compare scope flags (#5052)
(overlookmotel)
- 863b9cb semantic: Transform checker handle conditional scopes (#5040)
(overlookmotel)
- 47029c4 semantic: Transform checker output symbol names in errors
(#5038) (overlookmotel)
- 6ffbd78 transformer: Remove an `AstBuilder::copy` call from TS
namespace transform (#4987) (overlookmotel)
- a8dfdda transformer: Remove an `AstBuilder::copy` call from TS module
transform (#4986) (overlookmotel)
- 1467eb3 transformer: Remove an `AstBuilder::copy` call from TS enum
transform (#4985) (overlookmotel)
- 1365feb transformer: Remove an `AstBuilder::copy` call for TS
`AssignmentTarget` transform (#4984) (overlookmotel)
- edacf93 transformer: Remove an `AstBuilder::copy` call (#4983)
(overlookmotel)
- 3b35332 transformer/logical-assignment-operators: Fix semantic errors
(#5047) (Dunqing)- b7db235 Comments gen regression (#5003)
(IWANABETHATGUY)

### Documentation

- 178d1bd transformer: Add documentation for exponentiation-operator
plugin (#5084) (Dunqing)
- d50eb72 transformer: Add documentation for `optional-catch-binding`
plugin (#5064) (Dunqing)
- 4425b17 transformer: Add documentation for
`logical-assignment-operators` plugin (#5012) (Dunqing)
- 1bd5853 transformer: Updated README re: order of methods (#4993)
(overlookmotel)

### Refactor

- a4247e9 allocator: Move `Box` and `Vec` into separate files (#5034)
(overlookmotel)
- cca7440 ast: Replace `AstBuilder::move_statement_vec` with `move_vec`
(#4988) (overlookmotel)
- 4012260 ast: `AstBuilder::move_identifier_reference` do not allocate
empty string (#4977) (overlookmotel)
- 96422b6 ast: Make AstBuilder non-exhaustive (#4925) (DonIsaac)
- ca70cc7 linter, mangler, parser, semantic, transformer, traverse,
wasm: Rename various `flag` vars to `flags` (#5028) (overlookmotel)
- 0f64d10 minifier: Remove duplicated helper `move_out_expression`
(#5007) (IWANABETHATGUY)
- cd9cf5e oxc: Remove `remove_whitespace` (Boshen)
- b4407c4 oxc,mangler: `oxc` crate add mangler; mangler use options API
(Boshen)
- 9da6a21 semantic: Rename transform checker output for reference symbol
mismatches (#5091) (overlookmotel)
- fb46eaf semantic: Add remap functions to transform checker (#5082)
(overlookmotel)
- a00bf18 semantic: Add `IdMapping` to transform checker (#5079)
(overlookmotel)
- b14a302 semantic: Transform checker: change symbol name mismatch error
(#5075) (overlookmotel)
- b8c6ce5 semantic: Rename vars in transform checker (#5072)
(overlookmotel)
- 7156fd2 semantic: Transform checker `Pair` structure (#5053)
(overlookmotel)
- 0ba6f50 semantic: Simplify raising errors in transform checker (#5051)
(overlookmotel)
- ee7ac8b semantic: Store all data in `PostTransformChecker` in
transform checker (#5050) (overlookmotel)
- 4e1f4ab semantic: Add `SemanticIds` to transformer checker (#5048)
(overlookmotel)
- c1da574 semantic: Add comments to transformer checker (#5045)
(overlookmotel)
- 8cded08 semantic: Rename error labels in transformer checker snapshots
(#5044) (overlookmotel)
- 602244f semantic: Rename vars in transformer checker (#5043)
(overlookmotel)
- ae94b9a semantic: Remove unused function params in transformer checker
(#5042) (overlookmotel)
- 586e15c semantic: Reformat transform checker errors (#5039)
(overlookmotel)
- d69e34e semantic: Fix indentation (#5037) (overlookmotel)
- 4336a32 semantic: Rename fields in snapshots from `flag` to `flags`
(#5032) (overlookmotel)
- 83dfb14 semantic: Rename vars from `flag` to `flags` (#5031)
(overlookmotel)
- 3b7de18 semantic: Rename `SemanticBuilder::current_reference_flags`
field (#5027) (overlookmotel)
- 0bacdd8 semantic: Rename `Reference::flag` field to `flags` (#5026)
(overlookmotel)
- 896b92f semantic: Correct typo in doc comment (#5009) (overlookmotel)
- d677b8e semantic: Do not reserve space in `resolved_references`
(#4962) (overlookmotel)
- a7ef30d semantic: `UnresolvedReferencesStack` contain only
`ReferenceId` (#4960) (overlookmotel)
- 59d15c7 semantic: `root_unresolved_references` contain only
`ReferenceId` (#4959) (overlookmotel)
- 7706523 span: Clarify `Atom` conversion methods lifetimes (#4978)
(overlookmotel)
- 4fdf26d transform_conformance: Add driver (#4969) (Boshen)
- 8d15e65 transformer: Use `into_member_expression` (#5006)
(overlookmotel)
- 4796ece transformer: TS annotations transform use `move_expression`
(#4982) (overlookmotel)
- a9fcf29 transformer/es2016: Move all entry points to implementation of
Traverse trait (#5085) (Dunqing)
- deda6ac transformer/es2019: Move all entry points to implementation of
Traverse trait (#5065) (Dunqing)
- 9df2f80 transformer/es2020: Move all entry points to implementation of
Traverse trait (#4973) (Dunqing)
- 3f9433c transformer/es2021: Move all entry points to implementation of
Traverse trait (#5013) (Dunqing)
- c60a50d transformer/exponentiation-operator: Use built-in
`ctx.clone_identifier_reference` (#5086) (Dunqing)
- bcc8da9 transformer/logical-assignment-operator: Use
`ctx.clone_identifier_reference` (#5014) (Dunqing)
- 38d4434 transformer/nullish-coalescing-operator: Move internal methods
to bottom of the file (#4996) (Dunqing)

### Testing

- 0df1a94 semantic: Add more symbol and reference checks to
`PostTransformChecker` (Boshen)

Co-authored-by: Boshen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0-merge Merge with Graphite Merge Queue A-parser Area - Parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants