Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug with raw string handling #162

Open
DavisVaughan opened this issue Nov 25, 2024 · 3 comments
Open

Possible bug with raw string handling #162

DavisVaughan opened this issue Nov 25, 2024 · 3 comments

Comments

@DavisVaughan
Copy link
Member

> r"(foo(arg = "\\dots", ...))"
[1] "foo(arg = \"\\\\dots\", ...)"


> treesitter::text_parse('r"(foo(arg = "\\dots", ...))"', treesitter.r::language())
<tree_sitter_node>

── Text ──────────────────────────────────────────────────────────────────────────────────────────────────────
r"(foo(arg = "\dots", ...))"

── S-Expression ──────────────────────────────────────────────────────────────────────────────────────────────
(program [(0, 0), (0, 28)]
  (identifier [(0, 0), (0, 1)])
  (string [(0, 1), (0, 14)]
    "\"" [(0, 1), (0, 2)]
    content: (string_content [(0, 2), (0, 13)])
    "\"" [(0, 13), (0, 14)]
  )
  (ERROR [(0, 14), (0, 15)]
    "\\" [(0, 14), (0, 15)]
  )
  (identifier [(0, 15), (0, 19)])
  (string [(0, 19), (0, 28)]
    "\"" [(0, 19), (0, 20)]
    content: (string_content [(0, 20), (0, 27)])
    "\"" [(0, 27), (0, 28)]
  )
)
@DavisVaughan
Copy link
Member Author

Odd minimal reprex

works

treesitter::text_parse('r"( () )"', treesitter.r::language())
#> <tree_sitter_node>
#> 
#> ── Text ────────────────────────────────────────────────────────────────────────
#> r"( () )"
#> 
#> ── S-Expression ────────────────────────────────────────────────────────────────
#> (program [(0, 0), (0, 9)]
#>   (string [(0, 0), (0, 9)])
#> )

works

treesitter::text_parse('r"(() )"', treesitter.r::language())
#> <tree_sitter_node>
#> 
#> ── Text ────────────────────────────────────────────────────────────────────────
#> r"(() )"
#> 
#> ── S-Expression ────────────────────────────────────────────────────────────────
#> (program [(0, 0), (0, 8)]
#>   (string [(0, 0), (0, 8)])
#> )

doesnt work

treesitter::text_parse('r"(())"', treesitter.r::language())
#> <tree_sitter_node>
#> 
#> ── Text ────────────────────────────────────────────────────────────────────────
#> r"(())"
#> 
#> ── S-Expression ────────────────────────────────────────────────────────────────
#> (program [(0, 0), (0, 7)]
#>   (identifier [(0, 0), (0, 1)])
#>   (string [(0, 1), (0, 7)]
#>     "\"" [(0, 1), (0, 2)]
#>     content: (string_content [(0, 2), (0, 6)])
#>     "\"" [(0, 6), (0, 7)]
#>   )
#> )

doesnt work

treesitter::text_parse('r"( ())"', treesitter.r::language())
#> <tree_sitter_node>
#> 
#> ── Text ────────────────────────────────────────────────────────────────────────
#> r"( ())"
#> 
#> ── S-Expression ────────────────────────────────────────────────────────────────
#> (program [(0, 0), (0, 8)]
#>   (identifier [(0, 0), (0, 1)])
#>   (string [(0, 1), (0, 8)]
#>     "\"" [(0, 1), (0, 2)]
#>     content: (string_content [(0, 2), (0, 7)])
#>     "\"" [(0, 7), (0, 8)]
#>   )
#> )

so the trailing double )) is an issue?

@DavisVaughan
Copy link
Member Author

DavisVaughan commented Nov 25, 2024

Ah, got it. It's a bad interaction of calling advance() in the loop increment AND in the loop body. It really does only affect this weird case of back-to-back ) (maybeeee you can construct a case with - involved, not sure)

  • Line 330: Hit first ), advance to 2nd )
  • Line 351: ) is not a closing quote character, continue
  • Loop increment: Advance to ", skipping handling of )!

tree-sitter-r/src/scanner.c

Lines 328 to 352 in a0d3e33

for (; lexer->lookahead != 0; lexer->advance(lexer, false)) {
// consume a closing bracket
if (lexer->lookahead != closing_bracket) {
continue;
}
lexer->advance(lexer, false);
// consume hyphens
bool hyphens_ok = true;
for (int i = 0; i < hyphen_count; i++) {
if (lexer->lookahead != '-') {
hyphens_ok = false;
break;
}
lexer->advance(lexer, false);
}
if (!hyphens_ok) {
continue;
}
// consume a closing quote character
if (lexer->lookahead != quote) {
continue;
}

I imagine that if we change this to a while loop and only increment in the body, it should work. Probably need to add an advance() in that first if (lexer->lookahead != closing_bracket) { branch.

@DavisVaughan
Copy link
Member Author

A few more test cases

r"((\d+))" # -> yellow wave under the "r"
stringr::str_extract("foo123", r"((\d+))")  # -> red wave under the "r"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant