Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

newline causes erroneous class parsing #75

Open
brandonspark opened this issue Mar 14, 2023 · 0 comments
Open

newline causes erroneous class parsing #75

brandonspark opened this issue Mar 14, 2023 · 0 comments
Labels
bug Something isn't working grammar Related to the grammar help wanted Extra attention is needed

Comments

@brandonspark
Copy link
Contributor

brandonspark commented Mar 14, 2023

Current behavior:

Suppose that I had a class that looked like this:

public class Foo constructor () { 
   < stuff in here >
}

This is meant to be parsed in the grammar as a class declaration, with a primary_constructor. Indeed, we see it parsed as such with the current grammar:

(source_file [0, 0] - [1, 0]
  (class_declaration [0, 0] - [0, 35]
    (modifiers [0, 0] - [0, 6]
      (visibility_modifier [0, 0] - [0, 6]))
    (type_identifier [0, 13] - [0, 16])
    (primary_constructor [0, 17] - [0, 31])
    (class_body [0, 32] - [0, 35])))

If I add a newline, however:

public class Foo
constructor () {
  < stuff in here>
}

it parses as something different.

(source_file [0, 0] - [2, 0]
  (class_declaration [0, 0] - [0, 16]
    (modifiers [0, 0] - [0, 6]
      (visibility_modifier [0, 0] - [0, 6]))
    (type_identifier [0, 13] - [0, 16]))
  (call_expression [1, 0] - [1, 18]
    (call_expression [1, 0] - [1, 14]
      (simple_identifier [1, 0] - [1, 11])
      (call_suffix [1, 12] - [1, 14]
        (value_arguments [1, 12] - [1, 14])))
    (call_suffix [1, 15] - [1, 18]
      (annotated_lambda [1, 15] - [1, 18]
        (lambda_literal [1, 15] - [1, 18])))))

The reason for this ends up being that the newline induces an opportunity for an automatic semicolon, so the token stream looks like:

PUBLIC CLASS "Foo" SEMICOLON CONSTRUCTOR ( ) ...

This makes the parser think that, since a class_declaration can have many optional things, that the public class Foo is a standalone class (with not very many things in it), and the thing that follows it is a call expression of constructor on a lambda or something, I think.

Expected behavior:
The above example should correctly parse to a single class declaration, with a primary constructor, as opposed to two top-level entities. This could be done by suppressing the insertion of the automatic semicolon in such a spot (which may require refactoring and heavier state being carried in the external scanner), or some grammar-level hacking.

Let me know what you think! I am trying to add better support for Kotlin in an open-source static analysis tool, Semgrep (https://github.com/returntocorp/semgrep), and this is blocking my ability to do so.

@fwcd fwcd added bug Something isn't working grammar Related to the grammar labels Oct 6, 2023
@VladimirMakaev VladimirMakaev added the help wanted Extra attention is needed label Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working grammar Related to the grammar help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants