Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce custom identifier extraction mecanism #62

Merged
merged 14 commits into from
Jul 21, 2024

Conversation

jpedroh
Copy link
Owner

@jpedroh jpedroh commented Jul 20, 2024

Up until now, we used matching handlers to capture special node identifiers, diverging from their intended purpose. This approach also required users to add Rust code to the project, limiting the tool's generalization and extensibility.

This PR changes how node identifiers are extracted by moving the extraction process to the parsing step. This improves performance (as it runs only once per parse) and leverages Tree-sitter’s pattern matching query functionality.

Users can now provide a configuration with node types and a Tree-sitter query expression to extract identifiers. For example, in a Java class, a user can extract a field declaration identifier using the query (variable_declarator name: _ @field_name), which captures the field name.

However, Tree-sitter pattern matching can fall short in some cases. For instance, when trying to retrieve the identifier for a class with an inner class:

class A {
    class B {
    }
}

Using the query (class_declaration (identifier) @class_name) matches both classes A and B, resulting in [A, B] as the identifier, which is incorrect. Since Tree-sitter’s query language doesn’t support matching a single entry - this has to be done in userland code, which would complicate the identifier extraction process.

To address this, this PR introduces the option to use a Regular Expression for identifier extraction. The regular expression runs on the node’s source code and captures only the first match. In this case, class [A-Za-z_][A-Za-z0-9_]* correctly matches the class name, and we can safely discard the match for class B (since only the first match is considered).

These changes simplify the introduction of new extractors and eliminate approximately 600 lines of Rust code (tests and source code) previously used for node identifier extraction.

Copy link

codesandbox bot commented Jul 20, 2024

Review or Edit in CodeSandbox

Open the branch in Web EditorVS CodeInsiders

Open Preview

@coveralls
Copy link

coveralls commented Jul 20, 2024

Coverage Status

coverage: 80.247% (-2.0%) from 82.296%
when pulling 4003a53 on feat-introduce-identifier
into f6d66a5 on main.

@jpedroh jpedroh force-pushed the feat-introduce-identifier branch from 76756d8 to d6ada29 Compare July 20, 2024 18:02
@jpedroh jpedroh marked this pull request as ready for review July 21, 2024 18:04
@jpedroh jpedroh merged commit 2c6a135 into main Jul 21, 2024
8 checks passed
@jpedroh jpedroh deleted the feat-introduce-identifier branch July 21, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants