-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Introduce custom identifier extraction mecanism #62
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Review or Edit in CodeSandboxOpen the branch in Web Editor • VS Code • Insiders |
jpedroh
force-pushed
the
feat-introduce-identifier
branch
from
July 20, 2024 18:02
76756d8
to
d6ada29
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Up until now, we used matching handlers to capture special node identifiers, diverging from their intended purpose. This approach also required users to add Rust code to the project, limiting the tool's generalization and extensibility.
This PR changes how node identifiers are extracted by moving the extraction process to the parsing step. This improves performance (as it runs only once per parse) and leverages Tree-sitter’s pattern matching query functionality.
Users can now provide a configuration with node types and a Tree-sitter query expression to extract identifiers. For example, in a Java class, a user can extract a field declaration identifier using the query
(variable_declarator name: _ @field_name)
, which captures the field name.However, Tree-sitter pattern matching can fall short in some cases. For instance, when trying to retrieve the identifier for a class with an inner class:
Using the query
(class_declaration (identifier) @class_name)
matches both classes A and B, resulting in[A, B]
as the identifier, which is incorrect. Since Tree-sitter’s query language doesn’t support matching a single entry - this has to be done in userland code, which would complicate the identifier extraction process.To address this, this PR introduces the option to use a Regular Expression for identifier extraction. The regular expression runs on the node’s source code and captures only the first match. In this case,
class [A-Za-z_][A-Za-z0-9_]*
correctly matches the class name, and we can safely discard the match for class B (since only the first match is considered).These changes simplify the introduction of new extractors and eliminate approximately 600 lines of Rust code (tests and source code) previously used for node identifier extraction.