-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Introduce custom identifier extraction mecanism (#62)
Up until now, we used matching handlers to capture special node identifiers, diverging from their intended purpose. This approach also required users to add Rust code to the project, limiting the tool's generalization and extensibility. This PR changes how node identifiers are extracted by moving the extraction process to the parsing step. This improves performance (as it runs only once per parse) and leverages Tree-sitter’s pattern matching query functionality. Users can now provide a configuration with node types and a Tree-sitter query expression to extract identifiers. For example, in a Java class, a user can extract a field declaration identifier using the query `(variable_declarator name: _ @field_name)`, which captures the field name. However, Tree-sitter pattern matching can fall short in some cases. For instance, when trying to retrieve the identifier for a class with an inner class: ```java class A { class B { } } ``` Using the query `(class_declaration (identifier) @class_name)` matches both classes A and B, resulting in `[A, B]` as the identifier, which is incorrect. Since Tree-sitter’s query language doesn’t support matching a single entry - this has to be done in userland code, which would complicate the identifier extraction process. To address this, this PR introduces the option to use a Regular Expression for identifier extraction. The regular expression runs on the node’s source code and captures only the first match. In this case, `class [A-Za-z_][A-Za-z0-9_]*` correctly matches the class name, and we can safely discard the match for class B (since only the first match is considered). These changes simplify the introduction of new extractors and eliminate approximately 600 lines of Rust code (tests and source code) previously used for node identifier extraction.
- Loading branch information
Showing
24 changed files
with
275 additions
and
676 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
use model::CSTNode; | ||
|
||
pub trait Matches { | ||
fn matches(&self, right: &CSTNode) -> bool; | ||
} | ||
|
||
impl Matches for CSTNode<'_> { | ||
fn matches(&self, right: &CSTNode) -> bool { | ||
match (self, right) { | ||
(CSTNode::Terminal(left), CSTNode::Terminal(right)) => { | ||
left.get_identifier() == right.get_identifier() | ||
} | ||
(CSTNode::NonTerminal(left), CSTNode::NonTerminal(right)) => { | ||
left.kind == right.kind && left.get_identifier() == right.get_identifier() | ||
} | ||
(_, _) => false, | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.