Skip to content

Latest commit

 

History

History
232 lines (184 loc) · 7.69 KB

README.md

File metadata and controls

232 lines (184 loc) · 7.69 KB

java-tree-sitter · MIT license javadoc Maven Central

Java bindings for tree-sitter. Originally developed by serenadeai.

This fork was originally created to simplify integration into DL4SE. Along the way, the project evolved to include useful features from other forks, while introducing support for those that were completely absent from the original project. Highlights include:

  • Incremental abstract syntax tree edits
  • Parser and lexer debugging through loggers
  • APIs for querying parsed abstract syntax trees
  • Support for both macOS and Linux out of the box
  • A wide range of languages supported out of the box
  • Streamlined native library construction, packaging and runtime loading
  • Safer interop with native code to minimize risks of segmentation faults
  • Direct mapping between the abstract syntax tree nodes and the source code contents
  • Multiple export formats: DOT, XML, symbolic expression and human-readable syntax trees
  • Various other quality-of-life improvements

Our end-goal is to offer all features that are available in the official tree-sitter bindings, features that one might expect from py-tree-sitter or node-tree-sitter.

Local development

Recursively clone the project with submodules:

git clone https://github.com/seart-group/java-tree-sitter.git --recursive

Or clone first and update the submodules afterward:

git clone https://github.com/seart-group/java-tree-sitter.git
git submodule update --init --recursive  
# or: git submodule init && git submodule update

Building dependency locally

To build the project for development purposes, all one has to do is run the following:

mvn clean package

This will generate both the header files in lib, as well as the shared library produced by build.py. For it to work, you must have the following installed:

Dependency Version
Java 11
Maven 3.9
Python 3.10
Docker 23

Adding dependency to project

To use in your own Maven project, include the following in your POM file:

<dependency>
  <groupId>ch.usi.si.seart</groupId>
  <artifactId>java-tree-sitter</artifactId>
  <version>1.12.0</version>
</dependency>

Example usage

First, load the shared object somewhere in your application:

import ch.usi.si.seart.treesitter.*;

public class Example {

    static {
        LibraryLoader.load();
    }
}

Then you can create a Parser initialized to a Language, and use it to parse a string of source code:

import ch.usi.si.seart.treesitter.*;

public class Example {
    
    // init omitted...

    public static void main(String[] args) {
        try (
            Parser parser = Parser.getFor(Language.PYTHON);
            Tree tree = parser.parse("def foo(bar, baz):\n  print(bar)\n  print(baz)")
        ) {
            Node root = tree.getRootNode();
            assert root.getChildCount() == 1;
            assert root.getType().equals("module");
            assert root.getStartByte() == 0;
            assert root.getEndByte() == 44;
            Node function = root.getChild(0);
            assert function.getType().equals("function_definition");
            assert function.getChildCount() == 5;
        } catch (Exception ex) {
            // ...
        }
    }
}

Use TreeCursor instances to traverse trees, as it is more efficient than both manual traversal, and through Node iterators:

import ch.usi.si.seart.treesitter.*;

public class Example {

    // init omitted...

    public static void main(String[] args) {
        String type;
        try (
            Parser parser = Parser.getFor(Language.PYTHON);
            Tree tree = parser.parse("def foo(bar, baz):\n  print(bar)\n  print(baz)");
            TreeCursor cursor = tree.getRootNode().walk()
        ) {
            type = cursor.getCurrentTreeCursorNode().getType();
            assert type.equals("module");
            cursor.gotoFirstChild();
            type = cursor.getCurrentTreeCursorNode().getType();
            assert type.equals("function_definition");
            cursor.gotoFirstChild();
            type = cursor.getCurrentTreeCursorNode().getType();
            assert type.equals("def");
            cursor.gotoNextSibling();
            cursor.gotoParent();
        } catch (Exception ex) {
            // ...
        }
    }
}

The Query class can be used to specify subtrees to match, while the QueryCursor can be used to iterate over matched nodes:

import ch.usi.si.seart.treesitter.*;

public class Example {

    // init omitted...

    public static void main(String[] args) {
        Language language = Language.PYTHON;
        try (
            Query query = Query.getFor(language, "(identifier) @target");
            Parser parser = Parser.getFor(language);
            Tree tree = parser.parse("def foo(bar, baz):\n  print(bar)\n  print(baz)");
            QueryCursor cursor = tree.getRootNode().walk(query)
        ) {
            int count = 0;
            for (QueryMatch match: cursor) count++;
            assert count == 7;
        } catch (Exception ex) {
            // ...
        }
    }
}

We also provide a way to print the syntax tree, similar to the online playground:

import ch.usi.si.seart.treesitter.*;
import ch.usi.si.seart.treesitter.printer.*;

public class Example {

    // init omitted...

    public static void main(String[] args) {
        try (
            Parser parser = Parser.getFor(Language.PYTHON);
            Tree tree = parser.parse("print(\"hi\")");
            TreeCursor cursor = tree.getRootNode().walk()
        ) {
            SyntaxTreePrinter printer = new SyntaxTreePrinter(cursor);
            String actual = printer.print();
            String expected =
                "module [0:0] - [0:11]\n" +
                "  expression_statement [0:0] - [0:11]\n" +
                "    call [0:0] - [0:11]\n" +
                "      function: identifier [0:0] - [0:5]\n" +
                "      arguments: argument_list [0:5] - [0:11]\n" +
                "        string [0:6] - [0:10]\n";
            assert expected.equals(actual);
        } catch (Exception ex) {
            // ...
        }
    }
}

If you would like to debug the parsing process, you can attach loggers directly to a Parser:

import ch.usi.si.seart.treesitter.*;
import org.slf4j.*;

public class Example {

    // init omitted...

    public static void main(String[] args) {
        Logger logger = LoggerFactory.getLogger(Example.class);
        try (Parser parser = Parser.getFor(Language.PYTHON)) {
            parser.setLogger(logger);
            parser.parse("pass").close();
        } catch (Exception ex) {
            // ...
        }
    }
}

For more usage examples, take a look at the tests. You can also refer to the full documentation here.