Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate TLA+/PlusCal tree-sitter grammar? #248

Closed
ahelwer opened this issue Feb 1, 2022 · 6 comments
Closed

Integrate TLA+/PlusCal tree-sitter grammar? #248

ahelwer opened this issue Feb 1, 2022 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@ahelwer
Copy link

ahelwer commented Feb 1, 2022

The TLA+ tree-sitter grammar now supports PlusCal: https://github.com/tlaplus-community/tree-sitter-tlaplus/

You can see how to consume the grammar from TypeScript here: https://github.com/tlaplus-community/tlaplus-tool-dev-examples/tree/main/tree-sitter/typescript

There are a few ways the grammar could help:

  • Much more robust syntax highlighting compared to regex-based methods (however would have to use semantic highlighting API probably)
  • Code navigation (find all references, go to definition, etc.)
  • Code folding

Basically the grammar gives you access to an incrementally-updated error-tolerant queryable syntax tree. Let me know whether you think this would be a useful addition (or how else it could be used) and I will be happy to help with integration.

@lemmy
Copy link
Member

lemmy commented Feb 1, 2022

@edaena, I believe you're familiar with the relevant parts of the extension. One question to answer is how this integrates with SANY. For example, tree-sitter rejecting a spec shouldn't prevent users from launching TLC if SANY is happy, and vice versa. From experience with TLAPS, we know that two parsers disagree in subtle ways (which is why there is an (incomplete) effort to replace the TLAPS parser with SANY). Related, I'd suggest thinking about integrating with the PlusCal translator, such as users modifying the PlusCal without retranslating before model-checking...

Note that the proposed code navigation already exists.

Related:

@ahelwer
Copy link
Author

ahelwer commented Feb 1, 2022

Of note, tree-sitter doesn't really reject specs; it will provide as good of a parse as it can get, and end up looking something like (sample taken from different language):

(source_file [0, 0] - [3, 0]
  (class_def [0, 0] - [2, 1]
    (class [0, 0] - [0, 8])
    (class_method_name [1, 3] - [1, 6])
    (function_block [1, 6] - [1, 19]
      (parameter_list [1, 7] - [1, 18]
        (argument [1, 9] - [1, 11]
          name: (identifier [1, 9] - [1, 11]))
        (ERROR [1, 11] - [1, 17]
          (integer [1, 13] - [1, 17]))))))

There is a flag you can check to see whether there are any error nodes in the parse tree, but unless you go hunting for them you won't know they're there.

You can see the real-time parse tree (with errors or no) generated by the TLA+ tree-sitter grammar here: https://tlaplus-community.github.io/tree-sitter-tlaplus/

In the future, code navigation could be augmented by stack graphs which are built on tree-sitter grammars. I'm assuming how it works now is analogous to a string search in the file. Moving to tree-sitter grammar queries would make that a bit more robust, but you wouldn't get cross-file referencing.

@lemmy
Copy link
Member

lemmy commented Feb 1, 2022

To clarify: Are you're saying that the list of use-cases is exhaustive s.t. tree-sitter should only replace regex-based syntax highlighting and improve code navigation/completion? For everything else, however, SANY remains authoritative? This then sounds reasonable, especially, if syntax highlighting and code navigation continue to work in the presence of parse failures.
By the way, does the tree-sitter parser implement SANY's lookup sequence for extended, instantiated, nested, and re-defined modules (regardless of where they come from)? If not, a user might end up looking at a different module than what the tools will work with (compare #181 and #218).

In the light of #239, what are the (runtime) dependencies of tree-sitter?

@ahelwer
Copy link
Author

ahelwer commented Feb 1, 2022

Yes, SANY remains authoritative. Tree-sitter doesn't even have the capability to print error messages or anything, it's purely designed to support language tooling. I don't think my list of use-cases is exhaustive really, there are other possibilities like code folding or code formatting/linting that other people have implemented.

Tree-sitter does not implement any semantic analysis or restrictions whatsoever, it is purely a syntax-level tool against which you can write queries. The lookup logic (outside of a single file) could eventually be implemented by encoding it as stack graph rules, although stack graphs still don't have good developer documentation and are only currently used for code navigation on a handful of languages on github itself.

Tree-sitter grammars are all C and C++ files. The generated grammar is a 32 MB C file (src/parser.c in the repo) and the handwritten context-sensitive parsing logic for junction lists and proofs is in src/scanner.cc. When compiled this is about 3 MB in size, and it links to the tree-sitter library which is a few hundred KB in size. This can all easily be compiled with emscripten into WASM (this is what's running in the web demo I linked up above, see https://github.com/tlaplus-community/tlaplus-community.github.io/tree/main/tree-sitter-tlaplus/js). The WASM is also available as a pre-built file in the NPM module.

@lemmy lemmy added enhancement New feature or request help wanted Extra attention is needed labels Feb 13, 2022
@lemmy
Copy link
Member

lemmy commented Feb 17, 2022

@ahelwer
Copy link
Author

ahelwer commented Feb 17, 2022

Oh hey neat, that must be new! People have been asking for native vs code support for tree-sitter for a while now. Guess this is the compromise.

@ahelwer ahelwer closed this as not planned Won't fix, can't repro, duplicate, stale Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Development

No branches or pull requests

2 participants