-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix seg fault in scanner.c: Allocate the stack array if null #136
Conversation
@ObserverOfTime @fwcd please check this PR. This is a serious issue. Probably there is another cause for |
Can you post the crashing tests? |
@ObserverOfTime It's not easy to reproduce it with a simple example. It's happening in our private repo and there are a lot of rewriting happening. I think something goes wrong in the Stack serialization/deserialization. Do other tree-sitter parsers use the same mechanism for serializing/deserializing as in your refactoring? |
Also, regardless of the example, under which circumstances |
Can you at least run the fuzzer (on a Linux machine) using your repo's Kotlin files as the corpus? make fuzz LANG_NAME=kotlin LANG_DIR=/path/to/tree-sitter-kotlin CORPUS_DIR=/path/to/corpus
The built-in array API handles the |
I don’t have access to Linux unfortunately, only Mac. I looked a bit more into the example that triggered the unsafe memory access, and it's not the example that is problematic. What's happening is that there are many changes to the file, and the incremental parser tries to deserialize the scanner and this happens.
Two questions here:
- Is this the way Scanner deserialization is supposed to work in tree-sitter? Do other languages work like this?
- What are the guarantees here for `stack->contents` not being null? @amaanq
|
Just fyi, the fuzzer does actually work on macOS, you just need to use the Homebrew-installed version of LLVM ( PATH="/opt/homebrew/opt/llvm/bin:$PATH" make LANG_NAME=kotlin LANG_DIR=path/to/tree-sitter-kotlin ... (Perhaps a note on this could be added to the fuzzer repo, might be useful for other developers on macOS?) |
I don't know what's going on but when I run the fuzzer using the instruction, I get this:
Also, not sure why it would help. As I mentioned, this is happening when modifying a single file, serializing/deserializing. Not gonna hit this bug with a single file parsing. You'll gonna hit this bug in an IDE, for example. Anyway, I think now I have the more proper fix. I was looking which other tree-sitter grammars use an external scanner and saw Python. If you look here: You'll see that before
Please check my new commit that uses |
3a47896
to
2291c83
Compare
Neat, thanks for digging into this and for the fix! |
#115 introduces a segfault in the following function in
scanner.c
. I noticed(signal: 11, SIGSEGV: invalid memory reference
in some of our tests after upgrading.This PR fixes the segfault by adding a null check here.