Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix seg fault in scanner.c: Allocate the stack array if null #136

Merged
merged 2 commits into from
Aug 2, 2024

Conversation

afroozeh
Copy link
Contributor

@afroozeh afroozeh commented Aug 1, 2024

#115 introduces a segfault in the following function in scanner.c. I noticed
(signal: 11, SIGSEGV: invalid memory reference in some of our tests after upgrading.

void tree_sitter_kotlin_external_scanner_deserialize(void *payload, const char *buffer, unsigned length) {
  Stack *stack = (Stack *)payload;
  if (length > 0) {
    memcpy(stack->contents, buffer, length); // <<< stack.contents can be null
    stack->size = length;
  } else {
    array_clear(stack);
  }
}

This PR fixes the segfault by adding a null check here.

@github-actions github-actions bot added the scanner Related to the external scanner label Aug 1, 2024
@afroozeh
Copy link
Contributor Author

afroozeh commented Aug 1, 2024

@ObserverOfTime @fwcd please check this PR. This is a serious issue. Probably there is another cause for stack.content being null here, so I'm not sure if this is a good fix, also not sure about the de-allocation.

@ObserverOfTime
Copy link
Contributor

Can you post the crashing tests?

@afroozeh
Copy link
Contributor Author

afroozeh commented Aug 1, 2024

@ObserverOfTime It's not easy to reproduce it with a simple example. It's happening in our private repo and there are a lot of rewriting happening. I think something goes wrong in the Stack serialization/deserialization. Do other tree-sitter parsers use the same mechanism for serializing/deserializing as in your refactoring?
btw, this is the stack trace when the bad memory access happens.
Screenshot 2024-08-01 at 17 04 06

@afroozeh
Copy link
Contributor Author

afroozeh commented Aug 1, 2024

Also, regardless of the example, under which circumstances stack->contents can be null here?

@ObserverOfTime
Copy link
Contributor

It's not easy to reproduce it with a simple example. It's happening in our private repo and there are a lot of rewriting happening.

Can you at least run the fuzzer (on a Linux machine) using your repo's Kotlin files as the corpus?

make fuzz LANG_NAME=kotlin LANG_DIR=/path/to/tree-sitter-kotlin CORPUS_DIR=/path/to/corpus

Also, regardless of the example, under which circumstances stack->contents can be null here?

The built-in array API handles the contents field. You should ask @amaanq for details.

@afroozeh
Copy link
Contributor Author

afroozeh commented Aug 1, 2024 via email

@fwcd
Copy link
Owner

fwcd commented Aug 2, 2024

Just fyi, the fuzzer does actually work on macOS, you just need to use the Homebrew-installed version of LLVM (brew install llvm) since Apple Clang apparently does not bundle libfuzzer:

PATH="/opt/homebrew/opt/llvm/bin:$PATH" make LANG_NAME=kotlin LANG_DIR=path/to/tree-sitter-kotlin ...

(Perhaps a note on this could be added to the fuzzer repo, might be useful for other developers on macOS?)

@afroozeh
Copy link
Contributor Author

afroozeh commented Aug 2, 2024

I don't know what's going on but when I run the fuzzer using the instruction, I get this:

fuzzer(20459,0x1f3664c00) malloc: nano zone abandoned due to inability to reserve vm space.
ParseDictionaryFile: file does not exist or is empty
make: *** [fuzz] Error 1

Also, not sure why it would help. As I mentioned, this is happening when modifying a single file, serializing/deserializing. Not gonna hit this bug with a single file parsing. You'll gonna hit this bug in an IDE, for example.

Anyway, I think now I have the more proper fix. I was looking which other tree-sitter grammars use an external scanner and saw Python. If you look here:

https://github.com/tree-sitter/tree-sitter-python/blob/0dee05ef958ba2eae88d1e65f24b33cad70d4367/src/scanner.c#L403

You'll see that before memcpy the size of the array is checked and enlarged if needed using array_reserve. In this function, there is this check that malloc is called if array is null:

if (self->contents) {
      self->contents = ts_realloc(self->contents, new_capacity * element_size);
    } else {
      self->contents = ts_malloc(new_capacity * element_size);
    }

Please check my new commit that uses array_resize. This should fix the problem.

@fwcd
Copy link
Owner

fwcd commented Aug 2, 2024

Neat, thanks for digging into this and for the fix!

@fwcd fwcd merged commit e1a2d5a into fwcd:main Aug 2, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scanner Related to the external scanner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants