#632 Buffer chunks to avoid partial matches in nearleyc #635

naraen · 2023-04-06T15:07:45Z

Issue : #632

Cause : The StreamWriter receives tokens in chunks and passes it to the parser which in turn immediately passes it to a lexer. The default chunk size is 64KB. Token for a production rule could span chunks leading the lexer to match a partial token based on on the incomplete chunk.

Fix : Production rules in the nearley grammar are new line \n delimited. Passing only passing characters upto the newline should always result in a full match. StreamWriter buffers any characters in the chunk past the last newline prepends it to the next chunk.

Implements the Writable._finish to flush the buffer. This addresses grammar files that may not have a newline at the end of the file.

naraen added 2 commits April 5, 2023 21:33

Concatentate partial token from subsequent chunk before passing to lexer

cbd767b

Flush buffer when EOF not preceded by line break

c6ce94c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#632 Buffer chunks to avoid partial matches in nearleyc #635

#632 Buffer chunks to avoid partial matches in nearleyc #635

naraen commented Apr 6, 2023

#632 Buffer chunks to avoid partial matches in nearleyc #635

Are you sure you want to change the base?

#632 Buffer chunks to avoid partial matches in nearleyc #635

Conversation

naraen commented Apr 6, 2023