feature request: support incremental/streaming lexing #67

cartazio · 2015-06-29T04:04:04Z

in a number of application domains, I need to deal with handling streaming inputs in an incremental fashion, and having a streaming lexer / tokenization layer helps immensely with writing the layers on top.

If adding such capabilities to Alex are viable, i'd be very interested in trying to help add them. (rather than having to reinvent a lot of the tooling that alex provides)

would this be a feature you'd be open to having added? @simonmar ?

cartazio · 2015-06-29T04:14:42Z

even better would be that alex already tacitly supports this and i'm simply not understanding it yet :)

simonmar · 2015-06-29T09:22:48Z

I'd happily accept a patch, provided it doesn't compromise the speed of non-streaming lexers.

dcoutts · 2016-09-15T14:03:11Z

@cartazio In many cases this can already be made to work, though it requires knowing something about the maximum token length. For example we have implemented a streaming JSON lexer using alex. This relies on the fact that there's largest possible token length (around 6 bytes iirc for JSON) so that we can tell when we get to the end of a chunk if the lexer returning an error is due to running out of input or a real failure. If it fails within 6 bytes of the end then we need to supply more input and try again, but if there's more input available than that then it's a real lex error.

simonmar · 2016-09-16T13:11:35Z

Interesting. I have many questions :) Where is your Alex lexer for JSON? Do you have a parser too? Is it faster than aeson?

cartazio · 2016-09-16T14:15:44Z

I have a properly streaming one I wrote at work a year ago that has way
better memory behavior and incremental ingestion.

On Friday, September 16, 2016, Simon Marlow [email protected]
wrote:

Interesting. I have many questions :) Where is your Alex lexer for JSON?
Do you have a parser too? Is it faster than aeson?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#67 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAQwkL3FPgNNe5vx_iF9CC0VB9tH8a2ks5qqpWHgaJpZM4FN1Sm
.

simonmar · 2016-09-16T14:39:39Z

I am very happy for you.

cartazio · 2016-09-16T15:01:05Z

I can see about cleaning it up and getting thst into hackage if you want :)

On Friday, September 16, 2016, Simon Marlow [email protected]
wrote:

I am very happy for you.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#67 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAQwqcqLKOKjgk5ws8lTYNjJU4K19LWks5qqqorgaJpZM4FN1Sm
.

iteratee · 2022-12-20T23:14:04Z

I got something working that is pull-based, and I'd be happy to try and get it cleaned up and merged.

You supply some monadic action that can be used to get additional data, and a maxmimum token length.

The lexer treats an empty result from the action as EOF. If there is a lex error it checks for additional data and rescans if the data is less than the user-supplied maximum token length. It also attempts to get more data at EOF.

There is probably room for improvement to differentiate errors that are occurring because of EOF and other errors, but this is a rough first cut.

It is currently only working for bytestrings, with code borrowed from the monad template. It could accomadate userstate fairly readily, but I didn't need that, so it's not written.

cartazio · 2022-12-21T03:40:20Z

Ooo, this sounds amazing !

cartazio · 2022-12-21T03:41:20Z

https://github.com/cartazio/streaming-machine-json This repo has the parser I mentioned

andreasabel · 2023-04-14T19:35:51Z

@iteratee If this is fully backwards-compatible and does not affect performance of what we have now, a PR would be welcome!

@simonmar wrote:

I'd happily accept a patch, provided it doesn't compromise the speed of non-streaming lexers.

andreasabel added the feature request label Apr 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: support incremental/streaming lexing #67

feature request: support incremental/streaming lexing #67

cartazio commented Jun 29, 2015

cartazio commented Jun 29, 2015

simonmar commented Jun 29, 2015

dcoutts commented Sep 15, 2016 •

edited

Loading

simonmar commented Sep 16, 2016

cartazio commented Sep 16, 2016

simonmar commented Sep 16, 2016

cartazio commented Sep 16, 2016

iteratee commented Dec 20, 2022

cartazio commented Dec 21, 2022

cartazio commented Dec 21, 2022

andreasabel commented Apr 14, 2023

feature request: support incremental/streaming lexing #67

feature request: support incremental/streaming lexing #67

Comments

cartazio commented Jun 29, 2015

cartazio commented Jun 29, 2015

simonmar commented Jun 29, 2015

dcoutts commented Sep 15, 2016 • edited Loading

simonmar commented Sep 16, 2016

cartazio commented Sep 16, 2016

simonmar commented Sep 16, 2016

cartazio commented Sep 16, 2016

iteratee commented Dec 20, 2022

cartazio commented Dec 21, 2022

cartazio commented Dec 21, 2022

andreasabel commented Apr 14, 2023

dcoutts commented Sep 15, 2016 •

edited

Loading