-
-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider writing scanners by hand #167
Comments
If you decide to rewrite them by hand, perhaps my code could be useful: https://github.com/Knagis/CommonMark.NET/blob/master/CommonMark/Parser/Scanner.cs I did rewrite them by hand, it wasn't the hardest part when porting the parser, as for the maintenance, I haven't had any problems with the few changes that were required so far. |
Would But generally hand-coded scanners are IMO preferrable, for size and configurability reasons (ie, such scanning is easier to influence by command-line options to the |
I say go for it. Doesn't need to be all or nothing. Could just do a few functions at a time. May be some possible speed improvements as well. |
What is your exact concern with shipping this scanners.c file ? My only concern about this would be version control, but I'm pretty sure there are ways around it. |
+++ Mathieu Duponchelle [Dec 05 16 04:41 ]:
What is your exact concern with shipping this scanners.c file ? My only
concern about this would be version control, but I'm pretty sure there
are ways around it.
It's not a big problem. But it's a very big source file.
Some people object to that, it seems (see the md4c
announcement).
(There's also the problem that it's a generated file in
version control, but that hasn't been a big problem so far.)
|
@jgm, unless there's an actual technical reason to do so, I really don't see why we should do that to be honest. If it took multiple gigs of memory to compile this file for example, that could be a compelling reason, but I don't see the point in "compactness" for the sake of it, I'd much rather have easy-to-read sources to be honest. |
While I find the ~30000 lines (~400 KB) sized Re-generating this file (and comitting it) has never bothered me. So all in all, I don't loose sleep over it, at least not for the time being ;-) |
There's also issue #121 but this is caused by a GCC bug. The biggest problem I see is that re2c generates needlessly repetitive code for quantifiers like |
I regenerated scanners.c using re2c 0.16, which includes
some dfa minimization. That cut off 65K+ from the generated
source (though it's still pretty big).
|
Currently we generate a number of scanners from regexes using re2c.
This has two advantages:
Disadvantage: Either we require re2c as a build dependency, or we have to ship a gigantic scanners.c file.
Should we simply hand-write the scanners and dispense with re2c and scanners.re?
The text was updated successfully, but these errors were encountered: