-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CS2 Discussion: Project: Revisiting implementation of a new parser #4970
Comments
From @rattrayalex on 2017-03-20 19:13
Is it? My impression is that backwards-incompatibility is the biggest limiting factor, coupled with a lack of general appetite for new language features. Perhaps I haven't been paying close enough attention. I've personally been super-impressed with what the CS2 team has accomplished, and I'm not personally aware of issues that are blocked on "the parser is too hard to work with". Perhaps it would help to list a few? |
From @bd82 on 2017-03-20 19:41 Thanks for the quick feedback @rattrayalex. I'm not talking about CS2. but the longer term: CS3+.
My suggestion tries to find a gradual process and architecture to attack these longer References: (more in coffeescript6/discuss#25)
|
From @GeoffreyBooth on 2017-04-04 06:34 @bd82 Thanks for suggesting this, and offering your time. I for one would be interested, though I shudder to think of how ambitious this effort could be. But if you want to take it on, be my guest. When I was working on adding support for modules, @lydell and I discussed someday mapping the CoffeeScript grammar nodes to Babel’s AST, and we named the nodes (like The other benefit is that adopting new ECMAScript features would happen much faster and with less effort. If Babel already knew how to generate the JavaScript for a certain feature, all we would need to do is decide what CoffeeScript’s syntax for it should be and figure out how to parse it into an AST. For many features, that’s the easy part. |
From @bd82 on 2017-04-04 07:45
I'll have some time over the passover holidays to play around with some POCs. On Modules and Babel.Just to be clear, I'm not trying to change the compiler backend (code generation). I'm limiting my scope to modifying parts of compiler frontend (parser). |
From @GeoffreyBooth on 2017-04-04 17:52 Sure. The parser could certainly be more organized. As long as all the tests continue to pass once you're done 😄 |
From @auvipy on 2017-04-14 11:27 how about incorporating changes from https://github.com/michaelficarra/CoffeeScriptRedux ? |
From @lydell on 2017-04-14 11:46 @auvipy That has already been discussed in other issues. |
From @bd82 on 2017-08-03 07:40 Oops forgot to update this issue 😢 I've played around with this a few months ago with implementing this and reached the conclusion that CoffeeScript is better off with a bottom up (LR) parser instead of a top down parser (LL). I Initially thought that a conversion would only require handling different patterns for handling However as I went deeper and deeper into the grammar I realized that many rules actually // statement
return 1
// expression, 2 tokens lookahead to distinguish
return 1 if (true)
// expression, infinite tokens lookahead to distinguish.
return 1 + 1 + 1 + ... + 1 if (true) This works with an LR parser because it can "delay" the decision and reduce the possible This does not mean that the grammar cannot be converted to a top down (LL) form (was already done in other coffeeScript related projects...) So because it appears the kind of tool I wanted to use is not the best tool for the job |
From @bd82 on 2017-03-20 18:46
Hello. I'm a long time lurker here and would like to (re?)-raise a proposal.
Sorry for the long post 😄
The What:
I'd like to re-raise the issue of implementing a new parser for CoffeeScript.
Some previous related discussions:
Scope:
The scope is intentionally limited to only creating a new parser.
No intent to touch the lexer & re-writer nor to modify the code generation parts.
The why:
As previously discussed the existing CS compiler infrastructure is a limiting
factor in the long term for CoffeeScript.
Replacing the whole pipeline at once requires more resources than available to this project.
And even if those resources were available it is still a very risky approach.
Therefore an incremental approach is needed.
Architecture:
I propose to create a separation between the syntactic analysis and the AST creation.
This means that logic that creates the AST must not be embedded inside the parser.
Instead the parser should create a more low level structure, a Parse Tree / Concrete Syntax Tree.
which could be transformed afterwards to serve different needs, for example:
language services tool such a formatting & refactoring.
This proposed separation of concerns will help to future proof the CoffeeScript compiler
by enabling future incremental changes such as replacing the compiler backend without
modifying (or diverging from) the compiler frontend (parser).
The How:
Warning Sales pitch incoming
Normally the standard approach to writing a parser for a compiler is to write one "by hand".
The problem with this approach is that it can be a bit repetitive and error prone work.
And that implementing more advanced capabilities such as fault tolerance capabilities can be complex.
fortunately the last time I needed to write an hand built parser I was too lazy 😸 and instead
created a library that makes it easier to hand build parsers in JavaScript: Chevrotain
without any code generation.
Relevant Highlights:
The proposal is to write the new CoffeeScript parser in CoffeeScript (no code generation).
Using the Chevrotain Parsing library.
The who:
I can contribute enough time to try implementing this.
I obviously can't make any promises, but this won't be the first parser I've written so I've got a decent
chance of success.
Risks & Issues:
Factoring away left recursion (for LL(k) parser) may result in uglier parse trees.
Do the CoffeeScript's Token contain full position information?
lexer -> re-writer -> parser flow, but that is a less incremental approach.
My CoffeeScript skills are lacking, may require assistance in getting the code to decent quality.
Error messages contents and structure for invalid inputs will change.
Testing that the AST output is the same requires a large amount of valid CS source code.
Additional abstraction and separation will have an overhead performance wise.
Questions:
Any feedback / suggestions?
Am I missing some blocker or potential show stopper here?
Is this approach acceptable/approved by the project leaders?
If a POC succeeds will there be assistance in integrating this into the CoffeeScript code base?
What percentage of the CoffeeScript running time is spent parsing?
The text was updated successfully, but these errors were encountered: