Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Introduce new %error handler mode and catch for resumable parsing #272

Closed
wants to merge 10 commits into from

Conversation

sgraf812
Copy link
Collaborator

@sgraf812 sgraf812 commented Feb 2, 2024

Superseded by #318.

A drafty PoC serving as motivation for a GSoC proposal; I don't want to see this merged or reviewed for now.


Consider this excerpt from an example (errormonad-resume.y):

%name parseStmts Stmts
%tokentype { Token }
%error { \_ -> abort } { reportError } -- the entire point of this test

%monad { ParseM } { (>>=) } { return }

%token
  '1' { TOne }
  '+' { TPlus }
  ';' { TSemi }

%%

Stmts : {- empty -}           { [] }
      | Stmt                  { [$1] }
      | Stmts ';' Stmt        { $1 ++ [$3] }
      | catch ';' Stmt %shift { [$3] } -- Could insert error AST token here in place of $1

Stmt : Exp { ExpStmt $1 }

Exp : '1'                { One }
    | Exp '+' Exp %shift { Plus $1 $3 }

{
recordParseError :: [String] -> ParseM ()
recordParseError expected = recordError [ParseError expected]

reportError :: [Token] -> [String] -> ([Token] -> ParseM a) -> ParseM a
reportError ts expected resume = do
  recordParseError expected
  resume ts

The point of this example is that reporting a parse error (i.e. adding a diagnostic) is independent from aborting a parse (i.e., a fatal error that can't produce a syntax tree). Hence the new %error form takes two code blocks: One for the abort handler, the other for the report handler.

Additionally, the report handler gets a resume continuation that it may call to resume parsing; otherwise it would simply have to abort (TODO: In the current encoding it should perhaps simply be ParseM () and we call the resumption unconditionally).

Where does the parser resume parsing? That is mostly up to the user to specify, through use of the special catch terminal. In the example above, catch occurs before a ; is shifted, so that upon an error during parsing a Stmt, the parser will "unwind" the stack until it finds a situation in which catch can be shifted. After having done that, it will discard input until it finds the next ; so as to resume parsing.

The result is that for inputs such as 1++1;1;+, two errors (and a partial syntax tree) can be reported: One at the second + and the other at the third.

I've already tried to apply this patch to GHC; it seems to work: See https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11990 for a worked example.

@sgraf812
Copy link
Collaborator Author

sgraf812 commented Feb 5, 2024

For context, I proposed bringing this PR into a mergeable state and applying the result to GHC as a GSoC proposal.

@kd1729
Copy link

kd1729 commented Mar 4, 2024

Hi @sgraf812 I went through this issue. https://summer.haskell.org/ideas.html#parse-error-recovery
I am interested in this and want to take it up for GSOC'24. I am going to write a draft proposal for the same and want to get it reviewed by you. Any guidelines or suggestions regarding the same are appreciated.

@sgraf812
Copy link
Collaborator Author

sgraf812 commented Mar 6, 2024

Hi Kaustubh, that's great! Perhaps it's good to have a short chat in private to see whether you would actually enjoy working on this project.

For example,

  1. What is your background in Haskell?
  2. Do you maintain any open source projects in Haskell or related to compilers?
  3. Have you previously used parser generators such as bison or yacc?
  4. Have you attended any previous classes on compiler engineering at your university?
  5. What kind of improvements to happy would you find exciting to have that are not listed in the GSoC proposal, and why?

@alinab
Copy link

alinab commented Mar 16, 2024

@sgraf812 It would be great if I could ask a few questions on the scope of this work. Would setting up a chat work for this?And if so, please just let me know how. Thanks.

@xevor11
Copy link

xevor11 commented Mar 19, 2024

I'm a potential gsoc contributor I am interested in this project! I am currently taking a compiler course at my university along with a functional programming course in Haskell. My thesis project revolves around extending the Cool Compiler (a subset of Scala, utilized by Stanford) with LLVM as it's backend and utilizing ANTLR. I have just recently worked with yacc in building a lexer and scala bison (a version of bison) in building a parser for the same language aforementioned. Last year, I was a GSOC contributor for the GNU organization working on adding support for the Hurd OS to the Rust Compiler. Since I am a beginner in both Haskell and have had some experience working with parser generators and defining grammars I think this project would be a good starting point for me. I'd be happy to have a short private chat @sgraf812 to further discuss my background and projects I have worked on to see if this good be a good fit? This is one project I attempted using the E-Graph Library (that might be interesting):
https://github.com/xevor11/E-Graph-Optimizer
Thanks!
Vedant

@sgraf812
Copy link
Collaborator Author

Hi Alina and Vedant, feel free to reach out to me via mail (Vedant already did so) or via Matrix (@sgraf812:matrix.org).

@xevor11
Copy link

xevor11 commented Mar 19, 2024

Thanks for the update! I wanted to mention that I sent an email with my first rough draft proposal, I was eager to get your feedback if possible on improving the Milestones and Deliverables section, I provided the specifications in the email!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants