Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design of the AST builder #747

Open
larshum opened this issue May 30, 2023 · 2 comments
Open

Design of the AST builder #747

larshum opened this issue May 30, 2023 · 2 comments

Comments

@larshum
Copy link
Contributor

larshum commented May 30, 2023

I don't like how the AST builder works right now, so I would like to bring up a couple of issues I have with it. I'm bringing them up in this issue instead of directly fixing them as it would take quite some effort to do this (the AST builder is used almost everywhere in the compiler), so I would like to make sure this is something we want first. My first point could be a bit controversial.

My main complaints with the current AST builder are that:

  1. By default, the functions do not take an info field argument, but they set it to NoInfo (). This approach is acceptable when generating tests, but I don't think it is OK when generating code in a compiler pass. Even when the generated code has no direct connection to the input AST, I think it should have an info field stating where it originates (using the filename field). Otherwise, the code we generate will be very difficult to debug. Therefore, the default functions, such as var_, should take an info field as the first argument.
  2. The functions are implemented as let-bindings, meaning they are verbose (use MExprAst in ...) and risk being shadowed by later definitions. I think they should be implemented as sems in language fragments.
@elegios
Copy link
Contributor

elegios commented May 31, 2023

I agree fairly strongly with the first point, and I've been lightly considering having some term that expands to an Info literal containing the location of the term itself, so you could do things like lam_ HERE ident ty body, where HERE would expand to something like Info{filename="example.mc", row1=1, col1=6, row2=1, col2=9}. There are preprocessor macros for this in C, but we could just have it as a normal term. A potential side-effect is that it could leak the path to stdlib for the person compiling the compiler.

(What the syntax would be is of course up for discussion, HERE is just a placeholder to demonstrate the idea)

@elegios
Copy link
Contributor

elegios commented Jun 1, 2023

Summary of thoughts during the meeting today:

  • Having NoInfo on a term causes bad debuggability, and there should always be something we can put there, even if it's just the Info of the code that generated the term.
  • We need easy ways to generate terms. Explicitly writing the constructors is quite verbose. Long term we'd want quoting and splicing, but we're not there yet. Right now we have two options:
    • Creating a string and parsing it using boot-parser. This doesn't handle splicing, and doesn't have an easy way to attach symbols to contained identifiers.
    • Using ast-builder.mc. These functions use NoInfo by default, and you can use withInfo to attach Info. Unfortunately that makes it significantly more unwieldy to "do the right thing" and attach Info to all terms.

Approaches to improve the status quo, at varying levels of difficulty and breakage:

  • Keep ast-builder.mc as is, add extra versions of all functions that take Info values. ilam_, iulam_, inulam_, etc. The non i-functions would essentially be deprecated.
  • Change all functions in ast-builder.mc to always take an Info value first. This is breaking, but fixing things might be automatable:
    1. Add something like HERE in my previous comment that is parsed by boot, then immediately replaced by Info{...}.
    2. Rewrite all uses of such functions to add HERE as an argument, i.e., ulam_ becomes ulam_ HERE. Most likely this would be done in two steps (using ulam_ as a placeholder for all ast-builder.mc functions):
      1. Use Comby to replace occurrences of f ulam_ with f (ulam_).
      2. Then rewrite all ulam_ to ulam_ HERE.
  • Once we have a parser for MExpr in MCore (which should "just" be a matter of connecting tool.mc and lrk.mc, plus transformations to the normal ast.mc, which is purely mechanical) we can parse strings with some added syntax for splicing terms.

There are more combinations of the above ideas possible, and the syntax HERE is just a placeholder. Other syntax suggestions:

  • @ as a term.
  • @ as a syntactic sugar operator, where @foo x would mean (foo (Info {...})) x. This would make automatic updates slightly easier since we wouldn't have to care about precedence of function application, and it would mean that the Info value could point at the function as well. On the other hand it's somewhat niche, and means that if you just want the Info value you have to say @identity. Postfix is of course also an option, i.e., foo@ x would mean (foo (Info {...})) x.
  • #here as a term, or some other word, e.g., #info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants