Skip to content

Links Desugaring Pipeline

Jan Stolarek edited this page Feb 15, 2019 · 10 revisions

Links compilation pipeline

Desugaring pipeline defined in frontend.ml. There are separate pipelines for compilation and REPL.

Compilation pipeline is invoked from Loader.read_file_source and from bin/links.ml via bin/driver.ml. At a high level the invocation starts with

let sugar, pos_context = ModuleUtils.try_parse_file filename`

try_parse_file calls Parse.read (via Parse.parse_file), which returns a tuple of type 'result * SourceCode.source_code. Parse.read directly invokes the Parser.file, which is an entry point for the parsing of source files. 'result of parsing a file is of type (Sugartypes.binding list * Sugartypes.phrase option), which is a list of declarations in a file paired with optional expression. SourceCode.source_code is an object for handling source file (extracting line ranges, etc.).

Output of a compilation pipeline is of type ((Sugartypes.program * Types.datatype * Types.typing_environment) * string list). Importantly, Sugartypes.program is the desugared program and string list is a list of FFI files.

Program desugaring pipeline - early steps

Program compilation pipeline is defined as a function that takes tyenv, pos_context and program. pos_context is the same SourceCode.source_code object that was returned by parsing a file; program is a tuple of declarations and an optional phrase.

  1. Resolve positions by replacing (start, finish, None) with (start, finish, Some pos_context):

    (ResolvePositions.resolve_positions pos_context)#program program

  2. Desugar alien blocks:

    DesugarAlienBlocks.transform_alien_blocks program

  3. Desugar modules (optional, only if modules enabled). Returns a program with desugared modules paired with a list of filenames for FFI bindings (filenames stored in Foreign nodes). Note: DesugarModules.flatten_simple and DesugarAlienBlocks.flatten_simple are identical, but flatten_bindings are not.

  4. Check correctness of XML quasi quotes:

    CheckXmlQuasiquotes.checker#program program

    Raises an error if something is wrong. If everything correct, the result is discarded.

  5. Check correctness of settings for session exceptions, handlers, and relational lenses:

    DesugarSessionExceptions.settings_check program ExperimentalExtensions.check#program

Program desugaring pipeline - main phases

Main desugaring pipeline is a chained sequence of calls to desugar modules:

((( (* Early source transformations on the AST *)
     ExperimentalExtensions.check#program
 ->- before_typing_ext session_exceptions DesugarSessionExceptions.wrap_linear_handlers
 ->- DesugarHandlers.desugar_handlers_early#program
 ->- DesugarLAttributes.desugar_lattributes#program
 ->- RefineBindings.refine_bindings#program
 ->- DesugarDatatypes.program tyenv.Types.tycon_env

     (* Typechecking the AST *)
 ->- TypeSugar.Check.program tyenv

     (* Typability preserving late source transformations *)
 ->- after_typing ((DesugarCP.desugar_cp tyenv)#program ->- snd3)
 ->- after_typing ((DesugarInners.desugar_inners tyenv)#program ->- snd3)
 ->- after_typing_ext session_exceptions ((DesugarSessionExceptions.insert_toplevel_handlers tyenv)#program ->- snd3)
 ->- after_typing_ext session_exceptions ((DesugarSessionExceptions.desugar_session_exceptions tyenv)#program ->- snd3)
 ->- after_typing ((DesugarProcesses.desugar_processes tyenv)#program ->- snd3)
 ->- after_typing ((DesugarDbs.desugar_dbs tyenv)#program ->- snd3)
 ->- after_typing ((DesugarFors.desugar_fors tyenv)#program ->- snd3)
 ->- after_typing ((DesugarRegexes.desugar_regexes tyenv)#program ->- snd3)
 ->- after_typing ((DesugarFormlets.desugar_formlets tyenv)#program ->- snd3)
 ->- after_typing ((DesugarPages.desugar_pages tyenv)#program ->- snd3)
 ->- after_typing ((DesugarFuns.desugar_funs tyenv)#program ->- snd3))
  program), ffi_files)

Meaning of combinators:

  • ->- chains calls one by one

  • snd3 returns second component of triple

  • after_typing calls a function on a first component of a triple

  • _ext runs a pass conditionally, if a given extension is enabled.

Importantly, many desugaring transformations derive from TransformSugar.transform, which defines the type of program method as:

program -> ('self_type * program * Types.datatype option)

So a single step in the late desugaring pipeline, e.g.:

after_typing ((DesugarFors.desugar_fors tyenv)#program ->- snd3)

takes a (Sugartypes.program * Types.datatype * Types.typing_environment) triple as an input and calls a desugaring on the first component (after_typing). That call returns a triple, of which we take the second component (snd3).

In a mail to Links-dev on 22/11/2018 I asked about the details of Links compilation pipeline and Daniel provided a detailed response, which I quote below:

There are two distinct notions of desugaring in Links: 1) source-to-source transformation, and 2) translation between two separate languages. To the best of my knowledge, there are three separate phases that performs source-to-source transformations, and two separate phases that translates between internal languages. A bird's eye view of the compilation pipeline might look something like (where the pseudo types in brackets denotes a pseudo signature for the respective phase):

  1. Parsing textual source into an AST [Textual source -> Sugartypes]
  2. Early source transformations on the AST [Sugartypes -> Sugartypes]
  3. Typechecking the AST [Sugartypes * type_env -> Sugartypes * type * type_env]
  4. Typability preserving late source transformations [Sugartypes * type * type_env -> Sugartypes * type * type_env]
  5. Desugaring of the AST into IR [env * Sugartypes -> Ir * name_env] 5.5. (Optional) Typability preserving source transformations (or optimisations) on IR [type_env * Ir -> Ir * type]
  6. Evaluation directly on the IR... [ Value.env * IR -> Value.t * Value.env ] (Value.env maps IR names to abstract machine Value terms)
  7. or desugaring of IR into CODE (the JavaScript intermediate language) [ Value.env * IR -> CODE * value_env ] (value_env maps IR names to JavaScript strings)

The early source-to-source transformation phase identifies and groups recursive functions, expands type aliases, fills in kind information on type variables, etc. Sometimes the phase is also exploited to perform some purely syntactic elaboration (e.g. the phrase handler h(x) { M } is elaborated into fun h(x) { handle(x()) { M } }).

Furthermore the transformations performed in the second phase are untyped, whilst the transformations performed in the fourth phase are (intended) to preserve typability, in particular, such transformations are obliged to return the type of the transformed expression which may involve a transformation on types.

The fifth phase is akin to the desugaring phase from AST to Core that you are familiar with in GHC. Although, in Links this phase performs name resolution too.

It is worth noting that some of the transformations in phase 2 and 4 are unhygienic as they depend on surface names and sometimes the particular source order of functions in prelude (the source order sensitivity might be due to the grouping hack in refineBindings.ml).

And some more information from a follow-up email:

What is the intended behaviour if a fragment of code that is being transformed is incorrect? Is it expected that an error is raised in such a case? Or are such errors ignored and caught later on?

I suppose the intention is that the type checker will catch any semantic error. The ad-hoc nature of the early transformations make it is difficult to debug errors arising from them.

Now I'm wondering how typechecking works without names being resolved first...

I suppose technically names are resolved "on-the-fly", but they are not assigned unique names. The type checker passes an environment that contains an immutable map from surface names (strings) to types (specifically the types defined in types.ml).

Would it improve matters if we had a separate phase 1.5 that resolved names?

It would be more robust to resolve names earlier, and move the few special functions in the prelude into the compiler itself. Then we could use a structured API to query the resolved names of these built-ins. It would also allow us to get rid of the type-patching hack that runs after type checking. This hack patches up the inferred types of the special functions in the prelude, for example the wild effect is dropped in the signature of concatMap such that it can be compiled to the database.

Desugaring passes

Transform alien blocks

Entry point: DesugarAlienBlocks.transform_alien_blocks program

Invariants before: Positions resolved to (start, finish, Some pos_context), where pos_context is a SourceCode.source_code object returned by parsing.

Invariants after: All AlienBlock bindings replaced with Foreign bindings.

Program is a list of bindings and an optional phrase. This pass builds a new list of bindings and keeps the phrase. A list of bindings inside an AlienBlock gets flattened to a list of Foreign binding, so:

alien javascript "test.js" {
  setTitle : (String) ~> ()
  alertBox : (String) ~> ()
}

gets replaced with:

alien javascript "test.js" setTitle : (String) ~> ()
alien javascript "test.js" alertBox : (String) ~> ()

Bindings inside Module and Blocks get flattened in the same way, but they are obviously kept inside their respective modules/blocks and not floated to the top level. flatten_simple is responsible for block flattening. As pointed out earlier the same block flattening mechanism is used in DesugarModules.

Modules (optional, if module system enabled)

Note: Skipping a more detailed description due to ongoing work on the module system, which will very likely change the behaviour of current desugaring.

Entry point: DesugarModules.desugarModules prog_with_deps

Invariants before:

Invariants after: All Module and QualifiedImport bindings are removed. There are definitely some extra invariants here (c.f. resolve function).

Wrap linear handler (optional, if session exceptions enabled)

Entry point: DesugarSessionExceptions.wrap_linear_handlers

Invariants before:

Invariants after: TryInOtherwise phrases wrapped inside a switch statement.

This transformation replaces try L as x in M otherwise N with:

switch (try L as x in Just(x) otherwise Nothing) {
  case Just(x) -> M
  case Nothing -> N
}

There's a detailed comment in the source code describing the rationale behind this transformation, but I don't understand it.

Handlers

Entry point: DesugarHandlers.desugar_handlers_early#program

Invariants before:

Invariants after:

XML "l:" attributes

Entry point: DesugarLAttributes.desugar_lattributes#program

Invariants before: attributes starting with "l:" appear only as "l:action", "l:name" or "l:on*" attributes in XML forms or as "l:href" in link () XML tags.

At the beginning of the source file there is a comment that says "l:href" and "l:action" attributes should be disallowed in client-side XML, but I don't see any code that would enforce that in any way or that would cause errors because of this. Desugared attribute functions receive Server location though.

Invariants after: No Xml phrasenodes contain attributes with names that start with "l:". This is enforced in replace_lattrs` function that runs a check after desugaring and raises an error if it finds any attribute with name starting with "l:".

I don't really understand the details of what the transformation generates, since I'm not familiar with Links web programming model.

Question: Where does the "l:" prefix come from? Is this a special convention introduced in Links?

Note: discarding of source positions is definitely too eager in desugar_form and desugar_lnames. We traverse XML children nodes to modify the attributes. These new attributes should have dummy positions, but the XML children nodes should maintain their existing positions. Reported as #480.

Refine bindings

Note: from a quick glimpse it seems that this is responsible for identifying groups of mutually recursive bindings. Since there is an ongoing work on this (#457) I'm skipping this module for now since it is likely to change.

Entry point: RefineBindings.refine_bindings#program

Invariants before: Funs are absent in the AST. They are not generated by the parser but introduced in this step.

Invariants after:

Datatypes

Note: Daniel mentioned that he's working on changing this module as part of work on the module system. Skipping for now.

Entry point: DesugarDatatypes.program tyenv.Types.tycon_env

Invariants before:

Invariants after:

Typecheck (?)

Entry point: TypeSugar.Check.program tyenv

Invariants before:

Invariants after:

CPs

Entry point: (DesugarCP.desugar_cp tyenv)#program

Invariants before:

Invariants after:

Inners

Entry point: (DesugarInners.desugar_inners tyenv)#program

Invariants before:

Invariants after:

Insert toplevel handlers

Entry point: (DesugarSessionExceptions.insert_toplevel_handlers tyenv)#program

Invariants before:

Invariants after:

Session exceptions

Entry point: (DesugarSessionExceptions.desugar_session_exceptions tyenv)#program

Invariants before:

Invariants after:

Processes

Entry point: (DesugarProcesses.desugar_processes tyenv)#program

Invariants before:

Invariants after:

Databases

Entry point: (DesugarDbs.desugar_dbs tyenv)#program

Invariants before:

Invariants after:

Fors

Entry point: (DesugarFors.desugar_fors tyenv)#program

Invariants before:

Invariants after:

Regular expressions

Entry point: (DesugarRegexes.desugar_regexes tyenv)#program

Invariants before: Regex phrasenodes appear only as the right argument of RegexMatch binop. And the other way around: if anything else than a Regex appear as a rhs of RegexMatch an exception is raised.

Invariants after: No Regex phrasenodes and no RegexMatch binops are present in the tree, which implies that expressions of regex AST are erased completely. All Regexes are converted to phrases. Only variables appear in regexes. All spliced expressions are replaced with let-bindings placed in a block with the regex they are used in.

Formlets

Entry point: (DesugarFormlets.desugar_formlets tyenv)#program

Invariants before: Formlet body contains only: FormBinding, Xml, TextNode or Block phrases.

Invariants after: No Formlet phrasenodes in AST. TextNode and FormBinding that were located inside Formlet body are eliminated as well.

Pages

Entry point: (DesugarPages.desugar_pages tyenv)#program

Invariants before: Page contains only: FormletPlacement, PagePlacement, Xml, TextNode or Block phrases.

Invariants after: No Page phrasenodes in AST. FormletPlacement and PagePlacement placed directly under Page are also eliminated. I wonder whether the intention to eliminate FormletPlacement and PagePlacement phrasenodes completely. Currently this is not enforced, as hinted by a TODO note. Firstly, such nodes might appear outside of Page phrasenode since primary_expression production in the parser admits XML expressions. Secondly, they might appear somewhere in a Block expression. Note that Block expressions can only appear under a Page expressions if they have been introduced by desugaring - they cannot be created by the parser.

Functions

Entry point: (DesugarFuns.desugar_funs tyenv)#program

Invariants before:

Invariants after: Section (Project e) are eliminated by turning them into function bindings enclosed inside a block:

  (.l)
-->
  { fun f(x:r) {x.l}; f }

Note that this is different from what the source code comment in DesugarFuns says. FunLit (anonymous functions) are named, uncurried, and eliminated in the same way (again, comment says something else):

  fun [qs](xs1)...(xsk) {e}
-->
  {fun f[qs](xs1) {
     fun f1[](xs2) {
     ...
       fun fk[](xsk) {
         e
       }; fk
     ...
     }; f1
  }; f}
  ```

  Function bindings (`Fun`, `Funs`) are uncurried in the same way.