Skip to content

Links Desugaring Pipeline

Jan Stolarek edited this page Feb 18, 2019 · 10 revisions

Links compilation pipeline

Desugaring pipeline defined in frontend.ml. There are separate pipelines for compilation and REPL.

Compilation pipeline is invoked from Loader.read_file_source and from bin/links.ml via bin/driver.ml. At a high level the invocation starts with

let sugar, pos_context = ModuleUtils.try_parse_file filename`

try_parse_file calls Parse.read (via Parse.parse_file), which returns a tuple of type 'result * SourceCode.source_code. Parse.read directly invokes the Parser.file, which is an entry point for the parsing of source files. 'result of parsing a file is of type (Sugartypes.binding list * Sugartypes.phrase option), which is a list of declarations in a file paired with optional expression. SourceCode.source_code is an object for handling source file (extracting line ranges, etc.).

Output of a compilation pipeline is of type ((Sugartypes.program * Types.datatype * Types.typing_environment) * string list). Importantly, Sugartypes.program is the desugared program and string list is a list of FFI files.

Program desugaring pipeline - early steps

Program compilation pipeline is defined as a function that takes tyenv, pos_context and program. pos_context is the same SourceCode.source_code object that was returned by parsing a file; program is a tuple of declarations and an optional phrase.

  1. Resolve positions by replacing (start, finish, None) with (start, finish, Some pos_context):

    (ResolvePositions.resolve_positions pos_context)#program program

  2. Desugar alien blocks:

    DesugarAlienBlocks.transform_alien_blocks program

  3. Desugar modules (optional, only if modules enabled). Returns a program with desugared modules paired with a list of filenames for FFI bindings (filenames stored in Foreign nodes). Note: DesugarModules.flatten_simple and DesugarAlienBlocks.flatten_simple are identical, but flatten_bindings are not.

  4. Check correctness of XML quasi quotes:

    CheckXmlQuasiquotes.checker#program program

    Raises an error if something is wrong. If everything correct, the result is discarded.

  5. Check correctness of settings for session exceptions, handlers, and relational lenses:

    DesugarSessionExceptions.settings_check program ExperimentalExtensions.check#program

Program desugaring pipeline - main phases

Main desugaring pipeline is a chained sequence of calls to desugar modules:

((( (* Early source transformations on the AST *)
     ExperimentalExtensions.check#program
 ->- before_typing_ext session_exceptions DesugarSessionExceptions.wrap_linear_handlers
 ->- DesugarHandlers.desugar_handlers_early#program
 ->- DesugarLAttributes.desugar_lattributes#program
 ->- RefineBindings.refine_bindings#program
 ->- DesugarDatatypes.program tyenv.Types.tycon_env

     (* Typechecking the AST *)
 ->- TypeSugar.Check.program tyenv

     (* Typability preserving late source transformations *)
 ->- after_typing ((DesugarCP.desugar_cp tyenv)#program ->- snd3)
 ->- after_typing ((DesugarInners.desugar_inners tyenv)#program ->- snd3)
 ->- after_typing_ext session_exceptions ((DesugarSessionExceptions.insert_toplevel_handlers tyenv)#program ->- snd3)
 ->- after_typing_ext session_exceptions ((DesugarSessionExceptions.desugar_session_exceptions tyenv)#program ->- snd3)
 ->- after_typing ((DesugarProcesses.desugar_processes tyenv)#program ->- snd3)
 ->- after_typing ((DesugarDbs.desugar_dbs tyenv)#program ->- snd3)
 ->- after_typing ((DesugarFors.desugar_fors tyenv)#program ->- snd3)
 ->- after_typing ((DesugarRegexes.desugar_regexes tyenv)#program ->- snd3)
 ->- after_typing ((DesugarFormlets.desugar_formlets tyenv)#program ->- snd3)
 ->- after_typing ((DesugarPages.desugar_pages tyenv)#program ->- snd3)
 ->- after_typing ((DesugarFuns.desugar_funs tyenv)#program ->- snd3))
  program), ffi_files)

Meaning of combinators:

  • ->- chains calls one by one

  • snd3 returns second component of triple

  • after_typing calls a function on a first component of a triple

  • _ext runs a pass conditionally, if a given extension is enabled.

Importantly, many desugaring transformations derive from TransformSugar.transform, which defines the type of program method as:

program -> ('self_type * program * Types.datatype option)

So a single step in the late desugaring pipeline, e.g.:

after_typing ((DesugarFors.desugar_fors tyenv)#program ->- snd3)

takes a (Sugartypes.program * Types.datatype * Types.typing_environment) triple as an input and calls a desugaring on the first component (after_typing). That call returns a triple, of which we take the second component (snd3).

In a mail to Links-dev on 22/11/2018 I asked about the details of Links compilation pipeline and Daniel provided a detailed response, which I quote below:

There are two distinct notions of desugaring in Links: 1) source-to-source transformation, and 2) translation between two separate languages. To the best of my knowledge, there are three separate phases that performs source-to-source transformations, and two separate phases that translates between internal languages. A bird's eye view of the compilation pipeline might look something like (where the pseudo types in brackets denotes a pseudo signature for the respective phase):

  1. Parsing textual source into an AST [Textual source -> Sugartypes]
  2. Early source transformations on the AST [Sugartypes -> Sugartypes]
  3. Typechecking the AST [Sugartypes * type_env -> Sugartypes * type * type_env]
  4. Typability preserving late source transformations [Sugartypes * type * type_env -> Sugartypes * type * type_env]
  5. Desugaring of the AST into IR [env * Sugartypes -> Ir * name_env] 5.5. (Optional) Typability preserving source transformations (or optimisations) on IR [type_env * Ir -> Ir * type]
  6. Evaluation directly on the IR... [ Value.env * IR -> Value.t * Value.env ] (Value.env maps IR names to abstract machine Value terms)
  7. or desugaring of IR into CODE (the JavaScript intermediate language) [ Value.env * IR -> CODE * value_env ] (value_env maps IR names to JavaScript strings)

The early source-to-source transformation phase identifies and groups recursive functions, expands type aliases, fills in kind information on type variables, etc. Sometimes the phase is also exploited to perform some purely syntactic elaboration (e.g. the phrase handler h(x) { M } is elaborated into fun h(x) { handle(x()) { M } }).

Furthermore the transformations performed in the second phase are untyped, whilst the transformations performed in the fourth phase are (intended) to preserve typability, in particular, such transformations are obliged to return the type of the transformed expression which may involve a transformation on types.

The fifth phase is akin to the desugaring phase from AST to Core that you are familiar with in GHC. Although, in Links this phase performs name resolution too.

It is worth noting that some of the transformations in phase 2 and 4 are unhygienic as they depend on surface names and sometimes the particular source order of functions in prelude (the source order sensitivity might be due to the grouping hack in refineBindings.ml).

And some more information from a follow-up email:

What is the intended behaviour if a fragment of code that is being transformed is incorrect? Is it expected that an error is raised in such a case? Or are such errors ignored and caught later on?

I suppose the intention is that the type checker will catch any semantic error. The ad-hoc nature of the early transformations make it is difficult to debug errors arising from them.

Now I'm wondering how typechecking works without names being resolved first...

I suppose technically names are resolved "on-the-fly", but they are not assigned unique names. The type checker passes an environment that contains an immutable map from surface names (strings) to types (specifically the types defined in types.ml).

Would it improve matters if we had a separate phase 1.5 that resolved names?

It would be more robust to resolve names earlier, and move the few special functions in the prelude into the compiler itself. Then we could use a structured API to query the resolved names of these built-ins. It would also allow us to get rid of the type-patching hack that runs after type checking. This hack patches up the inferred types of the special functions in the prelude, for example the wild effect is dropped in the signature of concatMap such that it can be compiled to the database.

Desugaring passes

Transform alien blocks

Entry point: DesugarAlienBlocks.transform_alien_blocks program

Invariants before: Positions resolved to (start, finish, Some pos_context), where pos_context is a SourceCode.source_code object returned by parsing.

Invariants after: All AlienBlock bindings replaced with Foreign bindings.

Program is a list of bindings and an optional phrase. This pass builds a new list of bindings and keeps the phrase. A list of bindings inside an AlienBlock gets flattened to a list of Foreign binding, so:

alien javascript "test.js" {
  setTitle : (String) ~> ()
  alertBox : (String) ~> ()
}

gets replaced with:

alien javascript "test.js" setTitle : (String) ~> ()
alien javascript "test.js" alertBox : (String) ~> ()

Bindings inside Module and Blocks get flattened in the same way, but they are obviously kept inside their respective modules/blocks and not floated to the top level. flatten_simple is responsible for block flattening. As pointed out earlier the same block flattening mechanism is used in DesugarModules.

Modules (optional, if module system enabled)

Note: Skipping a more detailed description due to ongoing work on the module system, which will very likely change the behaviour of current desugaring.

Entry point: DesugarModules.desugarModules prog_with_deps

Invariants before:

Invariants after: All Module and QualifiedImport bindings are removed. There are definitely some extra invariants here (c.f. resolve function).

Wrap linear handler (optional, if session exceptions enabled)

Entry point: DesugarSessionExceptions.wrap_linear_handlers

Invariants before:

Invariants after: TryInOtherwise phrases wrapped inside a switch statement.

This transformation replaces try L as x in M otherwise N with:

switch (try L as x in Just(x) otherwise Nothing) {
  case Just(x) -> M
  case Nothing -> N
}

There's a detailed comment in the source code describing the rationale behind this transformation, but I don't understand it.

Handlers

Entry point: DesugarHandlers.desugar_handlers_early#program

Invariants before:

Invariants after:

XML "l:" attributes

Entry point: DesugarLAttributes.desugar_lattributes#program

Invariants before: attributes starting with "l:" appear only as "l:action", "l:name" or "l:on*" attributes in XML forms or as "l:href" in link () XML tags.

At the beginning of the source file there is a comment that says "l:href" and "l:action" attributes should be disallowed in client-side XML, but I don't see any code that would enforce that in any way or that would cause errors because of this. Desugared attribute functions receive Server location though.

Invariants after: No Xml phrasenodes contain attributes with names that start with "l:". This is enforced in replace_lattrs` function that runs a check after desugaring and raises an error if it finds any attribute with name starting with "l:".

I don't really understand the details of what the transformation generates, since I'm not familiar with Links web programming model.

Question: Where does the "l:" prefix come from? Is this a special convention introduced in Links?

Note: discarding of source positions is definitely too eager in desugar_form and desugar_lnames. We traverse XML children nodes to modify the attributes. These new attributes should have dummy positions, but the XML children nodes should maintain their existing positions. Reported as #480.

Refine bindings

Note: from a quick glimpse it seems that this is responsible for identifying groups of mutually recursive bindings. Since there is an ongoing work on this (#457) I'm skipping this module for now since it is likely to change.

Entry point: RefineBindings.refine_bindings#program

Invariants before: Funs are absent in the AST. They are not generated by the parser but introduced in this step.

Invariants after:

Datatypes

Note: Daniel mentioned that he's working on changing this module as part of work on the module system. Skipping for now.

Entry point: DesugarDatatypes.program tyenv.Types.tycon_env

Invariants before:

Invariants after:

Typecheck

Entry point: TypeSugar.Check.program tyenv

Invariants before: Links Prelude is loaded and typechecked, with the results stored in tyenv.

Invariants after: (Presumed) Program should be semantically correct.

Typechecking algorithm takes type environment (contains Prelude), bindings and body. Bindings are typechecked and results are contained in a new environment tyenv'. Body is typechecked in an extended environment (tyenv + tyenv'). Result of typechecking is (bindings, Some body), typ, tyenv'. Observation: we only return tyenv', i.e. environment generated by typechecking bindings. This does NOT contain original environment passed to the typechecking function. In other words, typing environments don't accumulate.

CPs

Entry point: (DesugarCP.desugar_cp tyenv)#program

Invariants before:

Invariants after: CP nodes are absent from the AST, mostly replaced with a block containing a binding (or a group of bindings) and an expression.

Inners

Entry point: (DesugarInners.desugar_inners tyenv)#program

Invariants before: binders in Funs declarations are typed (second binder component Types.datatype option is Some); (outer, extras) field exists (see below).

Invariants after: extras in Funs declarations are None (see below). Also, extras are unbound in the resulting environment because "any existing functions with the same name will now be shadowed by the functions defined in this binding - and hence will not need any extra type variables adding".

Comment in the source code says:

Recursive functions must be used monomorphically inside their function bodies (unless annotated with a polymorphic type), but they may still be used polymorphically outside (after having been generalised). This pass unifies the inner and outer types of functions, which is necessary for converting to the IR, which stores a single type for each recursive function.

Traverses TAppl, named InfixAppl and UnaryAppl, Spawn, Escape and Block phrasenodes but doesn't modify them, extends only the typing environment. Also traverses groups of function bindings (Funs). Fun bindings are ignored - presumably because we are only interested in recursive bindings (does this mean Fun binding cannot be recursive?).

Note: I don't really understand the meaning of arguments to Funs constructor, in particular the (Types.datatype * Types.quantifier option list) option bit, which is referred to as Some (outer, extras). What are outer and extras?

Insert toplevel handlers

Entry point: (DesugarSessionExceptions.insert_toplevel_handlers tyenv)#program

Invariants before:

Invariants after: bodies of Spawn pharsenodes (except for Wait) are enclosed inside a TryOrOtherwise. TODO: write down the exact nature of the transformation.

Session exceptions

Entry point: (DesugarSessionExceptions.desugar_session_exceptions tyenv)#program

Invariants before:

Invariants after: Raise and TryInOtherwise phrasenodes are removed from AST. Raise is replaced with DoOperation, TryInOtherwise with a constructed deep Handle.

Processes

Entry point: (DesugarProcesses.desugar_processes tyenv)#program

Invariants before:

Invariants after: Spawn phrasenodes are replaced with calls to builtin functions spawnWait/spawnAt/spawnAngelAt. Receive phrasenodes are replaced with Switch phrasenodes.

Databases

Entry point: (DesugarDbs.desugar_dbs tyenv)#program

Invariants before:

Invariants after: DBInsert phrasenodes are absent from the AST (replaced with calls to built-in functions).

This module looks like a leftover from some earlier design. According to the comment desugaring of delete statements has been moved to the IR and the same should happen for insert statements, which would make this module obsolete.

NOTE: labels field is ignored.

Fors

Entry point: (DesugarFors.desugar_fors tyenv)#program

Invariants before:

Invariants after: Iteration phrasenode and all iterpatt (comprehension generators) are absent from the AST.

Desugars for comprehensions into primitives (for -> concatMap, order by -> sortBy, where -> if then else).

Regular expressions

Entry point: (DesugarRegexes.desugar_regexes tyenv)#program

Invariants before: Regex phrasenodes appear only as the right argument of RegexMatch binop. And the other way around: if anything else than a Regex appear as a rhs of RegexMatch an exception is raised.

Invariants after: No Regex phrasenodes and no RegexMatch binops are present in the tree, which implies that expressions of regex AST are erased completely. All Regexes are converted to phrases. Only variables appear in regexes. All spliced expressions are replaced with let-bindings placed in a block with the regex they are used in.

Formlets

Entry point: (DesugarFormlets.desugar_formlets tyenv)#program

Invariants before: Formlet body contains only: FormBinding, Xml, TextNode or Block phrases.

Invariants after: No Formlet phrasenodes in AST. TextNode and FormBinding that were located inside Formlet body are eliminated as well.

Pages

Entry point: (DesugarPages.desugar_pages tyenv)#program

Invariants before: Page contains only: FormletPlacement, PagePlacement, Xml, TextNode or Block phrases.

Invariants after: No Page phrasenodes in AST. FormletPlacement and PagePlacement placed directly under Page are also eliminated. I wonder whether the intention to eliminate FormletPlacement and PagePlacement phrasenodes completely. Currently this is not enforced, as hinted by a TODO note. Firstly, such nodes might appear outside of Page phrasenode since primary_expression production in the parser admits XML expressions. Secondly, they might appear somewhere in a Block expression. Note that Block expressions can only appear under a Page expressions if they have been introduced by desugaring - they cannot be created by the parser.

Functions

Entry point: (DesugarFuns.desugar_funs tyenv)#program

Invariants before:

Invariants after: Section (Project e) are eliminated by turning them into function bindings enclosed inside a block:

  (.l)
-->
  { fun f(x:r) {x.l}; f }

Note that this is different from what the source code comment in DesugarFuns says. FunLit (anonymous functions) are named, uncurried, and eliminated in the same way (again, comment says something else):

  fun [qs](xs1)...(xsk) {e}
-->
  {fun f[qs](xs1) {
     fun f1[](xs2) {
     ...
       fun fk[](xsk) {
         e
       }; fk
     ...
     }; f1
  }; f}

Function bindings (Fun, Funs) are uncurried in the same way.