-
Notifications
You must be signed in to change notification settings - Fork 43
Links Desugaring Pipeline
Desugaring pipeline defined in frontend.ml
. There are separate pipelines for
compilation and REPL.
Compilation pipeline is invoked from Loader.read_file_source
and from
bin/links.ml
via bin/driver.ml
. At a high level the invocation starts with
let sugar, pos_context = ModuleUtils.try_parse_file filename`
try_parse_file
calls Parse.read
(via Parse.parse_file
), which returns a
tuple of type 'result * SourceCode.source_code
. Parse.read
directly invokes
the Parser.file
, which is an entry point for the parsing of source
files. 'result
of parsing a file is of type (Sugartypes.binding list * Sugartypes.phrase option)
, which is a list of declarations in a file paired
with optional expression. SourceCode.source_code
is an object for handling
source file (extracting line ranges, etc.).
Output of a compilation pipeline is of type ((Sugartypes.program * Types.datatype * Types.typing_environment) * string list)
. Importantly,
Sugartypes.program
is the desugared program and string list
is a list of FFI
files.
Program compilation pipeline is defined as a function that takes tyenv
,
pos_context
and program
. pos_context
is the same SourceCode.source_code
object that was returned by parsing a file; program
is a tuple of declarations
and an optional phrase.
-
Resolve positions by replacing
(start, finish, None)
with(start, finish, Some pos_context)
:(ResolvePositions.resolve_positions pos_context)#program program
-
Desugar alien blocks:
DesugarAlienBlocks.transform_alien_blocks program
-
Desugar modules (optional, only if modules enabled). Returns a program with desugared modules paired with a list of filenames for FFI bindings (filenames stored in
Foreign
nodes). Note:DesugarModules.flatten_simple
andDesugarAlienBlocks.flatten_simple
are identical, butflatten_bindings
are not. -
Check correctness of XML quasi quotes:
CheckXmlQuasiquotes.checker#program program
Raises an error if something is wrong. If everything correct, the result is discarded.
-
Check correctness of settings for session exceptions, handlers, and relational lenses:
DesugarSessionExceptions.settings_check program
ExperimentalExtensions.check#program
Main desugaring pipeline is a chained sequence of calls to desugar modules:
((( (* Early source transformations on the AST *)
ExperimentalExtensions.check#program
->- before_typing_ext session_exceptions DesugarSessionExceptions.wrap_linear_handlers
->- DesugarHandlers.desugar_handlers_early#program
->- DesugarLAttributes.desugar_lattributes#program
->- RefineBindings.refine_bindings#program
->- DesugarDatatypes.program tyenv.Types.tycon_env
(* Typechecking the AST *)
->- TypeSugar.Check.program tyenv
(* Typability preserving late source transformations *)
->- after_typing ((DesugarCP.desugar_cp tyenv)#program ->- snd3)
->- after_typing ((DesugarInners.desugar_inners tyenv)#program ->- snd3)
->- after_typing_ext session_exceptions ((DesugarSessionExceptions.insert_toplevel_handlers tyenv)#program ->- snd3)
->- after_typing_ext session_exceptions ((DesugarSessionExceptions.desugar_session_exceptions tyenv)#program ->- snd3)
->- after_typing ((DesugarProcesses.desugar_processes tyenv)#program ->- snd3)
->- after_typing ((DesugarDbs.desugar_dbs tyenv)#program ->- snd3)
->- after_typing ((DesugarFors.desugar_fors tyenv)#program ->- snd3)
->- after_typing ((DesugarRegexes.desugar_regexes tyenv)#program ->- snd3)
->- after_typing ((DesugarFormlets.desugar_formlets tyenv)#program ->- snd3)
->- after_typing ((DesugarPages.desugar_pages tyenv)#program ->- snd3)
->- after_typing ((DesugarFuns.desugar_funs tyenv)#program ->- snd3))
program), ffi_files)
Meaning of combinators:
-
->-
chains calls one by one -
snd3
returns second component of triple -
after_typing
calls a function on a first component of a triple -
_ext
runs a pass conditionally, if a given extension is enabled.
Importantly, many desugaring transformations derive from
TransformSugar.transform
, which defines the type of program
method as:
program -> ('self_type * program * Types.datatype option)
So a single step in the late desugaring pipeline, e.g.:
after_typing ((DesugarFors.desugar_fors tyenv)#program ->- snd3)
takes a (Sugartypes.program * Types.datatype * Types.typing_environment)
triple as an input and calls a desugaring on the first component
(after_typing
). That call returns a triple, of which we take the second
component (snd3
).
In a mail to Links-dev on 22/11/2018 I asked about the details of Links compilation pipeline and Daniel provided a detailed response, which I quote below:
There are two distinct notions of desugaring in Links: 1) source-to-source transformation, and 2) translation between two separate languages. To the best of my knowledge, there are three separate phases that performs source-to-source transformations, and two separate phases that translates between internal languages. A bird's eye view of the compilation pipeline might look something like (where the pseudo types in brackets denotes a pseudo signature for the respective phase):
- Parsing textual source into an AST [Textual source -> Sugartypes]
- Early source transformations on the AST [Sugartypes -> Sugartypes]
- Typechecking the AST [Sugartypes * type_env -> Sugartypes * type * type_env]
- Typability preserving late source transformations [Sugartypes * type * type_env -> Sugartypes * type * type_env]
- Desugaring of the AST into IR [env * Sugartypes -> Ir * name_env] 5.5. (Optional) Typability preserving source transformations (or optimisations) on IR [type_env * Ir -> Ir * type]
- Evaluation directly on the IR... [ Value.env * IR -> Value.t * Value.env ] (Value.env maps IR names to abstract machine Value terms)
- or desugaring of IR into CODE (the JavaScript intermediate language) [ Value.env * IR -> CODE * value_env ] (value_env maps IR names to JavaScript strings)
The early source-to-source transformation phase identifies and groups recursive functions, expands type aliases, fills in kind information on type variables, etc. Sometimes the phase is also exploited to perform some purely syntactic elaboration (e.g. the phrase
handler h(x) { M }
is elaborated intofun h(x) { handle(x()) { M } }
).Furthermore the transformations performed in the second phase are untyped, whilst the transformations performed in the fourth phase are (intended) to preserve typability, in particular, such transformations are obliged to return the type of the transformed expression which may involve a transformation on types.
The fifth phase is akin to the desugaring phase from AST to Core that you are familiar with in GHC. Although, in Links this phase performs name resolution too.
It is worth noting that some of the transformations in phase 2 and 4 are unhygienic as they depend on surface names and sometimes the particular source order of functions in prelude (the source order sensitivity might be due to the grouping hack in refineBindings.ml).
And some more information from a follow-up email:
What is the intended behaviour if a fragment of code that is being transformed is incorrect? Is it expected that an error is raised in such a case? Or are such errors ignored and caught later on?
I suppose the intention is that the type checker will catch any semantic error. The ad-hoc nature of the early transformations make it is difficult to debug errors arising from them.
Now I'm wondering how typechecking works without names being resolved first...
I suppose technically names are resolved "on-the-fly", but they are not assigned unique names. The type checker passes an environment that contains an immutable map from surface names (strings) to types (specifically the types defined in types.ml).
Would it improve matters if we had a separate phase 1.5 that resolved names?
It would be more robust to resolve names earlier, and move the few special functions in the prelude into the compiler itself. Then we could use a structured API to query the resolved names of these built-ins. It would also allow us to get rid of the type-patching hack that runs after type checking. This hack patches up the inferred types of the special functions in the prelude, for example the wild effect is dropped in the signature of
concatMap
such that it can be compiled to the database.
Entry point: DesugarAlienBlocks.transform_alien_blocks program
Invariants before: Positions resolved to (start, finish, Some pos_context)
, where pos_context
is a SourceCode.source_code
object returned by parsing.
Invariants after: All AlienBlock
bindings replaced with Foreign
bindings.
Program is a list of bindings and an optional phrase. This pass builds a new
list of bindings and keeps the phrase. A list of bindings inside an
AlienBlock
gets flattened to a list of Foreign
binding, so:
alien javascript "test.js" {
setTitle : (String) ~> ()
alertBox : (String) ~> ()
}
gets replaced with:
alien javascript "test.js" setTitle : (String) ~> ()
alien javascript "test.js" alertBox : (String) ~> ()
Bindings inside Module
and Block
s get flattened in the same way, but they
are obviously kept inside their respective modules/blocks and not floated to the
top level. flatten_simple
is responsible for block flattening. As pointed out
earlier the same block flattening mechanism is used in DesugarModules
.
Note: Skipping a more detailed description due to ongoing work on the module system, which will very likely change the behaviour of current desugaring.
Entry point: DesugarModules.desugarModules prog_with_deps
Invariants before:
Invariants after: All Module
and QualifiedImport
bindings are removed.
There are definitely some extra invariants here (c.f. resolve
function).
Entry point: DesugarSessionExceptions.wrap_linear_handlers
Invariants before:
Invariants after: TryInOtherwise
phrases wrapped inside a switch
statement.
This transformation replaces try L as x in M otherwise N
with:
switch (try L as x in Just(x) otherwise Nothing) {
case Just(x) -> M
case Nothing -> N
}
There's a detailed comment in the source code describing the rationale behind this transformation, but I don't understand it.
Entry point: DesugarHandlers.desugar_handlers_early#program
Invariants before:
Invariants after:
Entry point: DesugarLAttributes.desugar_lattributes#program
Invariants before: attributes starting with "l:" appear only as "l:action", "l:name" or "l:on*" attributes in XML forms or as "l:href" in link () XML tags.
At the beginning of the source file there is a comment that says "l:href" and "l:action" attributes should be disallowed in client-side XML, but I don't see any code that would enforce that in any way or that would cause errors because of this. Desugared attribute functions receive Server location though.
Invariants after: No Xml phrasenodes contain attributes with names that start with "l:". This is enforced in
replace_lattrs` function that runs a
check after desugaring and raises an error if it finds any attribute with name
starting with "l:".
I don't really understand the details of what the transformation generates, since I'm not familiar with Links web programming model.
Question: Where does the "l:" prefix come from? Is this a special convention introduced in Links?
Note: discarding of source positions is definitely too eager in desugar_form
and desugar_lnames
. We traverse XML children nodes to modify the attributes.
These new attributes should have dummy positions, but the XML children nodes
should maintain their existing positions. Reported as #480.
Note: from a quick glimpse it seems that this is responsible for identifying groups of mutually recursive bindings. Since there is an ongoing work on this (#457) I'm skipping this module for now since it is likely to change.
Entry point: RefineBindings.refine_bindings#program
Invariants before: Funs
are absent in the AST. They are not generated by
the parser but introduced in this step.
Invariants after:
Note: Daniel mentioned that he's working on changing this module as part of work on the module system. Skipping for now.
Entry point: DesugarDatatypes.program tyenv.Types.tycon_env
Invariants before:
Invariants after:
Entry point: TypeSugar.Check.program tyenv
Invariants before: Links Prelude is loaded and typechecked, with the results
stored in tyenv
.
Invariants after: (Presumed) Program should be semantically correct.
Typechecking algorithm takes type environment (contains Prelude), bindings and
body. Bindings are typechecked and results are contained in a new environment
tyenv'
. Body is typechecked in an extended environment (tyenv + tyenv').
Result of typechecking is (bindings, Some body), typ, tyenv'
. Observation: we
only return tyenv'
, i.e. environment generated by typechecking bindings. This
does NOT contain original environment passed to the typechecking function. In
other words, typing environments don't accumulate.
Entry point: (DesugarCP.desugar_cp tyenv)#program
Invariants before:
Invariants after: CP
nodes are absent from the AST, mostly replaced with
a block containing a binding (or a group of bindings) and an expression.
Entry point: (DesugarInners.desugar_inners tyenv)#program
Invariants before: binder
s in Funs
declarations are typed (second binder
component Types.datatype option
is Some
); (outer, extras)
field exists
(see below).
Invariants after: extras
in Funs
declarations are None
(see below).
Also, extras are unbound in the resulting environment because "any existing
functions with the same name will now be shadowed by the functions defined in
this binding - and hence will not need any extra type variables adding".
Comment in the source code says:
Recursive functions must be used monomorphically inside their function bodies (unless annotated with a polymorphic type), but they may still be used polymorphically outside (after having been generalised). This pass unifies the inner and outer types of functions, which is necessary for converting to the IR, which stores a single type for each recursive function.
Traverses TAppl
, named InfixAppl
and UnaryAppl
, Spawn
, Escape
and
Block
phrasenodes but doesn't modify them, extends only the typing
environment. Also traverses groups of function bindings (Funs
). Fun
bindings are ignored - presumably because we are only interested in recursive
bindings (does this mean Fun
binding cannot be recursive?).
Note: I don't really understand the meaning of arguments to Funs
constructor,
in particular the (Types.datatype * Types.quantifier option list) option
bit,
which is referred to as Some (outer, extras)
. What are outer
and extras
?
Entry point: (DesugarSessionExceptions.insert_toplevel_handlers tyenv)#program
Invariants before:
Invariants after: bodies of Spawn
pharsenodes (except for Wait
) are
enclosed inside a TryOrOtherwise
. TODO: write down the exact nature of the
transformation.
Entry point: (DesugarSessionExceptions.desugar_session_exceptions tyenv)#program
Invariants before:
Invariants after: Raise
and TryInOtherwise
phrasenodes are removed from
AST. Raise
is replaced with DoOperation
, TryInOtherwise
with a
constructed deep Handle
.
Entry point: (DesugarProcesses.desugar_processes tyenv)#program
Invariants before:
Invariants after: Spawn
phrasenodes are replaced with calls to builtin
functions spawnWait
/spawnAt
/spawnAngelAt
. Receive
phrasenodes are
replaced with Switch
phrasenodes.
Entry point: (DesugarDbs.desugar_dbs tyenv)#program
Invariants before:
Invariants after: DBInsert
phrasenodes are absent from the AST (replaced
with calls to built-in functions).
This module looks like a leftover from some earlier design. According to the comment desugaring of delete statements has been moved to the IR and the same should happen for insert statements, which would make this module obsolete.
NOTE: labels
field is ignored.
Entry point: (DesugarFors.desugar_fors tyenv)#program
Invariants before:
Invariants after: Iteration
phrasenode and all iterpatt
(comprehension
generators) are absent from the AST.
Desugars for
comprehensions into primitives (for
-> concatMap
, order by
-> sortBy
, where
-> if then else
).
Entry point: (DesugarRegexes.desugar_regexes tyenv)#program
Invariants before: Regex
phrasenodes appear only as the right argument of
RegexMatch
binop. And the other way around: if anything else than a Regex
appear as a rhs of RegexMatch
an exception is raised.
Invariants after: No Regex
phrasenodes and no RegexMatch
binops are
present in the tree, which implies that expressions of regex
AST are erased
completely. All Regex
es are converted to phrases. Only variables appear in
regexes. All spliced expressions are replaced with let-bindings placed in a
block with the regex they are used in.
Entry point: (DesugarFormlets.desugar_formlets tyenv)#program
Invariants before: Formlet
body contains only: FormBinding
, Xml
,
TextNode
or Block
phrases.
Invariants after: No Formlet
phrasenodes in AST. TextNode
and
FormBinding
that were located inside Formlet
body are eliminated as well.
Entry point: (DesugarPages.desugar_pages tyenv)#program
Invariants before: Page
contains only: FormletPlacement
,
PagePlacement
, Xml
, TextNode
or Block
phrases.
Invariants after: No Page
phrasenodes in AST. FormletPlacement
and
PagePlacement
placed directly under Page
are also eliminated. I wonder
whether the intention to eliminate FormletPlacement
and PagePlacement
phrasenodes completely. Currently this is not enforced, as hinted by a TODO
note. Firstly, such nodes might appear outside of Page
phrasenode since
primary_expression
production in the parser admits XML expressions.
Secondly, they might appear somewhere in a Block
expression. Note that
Block
expressions can only appear under a Page
expressions if they have
been introduced by desugaring - they cannot be created by the parser.
Entry point: (DesugarFuns.desugar_funs tyenv)#program
Invariants before:
Invariants after: Section (Project e)
are eliminated by turning them into
function bindings enclosed inside a block:
(.l)
-->
{ fun f(x:r) {x.l}; f }
Note that this is different from what the source code comment in DesugarFuns
says. FunLit
(anonymous functions) are named, uncurried, and eliminated in
the same way (again, comment says something else):
fun [qs](xs1)...(xsk) {e}
-->
{fun f[qs](xs1) {
fun f1[](xs2) {
...
fun fk[](xsk) {
e
}; fk
...
}; f1
}; f}
Function bindings (Fun
, Funs
) are uncurried in the same way.