-
Notifications
You must be signed in to change notification settings - Fork 43
Links Desugaring Pipeline
Desugaring pipeline defined in frontend.ml
. There are separate pipelines for
compilation and REPL.
Compilation pipeline is invoked from Loader.read_file_source
and from
bin/links.ml
via bin/driver.ml
. At a high level the invocation starts with
let sugar, pos_context = ModuleUtils.try_parse_file filename`
try_parse_file
calls Parse.read
(via Parse.parse_file
), which returns a
tuple of type 'result * SourceCode.source_code
. Parse.read
directly invokes
the Parser.file
, which is an entry point for the parsing of source
files. 'result
of parsing a file is of type (Sugartypes.binding list * Sugartypes.phrase option)
, which is a list of declarations in a file paired
with optional expression. SourceCode.source_code
is an object for handling
source file (extracting line ranges, etc.).
Output of a compilation pipeline is of type ((Sugartypes.program * Types.datatype * Types.typing_environment) * string list)
. Importantly,
Sugartypes.program
is the desugared program and string list
is a list of FFI
files.
Program compilation pipeline is defined as a function that takes tyenv
,
pos_context
and program
. pos_context
is the same SourceCode.source_code
object that was returned by parsing a file; program
is a tuple of declarations
and an optional phrase.
-
Resolve positions by replacing
(start, finish, None)
with(start, finish, Some pos_context)
:(ResolvePositions.resolve_positions pos_context)#program program
-
Desugar alien blocks:
DesugarAlienBlocks.transform_alien_blocks program
-
Desugar modules (optional, only if modules enabled). Returns a program with desugared modules paired with a list of filenames for FFI bindings (filenames stored in
Foreign
nodes). Note:DesugarModules.flatten_simple
andDesugarAlienBlocks.flatten_simple
are identical, butflatten_bindings
are not. -
Check correctness of XML quasi quotes:
CheckXmlQuasiquotes.checker#program program
Raises an error if something is wrong. If everything correct, the result is discarded.
-
Check correctness of settings for session exceptions, handlers, and relational lenses:
DesugarSessionExceptions.settings_check program
ExperimentalExtensions.check#program
Main desugaring pipeline is a chained sequence of calls to desugar modules:
((( (* Early source transformations on the AST *)
ExperimentalExtensions.check#program
->- before_typing_ext session_exceptions DesugarSessionExceptions.wrap_linear_handlers
->- DesugarHandlers.desugar_handlers_early#program
->- DesugarLAttributes.desugar_lattributes#program
->- RefineBindings.refine_bindings#program
->- DesugarDatatypes.program tyenv.Types.tycon_env
(* Typechecking the AST *)
->- TypeSugar.Check.program tyenv
(* Typability preserving late source transformations *)
->- after_typing ((DesugarCP.desugar_cp tyenv)#program ->- snd3)
->- after_typing ((DesugarInners.desugar_inners tyenv)#program ->- snd3)
->- after_typing_ext session_exceptions ((DesugarSessionExceptions.insert_toplevel_handlers tyenv)#program ->- snd3)
->- after_typing_ext session_exceptions ((DesugarSessionExceptions.desugar_session_exceptions tyenv)#program ->- snd3)
->- after_typing ((DesugarProcesses.desugar_processes tyenv)#program ->- snd3)
->- after_typing ((DesugarDbs.desugar_dbs tyenv)#program ->- snd3)
->- after_typing ((DesugarFors.desugar_fors tyenv)#program ->- snd3)
->- after_typing ((DesugarRegexes.desugar_regexes tyenv)#program ->- snd3)
->- after_typing ((DesugarFormlets.desugar_formlets tyenv)#program ->- snd3)
->- after_typing ((DesugarPages.desugar_pages tyenv)#program ->- snd3)
->- after_typing ((DesugarFuns.desugar_funs tyenv)#program ->- snd3))
program), ffi_files)
Meaning of combinators:
-
->-
chains calls one by one -
snd3
returns second component of triple -
after_typing
calls a function on a first component of a triple -
_ext
runs a pass conditionally, if a given extension is enabled.
Importantly, many desugaring transformations derive from
TransformSugar.transform
, which defines the type of program
method as:
program -> ('self_type * program * Types.datatype option)
So a single step in the late desugaring pipeline, e.g.:
after_typing ((DesugarFors.desugar_fors tyenv)#program ->- snd3)
takes a (Sugartypes.program * Types.datatype * Types.typing_environment)
triple as an input and calls a desugaring on the first component
(after_typing
). That call returns a triple, of which we take the second
component (snd3
).
In a mail to Links-dev on 22/11/2018 I asked about the details of Links compilation pipeline and Daniel provided a detailed response, which I quote below:
There are two distinct notions of desugaring in Links: 1) source-to-source transformation, and 2) translation between two separate languages. To the best of my knowledge, there are three separate phases that performs source-to-source transformations, and two separate phases that translates between internal languages. A bird's eye view of the compilation pipeline might look something like (where the pseudo types in brackets denotes a pseudo signature for the respective phase):
- Parsing textual source into an AST [Textual source -> Sugartypes]
- Early source transformations on the AST [Sugartypes -> Sugartypes]
- Typechecking the AST [Sugartypes * type_env -> Sugartypes * type * type_env]
- Typability preserving late source transformations [Sugartypes * type * type_env -> Sugartypes * type * type_env]
- Desugaring of the AST into IR [env * Sugartypes -> Ir * name_env] 5.5. (Optional) Typability preserving source transformations (or optimisations) on IR [type_env * Ir -> Ir * type]
- Evaluation directly on the IR... [ Value.env * IR -> Value.t * Value.env ] (Value.env maps IR names to abstract machine Value terms)
- or desugaring of IR into CODE (the JavaScript intermediate language) [ Value.env * IR -> CODE * value_env ] (value_env maps IR names to JavaScript strings)
The early source-to-source transformation phase identifies and groups recursive functions, expands type aliases, fills in kind information on type variables, etc. Sometimes the phase is also exploited to perform some purely syntactic elaboration (e.g. the phrase
handler h(x) { M }
is elaborated intofun h(x) { handle(x()) { M } }
).Furthermore the transformations performed in the second phase are untyped, whilst the transformations performed in the fourth phase are (intended) to preserve typability, in particular, such transformations are obliged to return the type of the transformed expression which may involve a transformation on types.
The fifth phase is akin to the desugaring phase from AST to Core that you are familiar with in GHC. Although, in Links this phase performs name resolution too.
It is worth noting that some of the transformations in phase 2 and 4 are unhygienic as they depend on surface names and sometimes the particular source order of functions in prelude (the source order sensitivity might be due to the grouping hack in refineBindings.ml).
And some more information from a follow-up email:
What is the intended behaviour if a fragment of code that is being transformed is incorrect? Is it expected that an error is raised in such a case? Or are such errors ignored and caught later on?
I suppose the intention is that the type checker will catch any semantic error. The ad-hoc nature of the early transformations make it is difficult to debug errors arising from them.
Now I'm wondering how typechecking works without names being resolved first...
I suppose technically names are resolved "on-the-fly", but they are not assigned unique names. The type checker passes an environment that contains an immutable map from surface names (strings) to types (specifically the types defined in types.ml).
Would it improve matters if we had a separate phase 1.5 that resolved names?
It would be more robust to resolve names earlier, and move the few special functions in the prelude into the compiler itself. Then we could use a structured API to query the resolved names of these built-ins. It would also allow us to get rid of the type-patching hack that runs after type checking. This hack patches up the inferred types of the special functions in the prelude, for example the wild effect is dropped in the signature of
concatMap
such that it can be compiled to the database.
Entry point: DesugarAlienBlocks.transform_alien_blocks program
Invariants before: Positions resolved to (start, finish, Some pos_context)
, where pos_context
is a SourceCode.source_code
object returned by parsing.
Invariants after: All AlienBlock
bindings replaced with Foreign
bindings.
Program is a list of bindings and an optional phrase. This pass builds a new
list of bindings and keeps the phrase. A list of bindings inside an
AlienBlock
gets flattened to a list of Foreign
binding, so:
alien javascript "test.js" {
setTitle : (String) ~> ()
alertBox : (String) ~> ()
}
gets replaced with:
alien javascript "test.js" setTitle : (String) ~> ()
alien javascript "test.js" alertBox : (String) ~> ()
Bindings inside Module
and Block
s get flattened in the same way, but they
are obviously kept inside their respective modules/blocks and not floated to the
top level. flatten_simple
is responsible for block flattening. As pointed out
earlier the same block flattening mechanism is used in DesugarModules
.
Note: Skipping a more detailed description due to ongoing work on the module system, which will very likely change the behaviour of current desugaring.
Entry point: DesugarModules.desugarModules prog_with_deps
Invariants before:
Invariants after: All Module
and QualifiedImport
bindings are removed.
There are definitely some extra invariants here (c.f. resolve
function).
Entry point: DesugarSessionExceptions.wrap_linear_handlers
Invariants before:
Invariants after: TryInOtherwise
phrases wrapped inside a switch
statement.
This transformation replaces try L as x in M otherwise N
with:
switch (try L as x in Just(x) otherwise Nothing) {
case Just(x) -> M
case Nothing -> N
}
There's a detailed comment in the source code describing the rationale behind this transformation, but I don't understand it.
Entry point: DesugarHandlers.desugar_handlers_early#program
Invariants before:
Invariants after:
Entry point: DesugarLAttributes.desugar_lattributes#program
Invariants before: attributes starting with "l:" appear only as "l:action", "l:name" or "l:on*" attributes in XML forms or as "l:href" in link () XML tags.
At the beginning of the source file there is a comment that says "l:href" and "l:action" attributes should be disallowed in client-side XML, but I don't see any code that would enforce that in any way or that would cause errors because of this. Desugared attribute functions receive Server location though.
Invariants after: No Xml phrasenodes contain attributes with names that start with "l:". This is enforced in
replace_lattrs` function that runs a
check after desugaring and raises an error if it finds any attribute with name
starting with "l:".
I don't really understand the details of what the transformation generates, since I'm not familiar with Links web programming model.
Question: Where does the "l:" prefix come from? Is this a special convention introduced in Links?
Note: discarding of source positions is definitely too eager in desugar_form
and desugar_lnames
. We traverse XML children nodes to modify the attributes.
These new attributes should have dummy positions, but the XML children nodes
should maintain their existing positions. Reported as #480.
Note: from a quick glimpse it seems that this is responsible for identifying groups of mutually recursive bindings. Since there is an ongoing work on this (#457) I'm skipping this module for now since it is likely to change.
Entry point: RefineBindings.refine_bindings#program
Invariants before:
Invariants after:
Note: Daniel mentioned that he's working on changing this module as part of work on the module system. Skipping for now.
Entry point: DesugarDatatypes.program tyenv.Types.tycon_env
Invariants before:
Invariants after:
Entry point: TypeSugar.Check.program tyenv
Invariants before:
Invariants after:
Entry point: (DesugarCP.desugar_cp tyenv)#program
Invariants before:
Invariants after:
Entry point: (DesugarInners.desugar_inners tyenv)#program
Invariants before:
Invariants after:
Entry point: (DesugarSessionExceptions.insert_toplevel_handlers tyenv)#program
Invariants before:
Invariants after:
Entry point: (DesugarSessionExceptions.desugar_session_exceptions tyenv)#program
Invariants before:
Invariants after:
Entry point: (DesugarProcesses.desugar_processes tyenv)#program
Invariants before:
Invariants after:
Entry point: (DesugarDbs.desugar_dbs tyenv)#program
Invariants before:
Invariants after:
Entry point: (DesugarFors.desugar_fors tyenv)#program
Invariants before:
Invariants after:
Entry point: (DesugarRegexes.desugar_regexes tyenv)#program
Invariants before: Regex
phrasenodes appear only as the right argument of
RegexMatch
binop. And the other way around: if anything else than a Regex
appear as a rhs of RegexMatch
an exception is raised.
Invariants after: No Regex
phrasenodes and no RegexMatch
binops are
present in the tree, which implies that expressions of regex
AST are erased
completely. All Regex
es are converted to phrases. Only variables appear in
regexes. All spliced expressions are replaced with let-bindings placed in a
block with the regex they are used in.
Entry point: (DesugarFormlets.desugar_formlets tyenv)#program
Invariants before: Formlet
body contains only: FormBinding
, Xml
,
TextNode
or Block
phrases.
Invariants after: No Formlet
phrasenodes in AST. TextNode
and
FormBinding
that were located inside Formlet
body are eliminated as well.
Entry point: (DesugarPages.desugar_pages tyenv)#program
Invariants before: Page
contains only: FormletPlacement
,
PagePlacement
, Xml
, TextNode
or Block
phrases.
Invariants after: No Page
phrasenodes in AST. FormletPlacement
and
PagePlacement
placed directly under Page
are also eliminated. I wonder
whether the intention to eliminate FormletPlacement
and PagePlacement
phrasenodes completely. Currently this is not enforced, as hinted by a TODO
note. Firstly, such nodes might appear outside of Page
phrasenode since
primary_expression
production in the parser admits XML expressions.
Secondly, they might appear somewhere in a Block
expression. Note that
Block
expressions can only appear under a Page
expressions if they have
been introduced by desugaring - they cannot be created by the parser.
Entry point: (DesugarFuns.desugar_funs tyenv)#program
Invariants before:
Invariants after: