-
Notifications
You must be signed in to change notification settings - Fork 105
Local Module Import & Export
The importing & exporting of local modules (not to be confused with Source modules) is available in Source 2+. It is done as a preprocessing step where the Abstract Syntax Trees (ASTs) that are parsed from the various files/modules involved are transformed in such a way that outputs a single AST that handles all imports & exports.
In Source, we make the distinction between the following modules:
- Source modules, which can be thought of as libraries which can be imported from. There is a separate system for handling these in
js-slang
. The modules that are available can be found in themodules
repository. - Local modules, which are essentially user-written files to allow for multi-file Source programs.
More details on how this distinction is made can be found below.
How the AST transformation works is best described by the equivalent code.
Say we have a Source 2+ program consisting of 4 files where /a.js
is the entrypoint file (i.e., the file containing the code that is the entrypoint of the program - because evaluation of any program needs to have a defined starting point):
-
/a.js
import { a as x, b as y } from "./b.js"; x + y;
-
/b.js
import y, { square } from "./c.js"; const a = square(y); const b = 3; export { a, b };
-
/c.js
import { mysteryFunction } from "./d.js"; const x = mysteryFunction(5); export function square(x) { return x * x; } export default x;
-
/d.js
const addTwo = x => x + 2; export { addTwo as mysteryFunction };
The resulting AST after transformation will be equivalent to the following Source code:
function __$b$dot$js__(___$c$dot$js___) {
const y = __access_export__(___$c$dot$js___, "default");
const square = __access_export__(___$c$dot$js___, "square");
const a = square(y);
const b = 3;
return pair(null, list(pair("a", a), pair("b", b)));
}
function __$c$dot$js__(___$d$dot$js___) {
const mysteryFunction = __access_export__(___$d$dot$js___, "mysteryFunction");
const x = mysteryFunction(5);
function square(x) {
return x * x;
}
return pair(x, list(pair("square", square)));
}
function __$d$dot$js__() {
const addTwo = x => x + 2;
return pair(null, list(pair("mysteryFunction", addTwo)));
}
const ___$d$dot$js___ = __$d$dot$js__();
const ___$c$dot$js___ = __$c$dot$js__(___$d$dot$js___);
const ___$b$dot$js___ = __$b$dot$js__(___$c$dot$js___);
const x = __access_export__(___$b$dot$js___, "a");
const y = __access_export__(___$b$dot$js___, "b");
x + y;
Note however that the transformation is not done as the syntax (code) level, but at the AST level. The above example merely serves as a mental model for how imports/exports work.
The evaluation of a Source program follows the following high-level steps:
- First, the entrypoint file is parsed into an AST.
- The AST of the entrypoint file is then filtered for import declarations.
- For each import declaration corresponding to a local file/module, we parse the corresponding file into its own AST.
- Then, we recursively filter its AST for import declarations to determine which other files to parse. While this is done, we build up a directed graph of local module imports.
- If our module graph is not acyclic (meaning that there are circular imports), the program evaluation fails. This is because our mental model is sequential and cannot support cyclic imports.
- Otherwise, every AST except the one corresponding to the entrypoint file is transformed into a function whose parameters are the results of the transformed functions it imports from, and its return value is a structure containing all of its exported values.
- Finally, all of these transformed functions and their invocations are assembled into the final AST.
The key ideas behind the AST transformation are discussed in this section.
Each module has its own environment.
As such, we cannot substitute imported code into the places where they are used (i.e., no referential transparency).
Consider the following contrived example Source program where the entrypoint file is /a.js
:
-
/a.js
import { increment_and_print } from "./b.js"; increment_and_print(); increment_and_print(); increment_and_print();
-
/b.js
let x = 0; export function increment_and_print() { x = x + 1; display(x); }
In /b.js
, the variable x
is incremented through multiple calls to the increment_and_print
function.
As a result, the value of x
that is display
ed for each call is different.
To keep track of the state of the variable x
across multiple calls, we need to maintain an environment for the module corresponding to the file /b.js
.
This was achieved by wrapping each module inside a function, which creates a new environment when called.
Note that because of this, when multiple modules import a single module, we only ever call the transformed function once so that only a single environment exists for the module. This necessitates the storing of a single copy of the result of calling the transformed function (which contains the module's exported values).
The entrypoint module does not need to be transformed into a function as its environment is simply that of the resulting program.
For every imported module, we only want the exported values to be available to other modules.
This is done by having each transformed function returning a structure that contains the module's exported values.
The structure is a pair where the head element is the default export or null
, and the tail element is a list of pairs.
Each pair is a mapping where the head element is the name of the value when exported, while the tail element is the identifier that is being exported.
As an example, the following module:
// /b.js
const square = x => x * x;
const x = 5;
const y = 6;
export default x;
export { square, y as z };
would be transformed into:
function __$b$dot$js__() {
const square = x => x * x;
const x = 5;
const y = 6;
return pair(x, list(pair("square", square), pair("z", y)));
}
This has the added benefit of allowing us to rename exported names as can be seen in the example above.
While an imported module can have many exported values, it is often the case that we are only interested in a subset of them. As such, we do not want all exported values to be available in the environment of the module that is importing another module (as that would cause unnecessary name collisions in the environment).
As such, we want to only declare the imported values in the environment.
To do so, we make use of an __access_export__
function to lookup the imported value in the structure containing exported values:
// Prelude functions
function __access_named_export__(named_exports, lookup_name) {
if (is_null(named_exports)) {
return undefined;
} else {
const name = head(head(named_exports));
const identifier = tail(head(named_exports));
if (name === lookup_name) {
return identifier;
} else {
return __access_named_export__(tail(named_exports), lookup_name);
}
}
}
function __access_export__(exports, lookup_name) {
if (lookup_name === "default") {
return head(exports);
} else {
const named_exports = tail(exports);
return __access_named_export__(named_exports, lookup_name);
}
}
The __access_export__
function (along with its helper functions) are inserted as prelude functions to Source 2+ programs.
This means that these functions are injected at the start of Source 2+ programs and are available to use.
Note: Because we want students to have a mental model of imports/exports, these are inserted as prelude functions so that their implementation can be seen when using tools such as the stepper.
As an example, the following module:
import { square as sq } from "./b.js";
sq(5);
would be transformed into:
const ___$b$dot$js___ = __$b$dot$js__();
const sq = __access_export__(___$b$dot$js___, "square");
sq(5);
As can be seen in the example above, renaming imports is also supported.
We cannot use function expressions (i.e., anonymous functions) because they are banned in Source. As such, we need to make use of function declarations. This means that when we transform modules into functions, the functions must have a name.
Valid function names in Source are comprised of alphanumeric characters, the underscore (_
), as well as the dollar sign ($
).
The set of valid characters for function names is much smaller than that of file paths.
In order to prevent name collisions from occurring when we derive the names of functions from the file path of the respective modules, we need to ensure that our derivation of function name from file path is a bijective function.
This means that for every derived function name, it corresponds to a unique file path.
To do so, we restrict the set of valid characters for file paths to alphanumeric characters and characters which have an encoding defined (currently /
, .
and -
) in /src/localImports/filePaths.ts
:
/**
* Maps characters that are legal in file paths but illegal in
* function names to strings which are legal in function names.
*/
export const charEncoding: Record<string, string> = {
'/': '$',
'.': '$dot$',
'-': '$dash$'
}
For example:
transformFilePathToValidFunctionName("/dir/a.js") === "__$dir$a$dot$js__";
Given that imported modules can themselves import other modules, it is important that we call the transformed functions in an order such that if A imports B, B must be called before A so that the result of calling B can be passed into the function call of A.
This is in fact a topological ordering problem. We can view nodes in our import graph as being modules, while edges as being dependencies between modules (when a module imports another). Since the import relationship is asymmetrical, these edges in the import graph are directed.
Once we have our directed graph abstraction, we can then run Kahn's algorithm to determine a valid topological ordering of modules (assuming that the import graph is acyclic). This topological ordering informs our AST transformer on the order in which the transformed functions should be invoked.
If we look at the example program above, /a.js
imports from /b.js
which imports from /c.js
which in turn imports from /d.js
.
Ignoring /a.js
since it is the entrypoint module and does not need to be transformed into a function, our transformer ends up with the following order of function calls:
const ___$d$dot$js___ = __$d$dot$js__();
const ___$c$dot$js___ = __$c$dot$js__(___$d$dot$js___);
const ___$b$dot$js___ = __$b$dot$js__(___$c$dot$js___);
A key principle behind why imports/exports are handled as described above is so that students have a mental model of how imports/exports work that can be described in Source code. Consequently, this also means that the import & export system for local modules cannot support circular dependencies. This is because the function calls must be made sequentially, yet a circular dependency implies that at least two of these function rely on the other being called first.
In the interest of having informative error messages, we have modified Kahn's algorithm above to also find any cycle in the import graph if it is not acyclic. This works by picking an arbitrary node on the cycle (as indicated by having a non-zero in-degree (number of incoming edges)), and then walking through an arbitrary neighbour of the node which also has a non-zero in-degree recursively. Once we visit the same node twice, we have completed a cycle and can stop the walk. All nodes that we visited in between the first and last time the same node was visited is part of a cycle, which we print in the error message.
The implementation of local module import & export supports more features than is enabled. For instance, only named exports are enabled in Source 2 through 4, even though default exports are implemented.
We control which import/export features are available at the AST node level.
The types of AST nodes allowed in the various chapters of Source are defined in the syntaxBlacklist
object.
Given that Source is a subset of JavaScript, the syntax for imports and exports is the same as JavaScript's. We make use of the Acorn parser to parse Source programs into their Abstract Syntax Tree (AST) representation. This particular representation is (mostly) compliant with the ESTree spec.
As such, we have the following import/export-related AST nodes to work with:
-
ExportNamedDeclaration
- any named exports.const x = 5; function square(x) { return x * x; } export { x as y, square };
-
ExportDefaultDeclaration
- the default export of a file/module.function cube = x => x * x * x; export default cube;
-
ExportAllDeclaration
- re-exporting/aggregating values exported from other modules.export * from "./b.js";
-
ImportDeclaration
- the parent AST node of all import specifiers. -
ImportSpecifier
- any named imports.import { x as y, z } from "./a.js";
-
ImportDefaultSpecifier
- the default import of a file/module.import x from "./a.js";
-
ImportNamespaceSpecifier
- imports all exported values from a module as an object.import * as x from "./a.js";
The Source chapters in which they are enabled are:
- 1 -
ImportDeclaration
,ImportSpecifier
- Needed for Source module imports.
- 2 -
ExportNamedDeclaration
- For Source 1 through 4, only named imports/exports is enabled. However, the implementation actually supports all AST nodes if possible so that they can be enabled by changing their values in the
syntaxBlacklist
.
- For Source 1 through 4, only named imports/exports is enabled. However, the implementation actually supports all AST nodes if possible so that they can be enabled by changing their values in the
- Library parser language (100) -
ExportDefaultDeclaration
,ImportDefaultSpecifier
- Default exports are implemented but not enabled in Source 1 through 4. They can however be used in the library parser language.
- Infinity (cannot be used at all) -
ExportAllDeclaration
,ImportNamespaceSpecifier
-
ExportAllDeclaration
will eventually be implemented, but not enabled in Source 1 through 4. Because of that, it is low priority and will be added at a later date. -
ImportNamespaceSpecifier
cannot be supported as Source does not allow dot notation (to access properties on objects).
-