-
Notifications
You must be signed in to change notification settings - Fork 77
JIT Ideas ...
Update March, 2021: More concrete thoughts and some experimental JIT mockups.
Not sure how to best collaborate on ideas -- maybe just edit this. Especially the example code section: once we're happy with it, we can build a JIT to produce it.
To achieve significantly better performance we need to employ just-in-time compilation and caching techniques (see Deutsch/Schiffman). This includes
- using actual temp vars for temporaries instead of the stack
- passing method arguments as function parameters instead of stack
- eliminating context creation for message sends
- reification of context objects when needed
- inline caching of method lookup result
- mapping jump bytecodes to JS control structures
Context switches can occur at each message send and backwards jump. We need to be able to return control to the browser at this point, as well as switch execution to a different Smalltalk process.
Idea: Have a "fast path" that assumes there will be no interrupts. It uses direct function calling and return, never reifing the context object. Only in the case of an interrupt it creates the actual context object, filling it with the values of the temp vars. Then it returns and the calling function needs to also reify its context and return etc.
Resuming a context object is harder. Ideally it would work just the other way around: temps are initialized with values from the context object, and then "somehow" the execution continues from the middle of the method. The question is how to do that without a go-to statement in JS.
The SimpleJIT emulates go-to using a switch statement inside an endless loop, with each case being a jump target. My current thinking is to simply fall back onto the SimpleJIT or even the interpreter when a method needs to be resumed (that is, the context's PC is not zero). This would avoid having to check for this case in the "fast path". We need to add support for entering the fast path to the SimpleJIT. That will get us back to fast land ASAP.
Jitted Smalltalk methods could be represented as unbound functions, or as methods on the class objects. Functions might be easier to compose, e.g. if we need to break up functions into code blocks to be able to "jump" into the middle of a method after an interrupt.
Slava Egorov gave a nice talk about implementing fast dynamic lookup in JS: Video and Slides. We probably want to do something along these lines.
Primitive methods expect arguments to be passed on the current context's stack. If we want to avoid having to pass arguments via the stack, we will have to rewrite the primitives. This is probably hard for the generated primitives. Perhaps it is sufficient to have a fast path for numbered primitives, and a slower path for named primitives?
Javascript limits the maximum stack depth to a few thousand calls. To avoid running into a "call stack size exceeded" exception we will have to limit our call depth. Every send and return needs to increment/decrement a global counter and trigger an interrupt (return to the main loop).
Another idea would be to use continuation-passing style (CPS), but the resulting code is not very readable so I'd rather avoid that. See this tutorial for a good explanation.
What should the generated code look like?
Our current simple JIT uses the same execution mechanism as the interpreter,
that is, it has a single generic send
method that allocates a new context
and then invokes the associated method. The speedup comes mostly from doing
the bytecode decoding work when jitting, and better locality of the resulting
bytecode sequence, which allows the JS JIT to optimize on a method level.
Example: This source method
method
[value yourself == nil] whileFalse.
^ 42
with these bytecodes
0 <00> pushInstVar: 0
1 <D0> send: #yourself
2 <73> pushConst: nil
3 <C6> send: #==
4 <A8 02> jumpIfTrue: 8
6 <A3 F8> jumpTo: 0
8 <21> pushConst: 42
9 <7C> return: topOfStack
is currently translated to this jitted code:
function Class_method(vm) {
var context = vm.activeContext,
stack = context.pointers,
rcvr = vm.receiver,
inst = rcvr.pointers,
temp = vm.homeContext.pointers,
lit = vm.method.pointers;
while (true) switch (vm.pc) {
case 0:
// 0 <00> pushInstVar: 0
stack[++vm.sp] = inst[0];
// 1 <D0> send: #yourself
vm.pc = 2; vm.send(lit[1], 0, false);
if (context !== vm.activeContext || vm.breakOutOfInterpreter !== false)
return;
case 2:
// 2 <73> pushConst: nil
stack[++vm.sp] = vm.nilObj;
// 3 <C6> send: #==
var cond = stack[vm.sp-1] === stack[vm.sp];
stack[--vm.sp] = cond ? vm.trueObj : vm.falseObj;
// 4 <A8 02> jumpIfTrue: 8
var cond = stack[vm.sp--];
if (cond === vm.trueObj) {
vm.pc = 8;
continue;
}
case 6:
// 6 <A3 F8> jumpTo: 0
vm.pc = 0;
if (vm.interruptCheckCounter-- <= 0) {
vm.checkForInterrupts();
if (context !== vm.activeContext || vm.breakOutOfInterpreter !== false)
return;
}
continue;
case 8:
// 8 <21> pushConst: 42
stack[++vm.sp] = lit[2];
// 9 <7C> return: topOfStack
vm.pc = 10; vm.doReturn(stack[vm.sp]);
return;
default:
// someone played around with vm.pc
vm.interpretOne(true);
return bytecodes;
}
}
This method can be invoked with any valid PC.
E.g. if there was a context switch inside yourself
, this method would be
resumed with vm.pc == 2
, jumping right to the second case.
The default case even falls back to the interpreter,
in case the method was entered with an 'unusual' program counter.
This can pretty much only happen during single-stepping in the debugger.
As you can see, for each send it creates a new context and returns. The main loop outside this code then executes that method. This has the advantage of allowing an unlimited stack depth, because all calls are shallow. It also has the disadvantage that called methods can never be inlined by the JS JIT.
Bytecodes and pseudo code:
PC SP while (true) {
0 <00> pushInstVar: 0 1 tmp0 = rcvr.instvar[0]
1 <D0> send: #yourself 1 tmp0 = send(tmp0, #yourself)
2 <73> pushConst: nil 2 tmp1 = nil
3 <C6> send: #== 1 tmp0 = tmp0 == tmp1
4 <A8 02> jumpIfTrue: 8 0 if (tmp0 !== false) break;
6 <A3 F8> jumpTo: 0 0 }
8 <21> pushConst: 42 1 tmp0 = 42
9 <7C> return: topOfStack 0 return tmp0
Here, temps are named according to their position on the stack. This should make it easy to reify the stack when needed. Not sure if that will work within blocks, though.
Rough idea for generated code:
new Function('env', `
var class0 = null, func0;
return function Class_method(rcvr) {
var tmp0, tmp1;
if (env.depth-- < 0) return env.reifyStack(0);
while (true) {
tmp0 = rcvr.pointers[0];
if (class0 !== tmp0.sqClass) {
class0 = tmp0.sqClass;
func0 = env.lookup(class0, lit[0], false);
}
tmp0 = func0(tmp0); if (tmp0 === false) return env.reifyStack(2, tmp0);
tmp1 = env.nilObj;
tmp0 = tmp0 === tmp1 ? env.trueObj : env.falseObj;
if (tmp0 === env.trueObj) break;
else if (tmp0 !== env.falseObj) return env.mustBeBoolean(4, tmp0);
}
tmp0 = 42;
return tmp0;
}
`)(env);
The inline-cache check class0 === rcvr.sqClass
should work even for SmallInts, their sqClass
is undefined
.
func0
is the jitted method for #yourself
. It answers the special value false
to indicate we have to leave the fast path.
env.reifyStack(pc, ...args)
would create a proper context object from the given pc
and temp vars (if any).