Skip to content

Latest commit

 

History

History
1347 lines (1016 loc) · 63.3 KB

SYNTAX-6-OPTIMIZATIONS.markdown

File metadata and controls

1347 lines (1016 loc) · 63.3 KB

Code optimization

Code optimization runs on compiled (mlog) code. The compiled code is inspected for sequences of instructions which can be removed or replaced by functionally equivalent, but shorter and/or faster sequence of instructions. The new sequence might even be longer than the original one, if it is executing faster (see the goal option).

The information on compiler optimizations is a bit technical. It might be useful if you're trying to better understand how Mindcode generates the mlog code.

Temporary Variables Elimination

The compiler sometimes creates temporary variables whose only function is to store output value of an instruction before passing it somewhere else. This optimization removes all assignments from temporary variables that carry over the output value of the preceding instruction. The set instruction is removed, while the preceding instruction is updated to replace the temporary variable with the target variable used in the set statement.

The optimization is performed only when the following conditions are met:

  • The set instruction assigns from a temporary variable.
  • The temporary variable is used in exactly one other instruction. The other instruction immediately precedes the instruction producing the temporary variable.
  • All arguments of the other instruction referencing the temporary variable are output ones.

An additional optimization is performed when an instruction has a temporary output variable which isn't read by any other instruction. In this case, the unused output variable is replaced by 0 (literal zero value). Such an instruction will be executed correctly by Mindustry Logic, but to new variable will be allocated for the replaced argument.

push and pop instructions are ignored by the above algorithm. push/pop instructions of any eliminated variables are removed by the Stack Optimization down the line.

Case Expression Optimization

Case expressions allocate temporary variable to hold the value of the input expression, even if the input expression is actually a user defined variable. This optimization detects these instances, removes the temporary variable and replaces it with the original variable containing the value of the input expression. The set instruction is removed, while the other instructions are updated to replace the temporary variable with the one used in the set statement.

The optimization is performed only when the following conditions are met:

  • The set instruction assigns to a case-expression temporary variable.
  • The set instruction is the first of all those using the temporary variable (the check is based on absolute instruction sequence in the program, not on the actual program flow).
  • Each subsequent instruction using the temporary variable conforms to the code generated by the compiler (i.e. has the form of jump target <condition> *tmpX testValue)

Dead Code Elimination

This optimization inspects the entire code and removes all instructions that write to variables, if none of the variables written to are actually read anywhere in the code.

This optimization support basic and advanced levels of optimization. On the advanced level, the optimization removes all dead assignments, even assignments to unused program parameters, global variables and main variables.

Dead Code Elimination also inspects your code and prints out suspicious variables:

  • Unused variables: those are the variables that were, or could be, eliminated. On basic level, some unused variables might not be reported.
  • Uninitialized variables: those are global variables that are read by the program, but never written to. (The Data Flow Optimization detects uninitialized local and function variables.)

Both cases deserve a closer inspection, as they might indicate a problem with your program.

Jump Normalization

This optimization handles conditional jumps whose condition can be fully evaluated:

  • always false conditional jumps are removed,
  • always true conditional jumps are converted to unconditional ones.

A condition can be fully evaluated if both of its operands are literals, or if they're variables whose values were determined to be constant by the Data Flow Optimization.

The first case reduces the code size and speeds up execution. The second one in itself improves neither size nor speed, but allows those jumps to be handled by other optimizations aimed at unconditional jumps.

Jump Optimization

Conditional jumps are sometimes compiled into an op instruction evaluating a boolean expression, and a conditional jump acting on the value of the expression.

This optimization turns the following sequence of instructions:

op <comparison> var A B
jump label equal/notEqual var false

into

jump label <inverse of comparison>/<comparison> A B

Prerequisites:

  • jump is an equal/notEqual comparison to false,
  • var is a temporary variable,
  • var is not used anywhere in the program except these two instructions,
  • <comparison> has an inverse/<comparison> exists as a condition.

Single Step Elimination

This optimizer simplifies the following sequences of jump that are a result of certain combinations of optimizations:

  • No jumps are removed from jump tables.
  • A conditional or unconditional jump targeting the next instruction.
  • A conditional or unconditional jump is removed if there is an identical jump immediately following it. The second jump may be a target of another jump.
  • On the advanced level, a jump is removed if there is an identical jump preceding it and these conditions are met:
    • the jump doesn't contain volatile variables,
    • there are no instructions affecting control flow between the jumps, including points targeted by jumps,
    • there are no instructions modifying the jump condition variables between the jumps.
  • On the advanced level, the end and jump 0 always instructions at the very end of the program are removed, as the processor will jump to the first instruction of the program upon reaching the end of the instruction list anyway, saving execution of one instruction.

Expression Optimization

This optimization looks for certain expressions that can be performed more efficiently. Currently, the following optimizations are available:

  • floor function called on a multiplication by a constant or a division. Combines the two operations into one integer division (idiv) operation. In the case of multiplication, the constant operand is inverted to become the divisor in the idiv operation.
  • sensor var @this @x and sensor var @this @y are replaced by set var @thisx and set var @thisy respectively. Data Flow Optimization may then apply constant propagation to the @thisx/@thisy built-in constants.
  • All set instructions assigning a variable to itself (e.g. set x x) are removed.
  • When both operands of the instruction are known to have the same value, some operations always produce a fixed value. If this is the case, the operation is replaced by a set instruction setting the target variable to the fixed value:
    • equal, lessThanEq, greaterThanEq, strictEqual: sets the result to 1 (true)
    • notEqual, lessThan, greaterThan: sets the result to 0 (false)
    • sub, xor: sets the result to 0
    • or, land: sets the result directly to the first operand, if the instruction doesn't represent the boolean version of the operation (|| or &&),
    • and, min and max: sets the result directly to the first operand.
  • The result of some operations may be determined by a known value of one of its operands. In some cases, the result doesn't depend on the other operand (a multiplication by zero is always zero regardless of the other operand), while in other cases the result is equal to the other operand (a multiplication by one or a subtraction of zero). All the performed replacements are listed in this table:
Operation Operator First operand Second operand Result Level Note
mul var 1 var Basic Commutative
mul var 0 0 Basic Commutative
div var 1 var Basic
div, idiv, mod var 0 null Basic
div, idiv, mod 0 var 0 Advanced
add, xor var 0 var Basic Commutative
sub var 0 var Basic
shl, shr var 0 var Advanced
shl, shr 0 var 0 Basic
or |, or var 0 var Advanced Commutative
or ||, or var nonzero 1 Basic Commutative
and, land var 0 0 Basic Commutative
land and var nonzero var Advanced Commutative

In case of commutative operations, the result is the same if the first and second operands are swapped. var represents a variable with an arbitrary, unknown value.

Some mlog operations are produced by different Mindcode operators. When the optimizations are only applied to operations corresponding to specific Mindcode operators, these operators are listed in the Operator column.

Important

Some optimizations applied to or, and, land, xor, shl and shr operations applied to non-integer or non-boolean values may produce different results from unoptimized code: unoptimized code would result into an integer value (or boolean value in case of land), while the optimized code may produce a non-integer value. Passing a non-integer/non-boolean value into these operators is unusual, but not impossible. These optimizations are therefore only performed on advanced level and can be turned off by setting the level to basic.

If the optimization level is advanced, the following additional expressions are handled:

  • If the @constant in a sensor var @constant @id instruction is a known item, liquid, block or unit constant, the Mindustry's ID of the objects is looked up and the instruction is replaced by set var <id>, where <id> is a numeric literal.
  • If the id in a lookup <type> id instruction is a constant, Mindcode searches for the appropriate item, liquid, block or unit with given id and if it finds one, the instruction is replaced by set var <object>, where <object> is an item, liquid, block or unit literal.

Mindcode contains a list of known Mindustry objects and their IDs, obtained from the latest Mindustry version. These IDs are expected to be immutable, but the optimization can be turned off by setting the optimization level to basic if the actual IDs turn out to be different from the IDs known to Mindcode.

If Expression Optimization

This optimization consists of three types of modifications performed on blocks of code created by if/ternary expressions. All possible optimizations are done independently.

Value propagation

The value of ternary expressions and if expressions is sometimes assigned to a user-defined variable. In these situations, the true and false branches of the if/ternary expression assign the value to a temporary variable, which is then assigned to the user variable. This optimization detects these situations and when possible, assigns the final value to the user variable directly in the true/false branches:

abs = if x < 0 then
    negative += 1;
    -x;
else
    positive += 1;
    x;
end;

produces this code:

jump 4 greaterThanEq :x 0
op add :negative :negative 1
op sub :abs 0 :x
jump 6 always 0 0
op add :positive :positive 1
set :abs :x
end

As the example demonstrates, value propagation works on more than just the set instruction. All instructions having exactly one output parameter (based on instruction metadata) are handled.

Instruction propagation

If the instruction immediately following the if expression isn't a set instruction, but another instruction taking the resulting value of the if expression, and the resulting value is stored using a set instruction, the set instruction will be replaced by the instruction actually consuming the value. The optimization targets these conditional expressions passed as a parameter into a function, for example print(a > 0 ? "positive" : "negative");, which produces:

jump 3 lessThanEq :a 0
print "positive"
jump 0 always 0 0
print "negative"

This saves the execution of one instruction storing the resulting value (positive or negative, in this case) before passing it into the print instruction.

All kinds of instructions are supported, for example approach(@thisx, @thisy, @unit.@type == @mega ? 8 : 12); produces

sensor *tmp0 @unit @type
jump 4 notEqual *tmp0 @mega
ucontrol approach @thisx @thisy 8 0 0
jump 5 always 0 0
ucontrol approach @thisx @thisy 12 0 0

The optimization handles even nested if expressions, such as

print(a < 100 ? a < 10 ? "units" : "tens" : a < 1000 ? "hundreds" : "thousands");

which produces

jump 6 greaterThanEq :a 100
jump 4 greaterThanEq :a 10
print "units"
jump 0 always 0 0
print "tens"
jump 0 always 0 0
jump 9 greaterThanEq :a 1000
print "hundreds"
jump 0 always 0 0
print "thousands"

This optimization is available on the advanced level.

Forward assignment

Some conditional expressions can be rearranged to save instructions while keeping execution time unchanged:

param x = 5;
text = x < 0 ? "negative" : "positive";

Without If Expression Optimization, the produced code is

jump 3 greaterThanEq x 0
set *tmp1 "negative"
jump 4 always 0 0
set *tmp1 "positive"
set :text *tmp1

Execution speed:

  • x is negative: 4 instructions (0, 1, 2) are executed,
  • x is positive: 3 instructions (0, 3) are executed.

The If Expression Optimization turns the code into this:

set :text "positive"
jump 3 greaterThanEq x 0
set :text "negative"

Execution speed:

  • x is negative: 4 instructions (0, 1, 2) are executed,
  • x is positive: 3 instructions (0, 1) are executed.

The execution time is the same. However, one less instruction is generated.

The forward assignment optimization is performed when these conditions are met:

  • optimization level is advanced,
  • both branches assign a value to the same variable (resulting variable) as the last statement,
  • the resulting variable is not global (global variables can be modified or used in function calls),
  • the resulting variable is not used in the condition in any way,
  • at least one branch consists of just one instruction which sets the resulting variable (this is the branch to be moved),
  • the other branch doesn't use the resulting variable anywhere except the last statement,
  • the last statement of the other branch doesn't depend on the resulting variable.

Depending on the type of condition and the branch sizes, either true branch or false branch can get eliminated this way. Average execution time remains the same, although in some cases the number of executed instructions per branch can change by one (total number of instructions executed by both branches remains the same).

Compound condition elimination

The instruction generator always generates true branch first. In some cases, the jump condition cannot be expressed as a single jump and requires additional instruction (this only happens with the strict equality operator ===, which doesn't have an opposite operator in Mindustry Logic).

The additional instruction can be avoided when the true and false branches in the code are swapped. When this optimizer detects such a situation, it does exactly that:

if @unit.@dead === 0 then
    print("alive");
else
    print("dead");
end;

Notice the print "dead" occurs before print "alive" now:

sensor *tmp0 @unit @dead
jump 4 strictEqual *tmp0 0
print "dead"
jump 0 always 0 0
print "alive"
end

Chained if-else statements

The elsif statements are equivalent to nesting the elsif part in the else branch of the outer expression. Optimizations of these nested statements work as expected:

y = if x < 0 then
    "negative";
elsif x > 0 then
    "positive";
else
    "zero";
end;

produces

set y "negative"
jump 5 lessThan x 0
set y "zero"
jump 5 lessThanEq x 0
set y "positive"
end

saving three instructions over the code without if statement optimization:

jump 3 greaterThanEq x 0
set *tmp1 "negative"
jump 8 always 0 0
jump 6 lessThanEq x 0
set *tmp3 "positive"
jump 7 always 0 0
set *tmp3 "zero"
set *tmp1 *tmp3
set y *tmp1
end

Data Flow Optimization

This optimization inspects the actual data flow in the program and removes instructions and variables (both user defined and temporary) that are dispensable or have no effect on the program execution. Each individual optimization performed is described separately below.

Data Flow Optimizations can have profound effect on the resulting code. User-defined variables can get completely eliminated, and variables in expressions can get replaced by various other variables that were determined to hold the same value. The goal of these replacements is to allow elimination of some instructions. The optimizer doesn't try to avoid variable replacements that do not lead to instruction elimination - this would make the resulting code more understandable, but the optimizer would have to be more complex and therefore more prone to errors.

Important

One of Mindcode goals is to facilitate making small changes to the compiled code, allowing users to change crucial parameters in the compiled code without a need to recompile entire program. To this end, assignments to program parameters are never removed. Any changes to set instructions in the compiled code assigning value to program parameters are fully supported and the resulting program is fully functional, as if the value was assigned to the program parameter in the source code itself.

All other variables can be completely removed by this optimization. Even if they stay in the compiled code, changing the value assigned to any variables other than program parameters may not produce the same effect as compiling the program with the changed value. In other words, changing a value assigned to a global, main or local variable in the compiled code may break the compiled program.

If you wish to create a parametrized code, follow these rules for best results:

  • Use program parameters as placeholders for the parametrized values.
  • Declare the parameters at the very beginning of the program (this way the parameters will be easy to find in the compiled code).
  • If you sanitize or otherwise modify the parameter value before being used by the program, store the results of these operations in another variable. Program parameters are read-only in Mindcode and cannot be modified once set.
  • Do not modify instructions other than set instructions assigning values to program parameters in the compiled code.

Warning

In older versions of Mindcode, global variables were intended for program parametrization. In current version, global variables are fully optimized in a way similar to main or local variables and aren't suitable for program parametrization.

Optimization levels

On basic optimization level, the Data Flow Optimization preserves last values assigned to main variables (except main variables that served as a loop control variable in at least one unrolled loop). This protection is employed to allow trying out snippets of code in the web application (which runs on basic optimization level by default) without the optimizer eliminating most or all of the compiled code due to it not having any effect on the Mindustry world.

#set optimization = basic;
x0 = 0.001;
y0 = 0.002;
x1 = x0 * x0 + y0 * y0;
y1 = 2 * x0 * y0;

produces

set x0 0.001
set y0 0.002
op add x1 0.000001 0.000004
op mul y1 0.002 0.002
end

On advanced optimization level, no special protection to main variables is awarded, and they can be completely removed from the resulting code:

#set optimization = advanced;
x0 = 0.001;
y0 = 0.002;
x1 = x0 * x0 + y0 * y0;
y1 = 2 * x0 * y0;

produces an empty program. All the assignment were removed as they wouldn't have any effect on the Mindustry world when run in actual logic processor.

Handling of uninitialized variables

The data flow analysis reveals cases where variables might not be properly initialized, i.e. situations where a value of a variable is read before it is known that some value has been written to the variable. Warnings are generated for each uninitialized variable found.

Since Mindustry Logic executes the code repeatedly while preserving variable values, not initializing a variable might be a valid choice, relying on the fact that all variables are assigned a value of null by Mindustry at the beginning. If you intentionally leave a variable uninitialized, declare the variable using a noinit modifier, which suppresses the warning:

noinit var count;
count++;
print(count);
printflush(message1);

Data Flow Optimization assumes that values assigned to variables detected as uninitialized might be reused on the next program execution. Assignments to uninitialized variables before calling the end() function are therefore protected, while assignments to initialized variables aren't - they will be overwritten on the next program execution anyway:

noinit var initialized;
var foo = rand(10);
if initialized == 0 then
    print("Initializing...");
    // Do some initialization
    initialized = 1;
    foo = 1;
    end();
end;
print("Doing actual work");
print(initialized);
print(foo);

produces this code:

op rand .foo 10 0
jump 5 notEqual .initialized 0
print "Initializing..."
set .initialized 1
end
print "Doing actual work"
print .initialized
print .foo

Notice the initialized = 1 statement is preserved, while foo = 1 is not.

This protection is also applied to assignment to uninitialized variables made before calling a user function which, directly or indirectly, calls or may call the end() function:

#set unreachable-code-elimination = none;
print(foo);
foo = 5;
bar();
foo = 6;
bar();

def bar()
    end();
end;

preserves both assignments to foo:

print :foo
set :foo 5
end
set :foo 6

See also end() function.

Unnecessary assignment elimination

All assignments to variables (except global variables) are inspected and unnecessary assignments are removed. The assignment is unnecessary if the variable is not read after being assigned, or if it is not read before another assignment to the variable is made:

a = rand(10);
a = rand(20);
print(a);
a = rand(30);

compiles to:

op rand :a 20 0
print :a
op rand :a 30 0
end

The first assignment to :a is removed, because :a is not read before another value is assigned to it. The last assignment to :a is preserved on basic optimization level, but would be removed on advanced level, because :a is not read after that assignment at all.

An assignment can also become unnecessary due to other optimizations.

Constant propagation

When a variable is used in an instruction and the value of the variable is known to be a constant value, the variable itself is replaced by the constant value. This can in turn make the original assignment unnecessary. See for example:

#set optimization = advanced;
a = 10;
b = 20;
c = @tick + b;
print($"$a, $b, $c.");

produces

op add :c @tick 20
print "10, 20, "
print :c
print "."

Constant folding

Constant propagation described above ensures that constant values are used instead of variables where possible. When a deterministic operation is performed on constant values (such as addition by the op add instruction), constant folding evaluates the expression and replaces the operation with the resulting value, eliminating an instruction. For example:

#set optimization = advanced;
a = 10;
b = 20;
c = a + b;
print($"$a, $b, $c.");

produces

print "10, 20, 30."

Looks quite spectacular, doesn't it? Here's what happened:

  • The optimizer figured out that variables a and b are not needed, because they only hold a constant value.
  • Then it found out the c = a + b expression has a constant value too.
  • What was left was a sequence of print statements, each printing a constant value. Print Merging optimization on advanced level then merged it all together.

Not every opportunity for constant folding is detected at this moment. While x = 1 + y + 2; is optimized to op add :x :y 3, x = 1 + y + z + 2; it too complex to process as this moment and the constant values of 1 and 2 won't be folded at compile time.

If the result of a constant expression doesn't have a valid mlog representation, or if the mlog representation of the result incurs a precision loss, the optimization is not performed.

Common subexpressions optimization

The Data Flow Optimizer keeps track of expressions that have been evaluated. When the same expression is encountered for a second (third, fourth, ...) time, the result of the last computation is reused instead of evaluating the expression again. In the following code:

a = rand(10);
b = a + 1;
c = 2 * (a + 1);
d = 3 * (a + 1);
print(a, b, c, d);

the optimizer notices that the value a + 1 was assigned to b after it was computed for the first time and reuses it in the subsequent instructions:

op rand :a 10 0
op add :b :a 1
op mul :c 2 :b
op mul :d 3 :b
print :a
print :b
print :c
print :d
end

Again, not every possible opportunity is used. Instructions are not rearranged, for example, even if doing so would allow more evaluations to be reused.

On the other hand, entire complex expressions are reused if they're identical. In the following code

a = rand(10);
b = rand(10);
x = 1 + sqrt(a * a + b * b);
y = 2 + sqrt(a * a + b * b);
print(x, y);

the entire square root is evaluated only once:

op rand :a 10 0
op rand :b 10 0
op mul *tmp2 :a :a
op mul *tmp3 :b :b
op add *tmp4 *tmp2 *tmp3
op sqrt *tmp5 *tmp4 0
op add :x 1 *tmp5
op add :y 2 *tmp5
print :x
print :y
end

Function call optimizations

Variables and expressions passed as arguments to inline functions, as well as return values of inline functions, are processed in the same way as other local variables. Using an inlined function therefore doesn't incur any overhead at all in Mindcode.

Data flow analysis, with some restrictions, is also applied to stackless and recursive function calls. Assignments to global variables inside stackless and recursive functions are tracked and properly handled. Optimizations are applied to function arguments and return values.

Support for other optimizations

Data Flow Optimization gathers information about variables and the values they attain in the program. Apart from using this information for its own optimizations, it also provides it to the other optimizers. The other optimizers can then improve their own optimizations further (e.g. to determine that a conditional jump is always true or always false and act accordingly). Data Flow Optimization is therefore crucial for best performance of some other optimizers as well. Some optimizations, such as Loop Unrolling, might outright require the Data Flow Optimization to be active for their own work.

Loop Hoisting

Loop hoisting is an optimization which tries to identify loop invariant code (i.e. code inside loops which executes identically in each loop iteration) and moves it in front of the loop. This way the code is executed only once, instead of on each loop iteration.

For example, in the following code

param MAX = 10;
for i in 0 ... MAX do
    print(2 * MAX);
end;

the evaluation of 2 * A is moved in front of the loop in the compiled code:

set MAX 10
set :i 0
op mul *tmp1 2 MAX
jump 7 greaterThanEq 0 MAX
print *tmp1
op add :i :i 1
jump 4 lessThan :i MAX
end

A loop condition is processed as well as a loop body, and invariant code in nested loops is hoisted all the way to the top when possible:

param MAX = 10;
for j in 0 ... MAX do
    i = 0;
    while i < MAX + 10 do
        i = i + 1;
        print(i);
    end;
end;

compiles into

set MAX 10
set :j 0
op add *tmp1 MAX 10
jump 11 greaterThanEq 0 MAX
set :i 0
jump 9 greaterThanEq 0 *tmp1
op add :i :i 1
print :i
jump 6 lessThan :i *tmp1
op add :j :j 1
jump 4 lessThan :j MAX
end

On advanced optimization level, Loop Hoisting is capable of handling some if expressions as well:

#set optimization = advanced;
param MAX = 10;
for i in 1 ... MAX do
    k = (MAX % 2 == 0) ? "Even" : "Odd";
    print(i, k);
end;
print("end");

produces

set MAX 10
set :i 1
set *tmp3 "Odd"
op mod *tmp1 MAX 2
jump 6 notEqual *tmp1 0
set *tmp3 "Even"
jump 11 greaterThanEq 1 MAX
print :i
print *tmp3
op add :i :i 1
jump 7 lessThan :i MAX
print "end"

At this moment, the following limitations apply:

  • If the loop contains a stackless or recursive function call, global variables that might be modified by that function call are marked as loop dependent and expressions based on them aren't hoisted, since the compiler must assume the value of the global variable would potentially change inside these functions.
  • if expressions are hoisted only when part of simple expressions. Specifically, when the if expression is nested in a function call (such as print(A < 0 ? "positive" : "negative");), it won't be optimized.

Loop Optimization

The loop optimization improves loops with the condition at the beginning by performing these modifications:

  • If the loop jump condition is invertible, the unconditional jump at the end of the loop to the loop condition is replaced by a conditional jump with inverted loop condition targeting the first instruction of the loop body. This doesn't affect the number of instructions, but executes one less instruction per loop.
    • If the loop condition isn't invertible (that is, the jump condition is ===), the optimization isn't done, since the saved jump would be spent on inverting the condition, and the code size would increase for no benefit at all.
  • If the previous optimization was done and the loop condition is known to be true before the first iteration of the loop, the optimizer removes the jump at the front of the loop. The Loop Optimizer utilizes information gathered by Data Flow Optimization to evaluate the initial loop condition.
  • Loop conditions that are complex expressions spanning several instructions can still be replicated at the end of the loop, if the code generation goal is set to speed (the default setting at the moment). As a result, the code size might actually increase after performing this kind of optimization. See optimization for speed for details on performing these optimizations.

The result of the first two optimizations in the list can be seen here:

param LIMIT = 10;
for i in 0 ... LIMIT do
    cell1[i] = 1;
end;
print("Done.");

produces

set LIMIT 10
set :i 0
jump 6 greaterThanEq 0 LIMIT
write 1 cell1 :i
op add :i :i 1
jump 3 lessThan :i LIMIT
print "Done."
end

Executing the entire loop (including the i variable initialization) takes 32 steps. Without optimization, the loop would require 43 steps. That's quite significant difference, especially for tight loops.

The third modification is demonstrated here:

while switch1.enabled and switch2.enabled do
    print("Doing something.");
end;
print("A switch has been reset.");

which produces:

sensor *tmp0 switch1 @enabled
sensor *tmp1 switch2 @enabled
op land *tmp2 *tmp0 *tmp1
jump 9 equal *tmp2 false
print "Doing something."
sensor *tmp0 switch1 @enabled
sensor *tmp1 switch2 @enabled
op land *tmp2 *tmp0 *tmp1
jump 4 notEqual *tmp2 false
print "A switch has been reset."
end

Loop Unrolling

Loop unrolling is a speed optimization, and as such is only active when the goal option is set to speed or auto. Furthermore, loop unrolling depends on the Data Flow optimization and isn't functional when Data Flow Optimization is not active.

The fundamentals of loop unrolling

Loop unrolling optimization works by replacing loops whose number of iterations can be determined by the compiler with a linear sequence of instructions. This results in speedup of program execution, since the jump instruction representing a condition which terminates the loop, and oftentimes also instruction(s) that change the loop control variable value, can be removed from the unrolled loop and only instructions actually performing the intended work of the loop remain. The optimization is most efficient on loops that are very "tight" - contain very little instructions apart from the loop itself. The most dramatic practical example is probably something like this:

#set loop-unrolling = none;
for i in 0 ... 10 do
    cell1[i] = 0;
end;

This code clears the first ten slots of a memory cell. Without loop unrolling, the code would look like this:

set :i 0
write 0 cell1 :i
op add :i :i 1
jump 1 lessThan :i 10

It takes 31 instruction executions to perform this code. With loop unrolling, though, the code is changed to this:

write 0 cell1 0
write 0 cell1 1
write 0 cell1 2
write 0 cell1 3
write 0 cell1 4
write 0 cell1 5
write 0 cell1 6
write 0 cell1 7
write 0 cell1 8
write 0 cell1 9

The size of the loop is now 10 instructions instead of 4, but it takes just these 10 instructions to execute, instead of the 31 in the previous case, executing three times as fast!

The price for this speedup is the increased number of instructions themselves. Since there's a hard limit of 1000 instructions in a Mindustry Logic program, loops with large number of iterations obviously cannot be unrolled. See speed optimization for an explanation of how Mindcode decides whether to unroll a loop.

Apart from removing the superfluous instructions, loop unrolling also replaces variables with constant values. This can make further optimizations opportunities arise, especially for a Data Flow Optimizer and possibly for others. A not particularly practical, but nonetheless striking example is this program which computes the sum of numbers from 0 to 100:

#set optimization = advanced;
sum = 0;
for i in 0 .. 100 do
    sum += i;
end;
print(sum);

which compiles to:

print 5050
end

What happened here is this: the loop was unrolled to individual instructions in this basic form:

set :sum 0
set :i 0
op add :sum :sum :i
op add :i :i 1 
op add :sum :sum :i
op add :i :i 1 
...

Data Flow Optimization then evaluated all those expressions using the combination of constant propagation and constant folding, all the way into the final sum of 5050, which is then directly printed.

Loop unrolling preconditions

A list iteration loop can be always unrolled, if there's enough instruction space left.

For other loops, unrolling can generally be performed when Mindcode can determine the loop has a certain fixed number of iterations and can infer other properties of the loop, such as a variable that controls the loop iterations. A loop should be eligible for the unrolling when the following conditions are met:

  • The loop is controlled by a single, local or main variable: the loop condition must consist of a variable which is modified inside a loop, and a constant or an effectively constant variable. Loops based on global variables cannot be unrolled.
  • The loop control variable is modified inside the loop only by op instructions which have the loop control variable as a first argument and a constant or an effectively constant variable as a second argument; the op instructions must be deterministic. Any other instruction that sets the value of loop control variable precludes loop unrolling.
  • All modifications of the loop control variable happen directly in the loop body: the variable must not be modified in a nested loop or in an if statement, for example.
  • The loop has a nonzero number of iterations. The upper limit of the number of iterations depends on available instruction space, but generally can never exceed 1000 iterations.

Furthermore:

  • If the optimization level is basic, the loop control variable can only be modified by add and sub operations. Every value the loop control variable attains in individual iterations must be an integer (it means that the starting value, ending value and every change of the iteration variable must be an integer as well). The loop condition must be expressed using one of these operators: >, <, >= or <=. In this mode, the total number of iterations is computed using the starting and ending value of the variable and the change in each iteration (resulting in a fast computation).
  • If the optimization level is advanced, every deterministic update of loop control variable by a constant value and every form of loop condition is allowed. In this case, Mindcode determines the total number of iterations by emulating the entire execution of the loop, until the loop exits or the maximum possible number of iterations allowed by available instruction space is reached (meaning the loop cannot be unrolled).
  • break and continue statements are supported. However, when using a break statement, the entire loop is still unrolled. Possible superfluous code after a break statement might be removed by other optimizers.

Examples of loops that can be unrolled on basic optimization level:

// Basic case
for i in 0 .. 10 do
    cell1[i] = cell1[i + 1];
end;

// Two separate increments of the loop control variable
j = 0;
while j < 20 do
    println(j);
    j += 1;
    println(j);
    j += 2;
end;

// The loop control variable can be used in further expressions
for k in 0 ... 10 do
    cell1[k] = cell1[2 * k];
end;

for l in 0 ... 10 do
    println(l % 2 ? "Odd" : "Even");
end;

// Loop inside an inline function: can be unrolled if a constant value is passed into the size parameter
inline def copy(src, dst, size)
    for i in 0 ... size do
        dst[i] = src[i];
    end;
end;

// This will produce a loop that can be unrolled
copy(cell1, cell2, 10);

// This produces a loop that CANNOT be unrolled: SIZE is not a constant value
param SIZE = 10;
copy(cell1, cell2, SIZE);

// Some loops containing expressions in the condition can still be unrolled,
// but it strongly depends on the structure of the expression
i = 0;
while (i += 1) < 10 do
    print(i);
end;

Examples of loops that can be unrolled on advanced optimization level:

// An operation different from add and sub is supported
for i = 1; i < 100000; i <<= 1 do
    println(i);
end;

// This loop can be unrolled, because it terminates
// (if the condition was j > 0, it would never terminate)
j = 10;
while j > 1 do
    j += 1;
    println(i);
    j \= 2;
    println(i);
end;

// This loop is unrolled, but the number of iterations is 11!
// The code produces the same output as if it wasn't unrolled.
// This is because of rounding errors when evaluating floating-point expressions 
for k = 0; k < 1; k += 0.1 do
    println(k);
end;

Examples of loops that cannot be unrolled:

// LIMIT is a program parameter and as such the value assigned to it isn't considered constant
param LIMIT = 10;
for i in 0 ... LIMIT do
    cell1[i] = 0;
end;

// The loop control variable is changed inside an if
i = 0;
while i < 10 do
    i += 1;
    print(i);
    if i % 5 == 0 then
        i += 1;
        print("Five");
    end;
end;

// There isn't a single loop control variable - loop condition depens on both i and j
for i = 0, j = 10; i < j; i += 1, j -= 1 do
    print(i);
end;

// The expression changing the loop control variable is too complex.
// (Rewriting the assignment to i *= 2; i += 1; would allow unrolling) 
i = 0;
while i < 1000 do
  i = 2 * i + 1;
  print(i);
end;

// This loop won't be unrolled. We know it ends after 5 iterations due to the break statement,
// but Mindcode assumes 2000 iterations, always reaching the instruction limit.  
for i in 0 ... 2000 do
    if i > 5 then break; end;
    print(i);
end;

Unrolling nested loops

Nested loops can also be unrolled, and the optimizer prefers unrolling the inner loop:

k = 0;
for i in 0 ... 100 do
    for j in 0 ... 100 do
        k = k + rand(j); // Prevents collapsing the loop by Data Flow Optimization 
    end;
end;

Both loops are eligible for unrolling at the beginning, and the inner one is chosen. After that the outer loop can no longer be unrolled, because the instruction limit would be hit.

Sometimes unrolling an outer loop can make the inner loop eligible for unrolling too. In this case, the inner loop cannot be unrolled first, as it is not constant:

#set optimization = advanced;
first = true;
for i in 1 .. 5 do
    for j in i .. 5 do
        if first then
            first = false;
        else
            print(" ");
        end;
        print(10 * i + j);
    end;
end;

In this example, the outer loop is unrolled first, after which each copy of the inner loop can be unrolled independently. Further optimizations (including Print Merging) then compact the entire computation into a single print instruction:

print "11 12 13 14 15 22 23 24 25 33 34 35 44 45 55"
end

List iteration loop with modifications

For list iteration loops with modifications, the output loop control variable is completely replaced with the variable assigned to it for the iteration. This helps in some more complicated cases where the Data FLow Optimization alone wasn't able to do the substitution on its own.

Function Inlining

Function Inlining is a speed optimization, and as such is only active when the goal option is set to speed or auto.

The fundamentals of function inlining

Function inlining converts out-of-line function calls into inline function calls. This conversion alone saves a few instructions: storing the return address, jumping to the function body and jumping back at the original address. However, many additional optimizations might be available once a function is inlined, especially if the inlined function call is using constant argument values. In such a situation many other powerful optimizations, such as constant folding or loop unrolling, may become available.

User defined, non-recursive function which is called just once in the entire program, is automatically inlined and this cannot be prevented: such code is always both faster and smaller. It is also possible to declare individual functions using the inline keyword, forcing all calls of such functions to be inlined.

Automatic function inlining

This optimization can inline additional functions that aren't recursive and also aren't declared inline in the source code. If there's enough instruction space, all function calls may be inlined and the original function body removed from the program.

When the optimization level is set to advanced and there isn't enough instruction space, only a single one or several specific function calls may be inlined; in such case the original function body remains in the program and is used by the function calls that weren't inlined. If there are only last two function calls remaining, either both of them, or none of them, will be inlined.

It is therefore no longer necessary to use the inline keyword, except in cases when Mindcode's automatic inlining chooses function different from the one(s) you prefer to be inlined.

Case Switching

Case Switching is a speed optimization, and as such is only active when the goal option is set to speed or auto.

Case expressions are normally compiled to a sequence of conditional jumps: for each when branch the entry condition(s) of that clause is evaluated; when it is false, the control is transferred to the next when branch, and eventually to the else branch or end of the expression. This means the case expression evaluates - on average - half of all existing conditions, assuming even distribution of the input values of the case expression. (If some input values of the case expressions are more frequent, it is possible to achieve better average execution times by placing those values first.)

The Case Switching optimization improves case expressions which branch on integer values of the expression.

Warning

It is assumed that a case statement branching exclusively on integer values always get integer value on input as well. If the input value of the case expression may take on non-integer values, this optimization will produce wrong code. At this moment Mindcode isn't able to recognize such situation; if this is the case, you need to disable the Case Switching optimization manually.

The sequence of conditional jumps in such statements is replaced by a jump table. A jump table facilitates direct jumps to the corresponding when branch. The actual instructions used to build a jump table are

jump <else branch address> lessThan value minimal_when_value
jump <else branch address> greaterThan value maximal_when_value
op add @counter <start_of_jump_table - minimal_when_value> value
start_of_jump_table:
jump <when branch for minimal when value address>
jump <when branch for minimal when value + 1 address>
jump <when branch for minimal when value + 2 address>
...
jump <when branch for maximal when value address>

The jump table is put in front of the when branches. Original conditions in front of each processed when branch are removed. Each when branch jumps to the end of the case expression as usual.

To build the jump table, the minimum and maximum value of existing when branches are determined first. Values outside this range are handled by the else branch (if there isn't an explicit else branch in the case statement, the else branch just jumps to the end of the case expression). Values inside this range are mapped to a particular when branch, or, if the value doesn't correspond to any of the when branches, to the else branch.

The first two instructions in the example above (jump lessThan, jump greaterThan) handle the cases where the input value lies outside the range supported by the jump table. The op add @counter instruction then transfers the control to the corresponding specific jump in the jump table and consequently to the proper when branch.

The jump table executes at most four instructions on each case expression execution (less if the input value lies outside the supported range). We've mentioned above that the original case statement executes half of the conditional jumps on average. This means that converting the case expression to a jump table only makes sense when there's more than 8 conditional jumps in the case expression.

Notes:

  • If you put the more frequent values first in the case expression, and the value distribution is very skewed, converting the case expression to the jump table might actually worsen the average execution time. Mindcode has no way to figure this on its own; if you encounter this situation, you might need to disable the Case Switching optimization for your program.
  • If the input value of the case expression could be determined by Mindcode to already lie in a specific range, it would be possible to avoid the jump lessThan and/or jump greaterThan instructions at the beginning of the jump table, as their function is to ensure the value lies in a proper range. This would provide a potentially significant additional speedup and is planned for some future version.
  • Currently, there's no limit on the size of the jump table. For a case expression handling values 1 to 10 and then a value of 100, the jump table would have 100 entries. This affects computation of the optimization's benefit, and might make the optimization less favorable compared to other optimizations; however if available space permits it, such a jump table would be created.

Preconditions

The following conditions must be met for a case expression to be processed by this optimization:

  • All values used in when clauses must be effectively constant.
  • All values used in when clauses must be integers.
  • Values used in when clauses must be unique.
  • There are no ranges in the when clauses.

Example

value = floor(rand(20));
text = case value
    when 0 then "None";
    when 1 then "One";
    when 2, 3, 4 then "A few";
    when 5, 6, 7, 8 then "Many";
    when 10 then "Ten";
    else "I don't known this number!";
end;
print(text);

The above case expression is transformed to this:

op rand *tmp0 20 0
op floor :value *tmp0 0
jump 26 lessThan :value 0
jump 26 greaterThan :value 10
op add @counter :value 5
jump 16 always 0 0
jump 18 always 0 0
jump 20 always 0 0
jump 20 always 0 0
jump 20 always 0 0
jump 22 always 0 0
jump 22 always 0 0
jump 22 always 0 0
jump 22 always 0 0
jump 26 always 0 0
jump 24 always 0 0
set *tmp2 "None"
jump 27 always 0 0
set *tmp2 "One"
jump 27 always 0 0
set *tmp2 "A few"
jump 27 always 0 0
set *tmp2 "Many"
jump 27 always 0 0
set *tmp2 "Ten"
jump 27 always 0 0
set *tmp2 "I don't known this number!"
set :text *tmp2
print *tmp2
end

Return Optimization

Return Optimization is a speed optimization, and as such is only active when the goal option is set to speed or auto.

The Return Optimization is very simple. Whenever there's an unconditional jump to the final sequence of instructions representing a return from the call (which is always three instructions long), the jump is replaced by the entire return sequence. The jump execution is avoided at the price of two additional instructions.

The impact of this optimization is probably marginal. Recursive functions are of limited use by themselves, and this optimization only applies in a rather specific contexts.

Jump Straightening

This optimization detects situations where a conditional jump skips a following, unconditional one and replaces it with a single conditional jump with a reversed condition and a target of the second jump. Example:

jump *label0 equal *tmp9 false
jump *label1
label *label0

will be turned to

jump *label1 notEqual *tmp9 false

Optimization won't be done if the condition does not have an inverse (strictEqual).

These sequences of instructions may arise when using the break or continue statements:

while true do
    ...
    if not_alive then
        break;
    end;
end;

Jump Threading

If a jump (conditional or unconditional) targets an unconditional jump, the target of the first jump is redirected to the target of the second jump, repeated until the end of jump chain is reached. Moreover:

  • on advanced level, end instruction is handled identically to jump 0 always,
  • conditional jumps in the jump chain are followed if:
    • their condition is identical to the condition of the first jump, and
    • the condition arguments do not contain a volatile variable (@time, @tick, @counter etc.).
  • unconditional jumps targeting an indirect jump (a set @counter instruction, internally represented by a virtual goto instruction) are replaced with the indirect jump itself.

No instructions are removed or added, but the execution of the code is faster.

Unreachable Code Elimination

This optimizer removes instructions that are unreachable. There are several ways unreachable instructions might appear:

  1. Jump Threading can create unreachable jumps that are no longer targeted.
  2. User-created unreachable regions, such as while false ... end, or code following a while true loop.
  3. User defined functions which are called from an unreachable region.

Instruction removal is done by analyzing the control flow of the program and removing instructions that are never executed. When Jump Normalization is not active, some section of unreachable code may not be recognized.

The end instruction, even when not reachable, is not removed unless the optimization level is advanced. Main body program and function definitions are each terminated by the end instruction and removing it might make the produced code somewhat less readable.

Stack Optimization

Optimizes the stack usage -- eliminates push/pop instruction pairs determined to be unnecessary. The following optimizations are performed:

  • push/pop instruction elimination for function variables that are not used anywhere apart from the push/pop instructions. This happens when variables are eliminated by other optimizations. The optimization is done globally, in a single pass across the entire program.
  • push/pop instruction elimination for function variables that are read neither by any instruction between the call instruction and the end of the function, nor by any instruction which is part of the same loop as the call instruction. Implicit reads by recursive calls to the same function with the value of the parameter unchanged are also detected.
  • push/pop instruction elimination for function parameters/variables that are never modified within the function.

Print Merging

This optimization merges together print instructions with string literal arguments. The print instructions will get merged even if they aren't consecutive, assuming there aren't instructions that could break the print sequence (jump, label or print <variable>).

Effectively, this optimization eliminates a print instruction by turning this

println("Items: ", items);
println("Time: ", @time);

into this:

print("Items: ", items);
print("\nTime: ", @time "\n");

On advanced level, all constant values - not just string constants - are merged. For example:

#set optimization = advanced;
const MAX_VALUE = 10;
print($"Step $i of $MAX_VALUE\n");

produces

print "Step "
print i
print " of 10\n"

On basic optimization level, the output would be:

print "Step "
print i
print " of "
print 10
print "\n"

The format instruction

The format instruction included in Mindustry Logic 8 allows more efficient optimizations to be done. For example, println($"Minimum: $min, middle: $mid, maximum: $max"); without using the format instruction gets compiled to this:

print "Minimum: "
print :min
print ", middle: "
print :mid
print ", maximum: "
print :max
print "\n"
end

The optimization utilizing format saves three instructions by producing

print "Minimum: {0}, middle: {0}, maximum: {0}\n"
format :min
format :mid
format :max

The format instruction is used in optimization when these conditions are met:

  • The language target supports the format instruction (8A or later)
  • The compiled code entering this optimization contains neither a string literal containing a {0} placeholder, nor any other substrings that could produce the {0} in a text buffer (for example, print("{{1}}"); format("0"); produces {0} in the text buffer and disables this optimization).

If the {0} placeholder is avoided, the formatting mechanism can be used freely in the code without any limitations and the print merging optimization with the format instruction can still happen. Just use placeholders starting at {1}:

#set target = 8;
param a = 10;               // prevent a from being propagated as a constant
println("{2} {1}");         // if you use "{0} {1}" instead - different optimization will happen 
format("Before");
println($"Value: $a");
format("After");

This program will compile to

set a 10
print "{2} {1}\n"
format "Before"
print "Value: {0}\n"
format a
format "After"
end

and will output

After Before
Value: 10

String length limit

On basic level, the optimization won't merge print instructions if the merge would produce a string longer than 34 characters (36 when counting the double quotes). On advanced level, such instructions will be merged regardless. This can create long string constants, but according to our tests these can be pasted into Mindustry processors even if they're longer than what the Mindustry GUI allows to enter.

Remarks

The print merging optimization will also merge print instructions generated by the remark() function. Instructions generated by remarks are merged differently to standard print instructions:

  • print instructions generated from different remark() function calls are never merged. Only instructions generated from a single remark() are merged together.
  • All constant values are merged together regardless of the resulting string length, even on the basic optimization level.

If the print merging optimization is not active, instructions from remark() functions aren't merged.

Optimization for speed

In some cases, it is possible to rearrange the code in such a way that, even though it consist of more instructions, it actually executes faster. An example of this kind of optimization is function call inlining: instead of having a single copy of the function called from various places, the entire code of the function can be replicated at the places it is called from. The speedup stems from the fact that passing arguments to the function, storing the return address and later returning back from the function can be avoided this way, while the code size increase is a consequence of having two or more copies of the function instead of just one.

Mindustry processors limit the total number of instructions in a program to a thousand. If the compiled code exceeds this limit, it is unusable, but if the compiled code is smaller than the limit, the unused instruction space brings no benefit at all. The best approach therefore is to use as much of the existing instruction space without exceeding the limit to generate faster code where possible.

Mindcode provides the goal option to specify whether Mindcode should generate code that is smaller (the goal is size) or a code that is faster (speed). There's a third option - auto - which is the default for the compiler and currently is identical to the speed option. When the goal is set to size, only optimizations with a zero or negative cost are performed and no statistics about speed optimizations is displayed.

Another setting affecting the speed optimizations is the instruction-limit option. It allows to change the instruction limit considered by the optimizer.

When generating the code, there might be several opportunities for speed optimizations. It is possible that by applying all of them the maximal number of instructions would be exceeded. Mindcode follows the following process to choose the optimizations that will be realized:

  • At certain point in the optimization process, all opportunities for speed optimizations are analyzed and those that can be realized in the remaining instruction space are gathered.
  • For each such possible optimization, the following quantities are computed:
    • cost: the number of additional instructions that will be created by the optimization,
    • benefit: a measure of execution improvement achieved by the optimization,
    • efficiency: the efficiency of the optimization is computed by dividing the benefit by cost, obtaining a measure of program improvement per additional instruction of code.
  • The optimization with the highest efficiency is applied. If several optimizations have the same efficiency, the one with the smallest cost is applied.
  • The entire process is repeated from start until there are no optimization possibilities left, or none of the optimizations is permissible given available instruction space.

In short, as many as possible optimizations are applied in the order from one giving best returns to the one giving worst return. Different strategies for choosing optimizations might be possible (for example, instead of realizing one large optimization, it might be better to apply two smaller ones), but at this point these different arrangements aren't considered.

Oftentimes, an optimization being applied might open up opportunities for further optimizations (in some cases it might be possible to unroll inner loop after unrolling the outer loop, for example). Since all remaining optimizations are considered after applying the most efficient one, Mindcode will automatically discover and process these additional optimizations, in the order described above.

It's quite obvious that calculating the benefit is a key part of the optimization evaluation. Avoiding an instruction that is executed a lot provides more benefit that avoiding one that is executed just once (in Mindustry, each instruction takes the same time to execute, simplifying the things greatly). Mindcode therefore assigns each instruction a weight, a measure of how often the instruction is expected to be executed. In general, it is impossible to compute this number precisely. Mindcode uses a very simple algorithm to determine instruction weights:

  • At the beginning of code generation, current weight is established:
    • 1 for the main program,
    • number of places a function is called from for out-of-line or recursive functions.
  • Each generated instruction is assigned the current weight.
  • When entering a branch of an if statement, the current weight is halved. It is restored when exiting the branch. This corresponds to a very simplistic expectation that the condition will be evaluated to true 50% of the time, and therefore a branch in the if statement will be executed only half as often as the surrounding code.
  • When entering a when branch of a case expression, the weight is divided by the number of branches in the expression. The reasoning (and inaccuracy) is analogous to the if statement approximation described above.
  • When entering a loop body, the weight is multiplied by 25. The implicit expectation is that a loop will iterate 25 times on average (which is certainly wrong). Even when the number of iterations is known at the compilation time, the weight adjustment is the same for all loops. This is to prevent Mindcode from preferring optimizations of loops with large, known number of iterations (such as setting all values of memory bank to 0) to other loops whose number of iterations cannot be known.
  • Weight of stackless and recursive functions is adjusted:
    • Stackless function weights are iteratively recomputed as a total weight of the respective CALL instructions of the given function.
    • Recursive function weights are then computed as a total weight of the respective CALLREC instructions of the given function. No iterative updating is made for recursive functions.
  • When instructions are duplicated or moved during optimizations, their weights are adjusted according to the context into which they were duplicated or moved.

The benefit of an optimization is then computed as the sum of weights of instructions that would be avoided thanks to the optimization. The net result is that Mindcode strongly prefers optimizing code inside loops, and defers optimizations inside the branches of if and case statements.


« Previous: Advanced features   |   Up: Contents   |   Next: Function reference for Mindustry Logic 6 »