Write assembly language applications for the EVM (and EVM-enabled blockchains) using Javascript!
evmscript
is a method of writing "bare-metal" assembly applications for the Ethereum Virtual Machine (EVM) using Javascript. evmscript
takes advantage of the Javascript interpreter so that you can write commented, multi-line assembly without a hex editor, with all the preprocessing and helper functions you could ever ask for.
$("deployable", true)
const TIMES = 5;
push(0) // start index at 0
mainloop =
// ... do something here, TIMES times ...
// Now check counter
push(1) // Add 1 to the index
add()
dup1() // save a copy for the next operation
push(TIMES)
gt() // TIMES > index ? push(1); push(0)
jumpi(mainloop)
stop()
Or this:
$("deployable", true)
// What is the meaning of life, the universe, and everything,
// and the return value of this code?
;[$value] = push(1)
;[$counter] = push(1)
loop =
add($value, mul(sub($counter, 1), $counter))
set($value)
set($counter, add($counter, 1))
jumpi(loop, lt($counter, 6))
set($value, add($value, 1))
allocStack($value)
ret()
It all started because I wanted a "jump map" (at least, that's what I'm calling it), which can best be thought of as an array of jump positions that can be referenced by their index. For gas effeciency reasons, I needed each lookup/jump to have constant gas costs. Here's an example of a jump map, written in evmscript
of course:
// Imagine the following is loaded into memory at 0x80:
//
// 00A100B700CF00D100E9...
// ^ ^ ^ ^ ^--- 4th jump destination
// | | | -------- 3rd jump destination
// | | ------------ 2nd jump destination
// | ---------------- 1st jump destination
// -------------------- 0th jump destination
//
// Notice every jump destination is two bytes.
// With that in memory, you could then do the following:
push(4) // 1. Load first call data slot past solidity 4-byte hash.
calldataload() // This will load the index passed in. It's 2 bytes per
push(0x2) // index, so take that into account.
mul() //
push(0x80) // 2. Calculate the offset for the index based on start of the map.
add()
mload() // 3. Load 32 bytes from the index offset onto the stack
push(240) // and get rid of the 30 we don't need (bitwise shift).
shr() //
jump() // And 4. Viola! Jump to the desired location.
Here we jump to the first index found in the call data, but theoretically we could stick this in a loop and process many lookups/jumps.
Of course I didn't intend to create my own EVM scripting mechanism, but Yul, the most used EVM assembly language baked directly into Solidity, didn't work for me (described below), and there was no way I was going to write straight assembly. Ain't nobody got time for that, especially in a project where needs can change rapidly.
Yul didn't work specifically because of this quote (emphasis mine):
[...] Yul provides high-level constructs like for loops, if and switch statements and function calls. These should be sufficient for adequately representing the control flow for assembly programs. Therefore, no explicit statements for SWAP, DUP, JUMPDEST, JUMP and JUMPI are provided, because the first two obfuscate the data flow and the last two obfuscate control flow.
I disagree with the author on that last point, because (somewhat pedantically) control flow and data flow aren't actually obfuscated in assembly. It's just confusing! Assembly language is by definition confusing to read and understand, and anything we can do simplify things can be a huge boon to program authors. Yul chooses to simplify things via familiar if
, switch
, and loop statements, which is definitely helpful but can be limiting in some situations, as mentioned.
When looking for an alternative beyond Yul, what I quickly realized was that writing straight assembly with the addition of helpful preprocessing functions can make the task a lot easier. I originally set out to solve my needs for a jump map, and after piggy-backing on Javascript's interpreter for translation, the features just kept coming. evmscript
gives you:
- A generalized
push()
function, which choosed the correctPUSH*
instruction based on input - Named jump destinations, and
jump()
andjumpi()
functions that will calculate the correct jump destination based on the name passed in. As well, named code locations that can be pointed to and resolved during preprocessing (see our own deployer). - Helpful common routines like
alloc()
that loads arbitrarily-sized data into memory (andallocUnsafe()
, which can do it more cheaply but it comes with some safety tradeoffs). Bothalloc()
andallocUnsafe()
pair well withret()
andrevert()
, teeing up the stack to so you can use those functions directly afterward -- no futzing with pushing large pieces of data, data lengths, and byte shifting. Oh, and do you want to get out quick with no revert message? Usebail()
! - Preprocessing-specific functions that can help you craft the right data, such as
$concat()
which concatenates hex values;$bytelen()
which calculates the byte length of arbitrary input;$hex()
which can turn string literals into their hexadecimal representation; and$ptr()
which will give you the bytecode-location of any named code location. - Helpers like
$("deployable", true)
, which add the necessary bytecode to ensure your code creates a deployable contract. - Ability to define constants and variables within your code, as well as your own preprocessing functions to greatly reduce copy/paste.
- Wield the full power of Javascript and Node by using environment variables, importing external helper functions, and even using
require()
to import any libraries you might need.
It's just Javascript! We use the features provided by Node's vm
module -- effectively a dressed up eval()
-- to bootstrap your code with functions you need. These functions, like push()
, build your program's bytecode as they're executed, and give you the ability to do as much preprocessing as you need.
First install it:
$ npm install evmscript
Then, in your code:
import {preprocess, preprocessFile} from "evmscript";
// Preprocess a file
let bytecode = preprocessFile("./path/to/file.bytecode", {
// ... constants/functions to inject in your code
});
// Or, preprocess code directly
let bytecode = preprocess(code, {
// ... constants/functions to inject in your code
})
// Then use that bytecode in a transaction. This example uses
// the `ethers` package. We also assume your bytecode is set
// to deployable.
// Set up ethers
let provider = new ethers.providers.JsonRpcProvider();
let signer = provider.getSigner();
// Deploy the bytecode
let tx = await signer.sendTransaction({
data: bytecode
})
// Get a receipt so we can record the deployed contract address
let receipt = await provider.getTransactionReceipt(tx.hash);
// Now let's call our code using ethers.
// Here we make up an ABI that will create the call data our bytecode expects.
// We generally recommend you follow Solidity conventions with
// calldata since almost all contracts are written in Solidity.
let contract = new ethers.Contract(receipt.contractAddress, [
"function myFunc() public pure returns(uint)" // Note: This is dependent on your code
], signer);
let value:ethers.BigNumber = await contract.myFunc();
// Then do something with the value!
TODO. For now, see the tests.
Note for TODO: We should show here how to use Javascript niceties, higher level examples, environment variables, injected variables, etc., and not just function definitions.
TODO: Fill this out.
TODO: Fill this out.
Keep stack references within scope. Be careful of using stack references out of scope (see first point)
TODO: Fill this out.
The following lists the helper functions available for preprocessing. Each function contains a type, which is either an Action
or Expression
, and a definition that describes how it manipulates the stack. Here's how to read this:
Action
: Adds instructions to the code. Also can manipulate the stack, consuming stack data and/or pushing new values.Expression
: Does not add instructions to the code nor manipulate the stack. Used solely during preprocessing. Expression functions start with$
.
Read the stack definition like the following:
[consumed stack, ...], <function> => [items added to the stack, ...]
In these defintiions, the left most data in stack arrays []
is the stack top.
Pushes an arbitrary amount of data (1 to 32 bytes) to the stack. This function will automatically determine the correct PUSH*
instruction to use based on input data.
Definition: Action: [], push(input:HexableValue) => [input, ...]
Input can be of type number
, BigInt
, 0x-prefixed string
, or a code pointer.
push(5) // number
push(0x1) // hex number
push("0x10101010") // 0x-prefixed string
push(128n) // bigint
push($ptr("somelabel")) // Push a code pointer to "somelabel" (see jump())
// Will be translated to the correct code location
// during preprocessing.
This function is exactly like push()
, except that it enforces the byte length of the input and errors if the input does not equal the expected length.
Available values of X
range from push1()
to push32()
.
Definition: Action: [], push<X>(input:HexableValue) => [input, ...]
push4(0x01020304) // Valid!
push5(0x01020304) // Errors! Expects 5 bytes.
Accept named solidity types, like uint
and bytes
that represent the expected call data, and push the values and offsets to the stack such that the first item in the call data is at the top of the stack.
Definition: Action: [], pushCallDataOffsets(item1, ..., itemN) => [(offset1, length1)|value1, ..., (offsetN, lengthN)|valueN]
Currently supported types (more to come):
uint
bytes
For non-dynamic types, like uint
, this function pushes [value, ...]
to the stack, since there's no data offset or length necessary. For dynamic types like bytes
, this function pushes [offset, length, ...]
instead. Note that this function does not load the data of dynamic types - that's up to you and your application. As well, the offset returned for dynamic types is a calldata offset, and not a memory offset, so do not use ret()
or revert()
directly afterward.
pushCallDataOffsets("uint", "bytes")
// For the calldata representing (5, "0x123456"), you should expect the following stack:
//
// 0: 000000...000005 (top) // Value of 5
// 1: 000000...000064 // Calldata offset of bytes data
// 2: 000000...000003 // Length of bytes data
See the integration tests for an example.
Allocate a preprocessed value in memory using a series of push()
's and mstore()
's. input
can be of arbitrary length. Will push the memory offset and the input length to the stack for use after allocation.
Definition: Action: [], alloc(input:HexableValue) => [offset, length, ...]
// This example allocates an ABI-encoded string that'll be consumed
// by revert() as a revert reason string (triggered by 0x08c379a0).
// This is only an example - you don't need to do this yourself!
// See revert() definition.
let ERROR_MESSAGE = $hex("This is an error message!");
alloc(
$concat(
"0x08c379a0",
$pad(0x20, 32),
$pad($bytelen(ERROR_MESSAGE), 32, "right"),
ERROR_MESSAGE
)
)
revert()
This function does the exact same thing as alloc()
, except it does so by inserting the input into the resultant bytecode and then using CODECOPY
to get the data into memory. This is considered unsafe because the input to allocUnsafe()
will be seen by the EVM as runnable code. Although allocUnsafe()
will never run the input as code, it is technically possible to jump to any JUMPDEST
's contained within the input. allocUnsafe()
has cheaper gas costs than alloc()
. When in doubt, use alloc()
.
Definition: Action: [], alloc(input:HexableValue) => [offset, length, ...]
// This is an example deployment script using allocUnsafe().
// Assume CODE is the code being deployed.
assertNonPayable() // The deployment shouldn't be payable
allocUnsafe(CODE)
// RETURN has everything it needs due to stack output of allocUnsafe()
ret()
Consume and store the top N items from the stack into memory. This will push [offset, length, ...]
of the memory stored to the stack.
Defintion: Action: [val1, ..., valN, ...], allocStack(input:number) => [offset, length, ...]
// Example that returns 3 words: 00..03, 00..02, 00..01
push(1)
push(2)
push(3)
allocStack(3)
ret()
Jump to the code location defined by input
. If none is passed, this function will act as a normal JUMP
instruction.
The jump()
function can take arbitrary data as input, as well as code pointers. Use $ptr()
to jump to named code locations when the location variable hasn't yet been defined during execution of the preprocessor (e.g., when the code location is lower in the code than the jump instruction). You can use the variable itself after it has been defined. See somelabel
in the example below.
Definitions:
- Using code location:
Action: [], jump(input:HexableValue) => []
- Normal jump:
Action: [location, ...] jump() => []
// Jump using a pointer to "somelabel"
jump($ptr("somelabel"))
somelabel =
// ... some code ...
// Jump using variable reference
// Remember, this is Javascript. Variable has now been defined.
jump(somelabel)
// Normal jump
push(0x55)
jump() // Pulls 0x55 from stack
This works exactly like jump()
except that this function expects an added conditional value on the stack.
Definitions:
- Using code location:
Action: [conditional], jumpi(input:HexableValue) => []
- Normal jump:
Action: [location, conditional, ...] jumpi() => []
// Loop example:
push(0) // our index variable
mainloop =
// ... do something, then check counter
push(1) // add 1 to the index
add()
push(5) // push 5 for comparison
dup2() // copy the index
eq() // consume 5 index and copy; push 1 if equal, 0 if not
jumpi(mainloop) // jump to mainloop if top of the stack is 1
stop()
Evaluate the function signature passed within the call data (the first four bytes) and if that value matches a function within the record, jump to the pointer associated with with that function.
This is best shown by example, so see below. That said, solidityFunctionSignature
is the string representation of the function, and pointerObject
is the return value of $ptr()
.
Note that dispatch()
will not revert if the signature found in calldata doesn't match one provided.
Definition: Action: [], dispatch(mapping:Record<solidityFunctionSignature, pointerObject>) => []
dispatch({
"setOwner(address)": $ptr("setOwner"),
"getOwner()": $ptr("getOwner")
})
setOwner =
pushCallDataOffsets()
// do something...
getOwner =
// do something...
This function directly inserts an arbitrary value into the code. It does not check if the inserted value is valid code. This is used under the hood within allocUnsafe()
, and contains the same safety risks. Use with caution.
Definition: Action: [], insert(input:HexableValue) => []
push(0xAA)
insert("0x6001") // Insert 6001 directly into the code
push(0xBB)
// When run, the above code results in the following bytecode:
//
// 0x60AA600160BB
// ^^^^ -> Directly inserted
// ^^^^ ^^^^ -> Filler for example purposes
Revert, causing the transaction to error. When input
is defined, revert()
will construct an ABI-encoded revert reason string using the input
that external callers can underestand. When no input
is given, this function is treated as a normal REVERT
instruction.
Definitions:
- When reason string is passed:
Action: [], revert(input:HexableValue) => []
- Normal revert:
Action: [offset, length, ...] revert() => []
// Will send "Bad input!" as the revert reason string
revert($hex("Bad input!"))
// Normal revert, passing no data
push(0)
dup(1)
revert()
This is a simple helper that reverts with no reason string. From a gas perspective, this is the cheapest revert available.
Definition: Action: [], bail() => []
bail() // revert!
// bail() is equivalent to:
push(0)
dup(1)
revert()
Will revert if Ether has been sent to the transaction. If input
is passed, it'll be used as the revert reason string. See revert()
.
Definitions:
Action: [], assertNonPayable(input:Hexable) => []
Action: [], assertNonPayable() => []
assertNonPayable("Do not send Ether!");
// This is equivalent to:
callvalue() // Jump to allgood if no Ether
iszero()
jumpi($ptr("allgood"))
revert("Do not send Ether!") // Ether passed? Revert!
allgood =
// ... continue onward
Both shr()
and shl()
are helpers for the right and left shift operations, respectively. When no value is passed they act like normal SHR
and SHL
instructions, consuming data from the stack. When a value is passed, the passed value is pushed to the stack and then used normally.
These function accept a value purely for readability and reducing lines of code.
Definitions:
Action: [shift amount, value, ...] shr() / shl() => [shifted value, ...]
Action: [value, ...] shr(input:HexableValue) / shl(input:HexableValue) => [shifted value, ...]
// The following are equivalent
// 1. Normal SHR
push(1) // value
push(8) // shift by 1 byte (8 bits)
shr()
// 2. Helper version
push(1) // value
shr(8) // shift by 1 byte (8 bits)
This is a helper function for loading specific data from calldata. When no value for offset
is passed, this function acts like a normal CALLDATALOAD
instruction. However, when offset
is passed, calldataload()
will load 32 bytes of data from calldata starting at the passed offset. When length is set and is less than 32, this function will shift the loaded data to include only the bytes from offset
to offset + length
. The resultant data is left padded. See example.
Definition:
Action: [offset], calldataload() => [msg.data[offset]..msg.data[offset+32], ...]
Action: [], calldataload(offset) => [msg.data[offset]..msg.data[offset+32], ...]
Action: [], calldataload(offset, length) => [msg.data[offset]..msg.data[offset+length], ...]
// Load the solidity 4-byte function signature (note length of 4)
calldataload(0, 4)
// Stack:
// 0: 00000000...1b22f9c5 (top)
// Know there's a uint as the first param? Load it yourself
calldataload(4)
// Stack now:
// 0: 00000000...00000005 (top)
// 1: 00000000...1b22f9c5 // Function signature previously loaded
Set up:
$ git clone [email protected]:tcoulter/evmscript.git
$ cd evmscript
$ npm install
Run tests:
$ npm test
All code is written in Typescript, including the tests, save for our own contract deployer that's written directly in evmscript
!
- ethervm: EVM assembly reference and decompiler. https://ethervm.io/