Test accuracy and speed of different function-signature and arguments extractors
For results, refer to the main README.md.
- Get N Etherscan-verified contracts, save the bytecode and ABI to
datasets/NAME/ADDR.json
. - Extract function signatures/arguments from the bytecode. Each tool runs inside a Docker container and is limited to 1 CPU (see
providers/NAME
andMakefile
). - Assume selectors and arguments from Etherscan's ABI as ground truth.
- Compare the results with it and count False Positives and False Negatives for signatures and count correct results (strings equal) for arguments list.
Set the performance mode using sudo cpupower frequency-set -g performance
and run make benchmark-selectors
or make benchmark-arguments
(GNU Make) inside the benchmark/
directory.
To use Podman instead of Docker: DOCKER=podman make benchmark-selectors
You can run only specific step; for example:
# Only build docker-images
$ make build
# Only run tests for selectors (assume that docker-images are already built)
$ make run-selectors
# Build `etherscan` docker image
$ make etherscan.build
# Run `etherscan` on dataset `largest1k` to extract function selectors
$ make etherscan.selectors/largest1k
# Run `etherscan` on dataset `largest1k` to extract function arguments
$ make etherscan.arguments/largest1k
To process results run compare.py
:
# default mode: compare 'selectors' results
$ python3 compare.py
# compare 'arguments' results
$ python3 compare.py --mode=arguments
# compare 'arguments' results for specified providers and datasets, show errors
$ python3 compare.py --mode=arguments --datasets largest1k --providers etherscan evmole-py --show-errors
# compare in web-browser
$ ../.venv/bin/python3 compare.py --web-listen 127.0.0.1:8080
-
Find all solidity contracts:
$ cd smart-contract-sanctuary/ethereum/contracts/mainnet/
# (contract_size_in_bytes) (contract_file_path)
$ find ./ -name "*.sol" -printf "%s %p\n" > all.txt
- Get ~1200 largest (by size) contracts:
$ cat all.txt | sort -rn | head -n 1200 | cut -d'/' -f3 | cut -d'_' -f1 > top.txt
- Get ~55.000 random contracts
$ cat all.txt | cut -d'/' -f3 | cut -d'_' -f1 | sort -u | shuf | head -n 55000 > random.txt
- Get all vyper contracts:
$ find ./ -type f -name '*.vy' | cut -d'/' -f3 | cut -d'_' -f1 > vyper.txt
- Download contracts code & abi:
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=top.txt --out-dir=datasets/largest1k --limit=1000 --code-regexp='^0x(?!73).'
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=random.txt --out-dir=datasets/random50k --limit=50000 --code-regexp='^0x(?!73).'
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=vyper.txt --out-dir=datasets/vyper --code-regexp='^0x(?!73).'
We use --code-regexp='^0x(?!73).'
to:
- Skip contract with empty code (
{"code": "0x",
) - these are self-destructed contracts. - Skip contract with code starting from
0x73
(PUSH20
opcode). Compiled Solidity libraries begins with this code, and because Non-storage structs are referred to by their fully qualified name it's not yet supported by our reference Etherscan extractor (providers/etherscan
). This issue may be fixed later.