Skip to content

Latest commit

 

History

History
 
 

benchmark

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Benchmarks

Test accuracy and speed of different function-signature and arguments extractors

For results, refer to the main README.md.

Methodology

  1. Get N Etherscan-verified contracts, save the bytecode and ABI to datasets/NAME/ADDR.json.
  2. Extract function signatures/arguments from the bytecode. Each tool runs inside a Docker container and is limited to 1 CPU (see providers/NAME and Makefile).
  3. Assume selectors and arguments from Etherscan's ABI as ground truth.
  4. Compare the results with it and count False Positives and False Negatives for signatures and count correct results (strings equal) for arguments list.

Reproduce

Set the performance mode using sudo cpupower frequency-set -g performance and run make benchmark-selectors or make benchmark-arguments (GNU Make) inside the benchmark/ directory.

To use Podman instead of Docker: DOCKER=podman make benchmark-selectors

You can run only specific step; for example:

# Only build docker-images
$ make build

# Only run tests for selectors (assume that docker-images are already built)
$ make run-selectors

# Build `etherscan` docker image
$ make etherscan.build

# Run `etherscan` on dataset `largest1k` to extract function selectors
$ make etherscan.selectors/largest1k

# Run `etherscan` on dataset `largest1k` to extract function arguments
$ make etherscan.arguments/largest1k

To process results run compare.py:

# default mode: compare 'selectors' results
$ python3 compare.py

# compare 'arguments' results
$ python3 compare.py --mode=arguments

# compare 'arguments' results for specified providers and datasets, show errors
$ python3 compare.py --mode=arguments --datasets largest1k --providers etherscan evmole-py --show-errors

# compare in web-browser
$ ../.venv/bin/python3 compare.py --web-listen 127.0.0.1:8080 

How datasets/ was constructed

  1. Clone tintinweb/smart-contract-sanctuary

  2. Find all solidity contracts:

$ cd smart-contract-sanctuary/ethereum/contracts/mainnet/

# (contract_size_in_bytes) (contract_file_path)
$ find ./ -name "*.sol" -printf "%s %p\n" > all.txt
  1. Get ~1200 largest (by size) contracts:
$ cat all.txt | sort -rn | head -n 1200 | cut -d'/' -f3 | cut -d'_' -f1 > top.txt
  1. Get ~55.000 random contracts
$ cat all.txt | cut -d'/' -f3 | cut -d'_' -f1 | sort -u | shuf | head -n 55000 > random.txt
  1. Get all vyper contracts:
$ find ./ -type f -name '*.vy' | cut -d'/' -f3 | cut -d'_' -f1 > vyper.txt
  1. Download contracts code & abi:
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=top.txt --out-dir=datasets/largest1k --limit=1000 --code-regexp='^0x(?!73).'
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=random.txt --out-dir=datasets/random50k --limit=50000 --code-regexp='^0x(?!73).'
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=vyper.txt --out-dir=datasets/vyper --code-regexp='^0x(?!73).'

We use --code-regexp='^0x(?!73).' to:

  1. Skip contract with empty code ({"code": "0x",) - these are self-destructed contracts.
  2. Skip contract with code starting from 0x73 (PUSH20 opcode). Compiled Solidity libraries begins with this code, and because Non-storage structs are referred to by their fully qualified name it's not yet supported by our reference Etherscan extractor (providers/etherscan). This issue may be fixed later.