Skip to content

Latest commit

 

History

History
 
 

fault-detector

@eth-optimism/fault-detector

codecov

The fault-detector is a simple service for detecting discrepancies between your local view of the Optimism network and the L2 output proposals published to Ethereum.

Installation

Clone, install, and build the Optimism monorepo:

git clone https://github.com/ethereum-optimism/optimism.git
yarn install
yarn build

Running the service

Copy .env.example into a new file named .env, then set the environment variables listed there. Additional env setting are listed on --help. If running the fault detector against a custom op chain, the necessary contract addresses must also be set associated with the op-chain.

  • Bedrock: OptimismPortal
  • Legacy: StateCommitmentChain

Once your environment variables or flags have been set, run the service via:

yarn start

Ports

  • API is exposed at $FAULT_DETECTOR__HOSTNAME:$FAULT_DETECTOR__PORT/api
  • Metrics are exposed at $FAULT_DETECTOR__HOSTNAME:$FAULT_DETECTOR__PORT/metrics
  • $FAULT_DETECTOR__HOSTNAME defaults to 0.0.0.0
  • $FAULT_DETECTOR__PORT defaults to 7300

What this service does

The fault-detector detects differences between the transaction results generated by your local Optimism node and the transaction results actually published to Ethereum. Currently, transaction results take the form of the root of the Optimism state trie.

  • Post bedrock upgrade, the state root of the block is published to the L2OutputOracle contract on Ethereum.
    • Note: The service accepts the OptimismPortal as a flag instead of the L2OutputOracle for backwards compatibility with early versions of these contracts. The L2OutputOracle is inferred from the portal contract.
  • For pre-bedrock chains, the state root of the block is published to the StateCommitmentChain contract on Ethereum.

We can therefore detect differences by, for each block, checking the state root of the given block as reported by an Optimism node and the state root as published to Ethereum. In order for the fault detector to differentiate between bedrock and legacy chains, please make sure to specify --bedrock.

We export a series of Prometheus metrics that you can use to trigger alerting when issues are detected. Check the list of available metrics via yarn start --help:

> yarn start --help
yarn run v1.22.19
$ ts-node ./src/service.ts --help
Usage: service [options]

Options:
  --l1rpcprovider    Provider for interacting with L1 (env: FAULT_DETECTOR__L1_RPC_PROVIDER)
  --l2rpcprovider    Provider for interacting with L2 (env: FAULT_DETECTOR__L2_RPC_PROVIDER)
  --startbatchindex  Batch index to start checking from. Setting it to -1 will cause the fault detector to find the first state batch index that has not yet passed the fault proof window (env: FAULT_DETECTOR__START_BATCH_INDEX, default value: -1)
  --loopintervalms   Loop interval in milliseconds (env: FAULT_DETECTOR__LOOP_INTERVAL_MS)
  --bedrock          Whether or not the service is running against a Bedrock chain (env: FAULT_DETECTOR__BEDROCK, default value: false)
  --optimismportaladdress        [Custom Bedrock Chains] Deployed OptimismPortal contract address. Used to retrieve necessary info for ouput verification  (env: FAULT_DETECTOR__OPTIMISM_PORTAL_ADDRESS, default 0x0)
  --statecommitmentchainaddress  [Custom Legacy Chains] Deployed StateCommitmentChain contract address. Used to fetch necessary info for output verification. (env: FAULT_DETECTOR__STATE_COMMITMENT_CHAIN_ADDRESS, default 0x0)

  --port             Port for the app server (env: FAULT_DETECTOR__PORT)
  --hostname         Hostname for the app server (env: FAULT_DETECTOR__HOSTNAME)
  -h, --help         display help for command

Metrics:
  highest_checked_batch_index   Highest good batch index (type: Gauge)
  highest_known_batch_index     Highest known batch index (type: Gauge)
  is_currently_mismatched       0 if state is ok, 1 if state is mismatched (type: Gauge)
  l1_node_connection_failures   Number of times L1 node connection has failed (type: Gauge)
  l2_node_connection_failures   Number of times L2 node connection has failed (type: Gauge)
  metadata                      Service metadata (type: Gauge)
  unhandled_errors              Unhandled errors (type: Counter)

Done in 2.19s.