Skip to content

Latest commit

 

History

History
128 lines (91 loc) · 8.56 KB

ANS-103.md

File metadata and controls

128 lines (91 loc) · 8.56 KB

ANS-103: Succinct Proofs of Random Access

Status: Draft Authors: Sam Williams [email protected], Lev Berman [email protected]

Abstract

This document describes the new consensus mechanism for the Arweave network based on the competition to find a chunk of the past data in a set of historical chunks inferred from the latest agreed-upon blockweave state.

Motivation

At the time when this specification is written, the consensus mechanism employed by the Arweave network is a classical Proof of Work with an additional requirement of including a chunk (up to 256 KiB) of the past data into the hash preimage where the chunk is deterministically determined by the latest state of the blockweave.

While this approach incentivizes the network to keep historical data, it does not impose any significant restrictions on the speed of access to data a miner needs to be competitive. Specifically, miners can benefit from using a remote storage pool. Moreover, combined with a computation pool, it can serve Proof of Work preimages to millions of clients per second via a Gbit Internet link. A storage and computation pool has been evidenced in the Arweave network by a decreased number of public nodes and a simultaneous increase of the network hashpower and people claiming to be part of the pool.

Therefore, the new consensus algorithm's first goal is to make the mining advantage grow sharply with the growing speed of access to data, promoting replication more aggressively.

The second but not less important goal is to reduce the energy consumed by the network. Proof of Work networks are known for their energy-intensity. Reduced energy consumption reduces the carbon footprint, widely believed to be very harmful to the planet. We believe the environment-friendliness of the consensus mechanism is crucial for the long-term sustainability of the Arweave platform.

To sum up, the consensus algorithm described here aims at two major goals.

  • Disincentivize miners from retrieving data on demand from the network. In other words, incentivize miners to store data as close to the mining machine as possible.
  • Reduce network's energy consumption.

Reference Implementation

Link.

Specification

Prerequsites

  1. Indexed Dataset

The core of the new mechanism - continuous retrieval of chunks of the past data. Every chunk is identified by a global offset, as we want to make the possession of any byte equally incentivized. Therefore, the whole weave has to be indexed so that every chunk is quickly accessible by its global offset.

Starting from the release 2.1, the Arweave Erlang client maintains such an index.

  1. Slow Hash

The consensus mechanism needs a deterministic but unpredictable way to choose candidate chunks, to make the miners continuously access storage. However, choosing candidate chunks cannot be too easy as it would allow miners to reduce the number of chunks they ever work with. There are two threats associated with that. One threat is the lower cost of the computation expenditure, as compared to the cost of storing extra data required for the same probability of a reward. The second threat is the existing computation technology that, although being more expensive than storing the entire weave, is so efficient that it outperforms even the most efficient data retrieval based client.

Starting from release 1.7, the Arweave protocol relies on RandomX, a proof-of-work algorithm optimized for general-purpose CPUs.

Algorithm Description

Every block uniquely determines a Search Space - a set of offsets (Recall Bytes) on the [0, weave size at a certain block (Search Space Upper Bound)] interval.

Miner Steps

  • Generate a random nonce and generate a slow hash (H0) of the hash of a Merkle tree containing the current state, the candidate block, and the nonce.
  • Compute the unique recall byte from H0, the hash of the previous block (PrevH), and Search Space Upper Bound.
  • Search the local storage for the chunk containing the computed Recall Byte. If not found, repeat from the first step.
  • Compute a fast hash of the hash of the Merkle tree containing the slow hash computed at the first step and the located chunk.
  • Check whether the computed hash is bigger than the current mining difficulty (the number computed from its binary digit big-endian representation is bigger). If not, repeat from the first step. If yes, pack and distribute the block (the block contains the nonce and the chunk).

The solution chunk together with the Merkle proofs of its inclusion in the blockweave we call Succinct Proof of Random Access (or SPoRA) and use as a name for the new consensus mechanism.

Verifier Steps

Perform one iteration over the miner steps where the nonce and the chunk are found in the verified block.

Rationale

Search Space Constraints
  1. Search Space needs to be big enough for two reasons:
  • make it prohibitively expensive to download the entire search space on demand; note that the prior consensus mechanism can be viewed as an edge case of SPoRA where the search space consists of a single chunk; the efficiency of serving data to the computing agents on demand depends on the network bandwidth, which grows over time.
  • make it prohibitively expensive to compensate for the lack of data by hashing.
  1. On the other hand, Search Space needs to be small enough to incentivize miners to replicate the rarer parts of the weave, which would give them and advantage over miners who replicated less of the corresponding area at the corresponding blocks.

We choose the Search Space size to be 10% of the weave. In this case, 10% of the miners storing unique 10% of the weave in the network where everybody stores 90% of the weave are roughly 1.2 times more efficient than the rest of the miners. It holds for various ratios of the time it takes to compute a RandomX hash and the time it takes to look up a chunk.

Pseudocode

The ar_deep_hash definition.

mine(Nonce, BlockPreimage, PrevH, SearchSpaceUpperBound):
	// Compute a slow hash.
    H0 := randomx_hash(concat(Nonce, BlockPreimage))
    RecallByte := pick_recall_byte(H0, PrevH, SearchSpaceUpperBound)
    // Search the local storage for the chunk containing Recall Byte.
    SPoA := get_spoa_by_byte(RecallByte)
    if SPoA == not_found
        mine(random_nonce(), BlockPreimage, PrevH, SearchSpaceUpperBound)
    Chunk := get_chunk(SPoA)
    SolutionHash := randomx_hash(concat(H0, PrevH, Timestamp, Chunk))
    if validate(Candidate, Diff)
        return
    mine(random_nonce(), BlockPreimage, PrevH, SearchSpaceUpperBound)

pick_recall_byte(H0, PrevH, SearchSpaceUpperBound):
    SubspaceNumber := remainder(hash_to_number(H0), SUBSPACES_COUNT)
    SearchSpaceSize := integer_division(SearchSpaceUpperBound, 10)
    EvenSubspaceSize := integer_division(SearchSpaceUpperBound, SUBSPACES_COUNT)
    SearchSubspaceSize := integer_division(SearchSpaceSize, SUBSPACES_COUNT)
    SubspaceStart := SubspaceNumber * EvenSubspaceSize
    SubspaceSize := min(SearchSpaceUpperBound - SubspaceStart, EvenSubspaceSize)
    EncodedSubspaceNumber := number_to_binary(SubspaceNumber)
    SearchSubspaceSeed := hash_to_number(sha256(concat(PrevH, EncodedSubspaceNumber)))
    SearchSubspaceStart := remainder(SearchSubspaceSeed, SubspaceSize)
    SearchSubspaceByteSeed := hash_to_number(sha256(H))
    SearchSubspaceByte := remainder(SearchSubspaceByteSeed, SearchSubspaceSize)
    return AbsoluteSubspaceStart + remainder(SearchSubspaceStart + SearchSubspaceByte, SubspaceSize)

Related Work

The work was heavily inspired by Permacoin: Repurposing Bitcoin Work for Data Preservation by Andrew Miller, Ari Juels, Elaine Shi, Bryan Parno, and Jonathan Katz from Microsoft Research.

We suggest to use the existing, growing, and always accessible Arweave dataset instead of one pre-generated by a trusted dealer. We use a slow specialized hardware-resistant hash to make it prohibitively expensive to compensate for the lack of local data with computation. Finally, we provide the network with the incentive to replicate data uniformly.