Skip to content

Latest commit

 

History

History
122 lines (83 loc) · 6.91 KB

database.md

File metadata and controls

122 lines (83 loc) · 6.91 KB

Database

Authenticated Attributes is built on top of Hyperbee, a key-value store with BitTorrent-esque replication. Authenticated Attributes is also a key-value store, but every value is signed, timestamped, and optionally encrypted. Our schema is designed around CIDs and making "attestations about media", rather than just generic key-value storage.

Table of Contents

Key-value format

The key is the CID of the asset, followed by a slash, followed by the name of the attestation. All keys are prefixed by a type. An example for an attestation is:

att/bafkreif7gtpfl7dwi5nflge2rsfp6vq6q5kkwfm7uvxyyezxhsnde5ly3y/description

The value is described below.

Encoding

Database entries are stored as binary data, encoded with DAG-CBOR. This is like CBOR, but has canonical encoding and native support for CIDs. If you don't know CBOR, it's like JSON but binary. This allows for easy storage of binary data alongside any other types.

Schema

{
  version: "1.0",
  signature: {
    pubKey: Uint8Array(32),
    sig: Uint8Array(64),
    // CID of "attestation" object
    msg: CID(bafyreietqpflteqz6kj7lmdqz76kzkwdo65o4bhivxrmqvha7pdgixxos4)
  },
  timestamp: {
    ots: { // OpenTimestamps
        proof: Uint8Array(503),
        upgraded: false,
        // CID of signature and attestation objects together in a map:
        // {signature, attestation}
        msg: CID(bafyreialprnoiwl25t37feen7wbkwwr4l5bpnokjydkog3mhiuodi2av6m)
    }
    // Possible other timestamp formats in the future
  },
  attestation: {
    // CID of asset file, same CID as in the database key
    CID: CID(bafkreif7gtpfl7dwi5nflge2rsfp6vq6q5kkwfm7uvxyyezxhsnde5ly3y),
    value: 'Web archive foo bar',
    attribute: 'description',
    encrypted: false,
    timestamp: '2023-05-29T19:03:28.601Z'
  }
}

The binary data of timestamp.ots.proof does not have a specified size, the size mentioned above is just an example and may vary in some cases.

When CID(...) is shown that represents a CID stored natively, not as text. Thanks to the DAG-CBOR encoding we are able to do this. We are also able to get the CID of non-files such as particular DAG-CBOR objects. This is what allows the usage of CIDs for signature.msg and timestamp.ots.msg.

Some information already in the database key is repeated in the attestation, such as CID and attribute. This allows for export of the whole object for external verification and use elsewhere.

When the attestation is encrypted, the schema looks very similar to the above. The only change is attestation.encrypted is true, and attestation.value is always binary data. That binary data, once decrypted, is a DAG-CBOR encoding of whatever the original value was: object, binary data, string, integer, etc.

Currently only a version of 1.0 is supported. In the past, debug databases had no version field and that is considered equivalent to 1.0. Future non-breaking changes will only update the minor version, after the dot.

Other database entries

For more information on specific kinds of attestation, or other types of key-value pairs stored in the database, please see schema.md.

Cryptographic keys

Signing

Every attestation stored in the database is signed with an ed25519 keypair. The private key can be loaded from a PEM file such as those generated by openssl, or directly from a 32-byte Buffer.

An ed25519 private key can be generated with the command openssl genpkey -algorithm ED25519.

Encryption

Attestations can optionally be encrypted on a per-attestation basis. Symmetric encryption is used, so a single secret key needs to be generated for encryption. This can just be a Buffer of 32 random bytes.

The NaCl API is used, so the specific encryption algorithm is xsalsa20-poly1305. The nonce is prepended before storing.

Timestamping

Attestations are timestamped with OpenTimestamps. This requires Internet access and takes about one second to finish. At first only the incomplete proof is stored (indicated by timestamp.ots.upgraded being false), but the proof can be upgraded at a later date.

The timestamp serves to prove that the attestation was not made after attestation.timestamp, within the several hours long error bars afforded by the system. In practice, this means attestation.timestamp is provably accurate to about a day interval.

If you trust the signer you can ignore the proof and rely on attestation.timestamp alone, making it accurate to about a second.

Timestamping methods

Modern trusted timestamping methods usually fall into two groups. Centralized timestamping requires trusting a central authority that will sign your data (or a hash) with a timestamp. Decentralized timestamping requires inserting your data (or a hash) into a timestamped widely-copied database, such as a blockchain or a printed newspaper.

This repo currently uses decentralized timestamping via OpenTimestamps, which uses the Bitcoin blockchain and could support other blockchains in the future.

There is an existing standard for centralized timestamping, RFC 3161, but it isn't used here due to the large proof size that would be need to be stored for each attestation. See this issue for more details.

API

There are two ways of accessing the database: using our Node.js library on local files, or over HTTP. In the ideal world, prospective readers would take advantage of Hyperbee's sparse downloading and partially clone the database over the P2P network, then start making queries using our code. In practice, the HTTP API is likely to see more use as it's simpler and will work in browsers. It also is the only way to do remote writes.

Node.js library

Public functions are documented in lib.md. You can see some example usage in files like demo.js, demo-get.js, and siblings. The actual library source code is easy to read and is all in the src folder.

HTTP API

Please see the documentation for this in http.md. The source code for the server is available in server.js.