IPIP-0421: HTTP Delegated Routing Reader Privacy Upgrade #421

ischasny · 2023-06-07T13:18:55Z

CC @gammazero @masih @guillaumemichel @lidel

lidel

Thank you for submitting this @ischasny.
I'm OoO for the rest of the week, but dropping some quick questions after very rushed first pass (fine to link to prior discussions, if these were already asked before).

src/ipips/ipip-0421.md

src/routing/http-routing-reader-privacy-v1.md

lidel · 2023-06-07T17:34:17Z

src/routing/http-routing-reader-privacy-v1.md

+- **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the 
+digest of a hash function over some content. `MH` is represented as a 32-byte array.


Does this mean CIDs with longer hash functions are truncated at 32-byte mark?

Removed the 32-byte length as indeed it can be longer.

src/ipips/ipip-0421.md

src/routing/http-routing-reader-privacy-v1.md

Co-authored-by: Marcin Rataj <[email protected]>

guillaumemichel · 2023-06-12T13:18:45Z

LGTM!

src/ipips/ipip-0421.md

src/routing/http-routing-reader-privacy-v1.md

src/ipips/ipip-0421.md

src/routing/http-routing-reader-privacy-v1.md

* Refine paragraphs for better readability. * Change section on router selection based on code, since from multihash code alone we cannot determine wheterh whether the request is encrypted or not. * Update alternatives section to explain how the IPIP can be enhanced with OHTTP and Tor.

Refine routing specification and add byte frame diagram to clearly illustrate the content of SALT values.

aschmahmann · 2023-08-01T19:46:25Z

src/ipips/ipip-0421.md

+
+#### Backwards Compatibility
+
+Users will need to deliberately activate Reader Privacy on their nodes. A new flag could be introduced into IPFS implementations such as Kubo's HTTP Delegated Content Router configuration to streamline this process. Users on older nodes can continue using the existing API and switch on Reader Privacy later.


I'd hope this doesn't need to be the case in an application that has some IPFS smarts (rather than a simple HTTP client). If enough features are expressed through something like #388 then the client should be able to have plausible defaults here (e.g. if my delegated router supports IPNI + DHT, but only IPNI has double-hashing support and the client can run its own DHT client it could choose to send double-hashed requests to the delegated router for IPNI and do the DHT lookups itself).

Obviously some clients will still offer configurability (e.g. would you rather ask the delegated router to do DHT lookups for you in cleartext, or not do them at all) but having reasonable default behavior should be possible.

aschmahmann · 2023-08-01T19:49:34Z

src/routing/http-routing-reader-privacy-v1.md

+
+All salts below are 64-bytes long and represent a string padded with `\x00`.
+
+- `SALT_DOUBLEHASH`: The string value `CR_DOUBLEHASH`, where each if the 13 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 51 null bytes after the `CR_DOUBLEHASH` string. The following illustrates its corresponding byte frame diagram:


Suggested change

- `SALT_DOUBLEHASH`: The string value `CR_DOUBLEHASH`, where each if the 13 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 51 null bytes after the `CR_DOUBLEHASH` string. The following illustrates its corresponding byte frame diagram:

- `SALT_DOUBLEHASH`: The string value `CR_DOUBLEHASH`, where each of the 13 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 51 null bytes after the `CR_DOUBLEHASH` string. The following illustrates its corresponding byte frame diagram:

aschmahmann · 2023-08-01T19:50:21Z

src/routing/http-routing-reader-privacy-v1.md

+  43 52 5F 44 4F 55 42 4C 45 48 41 53 48 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+  ```
+
+- `SALT_ENCRYPTIONKEY`: The string value `CR_ENCRYPTIONKEY`, where each if the 15 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 49 null bytes after the `CR_ENCRYPTIONKEY` string. The following illustrates its corresponding byte frame diagram:


Suggested change

- `SALT_ENCRYPTIONKEY`: The string value `CR_ENCRYPTIONKEY`, where each if the 15 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 49 null bytes after the `CR_ENCRYPTIONKEY` string. The following illustrates its corresponding byte frame diagram:

- `SALT_ENCRYPTIONKEY`: The string value `CR_ENCRYPTIONKEY`, where each of the 16 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 48 null bytes after the `CR_ENCRYPTIONKEY` string. The following illustrates its corresponding byte frame diagram:

aschmahmann · 2023-08-01T19:55:05Z

src/routing/http-routing-reader-privacy-v1.md

+  43 52 5F 45 4E 43 52 59 50 54 49 4F 4E 4B 45 59 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+  ```
+
+These magic values are utilized to compute distinct digests from identical values for varying purposes. For instance, a hash of a Multihash employed for lookups should differ from the one used for key derivation, despite originating from the same value. To achieve this, the Multihash is concatenated with different magic values before applying the hash function: `SALT_DOUBLEHASH` for lookups and `SALT_ENCRYPTIONKEY` for key derivation as elaborated in the `Glossary`.


Not strictly needed, but might be nice to explain a bit why the hashes should be different for when people coming looking at this later.

aschmahmann · 2023-08-01T19:57:46Z

src/routing/http-routing-reader-privacy-v1.md

+- **`CID`** stands for [Content IDentifier](https://github.com/multiformats/cid).
+- **`MH`** refers to the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the hash function's digest over certain content.
+- **`HASH2`** is a second hash over the multihash. Second Hashes must follow the `Multihash` format with `SHA2_256` codec. The digest must be calculated as `hash(SALT_DOUBLEHASH || MH)`.
+- **`ProviderRecord`** is a JSON with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).


nit:

Suggested change

- **`ProviderRecord`** is a JSON with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).

- **`ProviderRecord`** is a JSON object with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).

aschmahmann · 2023-08-11T13:20:07Z

src/routing/http-routing-reader-privacy-v1.md

+
+- `EncProviderRecordKeys` is a list of base64 encoded `EncProviderRecordKey`;
+
+#### `GET /routing/v1/encrypted/metadata/{HashProviderRecordKey}`


Same question about encoding as for HASH2

aschmahmann · 2023-08-11T13:21:45Z

src/routing/http-routing-reader-privacy-v1.md

+- **`ProviderRecord`** is a JSON with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).
+- **`ProviderRecordKey`** is a concatenation of `peerID || contextID`. Explicit encoding lengths are unnecessary as they are inherently encoded as part of the multihash format. Max `contextID` length is 64 bytes.
+- **`EncProviderRecordKey`** is `Nonce || enc(deriveKey(multihash), Nonce, ProviderRecordKey)`. Max `EncProviderRecordKey` is 200 bytes.
+- **`HashProviderRecordKey`**  is a hash over `ProviderRecordKey`, calculated as `hash(SALT_DOUBLEHASH || ProviderRecordKey)`.


Is this a SHA256 multihash as well?

aschmahmann · 2023-08-11T13:38:06Z

src/routing/http-routing-reader-privacy-v1.md

+- **`CID`** stands for [Content IDentifier](https://github.com/multiformats/cid).
+- **`MH`** refers to the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the hash function's digest over certain content.
+- **`HASH2`** is a second hash over the multihash. Second Hashes must follow the `Multihash` format with `SHA2_256` codec. The digest must be calculated as `hash(SALT_DOUBLEHASH || MH)`.
+- **`ProviderRecord`** is a JSON with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).


Is it? The routing-v1 spec allows for opaque blobs in the provider record. Where's the line between "metadata" and "provider record" here?

aschmahmann · 2023-08-11T13:40:06Z

src/routing/http-routing-reader-privacy-v1.md

+```json
+{
+    "EncProviderRecordKeys": [
+        "EBxdYDhd.....",
+        "IOknr9DK....."
+    ]
+}


It might be helpful/illustrative for the encrypted data to also show what it's expected the unencrypted form would look like.

aschmahmann · 2023-08-11T14:40:53Z

src/routing/http-routing-reader-privacy-v1.md

+
+### Notes
+
+Assembling a full `ProviderRecord` from the encrypted data requires multiple server roundtrips. The first fetches a list of `EncProviderRecordKey`s, followed by one for each `EncProviderRecordKey` to retrieve `EncMetadata`. To minimize the number of roundtrips to one, the client implementation should use the local libp2p peerstore for multiaddress discovery and [libp2p multistream select](https://github.com/multiformats/multistream-select) for protocol negotiation.


@masih we've talked about this before and it's more exploratory, but I'm noting here to make it public and see if folks have any thoughts about how this might impact the interface/API here in the future.

Note: This is not a "please rewrite everything" request.

How well this works is a function of how important the metadata is to performing a useful retrieval, and how important the metadata is depends on the distribution of information between the "ProviderRecord" and the "(ProviderRecord)Metadata".

IIUC the reason it's implemented this way is to keep data storage in routing backends like IPNI from needing to store the same data but encrypted many many times (i.e. once per multihash advertised).

At the extremes we have:

A lot of space could be saved by making ProviderRecord just contain a small pointer and Metadata contain all the actual provider record information (e.g. peerID, multiaddrs, protocols, ...). This means two round-trips unless the client already has the pointer information locally (e.g. if the pointer was a peerID, then having the multiaddrs, etc. locally, and if the pointer was some system-specific unique-ID then having that cached from a prior lookup). Also, unless there's some aggregation service/proxy it allows correlation between many different requests that use the same metadata (not necessarily a big deal here).

The second round-trip could disappear if we store all the information in the ProviderRecord portion. However, this means encrypting all the data for every advertised multihash

I could situationally see reasons to shuffle data between these two, depending on things like:

how reusable the metadata is

how frequently the metadata information is to be cached

cost models for routing systems
...

Two areas where I could see the extremes being in use:

Save storage by just returning the target's "identifier": e.g. libp2p peerID or an unauthenticated HTTP+libp2p URL (that has .well_known for protocol negotiation) since peer routing makes sense to be separable in libp2p, and protocol negotiation to be separate for unauthenticated HTTP + libp2p.

Save round-trips by returning all the information: e.g. a webseeds-like advertisement that points to an outboard blake3 HTTP URL and an HTTP URL for the data (data could be a separate advertisement)

This makes me wonder if there's a better way to do this. For example:

encrypted/providers returns a (JSON) blob mimicking the routing-v1 results

It contains some information indicating if there's metadata and/or what might be in the metadata

encrypted/metadata provides the metadata

This could allow systems like IPNI to optimize data layouts on ingestion and have some flexibility without breaking downstream clients.

docs: IPIP for Delegated Routing Privacy Upgrade

7fba2e7

ischasny requested a review from a team as a code owner June 7, 2023 13:18

ischasny added 2 commits June 7, 2023 14:25

Update with PR number

f01558f

Update with the PR number

f857f33

ischasny changed the title ~~docs: IPIP for Delegated Routing Privacy Upgrade~~ IPIP-0421: IPIP for Delegated Routing Privacy Upgrade Jun 7, 2023

lidel requested changes Jun 7, 2023

View reviewed changes

lidel changed the title ~~IPIP-0421: IPIP for Delegated Routing Privacy Upgrade~~ IPIP-0421: HTTP Delegated Routing Reader Privacy Upgrade Jun 7, 2023

ischasny and others added 3 commits June 8, 2023 10:22

Update src/ipips/ipip-0421.md

caede81

Co-authored-by: Marcin Rataj <[email protected]>

Update src/routing/http-routing-reader-privacy-v1.md

07967b3

Co-authored-by: Marcin Rataj <[email protected]>

Review comments

f76c87c

lidel reviewed Jun 16, 2023

View reviewed changes

src/ipips/ipip-0421.md Show resolved Hide resolved

chore(ipip-421): add missing sections

0d2948e

lidel reviewed Jun 16, 2023

View reviewed changes

src/ipips/ipip-0421.md Outdated Show resolved Hide resolved

src/ipips/ipip-0421.md Outdated Show resolved Hide resolved

src/ipips/ipip-0421.md Outdated Show resolved Hide resolved

src/ipips/ipip-0421.md Outdated Show resolved Hide resolved

ipip-421: editorials

6c76a33

lidel reviewed Jun 16, 2023

View reviewed changes

src/routing/http-routing-reader-privacy-v1.md Outdated Show resolved Hide resolved

ipip-421: fix typo

2d800ad

lidel reviewed Jun 16, 2023

View reviewed changes

src/routing/http-routing-reader-privacy-v1.md Outdated Show resolved Hide resolved

lidel reviewed Jun 16, 2023

View reviewed changes

ipip-421: editorials

18b258e

masih reviewed Jun 21, 2023

View reviewed changes

src/routing/http-routing-reader-privacy-v1.md Show resolved Hide resolved

masih mentioned this pull request Jun 21, 2023

Separate encrypted lookup path entirely ipni/indexstar#127

Closed

masih added 4 commits July 12, 2023 15:46

Test write access to branch

25242f6

Merge remote-tracking branch 'origin/main'

ef341c1

Refine routing specification and add byte frame diagrams

0195260

Refine routing specification and add byte frame diagram to clearly illustrate the content of SALT values.

masih requested review from lidel and aschmahmann July 12, 2023 16:21

gammazero approved these changes Jul 23, 2023

View reviewed changes

Merge branch 'main' into main

23eb7d3

This was referenced Aug 4, 2023

Communicate with network indexers for content routing using reader privacy with double hashing ipfs/kubo#9455

Open

Implement default content routing selection ipfs/roadmap#110

Closed

lidel mentioned this pull request Aug 7, 2023

IPIP-334: Double-Hashed Find Providers in Reframe #334

Closed

aschmahmann reviewed Aug 11, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPIP-0421: HTTP Delegated Routing Reader Privacy Upgrade #421

IPIP-0421: HTTP Delegated Routing Reader Privacy Upgrade #421

ischasny commented Jun 7, 2023 •

edited

Loading

lidel left a comment •

edited

Loading

lidel Jun 7, 2023

ischasny Jun 8, 2023

guillaumemichel commented Jun 12, 2023

aschmahmann Aug 1, 2023

aschmahmann Aug 1, 2023

aschmahmann Aug 1, 2023

aschmahmann Aug 1, 2023

aschmahmann Aug 1, 2023

aschmahmann Aug 11, 2023

aschmahmann Aug 11, 2023

aschmahmann Aug 11, 2023

aschmahmann Aug 11, 2023

aschmahmann Aug 11, 2023

		- `MH` is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the
		digest of a hash function over some content. `MH` is represented as a 32-byte array.


		#### Backwards Compatibility

		Users will need to deliberately activate Reader Privacy on their nodes. A new flag could be introduced into IPFS implementations such as Kubo's HTTP Delegated Content Router configuration to streamline this process. Users on older nodes can continue using the existing API and switch on Reader Privacy later.


		All salts below are 64-bytes long and represent a string padded with `\x00`.

		- `SALT_DOUBLEHASH`: The string value `CR_DOUBLEHASH`, where each if the 13 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 51 null bytes after the `CR_DOUBLEHASH` string. The following illustrates its corresponding byte frame diagram:

	- `SALT_ENCRYPTIONKEY`: The string value `CR_ENCRYPTIONKEY`, where each if the 15 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 49 null bytes after the `CR_ENCRYPTIONKEY` string. The following illustrates its corresponding byte frame diagram:
	- `SALT_ENCRYPTIONKEY`: The string value `CR_ENCRYPTIONKEY`, where each of the 16 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 48 null bytes after the `CR_ENCRYPTIONKEY` string. The following illustrates its corresponding byte frame diagram:

	- `ProviderRecord` is a JSON with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).
	- `ProviderRecord` is a JSON object with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).


		- `EncProviderRecordKeys` is a list of base64 encoded `EncProviderRecordKey`;

		#### `GET /routing/v1/encrypted/metadata/{HashProviderRecordKey}`


		### Notes

		Assembling a full `ProviderRecord` from the encrypted data requires multiple server roundtrips. The first fetches a list of `EncProviderRecordKey`s, followed by one for each `EncProviderRecordKey` to retrieve `EncMetadata`. To minimize the number of roundtrips to one, the client implementation should use the local libp2p peerstore for multiaddress discovery and [libp2p multistream select](https://github.com/multiformats/multistream-select) for protocol negotiation.

IPIP-0421: HTTP Delegated Routing Reader Privacy Upgrade #421

Are you sure you want to change the base?

IPIP-0421: HTTP Delegated Routing Reader Privacy Upgrade #421

Conversation

ischasny commented Jun 7, 2023 • edited Loading

lidel left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guillaumemichel commented Jun 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ischasny commented Jun 7, 2023 •

edited

Loading

lidel left a comment •

edited

Loading