Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose Naam as naming system powered by IPNI #4

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
275 changes: 275 additions & 0 deletions NAAM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
# Naam: Naming As Advertisement

![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square)

**Author(s)**:

- [Masih Derkani](https://github.com/masih)

**Maintainer(s)**:

- [Masih Derkani](https://github.com/masih)

* * *

**Abstract**

This document defines the **N**aming **A**s **A**dvertise**m**ent (Naam) protocol: a mechanism by
which [IPNI](IPNI.md) can be used to both publish, and
resolve [IPNS](https://github.com/ipfs/specs/blob/main/ipns/IPNS.md) records. Naam utilises the
Comment on lines +17 to +19

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC what you've identified here is reasonably that you can use this mechanism to store basically any type of small data. In particular, you don't actually need to change how any IPNI node operates in order for this proposal to function.

Instead, the proposal is more about letting people know what to do consistently so that they can coordinate across applications in the same way that the Bitswap and FilecoinGraphSync-v1 codes and metadata are defined so as to increase coordination. Also, it allows IPNI nodes to use APIs like ipfs/specs#337 reasonably instead of only responding to FindProviders queries and having clients decipher them on the their end.

I don't know if down the road advertising chains and networks might want to be divided up more so people can choose to participate in them more selectively, but without damaging the system as a whole. Doesn't seem necessary now or for this PR (given that it already works as is), but perhaps something to consider as the system and the data it stores grows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spot on; that's exactly why this spec exists.

On separation of networks; maybe down the line. It seems like a big overkill to just separate for IPNS records considering how few of them are out there compared to content CIDs.

extensible IPNI [advertisement metadata](IPNI.md#metadata) to encode IPNS records and form an
advertisement. The resulting advertisements are consumable by indexer nodes, just like any other
advertisement. Further, to resolve IPNI records, Naam utilises the existing
IPNI [multihash lookup API](IPNI.md#get-multihashmultihash) to discover IPNI records.

## Table of Contents

* [Introduction](#introduction)
* [Specification](#specification)
+ [IPNS Record Publication](#ipns-record-publication)
+ [Updating IPNS Records](#updating-ipns-records)
+ [Removing IPNS Records](#removing-ipns-records)
+ [Resolving IPNS Records](#resolving-ipns-records)
* [Limitations](#limitations)
+ [IPNS Record Size](#ipns-record-size)
+ [Record Validity Duration](#record-validity-duration)
* [Security](#security)
+ [Advertisement Verifiability](#advertisement-verifiability)
+ [Reputation](#reputation)
* [Privacy](#privacy)
* [Implementations](#implementations)
* [Related Resources](#related-resources)
* [Copyright](#copyright)

## Introduction

IPNI introduces a set of protocols to discover providers for content-addressable data, along with
the protocols over which the data can be retrieved. Content providers maintain a chain of
advertisements that collectively capture the list of multihashes they provide. IPNI network is
actively being integrated across multiple networks, including IPFS and FileCoin.

As the IPFS network grows, it is desirable to enable alternative content routing mechanisms in order
to provide the best user experience possible; e.g. fast time-to-first-byte, low latency lookups,
etc. IPNI already offers a DHT alternative to discover providers for a given CID. However, DHT in
IPFS provides other functionalities that go beyond provider lookup. One such functionality is IPNS:
InterPlanetary Naming System.

IPNS compliments the immutability of IPFS by providing a way to create cryptographically
verifiable, mutable pointers to content-addressed data. It is part of the core IPFS functionality,
which offers an intuitive way to enable naming service, such as mapping domain names to changing
CIDs. For IPNI to become a complete content routing system and a viable alternative to the existing
IPFS DHT it needs to support IPNS. This is where **N**aming **A**s **A**dvertise**m**ent
(Naam) comes in.

## Specification

Naam is a naming protocol that build on top of the existing IPNI protocol to facilitate a naming
system. It enables publication and resolution of IPNS records using the IPNI advertisements and
indexer node find APIs. Naam utilises the extensible advertisement metadata field to embed IPNS
records and produce advertisements that are consumable by existing indexer nodes without requiring
any change to the existing ingestion pipeline nor lookup API.

This approach offers two key advantages; it:

1) reuses the existing IPNI network to implement a name service, which takes IPNI closer to offering
a fully featured DHT alternative for the IPFS network without having to run any additional
services dedicated to resolving IPNS records , and
2) preserves an immutable historical catalogue of all IPNS records published by a peer, which can be
used for reputation analysis and ultimately holding peers responsible for their participation in
the network.

Naam naming system involves three main components:

1) **Publisher** - which makes a chain of specifically crafted advertisements available for
ingestion by the network indexers,
2) **Network Indexer** - which is an IPNI node responsible for ingesting advertisements and exposing
find APIs, and
3) **Resolver** - which with the aid of network indexers given an IPNS key resolves its
corresponding record.

The flow of information between the three components is similar to the indexing ecosystem depicted
in IPNI spec: publishers create and publish advertisements -> network indexers ingest them -> and
similar to retrieval clients resolvers use the find API to map a given IPNS key to IPNS record. In
this flow no changes are needed on network indexers. Therefore, the reminder of this document
focuses on interactions of Publishers, Resolvers and the data format exchanged between them. For
more information on how network indexers ingest advertisements and provide lookup APIs, please
see [IPNI](IPNI.md).

### IPNS Record Publication

Naam publishes IPNS records via Publishers, which create IPNI-compliant advertisements that
embed IPNS records and are signed by their identity. The produced advertisements consist of:

* **`PreviousID`** - the optional link to the previous Naam advertisement. `nil` indicates no
previous advertisements.
* **`Provider`** - the publisher peer ID.
* **`Addresses`** - the array of multiaddrs over which the publisher can be reached.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will the publisher addresses be used for? Indexers get the advertisements from the address specified in the announcement message and map the peer ID multihash to an IPNS record contained in IPNI metadata. IPNS clients resolve a peer ID multihash to the IPNS record. They do not care about what the publisher's address is. So, in the context of IPNS, the advertisement's Addresses field seems unnecessary.

* **`Entries`** - the link to an `EntryChunk` IPLD node, as specified by IPNI, that contains a
single entry: the multihash of IPNS record key with `nil` as link to `Next`.
* **`ContextID`** - fixed to UTF-8 bytes encoding of the value `/ipni/naam`.
* **`Metadata`** - the bytes representation of Naam metadata, consisting of:
* the varint representation of `ipns-record` multicodec code, i.e. `0x0300`, as bytes,
* followed by the marshalled IPNS record.
* **`IsRm`** - the boolean value specifying weather an IPNS entry is being added/updated,
i.e. `false`, or removed, i.e. `true`.
* **`Signature`** - the advertisement signature calculated as specified by IPNI specification.

#### `Entries`

`Entires` in a typical advertisement are responsible for capturing the list of multihashes
corresponding to the content hosted by a provider. Once ingested, the indexer nodes facilitate
lookup of providers over find APIs via the same multihashes.

Naam repurposes the same advertisement structure by specifically crafting entries of advertisement
such that the indexer node find API can also be repurposed to lookup IPNS records. Naam forms a
single `EntryChunk` IPLD node, that contains a single multihash calculated as:

* The SHA-256 multihash of IPNS Record Key, i.e. `/ipns/<ipns-key>`.
* The multihash value is encoded according to the IPNI advertisement encoding. See [IPNI Specification](IPNI.md).
* There are no restrictions on the concrete IPNS key format as long as it complies with the [IPNS routing record specification](https://github.com/ipfs/specs/blob/main/ipns/IPNS.md#routing-record).

This approach allows the resolvers to deterministically calculate the multihash used for lookup when
interacting with the indexer node find APIs.

#### Context ID

The field `ContextID` in a regular advertisement is used as a way to uniquely identify the metadata
associated to a list of multihashes. Naam uses a fixed value, `/ipni/naam`, as context ID to assure
that the metadata associated to the published advertisement for the same IPNI key is singular and
uniquely identifiable.

Using a fixed value as context ID also enables network indexers to quickly differentiate Naam
advertisements from a typical content advertisement. This opens up future opportunities for
optimisation when it comes to ingesting Naam advertisements. For example, indexers can pro-actively
process IPNS records and offer bespoke APIs that understand IPNS record expiry/validity.

#### Metadata

`Metadata` is typically used to convey the list of protocols over which the advertised content is
retrievable. Further, this field is designed to be extensible: a valid metadata should start with a
`varint` signifying _some_ protocol ID followed by arbitrary bytes. There are currently three well
known protocol IDs. The protocol ID itself need not to be known by the indexer nodes. In fact,
indexers treat metadata as a black box and carry on processing advertisements. Instead, it is meant
to be a signal to the consumers of `Metadata`, i.e. retrieval clients, as a hint on how to decode
the remaining bytes.

Naam utilises this extensibility to directly encode IPNS records as metadata. The Naam advertisement
metadata consists of:

* `ipns-record` multicodec code `0x0300` as protocol ID,
* followed by marshalled IPNS Record.

The use of metadata to capture IPNS records enables Naam to also utilise the built-in IPNI mechanism
for updating metadata to update IPNS records. Because, context ID nor the multihash in Naam
advertisements ever change for the same IPNS key. For more information,
see [Updating IPNS Records](#updating-ipns-records).

### Updating IPNS Records

IPNI offers a built-in mechanism to efficiently update the metadata associated to a list of
multihashes without having to republish the entries. Indexer nodes uniquely identify the metadata by
the advertisement `Provider`, and `ContextID`.

Both of these values remain the same in Naam advertisements: the former is the same peer ID that
forms part of the IPNS key, i.e. `/ipns/<ipns-key>`, and the latter is fixed. As a result, to update
an IPNS record, all that's needed is to publish a new Naam advertisement that:

* links to the previous advertisement in `PrevousID` link as specified by IPNI,
* uses the same `Provider` and `ContextID` values as before,
* sets `IsRm` to `false`, and
* has the updated IPNS record in Naam Metadata format.

Note that there is no need to specify `Entries` when updating an existing record. It is inferred by
the network indexers since `Provider` and `ContextID` in Naam advertisements never change.

### Removing IPNS Records

IPNS has EOL and TTL features built-in. This means explicit removal of records is not strictly
required. Instead, records eventually expire. Though not strictly necessary, Naam protocol can
utilise IPNI to explicitly remove records just like a traditional DNS nameserver.

The process to remove an IPNS record published via Naam protocol is identical to removal
advertisement specified by IPNI. Similar to a regular advertisement, an IPNS record may be removed
by publishing an advertisement that:

* links to the previous advertisement in `PrevousID` link as specified by IPNI,
* uses the same `Provider` and `ContextID` values as before,
* sets `IsRm` to `true`, and
* sets `Entries` link to `NoEntries` as specified by IPNI.

Once ingested, indexer nodes will eventually remove the Naam advertisement metadata along with the
publisher information.

### Resolving IPNS Records

IPNS record resolution involves finding the IPNS record associated to a given IPNS key, i.e.
`/ipns/<ipns-key>`. As eluded to earlier, Naam uses the find API already provided by the network
indexers to achieve this. To resolve an IPNS record, a resolver:

* calculates the multihash of the given IPNS key as its SHA-256,
* looks up the provider info for that multihash via the IPNI find API,
* decodes the metadata from the provider info as IPNS record,
* validates the record, and
* if valid returns its value.
Comment on lines +211 to +213

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note/thought here. Unlike for provider records, IPNS records have a single "best" value. That is to say that while it's totally reasonable that if 100 different peers are advertising they have the block with multihash M returning them all to the user is very understandable (unless they've asked for some kind of limiting, like geographic or recently seen online). However, given a 100 different peers that have all put up IPNS records corresponding to the key K there is at most 1 record worth returning (unless you're trying to allow for querying historical information). Recall that even if Alice is the only one with the key K, anyone can store and therefore store with IPNI an IPNS record signed by Alice corresponding with the key K (e.g. in case her publisher goes offline and her records are deleted).

Perhaps it's fine to ask the caller to decode all of the returned values and calculate the best themselves so as to not require the IPNI nodes to do any validation. On the other hand, if the IPNI nodes do validation before storing the data in their database they can both only store a single IPNS record per key and save the caller time on validation.

I can see how it's probably easier to shift it to being the caller's responsibility to do the processing given that it doesn't require any database alterations in storetheindex and given how the network behaves today the number of nodes advertising a given IPNS record will be very low. In this case perhaps it'd be worthwhile to do the work of selecting the "best" record in between the IPNI database and the HTTP API (i.e. ipfs/specs#337) server responses. Then later if the savings look substantial implementers can switch to only storing a single record per key.

Copy link
Member Author

@masih masih Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is at most 1 record worth returning

Yep; the go implementation does exactly that, in that in the context of naming over advertisement, only the "head" advertisement matters, and because advertisements are signed we can verify that the advertisement containing the IPNS record was indeed published by the peer that corresponds to the IPNS key.

anyone can store and therefore store with IPNI an IPNS record signed by Alice

Yes I think of this as a good thing, since even if Alice is offline, IPNS key is resolvable. I think the same is true about the existing DHT-based IPNS resolution?

decode all of the returned values and calculate the best themselves...
if the IPNI nodes do validation before storing the data in their database

Two things to point out here:

  • Because the provider and context ID is identical across all NAAM advertisements, there will only be a single record as a result of processing those ads.
  • The IPNI built-in validation, checks the signature of the ad and would guarantee that the ad is from the publisher/provider.

do the work of selecting the "best" record in between the IPNI database and the HTTP API

Thank you for mentioning the HTTP API, that seems reasonable, though I must point out in IPNI there will only be one IPNS record for a given IPNS key. Reading this comment I am worried that I have not spelled out the fact that only the head advertisement matters, and because provider and context ID are identical across all NAAM advertisements then there will only be a single provider record for it and therefore a singe IPNS record.

Having pointed out the singularity of IPNS records resolved via IPNI, selecting the best between HTTP and API, would become a matter of EOL value and whichever record is newer I think.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and because advertisements are signed we can verify that the advertisement containing the IPNS record was indeed published by the peer that corresponds to the IPNS key.

You can't really do this. There may be many IPNS keys owned by a single peer, Alice. Of course she can also use her libp2p node ID as a key but this isn't the only one.

though I must point out in IPNI there will only be one IPNS record for a given IPNS key.

I'm not sure if I'm misunderstanding or misexplaining myself. Say the following happens:

  1. On day 1 Alice signs an IPNS record R1 using key K that's valid for 30 days and has a sequence number 1, then advertises it to IPNI from the AliceProvider
  2. On day 2 Bob sees Alice's IPNS record R1 and says "hey, I should try and keep that alive" so advertises it to IPNI from the BobProvider
  3. On day 3 Alice signs an IPNS record R2 using key K that's valid for 30 days and has a sequence number 2, then advertises it to IPNI from the AliceProvider
  4. On day 4 when Charlie tries to look up the record associated with the IPNS identifier derived from K from IPNI there are two records (AliceProvider, R2) and (BobProvider, R1) which will either both need to be returned (for the client to figure out which is valid) or the IPNI node will need to figure out that R2 > R1 and only publish the result from the AliceProvider
    • Note: In theory R2 could be invalid (EOL was on day -7, or even day 3.5, invalid signature, etc.) so just choosing R2 because Alice published R1 and R2 to IPNI isn't sufficient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On step 2: this is where I need your help to get my head around this; I suspect I am missing something fundamental about IPNS.
In Naam, BobProvider can only advertise IPNS records with key /ipns/<bob's-key>, because Bob does not have Alice's key. What am I missing?

On step 4: thank you for pointing out the case where R2 could have expired or invalid. As the spec stands, such publication by Alice would make R1 no longer discoverable. I am curious why should R1 remain discoverable if Alice has published a new record for the same key that has expired? One could argue because the key is the same and the value is changed, then the value corresponding to the key should be the latest, i.e. the expired record and it is Alice's responsibility to publish new records to correct it. As for the case, where signature of R2 is invalid, again one can argue that it is Alice's responsibility to publish correct IPNS records if Alice wants the records to be discoverable.

WDYT?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On step 2: this is where I need your help to get my head around this; I suspect I am missing something fundamental about IPNS.

In Naam, BobProvider can only advertise IPNS records with key /ipns/<bob's-key>, because Bob does not have Alice's key. What am I missing?

I thought I was explaining this above, but I'll try to give more examples to see if that helps.

Alice and Bob can each have hundreds of keys (e.g. Ed25519 key pairs) for a variety of purposes. Any of these keys can be used to create IPNS identifiers, such as /ipns/alices-favorite-book-key, /ipns/alices-least-favorite-movie-key, etc. These may/may not be the same as the alices-ipni-provider-node-key or alices-laptop-kubo-node-libp2p-key.

This means that this approach (i.e. one key per IPNI provider) is not sufficient, nor necessary https://github.com/ipni/go-naam/blob/9002ac91a0e3d606d81748df34d63e5f826e821a/naam.go#L142-L143.

Additionally, once Alice has signed a record R1 for /ipns/alices-favorite-book-key with her private key there's nothing stopping anyone else (e.g. Bob) from publishing R1 as well since that signed record is now public information.

On step 4: thank you for pointing out the case where R2 could have expired or invalid. As the spec stands, such publication by Alice would make R1 no longer discoverable. I am curious why should R1 remain discoverable if Alice has published a new record for the same key that has expired?

Given the above let's revisit. Let's take the go-naam implementation as-is as an example. We can see that there is no server side code at all, only client side code. This means a few things:

  1. IPNS records are not validated server side, meaning they might be completely invalid (missing a signature, signed with the wrong key, etc.) not just expired
  2. Anyone can publish R1 including both Alice and Bob AND there is no way for IPNI to know whether Alice or Bob is the "canonical" author.
  3. 1 + 2 implies that only sending back a single value would be unsafe since there's no logic to decide which one to send back. This is fixed if there's server-side logic deciding which to send back (i.e. negating 1).

This means that as is https://github.com/ipni/go-naam/blob/9002ac91a0e3d606d81748df34d63e5f826e821a/indexer_client.go#L105-L110 seems to be a DoS vulnerability since a malicious Mallory could advertise /ipns/alice-favorite-book-key with a totally bogus and invalid value (i.e. not signed). This is because a) the server doesn't validate the publish b) the server sends back all matches for a given multihash and c) the client code chooses the first response it might be Mallory's which would prevent resolving the IPNS address.

The solutions here are, as mentioned above, either:

  1. Have the client process all the results and choose the best valid one (the server will already send back multiple results as-is). go-ipns has tools for this.
  2. Have the server do processing and only store the best valid entry and return only that one

@masih does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does thank you for helping me understand. Would this address your concern? This is to make it more explicit that the identity used to sign and publish the advertisement must match the identity in IPNS key.

This specification is written to require zero changes on the existing ad ingest pipeline in the hope to make it easier to integrate. Relaxing that condition slightly, it should be straightforward to also expand server-side validations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. IIRC pr.Provider from here refers to an the "provider" of a CID not the index provider node and there's no proven relationship the owner of AliceIndexProviderKey is the owner of AliceMHProviderKey (e.g. for example, I could delegate all my IPNI record publishes to a third party). IIUC the plan is to have some reputation to track good behavior here, but it's TBD and best-effort.

However, even if this did work it would limit utility by only allowing 1 key per index provider (e.g. really bad for infra providers, or just anyone with multiple mutable websites to manage under different keys). Given that your proposed change is client side, why not just use go-ipns to select the best record?

For example in go-naam do something roughly like:

ipnsValidator := ipns.Validator{}
var records [][]byte
for _, pr := range mhr.ProviderResults {
  if !bytes.Equal(pr.ContextID, ContextID) { continue }
  recBytes, err := getIPNSBytesFromMetadata(pr.Metadata)
  if err != nil { continue }
  if ipnsValidator.Validate(ipnsID, pr.Metadata) != nil { continue }
  records = append(records, recBytes)
}
bestRecordIndex, err := ipnsValidator.Select(ipnsID, records)
if err != nil || bestRecordIndex <0 { return nil, routing.ErrNotFound}
return records[bestRecordIndex], nil

Basically, because the validation is only happening client-side anyway why not do a little more CPU work (rather than relying on assumptions) in the validation logic in order to unlock a lot of functionality. If it becomes important later servers can add the validation onto their end.

Copy link
Member Author

@masih masih Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proven relationship the owner of AliceIndexProviderKey is the owner of AliceMHProviderKey

Advertisement signature would prove that which is verified at the time of ingest by indexer nodes. Right?

only allowing 1 key per index provider

That's a reasonable concern. Though, I must point out, Naam publish is way more light-weight than a regular index provider set up. The two also are not bound to each other: a user with multiple identities can publish different advertisement chains under different identities. The Naam ad chains can be published on their own exclusive chain.

why not do a little more CPU work

go-naam already does the validation here. Is there something I have missed in validating the records?

Thank you for pointing out select I missed it! I still think using select doesn't make sense because, indexer has to return at most one record for the same Provider ID + Context ID + Multihash. If it does not then there is bug in the indexer implementation. When the result can only be a single one then select usage doesn't quite make sense.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advertisement signature would prove that which is verified at the time of ingest by indexer nodes. Right?

Ah ok thx. I knew you could publish for others, but I guess that depends on the policy. Not sure how that's determined, but if it's sufficiently trusted then that's probably ok.

The Naam ad chains can be published on their own exclusive chain.

For sure, and maybe this depends on what the delegated advertising policies look like but it'd be annoying to have to have hundreds of libp2p hosts (and ports) for hundreds of naam-index-provider nodes. If it's as simple as making all the private keys available to the a single libp2p index provider node that's probably ok (although note there might be some operational complexity around creating advertisements if people want to isolate their private keys to colder storage/hardware devices).

I still think using select doesn't make sense because, indexer has to return at most one record for the same Provider ID + Context ID + Multihash.

Right, this is for enabling multiple return values. The only use case I can think of for this at the moment is to increase some resiliency of the IPNS record which is not currently allowed. For example:

  1. Alice decides to make a consistency vs availability tradeoff and always publish her records with EOLs of effectively infinity
  2. At some point Alice takes her node (and index provider) offline
  3. A few days later index providers garbage collect her records
  4. Bob tries to keep Alice's record alive by publishing it to IPNI, but can't which means /ipns/alice-key is effectively dead even though it doesn't have to be

This isn't necessarily the most important, although it's something that's being lost here. It could of course be added later as well if needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's as simple as making all the private keys available to the a single libp2p index provider node that's probably ok

yup that should suffice.

it's something that's being lost here.

Agreed; already added as limitations. As is if the provider side is offline for longer than a week records are no longer resolvable. Server side changes to treat Naam context ID differently can be a way rectify this.

Thank you so much for the great pointers.


## Limitations

### IPNS Record Size

The maximum size of IPNS records supported by Naam is 1 KiB, even though IPNS specification suggests
a maximum limit of 10 KiB. The 1 KiB limit currently imposed by network indexers on the maximum size
of advertisement metadata.
Comment on lines +220 to +221

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 1KiB the current network limit, but 100B is the recommended (you will have a good time) number? https://github.com/ipni/specs/blob/69a5f6bfdacf6050958a901324322c1e61b1611e/IPNI.md#advertisements

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec might be out of date though i think 100B is a reasonable value for a typical ad. the indexer nodes currently have a hard limit of 1KiB.


Basic tests show that a 1KiB limit is sufficiently large for storing simple IPNS records pointing to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some background info on the limits:

When trying to figure out a limit to use here was some of the input information ipfs/specs#319 (comment). The max record length found exceeded 1kib.

At the moment IPNS records have bloat to them (e.g. some data is stored in the record twice, once as cbor and another in the protobuf) so there's room to evolve the spec to cut down on used space.

You may already have metrics on this for IPNI, but my suspicion is that the current problem is not long paths but people using really large identity cids. IIRC the only way you end up needing long paths is either with non-UnixFS IPLD queries (e.g. traversing a deep path inside a single block, a deep path within an ADL that captures the query, the types of complexity that can come with the IPLD URI scheme, etc.), or with nested IPNS/dnslink queries (e.g. pointing to /ipns/otherkey/foo/bar/baz/....). Neither of these are common at the moment, people abusing identity cids definitely does happen though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

people using really large identity cids

Ah yes IDENTITY CIDs strike again! :) My vote would be to stop the abuse of IDENTITY CIDs, and i would go as far as saying that NAAM should reject IDENTITY CIDs ( and maybe IPFS too). I will add this as a limitation.

I am curious what is the actual usecase for putting IDENTITY CIDs inside IPNS records?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes IDENTITY CIDs strike again!

Yes, along with their lack of max-size specification 😭.

I am curious what is the actual usecase for putting IDENTITY CIDs inside IPNS records?

This one's actually pretty straightforward, especially compared to standard IPNI provider records for identity CIDs.

  1. Let's say Alice publishes her current favorite quote under her key with identifier /ipns/alicekey
  2. On day 1 Alice's favorite quote is "You only live once, but if you do it right, once is enough" which is 58 characters. Rather than using SHA2-512 on it (which is 64 bytes) she decides to encode it as an identity CID to save on bytes, and uses it as the value of the IPNS record (along with the raw codec)
  3. On day 2 Alice's favorite quote is "In the end, it's not the years in your life that count. It's the life in your years" which is 83 characters so Alice uses SHA2-512 (along with the raw codec) to get her CID, and updates her IPNS record to point at the CID of her new favorite quote

Basically, as long as there are any valid uses for identity CIDs, there are reasons to have them in IPNS records. Where things start getting out of hand is people pushing identity CIDs to be bigger and bigger to save themselves from doing network queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem like forbidding identity CIDs in IPNS records would still be a reasonable restriction, or have the IPNS spec explicitly limit the length of identity CIDs to be less than the length of the largest hash.

IPFS paths, specially for records that do not need the peer's public key to be embedded into IPNS
records. For example, a Ed25519 peer ID also captures the peer's public key. A marshalled IPNS
Comment on lines +224 to +225

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could get a little tricky for people trying to use say RSA 4096 keys. Depending on how much of a sticking point this is, there's possibly a couple ways around it. Notably, the RSA key is not covered by the signature so you could store as separate advertisements the identifier -> public key mapping and the IPNS record and then combine them when returning results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out; I need to think about this one a bit. Public key resolution feels like something that's beyond the scope of this spec. Unless you are saying such keys make up the majority of IPNS users today and without it, the size limitations of the approach would make it much less useful to people?

Copy link
Member Author

@masih masih Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like RSA keys are very popular.

Maybe we can extend this to support CID -> public key lookup, where the NAAM IPNS record has the CID of public key which is resolvable to data just like a regular content CID via IPNI.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidel can explain how those numbers were collected. Note: that number showing only RSA PK records is a bit misleading. PK records shouldn't exist for Ed25519 since they're derivable from the identifiers, the question is what percentage of IPNS records use RSA keys (and of what size) vs ed25519, secp256k1, ...

I'd hope that by now most of the keys are ed25519, but I'm not sure about the distribution without some checking.

Copy link

@lidel lidel Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Disclaimer: someone executed some lookups on hydra's for me back then, and we ended up marking RSA as SHOULD in the spec so I did not dig deeper)

I remember these RSA vs ED25519 numbers had to be interpreted with additional steps for /ipns and /pk prefixes (number for RSA included keys referred from peer records, and not used for IPNS, Ed25519 missing because its already inlined, etc)

Either way, RSA keys were the old default so we should support them. Ed25519 is the new default, and we are nudging people to move away from RSA (ipfs/kubo#9507, ipfs/kubo#9506), so over time, there will be less of them.

record with such key type and empty value constructed
using [`go-ipns`](https://github.com/ipfs/go-ipns) library comes to `257` bytes, which leaves
sufficient space for encoding fairly long IPFS paths as the record value.

Because of the size limitation NAAM Records also limit storing large IDENTITY CIDs as record value.

### Record Validity Duration

The indexers will maintain information from inactive providers for up to a week. This means IPNS
records published by Naam with EOL of beyond a week should be re-published to remain resolvable.

## Security

### Advertisement Verifiability

Naam advertisements will benefit from all the verifiability features of regular advertisements. They
are signed by the IPNS record publisher and are verified for validity before being ingested by the
network indexers.

### Reputation

Similar to the IPNI indexing protocol, the use of immutable advertisement chain by Naam provides a
verifiable and immutable catalog of IPNS records exposed by a network participant. Access to such
historical data enables verifiable analysis of participant behaviour, and offers an opportunity to
design and develop reputation systems, which ultimately can contribute to the health of the network.
As far as I know, this is something that is not currently offered by the existing implementations of
IPNS.

Even though, the specification of such reputation system is beyond the scope of this document, it is
masih marked this conversation as resolved.
Show resolved Hide resolved
important to point out such potential. Because, as the size of IPFS network grows so will the need
for reputation systems that enable users to choose from ever increasing "choices" across the
network.

## Privacy

To be revisited when double-hashing work is finalised.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See go-naam PR 13.

When reader-privacy is enabled (enabled by default), a double-hashed lookup is done when resolving a key to an IPNS record. The client wishing to resolve a key to an IPNS record creates a hash of the key and sends that to the indexer for lookup. The reader-privacy-enabled indexer returns encrypted metadata that is indexed by the hash of the key. The client then decrypts this metadata using the original key and extracts the IPNS record from the decrypted metadata.


## Implementations

* [`ipni/go-naam`](https://github.com/ipni/go-naam) - Golang implementtion of IPNI Naam protocol.

## Related Resources

* [IPNS: InterPlanetary Naming System](https://github.com/ipfs/specs/blob/main/ipns/IPNS.md)
* [IPNS PubSub Router](https://github.com/ipfs/specs/blob/main/ipns/IPNS_PUBSUB.md)
* [IPNI: InterPlanetary Network Indexer](https://IPNI.md)

## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).