feat: `create` a document with a provided identifier #263

sroze · 2023-12-29T11:53:40Z

This allows to create documents with a given identifier. In particular, when integrating other systems with automerge, it is very useful to be able to create documents from predictable identifiers (i.e. UUID v5) so we don't need to store any 'reference' within existing systems.

sroze · 2023-12-29T11:54:41Z

packages/automerge-repo/src/index.ts

@@ -31,6 +31,7 @@ export {
  isValidAutomergeUrl,
  parseAutomergeUrl,
  stringifyAutomergeUrl,
+  interpretAsDocumentId,


Exposed for applications to do the same as this piece of code in the tests to be able to fetch such a document.

pvh · 2023-12-29T17:38:36Z

Hi @sroze, thanks for the PR. I appreciate you taking the time and including a test. I've had to turn down variations on this patch a few times but I'd be happy to help you find some kind of a solution that works for you. Let me explain.

In early versions of Automerge-Repo, we actually required the user to provide a document ID but this lead to serious problems where users would create documents without shared ancestry but with the same document IDs. In the most naive case, the ID would just be a string like "my-document" but the same problem would exist with any externally sourced UUID.

The issue is that Automerge needs shared object history to merge. If you import the same document on two different nodes, it will have small differences (such as time stamps) that will result in the hashes in the change graph not matching and merges will result in full document conflicts or just be rejected outright. For an analogy, imagine two people importing the same codebase into git, or pasting the same text into a Google Doc. The files don't have a shared history even if they have very similar contents.

As a result, I moved to a model where we treat the documentIds as opaque system-generated identifiers. My feeling is that storing an extra ~16b per document (plus some key overhead, I suppose) is probably a good trade to avoid introducing corruption bugs in your synchronization system.

Deriving a document from a content hash at import might seem at first glance to improve the situation, but that would leave us in a position where everyone starting from the same documentID (who happens to share a sync-path) would wind up merging all the changes for their documents. We could add a salt, I suppose, which would help... but I want to be very careful whatever we do here and would want to think about both correctness/expectations and any potential security problems that could be introduced.

Anyway, sorry to be the bearer of bad news! One thing I have been considering is adding support for local-only "pet names" for documents. This would allow something like repo.openMy("rootDocument") (likely not with this name). This may not solve your problem completely but would it help?

It might also help to hear a little bit more about your integration story. Maybe there are other approaches we can take to solving the underlying problem.

sroze · 2023-12-29T18:18:10Z

Thank you so much for the detailed answer. The problem completely makes sens, I fully appreciate the challenge associated with resolving the (real) conflict from merging two documents with the same identifier coming from different peers/history. Technically, this will even be an issue at some point, with system-generated identifiers (aka UUID collisions). Is there any mechanism, currently, in Automerge to report/handle impossible synchronisations (aka conflicts) or it's been attempted to avoid it altogether given the conflict-free nature of each document?

My feeling is that storing an extra ~16b per document (plus some key overhead, I suppose) is probably a good trade to avoid introducing corruption bugs in your synchronization system.

From Automerge's perspective, I tend to agree, given it's moving away complexity from the library to its users. Just to illustrate the example I have at hand today if it wasn't unclear: is that I have a system storing its state in a traditional Postgres database and I would like to use Automerge alongside it. In order to start storing things in Automerge, I need a document identifier. In other systems, I use a UUIDv5 of the object ID stored in Postgres, and I have my 'new' identifier, without having to manage any state whatsoever regarding this integration. With Automerge, I'd have to 1) create an empty document and 2) store the document ID in Postgres. It's completely feasible but more work for most users of the library (that I assume will always use Automerge alongside something else).

sroze · 2023-12-30T09:44:22Z

@pvh as described in this comment there is a need for sync servers to reject proposed changes for authorisation reasons anyway, couldn't we see this problem as part of the same category? (ie sync servers might reject creation of existing documents).

feat: create a document with a stable identifier

a06d926

sroze changed the title ~~feat: create a document with a stable identifier~~ feat: create a document with a provided identifier Dec 29, 2023

sroze commented Dec 29, 2023

View reviewed changes

pvh marked this pull request as draft January 24, 2024 00:50

pvh force-pushed the main branch 2 times, most recently from e61f8e3 to d3d1a7d Compare July 26, 2024 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: `create` a document with a provided identifier #263

feat: `create` a document with a provided identifier #263

sroze commented Dec 29, 2023

sroze Dec 29, 2023

pvh commented Dec 29, 2023

sroze commented Dec 29, 2023

sroze commented Dec 30, 2023

feat: create a document with a provided identifier #263

Are you sure you want to change the base?

feat: create a document with a provided identifier #263

Conversation

sroze commented Dec 29, 2023

sroze Dec 29, 2023

Choose a reason for hiding this comment

pvh commented Dec 29, 2023

sroze commented Dec 29, 2023

sroze commented Dec 30, 2023

feat: `create` a document with a provided identifier #263

feat: `create` a document with a provided identifier #263