-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: create
a document with a provided identifier
#263
base: main
Are you sure you want to change the base?
Conversation
create
a document with a stable identifiercreate
a document with a provided identifier
@@ -31,6 +31,7 @@ export { | |||
isValidAutomergeUrl, | |||
parseAutomergeUrl, | |||
stringifyAutomergeUrl, | |||
interpretAsDocumentId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposed for applications to do the same as this piece of code in the tests to be able to fetch such a document.
Hi @sroze, thanks for the PR. I appreciate you taking the time and including a test. I've had to turn down variations on this patch a few times but I'd be happy to help you find some kind of a solution that works for you. Let me explain. In early versions of Automerge-Repo, we actually required the user to provide a document ID but this lead to serious problems where users would create documents without shared ancestry but with the same document IDs. In the most naive case, the ID would just be a string like "my-document" but the same problem would exist with any externally sourced UUID. The issue is that Automerge needs shared object history to merge. If you import the same document on two different nodes, it will have small differences (such as time stamps) that will result in the hashes in the change graph not matching and merges will result in full document conflicts or just be rejected outright. For an analogy, imagine two people importing the same codebase into As a result, I moved to a model where we treat the documentIds as opaque system-generated identifiers. My feeling is that storing an extra ~16b per document (plus some key overhead, I suppose) is probably a good trade to avoid introducing corruption bugs in your synchronization system. Deriving a document from a content hash at import might seem at first glance to improve the situation, but that would leave us in a position where everyone starting from the same documentID (who happens to share a sync-path) would wind up merging all the changes for their documents. We could add a salt, I suppose, which would help... but I want to be very careful whatever we do here and would want to think about both correctness/expectations and any potential security problems that could be introduced. Anyway, sorry to be the bearer of bad news! One thing I have been considering is adding support for local-only "pet names" for documents. This would allow something like It might also help to hear a little bit more about your integration story. Maybe there are other approaches we can take to solving the underlying problem. |
Thank you so much for the detailed answer. The problem completely makes sens, I fully appreciate the challenge associated with resolving the (real) conflict from merging two documents with the same identifier coming from different peers/history. Technically, this will even be an issue at some point, with system-generated identifiers (aka UUID collisions). Is there any mechanism, currently, in Automerge to report/handle impossible synchronisations (aka conflicts) or it's been attempted to avoid it altogether given the conflict-free nature of each document?
From Automerge's perspective, I tend to agree, given it's moving away complexity from the library to its users. Just to illustrate the example I have at hand today if it wasn't unclear: is that I have a system storing its state in a traditional Postgres database and I would like to use Automerge alongside it. In order to start storing things in Automerge, I need a document identifier. In other systems, I use a UUIDv5 of the object ID stored in Postgres, and I have my 'new' identifier, without having to manage any state whatsoever regarding this integration. With Automerge, I'd have to 1) create an empty document and 2) store the document ID in Postgres. It's completely feasible but more work for most users of the library (that I assume will always use Automerge alongside something else). |
@pvh as described in this comment there is a need for sync servers to reject proposed changes for authorisation reasons anyway, couldn't we see this problem as part of the same category? (ie sync servers might reject creation of existing documents). |
e61f8e3
to
d3d1a7d
Compare
This allows to create documents with a given identifier. In particular, when integrating other systems with automerge, it is very useful to be able to create documents from predictable identifiers (i.e. UUID v5) so we don't need to store any 'reference' within existing systems.