Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add createdAt and createdBy #153

Closed
gmaclennan opened this issue Aug 9, 2023 · 4 comments · Fixed by #274
Closed

Add createdAt and createdBy #153

gmaclennan opened this issue Aug 9, 2023 · 4 comments · Fixed by #274
Assignees

Comments

@gmaclennan
Copy link
Member

Problem

Clients need to know which device a record was created by, because for some devices (currently mobile, could be based on role later) can only edit records that they created.

Clients need to know when a record was created in order to sort by created date.

createdAt and createdBy should not be editable.

Possible soultions

  1. Store createdAt and createdBy in every record

    This is the simplest to implement. Every time a document is updated then these two fields are duplicated to the new version. However there is nothing to stop a "rogue" device changing these fields. However this might be ok, if we are assuming that trusted users are only interacting with the database through the Mapeo Core API, which can enforce that these never change.

  2. Derive createdAt and createBy from "root" record

    The indexer could track back to the "root" record (e.g. the version with links: [] - the record from the create() action). createdBy can be derived from the hypercore key where this record is written. createdAt just needs to be trusted. This means a "rogue" client cannot change these fields by updating a document, but it adds more complexity to the indexer. If the "root" record is unavailable (e.g. if the original has not synced for whatever reason) then these fields would be unknown / undefined.

  3. Encode createdAt and createdBy to the docId

    Reading about snowflake IDs made me think of this. This has the advantage of (2) without the cost of indexing complexity and the problem of missing "root" records. (we still essentially implicitly trust that a docId is not changed by an update - but that would effectively create a new document anyway). We currently use 256-bit random numbers for our docId, which is overkill really. 72-bits of entropy with 1 million document IDs would have a 1 in 9.4 billion chance of collision, which seems like plenty - I think worth adding an extra byte over 64-bit, which would have a 1 in 37 million chance of collision with 1 million IDs.

    Some ways we might implement our IDs:

    • Encode a hash of the creator device public key as the device ID, so that the ID can only be used to find the creator device ID if they know the public key. Could hash with the random part of the ID, but would be expensive to derive the device ID for every record (because when reading would need to calculate a hash for each record). Device public keys are 256-bits, and the hash would be the same. Suggest just using the first 64-bits of the hashed device ID.
    • Encode 72-bits of random data
    • Encode a 48-bit timestamp as milliseconds since UNIX epoc (gives us until 10,889 AD)
    • Encode the random 72-bit part first (e.g. docID to be ${random72bits}${deviceID64bits}${timestamp48bits} so that when a user shares a doc ID externally by only sharing the first few bytes then we can lookup the ID quickly from SQLite index tables via searching for IDs that start with this value.
    • We don't need IDs to be sortable by creation date - can decode that to a separate column when indexing.

I'm leaning towards option (3), but option (1) might be ok, but we can't change it in the future, so maybe worth encoding this info to docIds anyway?

Related

@tomasciccola
Copy link
Contributor

As @achou11 said on slack, I would say I'm not qualified enough to have a strong opinion.It seems that option 3 is the most efficient/robust but I can't anticipate any shortcomings for it. In terms of implementation, adding those fields on the header seems pretty straightaway. Maybe there's some complexity in grabbing and validating those fields; but maybe if that's the case we can encode both fields in the header AND on the doc for know (so, basically implement (1) and (3)) since we can remove those fields on subsequent versions of the schema.
Regarding different encoding posibilities for 3, I don't have any opinion at a glance

@gmaclennan
Copy link
Member Author

Created a quick proof-of-concept for (3) this morning. Seems to make sense.

@gmaclennan gmaclennan changed the title Discussion: createdAt and createdBy Add createdAt and createdBy Aug 22, 2023
@gmaclennan gmaclennan added the post-mvp de-scoped to after MVP label Aug 22, 2023
@achou11 achou11 self-assigned this Aug 31, 2023
@achou11 achou11 removed the post-mvp de-scoped to after MVP label Aug 31, 2023
@gmaclennan
Copy link
Member Author

Revisiting this, I'm having second thoughts. Encoding this information to the docId feels too "clever" or "magic", and will either encode a truncated device identifier, or we have massive IDs.

I think (1) is nice and simple, and we can use (2) to verify. We can verify at index time, on every read, or within a "validate database" option. We can deal with verification post-MVP.

However I would make one change: instead of createdBy store originalVersionId (naming TBC), using the same format as versionId (encoded to protobuf as a version object with index and key). This allows us to quickly and easily look up the original record, and it contains the core key (or discovery key once we change it) of the hypercore where the original record was created.

TL;DR: Recommended plan is to add two fields to common properties for both the protobuf and json schema:

  1. createdAt - a timestamp (ISO string timestamp in JSON) that is duplicated across all record updates.
  2. originalVersionId - the discovery key and index of where the record was first created.

At a later stage we might want to consider parsing the versionId and originalVersionId at index time or read time to extract the creator/updator discoveryKey, from which we can look up the creator/updator deviceID (via the core ownership records).

@gmaclennan
Copy link
Member Author

Re-thinking and not sure about this again! originalVersionId could still be changed by a bad actor, so can't be used to lookup the record for validation, and adds a layer of obfuscation when trying to get the actual createdBy. Thinking maybe to stick to createdBy, which is the discoveryKey of the hypercore where the original record was created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants