Add `createdAt` and `createdBy` #153

gmaclennan · 2023-08-09T11:16:01Z

Problem

Clients need to know which device a record was created by, because for some devices (currently mobile, could be based on role later) can only edit records that they created.

Clients need to know when a record was created in order to sort by created date.

createdAt and createdBy should not be editable.

Possible soultions

Store createdAt and createdBy in every record

This is the simplest to implement. Every time a document is updated then these two fields are duplicated to the new version. However there is nothing to stop a "rogue" device changing these fields. However this might be ok, if we are assuming that trusted users are only interacting with the database through the Mapeo Core API, which can enforce that these never change.
Derive createdAt and createBy from "root" record

The indexer could track back to the "root" record (e.g. the version with links: [] - the record from the create() action). createdBy can be derived from the hypercore key where this record is written. createdAt just needs to be trusted. This means a "rogue" client cannot change these fields by updating a document, but it adds more complexity to the indexer. If the "root" record is unavailable (e.g. if the original has not synced for whatever reason) then these fields would be unknown / undefined.
Encode createdAt and createdBy to the docId

Reading about snowflake IDs made me think of this. This has the advantage of (2) without the cost of indexing complexity and the problem of missing "root" records. (we still essentially implicitly trust that a docId is not changed by an update - but that would effectively create a new document anyway). We currently use 256-bit random numbers for our docId, which is overkill really. 72-bits of entropy with 1 million document IDs would have a 1 in 9.4 billion chance of collision, which seems like plenty - I think worth adding an extra byte over 64-bit, which would have a 1 in 37 million chance of collision with 1 million IDs.

Some ways we might implement our IDs:
- Encode a hash of the creator device public key as the device ID, so that the ID can only be used to find the creator device ID if they know the public key. Could hash with the random part of the ID, but would be expensive to derive the device ID for every record (because when reading would need to calculate a hash for each record). Device public keys are 256-bits, and the hash would be the same. Suggest just using the first 64-bits of the hashed device ID.
- Encode 72-bits of random data
- Encode a 48-bit timestamp as milliseconds since UNIX epoc (gives us until 10,889 AD)
- Encode the random 72-bit part first (e.g. docID to be ${random72bits}${deviceID64bits}${timestamp48bits} so that when a user shares a doc ID externally by only sharing the first few bytes then we can lookup the ID quickly from SQLite index tables via searching for IDs that start with this value.
- We don't need IDs to be sortable by creation date - can decode that to a separate column when indexing.

I'm leaning towards option (3), but option (1) might be ok, but we can't change it in the future, so maybe worth encoding this info to docIds anyway?

As @achou11 said on slack, I would say I'm not qualified enough to have a strong opinion.It seems that option 3 is the most efficient/robust but I can't anticipate any shortcomings for it. In terms of implementation, adding those fields on the header seems pretty straightaway. Maybe there's some complexity in grabbing and validating those fields; but maybe if that's the case we can encode both fields in the header AND on the doc for know (so, basically implement (1) and (3)) since we can remove those fields on subsequent versions of the schema.
Regarding different encoding posibilities for 3, I don't have any opinion at a glance

gmaclennan · 2023-08-11T09:51:19Z

Created a quick proof-of-concept for (3) this morning. Seems to make sense.

gmaclennan · 2023-09-13T07:11:39Z

Revisiting this, I'm having second thoughts. Encoding this information to the docId feels too "clever" or "magic", and will either encode a truncated device identifier, or we have massive IDs.

I think (1) is nice and simple, and we can use (2) to verify. We can verify at index time, on every read, or within a "validate database" option. We can deal with verification post-MVP.

However I would make one change: instead of createdBy store originalVersionId (naming TBC), using the same format as versionId (encoded to protobuf as a version object with index and key). This allows us to quickly and easily look up the original record, and it contains the core key (or discovery key once we change it) of the hypercore where the original record was created.

TL;DR: Recommended plan is to add two fields to common properties for both the protobuf and json schema:

createdAt - a timestamp (ISO string timestamp in JSON) that is duplicated across all record updates.
originalVersionId - the discovery key and index of where the record was first created.

At a later stage we might want to consider parsing the versionId and originalVersionId at index time or read time to extract the creator/updator discoveryKey, from which we can look up the creator/updator deviceID (via the core ownership records).

gmaclennan · 2023-09-13T12:19:45Z

Re-thinking and not sure about this again! originalVersionId could still be changed by a bad actor, so can't be used to lookup the record for validation, and adds a layer of obfuscation when trying to get the actual createdBy. Thinking maybe to stick to createdBy, which is the discoveryKey of the hypercore where the original record was created.

tomasciccola mentioned this issue Aug 17, 2023

Add tests for encoding/decoding header on mapeo docs digidem/comapeo-schema#123

Closed

gmaclennan changed the title ~~Discussion: createdAt and createdBy~~ Add createdAt and createdBy Aug 22, 2023

gmaclennan added the post-mvp de-scoped to after MVP label Aug 22, 2023

This was referenced Aug 22, 2023

Index creatorCoreId digidem/mapeo-sqlite-indexer#7

Closed

Derive createdBy and updatedBy fields for data returned to client #136

Open

achou11 self-assigned this Aug 31, 2023

achou11 removed the post-mvp de-scoped to after MVP label Aug 31, 2023

achou11 mentioned this issue Sep 12, 2023

feat: add createdBy field for all records digidem/comapeo-schema#142

Merged

achou11 assigned tomasciccola and achou11 and unassigned achou11 Sep 18, 2023

tomasciccola mentioned this issue Sep 19, 2023

feat: addCreatedBy #274

Merged

tomasciccola closed this as completed in #274 Sep 21, 2023

gmaclennan mentioned this issue Nov 10, 2023

Change createdBy to originalVersionId #371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `createdAt` and `createdBy` #153

Add `createdAt` and `createdBy` #153

gmaclennan commented Aug 9, 2023

tomasciccola commented Aug 10, 2023

gmaclennan commented Aug 11, 2023

gmaclennan commented Sep 13, 2023

gmaclennan commented Sep 13, 2023

Add createdAt and createdBy #153

Add createdAt and createdBy #153

Comments

gmaclennan commented Aug 9, 2023

Problem

Possible soultions

Related

tomasciccola commented Aug 10, 2023

gmaclennan commented Aug 11, 2023

gmaclennan commented Sep 13, 2023

gmaclennan commented Sep 13, 2023

Add `createdAt` and `createdBy` #153

Add `createdAt` and `createdBy` #153