From de59ba7383c7f206d2ac3324343fb7a37652f47f Mon Sep 17 00:00:00 2001 From: Henry Tsai Date: Tue, 6 Jun 2023 10:50:38 -0700 Subject: [PATCH 01/12] TP25 --- tp025/README.md | 129 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 129 insertions(+) create mode 100644 tp025/README.md diff --git a/tp025/README.md b/tp025/README.md new file mode 100644 index 0000000..84e046b --- /dev/null +++ b/tp025/README.md @@ -0,0 +1,129 @@ +# TP25 Snapshots for DWeb Nodes + +```yaml +TP: 25 +Title: Snapshots for DWeb Nodes +Author(s): Henry Tsai (@thehenrytsai) +Comments URI: https://github.com/TBD54566975/technical-proposals/discussions/ +Status: Draft +Created: June 5, 2023 +Updated: June 5, 2023 +``` + +## Problem Statement + +Currently, if an entity is granted write access to a DWeb Node but later has that access revoked, the entity can still write to the DWeb Node by manipulating the timestamp of messages to be earlier than the revocation time. + + +## Proposal + +Implement a mechanism that allows the owner to capture a 'snapshot' of the state of all records within a specific scope at a particular time. This snapshot would serve to discard any messages with timestamps earlier than the snapshot but not included in it. + +The snapshot will be created be through `SnapshotsCreate` method, an example of the message: + +```json +{ + "descriptor": { + "interface": "Snapshots", + "method": "Create", + "dateCreated": "2023-06-05T11:22:33.445566Z", + "scope": "protocols//", + "dataCid": "", + "", + "", + ... +] +``` + +### Scoping + +In order to optimize the efficiency of snapshot creation and message authorization against snapshots, a snapshot `scope` property can only have a value that maps to a position in the logical tree structure below: + +```mermaid + graph TD; + global[global - empty string]-->protocol[`protocol`]; + protocol-->path[`protocolPath`] + schema-->schema-data-format[`dataFormat`] + global-->schema[`schema` - protocol-less]; + global-->data-format[`dataFormat` - schema-less]; + +``` + +1. `"scope": '' | undefined` + + This means the snapshot is taken at a global scope. ie. any message not included in the snapshot is deleted. + +1. `"scope": 'protocols/'` + + All messages under a particular protocol. + +1. `"scope": 'protocols//'` + + All messages under a particular protocol path under a protocol. + +1. `"scope": 'schemas/'` + + All non-protocol messages under a particular schema. + +1. `"scope": 'schemas//data-formats/'` + + All non-protocol-based messages under a particular schema and data-format. + Unsure of its practical use, this is mainly for illustration purpose. + +1. `"scope": 'data=formats/'` + + All schema-less messages under a particular data format. + +The intent of the prescribed scoping structure is to minimize the possible permutation of scopes a message can appear in by enforcing a message to appear in only one the hierarchy branch; if we allow a more free-formed scoping syntax (based on filters for example), we'd need to have a more complex include-list computation logic and/or iterate over many snapshots for: + 1. evaluating if a message under the scope a newly snapshot needs to be kept or removed; and + 1. authorizing a message. + + +It's important to realize that its not possible to create a single 'overall' include-list of message CIDs for authorization purposes, because the appropriate include-list varies based on the timestamp of the received message. As a result, we the include-list of message CIDs much be dynamically compose. + +### Scope Processing + +General rules: + +1. A newer snapshot erases all older snapshots with the same or a descendent scope. (e.g. a newer snapshot with "protocol" scope overwrites all older snapshots with any "protocol path" scope under the same protocol) + +1. A newer snapshot invalidates and overwrites messages that are under the same (sub)scope included in any older snapshot with an ancestral scope. (e.g. a newer snapshot with a "protocol path" scope invalidates and overwrites all messages that are under the same "protocol path" included in a parent "protocol" snapshot) + +## Additional Considerations + +1. Since this is a highly privileged operation, this TP suggests initially limiting it to the DID owner and potentially extending access to other actors later. + +1. Sync will likely need to have awareness of Snapshots messages. Consider a scenario where a DWN receives a snapshot containing messages it does not (no longer) have: + + ```mermaid + sequenceDiagram + Alice->>DWN1: Snapshot1(ScopeX, CID1) + Alice->>DWN2: Snapshot2(ScopeX, CID1, CID2) + DWN1->>DWN2: Sync(Snapshot1) + DWN2->>DWN2: Discard Snapshot1 since it is older than Snapshot2 + DWN2->>DWN1: Sync(Snapshot2) + DWN1->>DWN1: Discovers message CID2 is needed but does not exist + Note right of DWN1: Possible remedy: + DWN1->>DWN2: Fetch(CID2) + DWN2-->>DWN1: Message of CID2 + ``` + +1. It is apparent that snapshot scoping turns out to be quite "tailor-made" towards permission and protocol, so maybe it is really not practical to have a pure general purpose snapshot feature beyond the first 2 levels of scoping hierarchy. + +1. The currently proposed structure falls short if there is a need to snapshot a specific protocol context (it's likely there are additional unsupported scenarios). We could introduce support for it under the "protocol" subtree, but that would violate the current design goal of "no intersecting message set in branches". + +1. It is be extremely desirable if not necessary for the scoping scheme used in snapshot be compatible to permission scoping and protocol hierarchy, such that messages in a snapshots created can roughly match the scope of permission or path of a protocol feature toggle. It does not make sense functionally and dangerous even for example to allow a snapshot of both protocol-authorized and non-protocol-authorized Records messages with schema `x`. + + If we allow intersecting scopes, it means we'd need to know the snapshot scopes each messages belongs in at any given time, so that we'd have necessary information when evaluating if a message should be kept or not due to a new snapshot being created. Furthermore, when authorizing a message against the snapshots, we'd need to iterate overall many/all snapshots. + +1. It doesn't seem logical to permit the deletion of a snapshot once it's created for authorization purposes. If the snapshot is deleted, the DWeb Node will no longer be able to utilize the deleted snapshot to prevent unauthorized access. From 53dcb19965fa4f7cd5732c11831f60eec53ef0aa Mon Sep 17 00:00:00 2001 From: Henry Tsai Date: Tue, 6 Jun 2023 10:52:54 -0700 Subject: [PATCH 02/12] fixed typo --- tp025/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tp025/README.md b/tp025/README.md index 84e046b..adb52c6 100644 --- a/tp025/README.md +++ b/tp025/README.md @@ -89,7 +89,7 @@ The intent of the prescribed scoping structure is to minimize the possible permu 1. authorizing a message. -It's important to realize that its not possible to create a single 'overall' include-list of message CIDs for authorization purposes, because the appropriate include-list varies based on the timestamp of the received message. As a result, we the include-list of message CIDs much be dynamically compose. +It's important to realize that its not possible to create a single 'overall' include-list of message CIDs for authorization purposes, because the appropriate include-list varies based on the timestamp of the received message. As a result, we the include-list of message CIDs much be dynamically composed. ### Scope Processing From c571e6bebad49a9aff697b9559a5425b7fd86e91 Mon Sep 17 00:00:00 2001 From: Henry Tsai Date: Tue, 6 Jun 2023 10:56:55 -0700 Subject: [PATCH 03/12] typos --- tp025/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tp025/README.md b/tp025/README.md index adb52c6..2b29218 100644 --- a/tp025/README.md +++ b/tp025/README.md @@ -27,9 +27,9 @@ The snapshot will be created be through `SnapshotsCreate` method, an example of "interface": "Snapshots", "method": "Create", "dateCreated": "2023-06-05T11:22:33.445566Z", - "scope": "protocols//", + "scope": "protocols//", "dataCid": " Date: Tue, 6 Jun 2023 12:25:01 -0700 Subject: [PATCH 04/12] edits --- tp025/README.md | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/tp025/README.md b/tp025/README.md index 2b29218..e662327 100644 --- a/tp025/README.md +++ b/tp025/README.md @@ -84,12 +84,14 @@ In order to optimize the efficiency of snapshot creation and message authorizati All schema-less messages under a particular data format. -The intent of the prescribed scoping structure is to minimize the possible permutation of scopes a message can appear in by enforcing a message to appear in only one the hierarchy branch; if we allow a more free-formed scoping syntax (based on filters for example), we'd need to have a more complex include-list computation logic and/or iterate over many snapshots for: +We only need to keep newest snapshot of any given scope. + +The intent of the prescribed scoping structure is to minimize the possible permutation of scopes a message can appear in by enforcing a message to appear in only one the hierarchy branch. if we allow a more free-formed scoping syntax (based on filters for example), we'd need to have a more complex include-list computation logic and/or iterate over many snapshots for: 1. evaluating if a message under the scope a newly snapshot needs to be kept or removed; and 1. authorizing a message. -It's important to realize that its not possible to create a single 'overall' include-list of message CIDs for authorization purposes, because the appropriate include-list varies based on the timestamp of the received message. As a result, we the include-list of message CIDs much be dynamically composed. +Note that there may not be a single 'overall' include-list of message CIDs for authorization purposes, because snapshots can be taken with scopes that have no intersection. (e.g. snapshot A with protocol X scope and snapshot B with schema-less data format Y scope) ### Scope Processing @@ -99,6 +101,17 @@ General rules: 1. A newer snapshot invalidates and overwrites messages that are under the same (sub)scope included in any older snapshot with an ancestral scope. (e.g. a newer snapshot with a "protocol path" scope invalidates and overwrites all messages that are under the same "protocol path" included in a parent "protocol" snapshot) +Pseudo-code for `SnapshotsCreate` handling: +```typescript + +// figure out if this snapshot should be ignored or processed +// TODO: + +for (const cid of cidsInSnapshot) { + // TODO: +} +``` + ## Additional Considerations 1. Since this is a highly privileged operation, this TP suggests initially limiting it to the DID owner and potentially extending access to other actors later. @@ -122,8 +135,8 @@ General rules: 1. The currently proposed structure falls short if there is a need to snapshot a specific protocol context (it's likely there are additional unsupported scenarios). We could introduce support for it under the "protocol" subtree, but that would violate the current design goal of "no intersecting message set in branches". -1. It is be extremely desirable if not necessary for the scoping scheme used in snapshot be compatible to permission scoping and protocol hierarchy, such that messages in a snapshots created can roughly match the scope of permission or path of a protocol feature toggle. It does not make sense functionally and dangerous even for example to allow a snapshot of both protocol-authorized and non-protocol-authorized Records messages with schema `x`. +1. It is be extremely desirable if not necessary for the scoping scheme used in snapshot be compatible to permission scoping and protocol hierarchy, such that messages in a snapshots created can roughly match the scope of permission or path of a protocol feature toggle. - If we allow intersecting scopes, it means we'd need to know the snapshot scopes each messages belongs in at any given time, so that we'd have necessary information when evaluating if a message should be kept or not due to a new snapshot being created. Furthermore, when authorizing a message against the snapshots, we'd need to iterate overall many/all snapshots. +1. It does not make sense functionally and dangerous even to allow scopes that cuts across both protocol-authorized messages and protocol-less message. 1. It doesn't seem logical to permit the deletion of a snapshot once it's created for authorization purposes. If the snapshot is deleted, the DWeb Node will no longer be able to utilize the deleted snapshot to prevent unauthorized access. From a8191b912ff4b2f8306c0a9d692732b3c770a780 Mon Sep 17 00:00:00 2001 From: Henry Tsai Date: Tue, 6 Jun 2023 12:25:44 -0700 Subject: [PATCH 05/12] typo --- tp025/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tp025/README.md b/tp025/README.md index e662327..4ad242d 100644 --- a/tp025/README.md +++ b/tp025/README.md @@ -80,7 +80,7 @@ In order to optimize the efficiency of snapshot creation and message authorizati All non-protocol-based messages under a particular schema and data-format. Unsure of its practical use, this is mainly for illustration purpose. -1. `"scope": 'data=formats/'` +1. `"scope": 'data-formats/'` All schema-less messages under a particular data format. From dbacf6ce1363292260650b9a446d080f64110893 Mon Sep 17 00:00:00 2001 From: Henry Tsai Date: Tue, 6 Jun 2023 15:13:25 -0700 Subject: [PATCH 06/12] updated TP --- tp025/README.md | 47 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 43 insertions(+), 4 deletions(-) diff --git a/tp025/README.md b/tp025/README.md index 4ad242d..427789c 100644 --- a/tp025/README.md +++ b/tp025/README.md @@ -99,17 +99,52 @@ General rules: 1. A newer snapshot erases all older snapshots with the same or a descendent scope. (e.g. a newer snapshot with "protocol" scope overwrites all older snapshots with any "protocol path" scope under the same protocol) -1. A newer snapshot invalidates and overwrites messages that are under the same (sub)scope included in any older snapshot with an ancestral scope. (e.g. a newer snapshot with a "protocol path" scope invalidates and overwrites all messages that are under the same "protocol path" included in a parent "protocol" snapshot) +1. A newer descendent snapshot overwrites inclusion of messages that falls under its (sub)scope in the parent snapshot. (e.g. a newer descendent snapshot with a `protocolX/pathSegment1/pathSegment2` scope overwrites inclusion of messages of a parent snapshot with scope `protocolX`) Pseudo-code for `SnapshotsCreate` handling: ```typescript // figure out if this snapshot should be ignored or processed -// TODO: +const newerSnapshots = getNewerSnapshots(incomingSnapshot.timestamp); +for (const newerSnapshot of newerSnapshots) { + if (newerSnapshot.scope.isSuperSetOf(incomingSnapshot.scope)) { + return; // no need to process this snapshot + } +} -for (const cid of cidsInSnapshot) { - // TODO: +// delete all CIDs that falls under the scope of the incoming snapshot so we can repopulate that subsection correctly +// an alternate strategy is to iterate overall all CIDs under the scope and remove as needed, but that seems less efficient +deleteAllCidsUnderScope(incomingSnapshot.scope); + +// const inclusionList = new Map; // CID -> scope map + +// computes the complete inclusion list at the scope of thd given snapshot +function computeInclusionList(currentSnapshot) { + // NOTE: immediate descending snapshots do NOT have to have direct child scope + const immediateDescendingSnapshots = getImmediateDescendingSnapshots(currentSnapshot); + for (const immediateDescendingSnapshot in immediateDescendingSnapshots) { + computeInclusionList(immediateDescendingSnapshot.scope); + } + + for (const cid of currentSnapshot.cids) { + const messageScope = getMessageFullScope(cid); // consideration: opportunity for optimization + + if (immediateDescendingSnapshots.scopes.hasSuperSetOf(messageScope)) { + continue; // a newer descendent snapshot overwrites inclusion of messages that falls under its (sub)scope. + } + + // else + // `finalInclusionList` is the in-memory global inclusion list that also stores the concise scope (for optimization) + this.finalInclusionList.set(messageCid, messageScope); + } } + +computeInclusionList(incomingSnapshot); + +// delete all messages that are not in the inclusion list +const cidsUnderIncomingSnapshotScope = getCidsUnderScope(incomingSnapshot.scope); +this.storageController.deleteMessageAndData(cidsUnderIncomingSnapshotScope); + ``` ## Additional Considerations @@ -135,6 +170,10 @@ for (const cid of cidsInSnapshot) { 1. The currently proposed structure falls short if there is a need to snapshot a specific protocol context (it's likely there are additional unsupported scenarios). We could introduce support for it under the "protocol" subtree, but that would violate the current design goal of "no intersecting message set in branches". +1. The CID inclusion list construction requires the most concise scope of every message referenced in a snapshot by CID, while we can obtain the info by fetching the actual message per CID, this approach is highly inefficient. We could require scope to be included for each CID in the snapshot for an instance lookup, but can we blindly trust the value given to us? This requires further thinking. + +1. The deletion of messages does not take into account of their corresponding Record, this means an semantically valid but logically invalid list of CIDs can render the DWN in a corrupt state (e.g. containing only pruned initial `RecordsWrite` without subsequent `RecordsWrite` or `RecordsDelete`). + 1. It is be extremely desirable if not necessary for the scoping scheme used in snapshot be compatible to permission scoping and protocol hierarchy, such that messages in a snapshots created can roughly match the scope of permission or path of a protocol feature toggle. 1. It does not make sense functionally and dangerous even to allow scopes that cuts across both protocol-authorized messages and protocol-less message. From 86dee0f155ee5bb3043aca66316a2fda5321e687 Mon Sep 17 00:00:00 2001 From: Henry Tsai Date: Tue, 6 Jun 2023 16:55:09 -0700 Subject: [PATCH 07/12] pseudo-code for authorization --- tp025/README.md | 58 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 41 insertions(+), 17 deletions(-) diff --git a/tp025/README.md b/tp025/README.md index 427789c..dcbb608 100644 --- a/tp025/README.md +++ b/tp025/README.md @@ -47,7 +47,7 @@ The payload will simply contain a list of message CIDs, a potential structure: ### Scoping -In order to optimize the efficiency of snapshot creation and message authorization against snapshots, a snapshot `scope` property can only have a value that maps to a position in the logical tree structure below: +In order to optimize the efficiency of `SnapshotsCreate` processing, the snapshot `scope` property can only have a value that maps to a position in the logical tree structure below: ```mermaid graph TD; @@ -59,39 +59,36 @@ In order to optimize the efficiency of snapshot creation and message authorizati ``` -1. `"scope": '' | undefined` +1. `"scope": "" | undefined` This means the snapshot is taken at a global scope. ie. any message not included in the snapshot is deleted. -1. `"scope": 'protocols/'` +1. `"scope": "protocols/"` - All messages under a particular protocol. + Messages under a particular protocol. -1. `"scope": 'protocols//'` +1. `"scope": "protocols//"` - All messages under a particular protocol path under a protocol. + Messages under a particular protocol path under a protocol. -1. `"scope": 'schemas/'` +1. `"scope": "schemas/"` - All non-protocol messages under a particular schema. + Non-protocol messages under a particular schema. -1. `"scope": 'schemas//data-formats/'` +1. `"scope": "schemas//data-formats/"` - All non-protocol-based messages under a particular schema and data-format. + Non-protocol-based messages under a particular schema and data-format. Unsure of its practical use, this is mainly for illustration purpose. -1. `"scope": 'data-formats/'` +1. `"scope": "data-formats/"` - All schema-less messages under a particular data format. + Schema-less messages under a particular data format. We only need to keep newest snapshot of any given scope. -The intent of the prescribed scoping structure is to minimize the possible permutation of scopes a message can appear in by enforcing a message to appear in only one the hierarchy branch. if we allow a more free-formed scoping syntax (based on filters for example), we'd need to have a more complex include-list computation logic and/or iterate over many snapshots for: - 1. evaluating if a message under the scope a newly snapshot needs to be kept or removed; and - 1. authorizing a message. +The intent of the prescribed scoping structure is to minimize the possible permutation of scopes a message can appear in by enforcing a message to appear in only one the hierarchy branch. if we allow a more free-formed scoping syntax (based on filters for example), we'd need to have a more complex include-list computation logic and/or iterate over many snapshots when evaluating if a message under the scope a newly snapshot needs to be kept or removed. - -Note that there may not be a single 'overall' include-list of message CIDs for authorization purposes, because snapshots can be taken with scopes that have no intersection. (e.g. snapshot A with protocol X scope and snapshot B with schema-less data format Y scope) +Note that there may be multiple logical include-list of message CIDs for authorization purposes, because snapshots can be taken with scopes that have no intersection (e.g. snapshot A with protocol X scope and snapshot B with schema-less data format Y scope). A message that does not fall under any scope of any snapshot MUST be kept. ### Scope Processing @@ -147,6 +144,33 @@ this.storageController.deleteMessageAndData(cidsUnderIncomingSnapshotScope); ``` +Pseudo-code for snapshot authorization: +```typescript +// get newer snapshots +const newerSnapshots = getNewerSnapshots(incomingMessage.timestamp); + +// if there is one newer snapshot with scope that the incoming message falls under, then we will need to snapshot-authorize it +const needSnapshotAuthorization = false; +for (const newerSnapshot of newerSnapshots) { + if (newerSnapshot.scope.isSuperSetOf(incomingMessage.scope)) { + needSnapshotAuthorization = true; + break; + } +} + +if (!needSnapshotAuthorization) { + return; +} + +if (this.finalInclusionList.has(incomingMessage.cid)) { + return; +} + +// else +throw Error('Message failed snapshot-authorization.'); + +``` + ## Additional Considerations 1. Since this is a highly privileged operation, this TP suggests initially limiting it to the DID owner and potentially extending access to other actors later. From e4ce79eeccd726d23ba500ef8bbfbd2bae0f4af2 Mon Sep 17 00:00:00 2001 From: Henry Tsai Date: Tue, 6 Jun 2023 17:58:42 -0700 Subject: [PATCH 08/12] added snapshot example + updated text --- tp025/README.md | 41 +++++++++++++++++++++++++++++++---------- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/tp025/README.md b/tp025/README.md index dcbb608..bad6d71 100644 --- a/tp025/README.md +++ b/tp025/README.md @@ -50,13 +50,12 @@ The payload will simply contain a list of message CIDs, a potential structure: In order to optimize the efficiency of `SnapshotsCreate` processing, the snapshot `scope` property can only have a value that maps to a position in the logical tree structure below: ```mermaid - graph TD; - global[global - empty string]-->protocol[`protocol`]; - protocol-->path[`protocolPath`] - schema-->schema-data-format[`dataFormat`] - global-->schema[`schema` - protocol-less]; - global-->data-format[`dataFormat` - schema-less]; - +graph TD; + global[global - empty string]-->protocol[`protocol`]; + protocol-->path[`protocolPath`] + schema-->schema-data-format[`dataFormat`] + global-->schema[`schema` - protocol-less]; + global-->data-format[`dataFormat` - schema-less]; ``` 1. `"scope": "" | undefined` @@ -84,11 +83,33 @@ In order to optimize the efficiency of `SnapshotsCreate` processing, the snapsho Schema-less messages under a particular data format. -We only need to keep newest snapshot of any given scope. -The intent of the prescribed scoping structure is to minimize the possible permutation of scopes a message can appear in by enforcing a message to appear in only one the hierarchy branch. if we allow a more free-formed scoping syntax (based on filters for example), we'd need to have a more complex include-list computation logic and/or iterate over many snapshots when evaluating if a message under the scope a newly snapshot needs to be kept or removed. +Example snapshots in a DWN: + +- Round box - no snapshot defined +- Padded Square box - snapshot defined + +```mermaid +graph TD; + global(["global scope (no snapshot)"])-->socialMediaProtocol[["social-media protocol snapshot"]]; + socialMediaProtocol-->posts[["posts snapshot"]] + posts-->postComments(["post comments (no snapshot)"]) + socialMediaProtocol-->photoAlbum(["photo albums (no snapshot)"]) + photoAlbum-->photos[["photos snapshot"]] + global-->chatProtocol(["chat protocol (no snapshot)"]); + + global-->vcSchema(["protocol-less VCs (no snapshot)"]); + vcSchema-->vcFormat[["JWT VCs snapshot"]]; + global-->mp3(["schema-less MP3s (no snapshot)"]); +``` + +1. We only need to keep newest snapshot of any given scope. + +1. A Records message will by-design always fall under one and only one leaf-node scope in the hierarchical scoping structure. + +1. The intent of the prescribed scoping structure is to minimize the possible permutation of scopes a message can appear in. If we were to allow a more flexible scoping syntax, such as using filters, it would require a more complex or less efficient logic for maintaining the overall CID include-list when evaluating messages that fall under the scope of a new snapshot for deletion. -Note that there may be multiple logical include-list of message CIDs for authorization purposes, because snapshots can be taken with scopes that have no intersection (e.g. snapshot A with protocol X scope and snapshot B with schema-less data format Y scope). A message that does not fall under any scope of any snapshot MUST be kept. +1. There can be multiple logical include-lists since snapshots can be taken with non-intersecting scopes. For example, snapshot A may have a scope of "protocol X", while snapshot B may have a entirely unrelated scope of "schema-less data format Y" with no parent/global snapshot that link them together. A message that does not fall under any scope defined in any snapshot MUST be kept. In the actual implementation, we might be able to utilize a single `Map`, as illustrated in the pseudo-code below. ### Scope Processing From be1ab5849a4631ae4a680ebb25c95ffdac77b0e9 Mon Sep 17 00:00:00 2001 From: Henry Tsai Date: Tue, 6 Jun 2023 19:55:04 -0700 Subject: [PATCH 09/12] updated text --- tp025/README.md | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/tp025/README.md b/tp025/README.md index bad6d71..9d3a3e8 100644 --- a/tp025/README.md +++ b/tp025/README.md @@ -19,7 +19,7 @@ Currently, if an entity is granted write access to a DWeb Node but later has tha Implement a mechanism that allows the owner to capture a 'snapshot' of the state of all records within a specific scope at a particular time. This snapshot would serve to discard any messages with timestamps earlier than the snapshot but not included in it. -The snapshot will be created be through `SnapshotsCreate` method, an example of the message: +A snapshot will be created through a `SnapshotsCreate` message, an example of the message: ```json { @@ -47,7 +47,7 @@ The payload will simply contain a list of message CIDs, a potential structure: ### Scoping -In order to optimize the efficiency of `SnapshotsCreate` processing, the snapshot `scope` property can only have a value that maps to a position in the logical tree structure below: +In order to optimize the efficiency of `SnapshotsCreate` processing, the snapshot `scope` property can only have a value that maps to one position in the logical tree structure below: ```mermaid graph TD; @@ -81,8 +81,7 @@ graph TD; 1. `"scope": "data-formats/"` - Schema-less messages under a particular data format. - + Schema-less messages with a particular data format. Example snapshots in a DWN: @@ -103,11 +102,11 @@ graph TD; global-->mp3(["schema-less MP3s (no snapshot)"]); ``` -1. We only need to keep newest snapshot of any given scope. +1. Snapshot scope will by-design fall under one and only one branch in the hierarchical scoping structure. Similarly a Records message will by-design always fall under one and only one leaf-node scope in the hierarchical scoping structure. -1. A Records message will by-design always fall under one and only one leaf-node scope in the hierarchical scoping structure. +1. We only need to keep the newest snapshot of any given scope. -1. The intent of the prescribed scoping structure is to minimize the possible permutation of scopes a message can appear in. If we were to allow a more flexible scoping syntax, such as using filters, it would require a more complex or less efficient logic for maintaining the overall CID include-list when evaluating messages that fall under the scope of a new snapshot for deletion. +1. The intent of the prescribed scoping structure is to minimize the possible permutation of scopes a message can appear in. If we were to allow a more flexible scoping syntax, such as using filters, it would require a more complex or less efficient logic for maintaining the overall CID include-list when evaluating messages that fall under the scope of a new snapshot for retention/deletion. 1. There can be multiple logical include-lists since snapshots can be taken with non-intersecting scopes. For example, snapshot A may have a scope of "protocol X", while snapshot B may have a entirely unrelated scope of "schema-less data format Y" with no parent/global snapshot that link them together. A message that does not fall under any scope defined in any snapshot MUST be kept. In the actual implementation, we might be able to utilize a single `Map`, as illustrated in the pseudo-code below. @@ -115,7 +114,7 @@ graph TD; General rules: -1. A newer snapshot erases all older snapshots with the same or a descendent scope. (e.g. a newer snapshot with "protocol" scope overwrites all older snapshots with any "protocol path" scope under the same protocol) +1. A newer snapshot erases all older snapshots with the same or a descendent scope. (e.g. a newer snapshot with "protocol X" scope overwrites all older snapshots with any "protocol path" scope under the same protocol) 1. A newer descendent snapshot overwrites inclusion of messages that falls under its (sub)scope in the parent snapshot. (e.g. a newer descendent snapshot with a `protocolX/pathSegment1/pathSegment2` scope overwrites inclusion of messages of a parent snapshot with scope `protocolX`) @@ -194,7 +193,7 @@ throw Error('Message failed snapshot-authorization.'); ## Additional Considerations -1. Since this is a highly privileged operation, this TP suggests initially limiting it to the DID owner and potentially extending access to other actors later. +1. Since this is a highly privileged operation, this TP suggests initially limiting access to the DID owner and potentially extending access to other actors later. 1. Sync will likely need to have awareness of Snapshots messages. Consider a scenario where a DWN receives a snapshot containing messages it does not (no longer) have: @@ -211,16 +210,16 @@ throw Error('Message failed snapshot-authorization.'); DWN2-->>DWN1: Message of CID2 ``` -1. It is apparent that snapshot scoping turns out to be quite "tailor-made" towards permission and protocol, so maybe it is really not practical to have a pure general purpose snapshot feature beyond the first 2 levels of scoping hierarchy. +1. It is apparent that currently snapshot scoping turns out to be quite "tailor-made" towards permission and protocol, so maybe it is really not practical to have a pure general purpose snapshot feature beyond the first 2 levels of scoping hierarchy. -1. The currently proposed structure falls short if there is a need to snapshot a specific protocol context (it's likely there are additional unsupported scenarios). We could introduce support for it under the "protocol" subtree, but that would violate the current design goal of "no intersecting message set in branches". +1. The currently proposed structure falls short if there is a need to snapshot a specific protocol context (it's likely there are additional unsupported scenarios). We could introduce support for it under the "protocol" subtree, but that would violate the current design goal of "strictly one leaf-code scope per message". -1. The CID inclusion list construction requires the most concise scope of every message referenced in a snapshot by CID, while we can obtain the info by fetching the actual message per CID, this approach is highly inefficient. We could require scope to be included for each CID in the snapshot for an instance lookup, but can we blindly trust the value given to us? This requires further thinking. +1. The CID inclusion list construction needs the leaf-node scope of every message referenced in a snapshot by CID, while we can obtain this info by fetching the actual message for each CID, this approach is highly inefficient. We could require scope to be included for each CID in the snapshot for an instance lookup, but can we blindly trust the value given to us? This requires further thinking. 1. The deletion of messages does not take into account of their corresponding Record, this means an semantically valid but logically invalid list of CIDs can render the DWN in a corrupt state (e.g. containing only pruned initial `RecordsWrite` without subsequent `RecordsWrite` or `RecordsDelete`). -1. It is be extremely desirable if not necessary for the scoping scheme used in snapshot be compatible to permission scoping and protocol hierarchy, such that messages in a snapshots created can roughly match the scope of permission or path of a protocol feature toggle. +1. It is highly desirable for the snapshot scoping scheme to align with permission scoping and protocol hierarchy. -1. It does not make sense functionally and dangerous even to allow scopes that cuts across both protocol-authorized messages and protocol-less message. +1. It does not make sense functionally and even potentially dangerous even to allow scopes that span across both protocol-authorized messages and protocol-less message. -1. It doesn't seem logical to permit the deletion of a snapshot once it's created for authorization purposes. If the snapshot is deleted, the DWeb Node will no longer be able to utilize the deleted snapshot to prevent unauthorized access. +1. It does not seem logical to permit the deletion of a snapshot once it is created for authorization purposes. If the snapshot is deleted, the DWeb Node will no longer be able to utilize the deleted snapshot to prevent unauthorized access. From 3f1b4886ece02d99a88311568365eba1596f48a1 Mon Sep 17 00:00:00 2001 From: Henry Tsai Date: Fri, 9 Jun 2023 19:12:24 -0700 Subject: [PATCH 10/12] updated based on review --- tp025/README.md | 115 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 82 insertions(+), 33 deletions(-) diff --git a/tp025/README.md b/tp025/README.md index 9d3a3e8..c5262eb 100644 --- a/tp025/README.md +++ b/tp025/README.md @@ -47,15 +47,18 @@ The payload will simply contain a list of message CIDs, a potential structure: ### Scoping -In order to optimize the efficiency of `SnapshotsCreate` processing, the snapshot `scope` property can only have a value that maps to one position in the logical tree structure below: +In order to minimize complexity of `SnapshotsCreate` processing, the snapshot `scope` property must have a value that maps to one of the allowed positions in the logical tree structure below: ```mermaid graph TD; global[global - empty string]-->protocol[`protocol`]; - protocol-->path[`protocolPath`] + protocol-->path[`protocolPath`]:::intersectingScope + protocol-->context[`contextId`]:::intersectingScope schema-->schema-data-format[`dataFormat`] global-->schema[`schema` - protocol-less]; global-->data-format[`dataFormat` - schema-less]; + + classDef intersectingScope stroke-dasharray: 5 ``` 1. `"scope": "" | undefined` @@ -66,10 +69,14 @@ graph TD; Messages under a particular protocol. -1. `"scope": "protocols//"` +1. `"scope": "protocols//path/"` Messages under a particular protocol path under a protocol. +1. `"scope": "protocols//context/"` + + Messages under a particular context ID under a protocol. + 1. `"scope": "schemas/"` Non-protocol messages under a particular schema. @@ -102,13 +109,13 @@ graph TD; global-->mp3(["schema-less MP3s (no snapshot)"]); ``` -1. Snapshot scope will by-design fall under one and only one branch in the hierarchical scoping structure. Similarly a Records message will by-design always fall under one and only one leaf-node scope in the hierarchical scoping structure. +1. Snapshot scope of `protocolPath` and `contextId` are the only two leaf scopes that are intersecting. That is, a message can simultaneously be referenced in a `protocolPath` scoped snapshot as well as a `contextId` scoped snapshot. 1. We only need to keep the newest snapshot of any given scope. -1. The intent of the prescribed scoping structure is to minimize the possible permutation of scopes a message can appear in. If we were to allow a more flexible scoping syntax, such as using filters, it would require a more complex or less efficient logic for maintaining the overall CID include-list when evaluating messages that fall under the scope of a new snapshot for retention/deletion. +1. The intent of the prescribed scoping structure is to minimize the possible permutation of scopes a message can appear in. If we were to allow a more flexible scoping syntax, such as using filters, it would require a more complex or less efficient logic for maintaining the overall CID retention list . -1. There can be multiple logical include-lists since snapshots can be taken with non-intersecting scopes. For example, snapshot A may have a scope of "protocol X", while snapshot B may have a entirely unrelated scope of "schema-less data format Y" with no parent/global snapshot that link them together. A message that does not fall under any scope defined in any snapshot MUST be kept. In the actual implementation, we might be able to utilize a single `Map`, as illustrated in the pseudo-code below. +1. A message that does not fall under the scope of any snapshot is not subject to snapshot authorization, thus MUST be kept. ### Scope Processing @@ -116,51 +123,91 @@ General rules: 1. A newer snapshot erases all older snapshots with the same or a descendent scope. (e.g. a newer snapshot with "protocol X" scope overwrites all older snapshots with any "protocol path" scope under the same protocol) -1. A newer descendent snapshot overwrites inclusion of messages that falls under its (sub)scope in the parent snapshot. (e.g. a newer descendent snapshot with a `protocolX/pathSegment1/pathSegment2` scope overwrites inclusion of messages of a parent snapshot with scope `protocolX`) +1. A newer descendent snapshot overwrites retention of messages that falls under its (sub)scope in the parent snapshot. (e.g. a newer descendent snapshot with a `protocolX/pathSegment1/pathSegment2` scope overwrites retention of messages of a parent snapshot with scope `protocolX`) -Pseudo-code for `SnapshotsCreate` handling: +Pseudo-code for `SnapshotsCreate` processing: ```typescript -// figure out if this snapshot should be ignored or processed +// processing algorithm in a nutshell: +// maintain an aggregate retention list of CIDs (`this.finalRetentionList`) for quick snapshot-authorization evaluation +// +// 0. determine if the incoming snapshot should be ignored or processed +// 1. first delete all the CIDs in the final retention list that came from snapshots with scope same as, or sub-scope of, the incoming snapshot scope, so that we can rebuild that section of the retention list +// 2. delete all older snapshots with same or sub-scope, because a newer parent scope snapshot trumps any older snapshots with a sub-scope +// 3. update the final retention list by inserting the CIDs that need to be retained under the incoming snapshot scope; then +// 4. delete all DWN messages under the incoming snapshot scope (including sub-scopes) that are not in the retention list + + +// 0. determine if the incoming snapshot should be ignored or processed const newerSnapshots = getNewerSnapshots(incomingSnapshot.timestamp); for (const newerSnapshot of newerSnapshots) { - if (newerSnapshot.scope.isSuperSetOf(incomingSnapshot.scope)) { + if (newerSnapshot.scope.isParentScopeOf(incomingSnapshot.scope)) { return; // no need to process this snapshot } } -// delete all CIDs that falls under the scope of the incoming snapshot so we can repopulate that subsection correctly -// an alternate strategy is to iterate overall all CIDs under the scope and remove as needed, but that seems less efficient -deleteAllCidsUnderScope(incomingSnapshot.scope); +// 1. delete all the CIDs in the final retention list that came from snapshots with scope same as, or sub-scope of, the incoming snapshot scope +// +// get all CIDs of retained in snapshots under the incoming snapshot scope (including sub-scopes), regardless of snapshot timestamp +// NOTE: logic for doing this may be optimized to look very similar to the recursive retention list computation +const snapshotsUnderIncomingSnapshotScope = getSnapshotsUnderScope(incomingSnapshot.scope); +const cidCandidatesForRemoval = getAllRetainedCidsInSnapshots(snapshotsUnderIncomingSnapshotScope); +for (const cid in cidCandidatesForRemoval) { + // TODO: figure out how to efficiently obtain full scope info without fetching the message + const messageScope = getMessageFullScope(cid); + // this is where we need to make sure we don't remove CIDs in the final retention list that are referenced by an external intersecting scope + // ie. if incoming snapshot scope is `contextId` scoped, + // then we need to make sure we check the retention list of intersecting snapshots with `protocolPath` scope + // eg. if a CID is being evaluated for deletion under the `contextId` scope and its `protocolPath` is `foo/bar/baz`, + // we need to check snapshots with `protocolPath` scope of `foo`, and `foo/bar`, and `foo/bar/baz` + // to make sure none of those snapshots attempts to retrain the same CID + // TODO: requires more drilling into as it appears to be very costly + const intersectingSnapshots = getIntersectingSnapshots(messageScope, incomingSnapshot.scope); + for (const intersectingSnapshot of intersectingSnapshot) { + const retainedInIntersectingSnapshot = intersectingSnapshot.includes(cid); + if (retainedInIntersectingSnapshot) { + continue; // can't remove the CID from the final retention list if it is being referenced by an intersecting snapshot + } + } + + // delete if not referenced in any snapshots with externally intersecting scopes + this.finalRetentionList.delete(cid); +} + + +// 2. delete all older snapshots with same or sub-scope, because a newer parent scope snapshot trumps any older snapshots with a sub-scope +deleteOlderSnapshotsWithSameOrSubScope(); -// const inclusionList = new Map; // CID -> scope map +// computes the complete retention list at the scope of thd given snapshot +function updateRetentionList(currentSnapshot) { + // NOTE: immediate newer descending snapshots do NOT have to have direct child scope + const immediateNewerDescendingSnapshots = getImmediateNewerDescendingSnapshots(currentSnapshot); -// computes the complete inclusion list at the scope of thd given snapshot -function computeInclusionList(currentSnapshot) { - // NOTE: immediate descending snapshots do NOT have to have direct child scope - const immediateDescendingSnapshots = getImmediateDescendingSnapshots(currentSnapshot); - for (const immediateDescendingSnapshot in immediateDescendingSnapshots) { - computeInclusionList(immediateDescendingSnapshot.scope); + // NOTE: looping through the immediate descending snapshots can probably be after the looping of the CIDs in this snapshot below also + for (const immediateNewerDescendingSnapshot in immediateNewerDescendingSnapshots) { + updateRetentionList(immediateNewerDescendingSnapshot.scope); } for (const cid of currentSnapshot.cids) { - const messageScope = getMessageFullScope(cid); // consideration: opportunity for optimization + // TODO: figure out how to efficiently obtain full scope info without fetching the message + const messageScope = getMessageFullScope(cid); - if (immediateDescendingSnapshots.scopes.hasSuperSetOf(messageScope)) { - continue; // a newer descendent snapshot overwrites inclusion of messages that falls under its (sub)scope. + if (immediateDescendingSnapshotsHaveAParentScopeOf(messageScope)) { + continue; // a newer descendent snapshot overwrites retention list that falls under its (sub)scope. } // else - // `finalInclusionList` is the in-memory global inclusion list that also stores the concise scope (for optimization) - this.finalInclusionList.set(messageCid, messageScope); + this.finalRetentionList.push(messageCid); } } -computeInclusionList(incomingSnapshot); +// 3. update the final retention list by inserting the CIDs that need to be retained under the incoming snapshot scope; then +updateRetentionList(incomingSnapshot); -// delete all messages that are not in the inclusion list -const cidsUnderIncomingSnapshotScope = getCidsUnderScope(incomingSnapshot.scope); -this.storageController.deleteMessageAndData(cidsUnderIncomingSnapshotScope); +// 4. delete all DWN messages under the incoming snapshot scope (including sub-scopes) that are not in the retention list +// NOTE: seems super expensive to iterate over all CIDs but unavoidable?! +const cidsUnderIncomingSnapshotScope = getCidsInDwnUnderScope(incomingSnapshot.scope); +this.storageController.deleteMessageAndDataUnlessInRetentionList(cidsUnderIncomingSnapshotScope, this.finalRetentionList); ``` @@ -182,7 +229,7 @@ if (!needSnapshotAuthorization) { return; } -if (this.finalInclusionList.has(incomingMessage.cid)) { +if (this.finalRetentionList.has(incomingMessage.cid)) { return; } @@ -210,11 +257,13 @@ throw Error('Message failed snapshot-authorization.'); DWN2-->>DWN1: Message of CID2 ``` -1. It is apparent that currently snapshot scoping turns out to be quite "tailor-made" towards permission and protocol, so maybe it is really not practical to have a pure general purpose snapshot feature beyond the first 2 levels of scoping hierarchy. + Consider an an extension to the scenario above, if message of `CID2` is removed after `Snapshot2` is taken and before sync of `Snapshot2`, DWN would not be able to fetch message of `CID2` even if it tries. + +1. Would not make sense to allow scope to use mutable properties. -1. The currently proposed structure falls short if there is a need to snapshot a specific protocol context (it's likely there are additional unsupported scenarios). We could introduce support for it under the "protocol" subtree, but that would violate the current design goal of "strictly one leaf-code scope per message". +1. The the support for both "protocolPath" abd "contextId" in scope structure adds extra complexity. -1. The CID inclusion list construction needs the leaf-node scope of every message referenced in a snapshot by CID, while we can obtain this info by fetching the actual message for each CID, this approach is highly inefficient. We could require scope to be included for each CID in the snapshot for an instance lookup, but can we blindly trust the value given to us? This requires further thinking. +1. The CID retention list construction needs the leaf-node scope of every message referenced in a snapshot by CID, while we can obtain this info by fetching the actual message for each CID, this approach is highly inefficient. We could require scope to be included for each CID in the snapshot for an instance lookup, but can we blindly trust the value given to us? This requires further thinking. 1. The deletion of messages does not take into account of their corresponding Record, this means an semantically valid but logically invalid list of CIDs can render the DWN in a corrupt state (e.g. containing only pruned initial `RecordsWrite` without subsequent `RecordsWrite` or `RecordsDelete`). From d8c3eb92f22cdb70f5bd0d816de64eaef3061538 Mon Sep 17 00:00:00 2001 From: Henry Tsai Date: Mon, 12 Jun 2023 11:34:37 -0700 Subject: [PATCH 11/12] minor updtaes --- tp025/README.md | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/tp025/README.md b/tp025/README.md index c5262eb..7cf3797 100644 --- a/tp025/README.md +++ b/tp025/README.md @@ -132,7 +132,8 @@ Pseudo-code for `SnapshotsCreate` processing: // maintain an aggregate retention list of CIDs (`this.finalRetentionList`) for quick snapshot-authorization evaluation // // 0. determine if the incoming snapshot should be ignored or processed -// 1. first delete all the CIDs in the final retention list that came from snapshots with scope same as, or sub-scope of, the incoming snapshot scope, so that we can rebuild that section of the retention list +// 1. first delete all the CIDs in the final retention list that were retained by snapshots +// with the same scope as, or sub-scope of, the incoming snapshot scope, so that we can rebuild that section of the retention list // 2. delete all older snapshots with same or sub-scope, because a newer parent scope snapshot trumps any older snapshots with a sub-scope // 3. update the final retention list by inserting the CIDs that need to be retained under the incoming snapshot scope; then // 4. delete all DWN messages under the incoming snapshot scope (including sub-scopes) that are not in the retention list @@ -146,21 +147,24 @@ for (const newerSnapshot of newerSnapshots) { } } -// 1. delete all the CIDs in the final retention list that came from snapshots with scope same as, or sub-scope of, the incoming snapshot scope +// 1. delete all the CIDs in the final retention list that were retained by snapshots +// with the same scope as, or sub-scope of, the incoming snapshot scope // -// get all CIDs of retained in snapshots under the incoming snapshot scope (including sub-scopes), regardless of snapshot timestamp -// NOTE: logic for doing this may be optimized to look very similar to the recursive retention list computation +// get all CIDs retained by snapshots under the incoming snapshot scope (including sub-scopes), +// regardless of the timestamp of snapshot that retains them +// NOTE: an optimized algorithm amy allow us to only iterate over CIDs retained by older snapshots, +// but we'd need to keep metadata such as timestamp in the final retention list for last lookup const snapshotsUnderIncomingSnapshotScope = getSnapshotsUnderScope(incomingSnapshot.scope); const cidCandidatesForRemoval = getAllRetainedCidsInSnapshots(snapshotsUnderIncomingSnapshotScope); for (const cid in cidCandidatesForRemoval) { // TODO: figure out how to efficiently obtain full scope info without fetching the message const messageScope = getMessageFullScope(cid); - // this is where we need to make sure we don't remove CIDs in the final retention list that are referenced by an external intersecting scope + // we need to make sure we don't remove CIDs in the final retention list that are referenced by another external intersecting scope // ie. if incoming snapshot scope is `contextId` scoped, // then we need to make sure we check the retention list of intersecting snapshots with `protocolPath` scope // eg. if a CID is being evaluated for deletion under the `contextId` scope and its `protocolPath` is `foo/bar/baz`, - // we need to check snapshots with `protocolPath` scope of `foo`, and `foo/bar`, and `foo/bar/baz` - // to make sure none of those snapshots attempts to retrain the same CID + // we will need to check snapshots with `protocolPath` scope of `foo`, and `foo/bar`, and `foo/bar/baz` + // to make sure none of those snapshots also retrain the same CID // TODO: requires more drilling into as it appears to be very costly const intersectingSnapshots = getIntersectingSnapshots(messageScope, incomingSnapshot.scope); for (const intersectingSnapshot of intersectingSnapshot) { @@ -189,10 +193,9 @@ function updateRetentionList(currentSnapshot) { } for (const cid of currentSnapshot.cids) { - // TODO: figure out how to efficiently obtain full scope info without fetching the message - const messageScope = getMessageFullScope(cid); - - if (immediateDescendingSnapshotsHaveAParentScopeOf(messageScope)) { + // TODO: figure out how to efficiently check this + const cidRetainedByADescendingSnapshot = isCidRetainedByADescendingSnapshot(cid); + if (cidRetainedByADescendingSnapshot) { continue; // a newer descendent snapshot overwrites retention list that falls under its (sub)scope. } From f67d4eb64ddc48276e63b2ffd2f58508363c05c9 Mon Sep 17 00:00:00 2001 From: Henry Tsai Date: Mon, 12 Jun 2023 16:51:24 -0700 Subject: [PATCH 12/12] add one more consideration --- tp025/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tp025/README.md b/tp025/README.md index 7cf3797..dde0c97 100644 --- a/tp025/README.md +++ b/tp025/README.md @@ -275,3 +275,5 @@ throw Error('Message failed snapshot-authorization.'); 1. It does not make sense functionally and even potentially dangerous even to allow scopes that span across both protocol-authorized messages and protocol-less message. 1. It does not seem logical to permit the deletion of a snapshot once it is created for authorization purposes. If the snapshot is deleted, the DWeb Node will no longer be able to utilize the deleted snapshot to prevent unauthorized access. + +1. We need to add locking/queueing mechanism to handle concurrent/competing snapshots and record changes during snapshot processing. \ No newline at end of file