Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add README #284

Merged
merged 16 commits into from
Nov 9, 2023
Merged

Add README #284

merged 16 commits into from
Nov 9, 2023

Conversation

neekolas
Copy link
Contributor

@neekolas neekolas commented Oct 25, 2023

I took a first pass at the database schema and state machine for MLS.

We can rely on the total ordering of the Query endpoint for our network to get to a consistent state on every read of the group topic. All messages are read in ascending order, and any message sent for a previous epoch must be discarded. This flow introduces a new concept "group_intents" to handle cases where you might send a message in one epoch, and then upon subsequently reading the network need to re-send in another epoch.

Intents are not bound to any epoch and can therefore be safely retried in cases where conflicts occur.

Some actions can either be done optimistically (send the message then refresh the group state) or pessimistically (refresh group state, send message, refresh again). I have opted to handle regular message sending optimistically and anything that modifies the ratchet tree (add/remove members and key updates) pessimistically. Conflicts are possible either way, but are less likely with the pessimistic approach.

@neekolas neekolas marked this pull request as ready for review October 25, 2023 18:58
@neekolas neekolas requested a review from jhaaaa as a code owner October 25, 2023 18:58
@neekolas neekolas requested review from a team and richardhuaaa October 25, 2023 18:58
@richardhuaaa richardhuaaa requested a review from Bren2010 October 25, 2023 19:50
@neekolas
Copy link
Contributor Author

One thing I'm wondering is whether this is more conservative than necessary.

For sending application messages, which doesn't advance the epoch, we should very rarely need to re-send them. If we set max_past_epochs to something like 3 or 5, a lot of things would have had to happen in the group since your last sync to render a message unreadable. Having to read the group state on every message send feels very heavy-handed, so maybe we can relax that and make some publishes more "fire and forget".

Syncing the group state before and after adding/removing new members is also a lot. We could just send optimistically and re-send later. There would be a higher rate of re-sends with that approach, but it may be a worthwhile trade-off for speed and efficiency in the base case.

Copy link
Contributor

@richardhuaaa richardhuaaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this proposal - the intent concept is powerful and gives a foundation for all of our future work, essentially solving the issue of retries for conflicting commits. This is also laid out incredibly clearly and seems very well thought out.

Building on this concept, my main suggestion would be to increase the asynchronicity and robustness of each of these operations using event loops.

Some other thoughts that came up for me, not necessarily addressable here:

  1. In the event that an update fails due to conflicts, I'm not sure what happens with the cryptographic state, as we have already updated ours when constructing the original payload. I'm not sure if there is forking logic or rewinding logic already built into OpenMLS.
  2. All retries in this proposal happen automatically without user involvement. It may be worth thinking through any cases where the user should manually retry instead - for example if they sent a sensitive message for a set of group members, but another member was added before the message was committed.
  3. The event loop I suggested preserves the ordering of all user actions, but I'm not sure if this ordering is preserved if an update fails due to conflicts.
  4. A malicious group member can add a member to the group, but not send them any welcome messages. This prevents anyone else from adding the member, while keeping the group undiscoverable from the member. It's not the biggest issue in practice, but I don't see anything in the MLS spec for how to address this, curious if @Bren2010 has thoughts

xmtp_mls/README.md Show resolved Hide resolved
xmtp_mls/README.md Outdated Show resolved Hide resolved
xmtp_mls/README.md Outdated Show resolved Hide resolved
xmtp_mls/README.md Outdated Show resolved Hide resolved
xmtp_mls/README.md Outdated Show resolved Hide resolved
@neekolas
Copy link
Contributor Author

@richardhuaaa great questions

In the event that an update fails due to conflicts, I'm not sure what happens with the cryptographic state, as we have already updated ours when constructing the original payload. I'm not sure if there is forking logic or rewinding logic already built into OpenMLS.

Unlike the demo I had done previously, we don't update our cryptographic state until we read the message from the network in this model. Nothing changes in the local ratchet tree/group state at the time of sending the commit.

All retries in this proposal happen automatically without user involvement. It may be worth thinking through any cases where the user should manually retry instead - for example if they sent a sensitive message for a set of group members, but another member was added before the message was committed.

It's a good point. Feels like something we can table until post-launch. Retries of application messages should be very rare if we take advantage of max_past_epochs and only retry if a message is from an epoch so old that the keys have already been discarded (maybe a new group where members are being added in rapid succession by different members).

The event loop I suggested preserves the ordering of all user actions, but I'm not sure if this ordering is preserved if an update fails due to conflicts.

I would expect that if one intent fails due to conflicts all later intents will also fail. That may not be true if we set max_past_epochs > 0, since application messages could actually succeed even if they were from a past epoch. It may be a narrow enough edge case we don't have to worry too much about it.

xmtp_mls/README.md Outdated Show resolved Hide resolved
xmtp_mls/README.md Show resolved Hide resolved
group_id BLOB NOT NULL,
"data" BLOB NOT NULL, -- Some sort of serializable blob that can be used to re-try the message if the first attempt failed due to conflict
"state" INT NOT NULL, -- INTENT_STATE,
message_hash BLOB -- The hash of the encrypted, concrete, form of the message if it was published.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to store the epoch in which the message is intended to land in this table. If the message lands in a subsequent epoch, it needs to be retried.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kinda hoping openmls would handle the logic of deciding "this is from the wrong epoch" rather than us having to do the bookkeeping separately. That way it can transparently handle things like setting max_past_epochs for the group to a value > 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenMLS is not going to be able to decrypt messages that you yourself sent. So when you are seeing what order messages got put in on the topic, you will not be able to "black box" run your own messages through OpenMLS to see if they ended up decryptable

Copy link
Contributor Author

@neekolas neekolas Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to learn more about that. My understand was that I would be able to validate my own messages, and in the case of commits be able to apply them (because we have a copy of the pending commit staged locally). We should at least be able to do some validation of messages even if they can't be decrypted.

xmtp_mls/README.md Outdated Show resolved Hide resolved
xmtp_mls/README.md Outdated Show resolved Hide resolved
xmtp_mls/README.md Outdated Show resolved Hide resolved
neekolas and others added 6 commits October 30, 2023 12:32
Will try to keep these suggestion PR's small, for this one:

1. Serialize welcome messages to `post_commit_data` field on intent rather than separate table
1. Remove some group sync steps (we can handle conflicts anyway)
1. Drop pending proposals for now
1. Describe how payloads can be synced in a non-duplicative way

Other things worth considering:
1. Making some of the steps generic rather than specific to intent type (a lot of it is duplicative/identical)
1. Whether to make commit processing strictly sequential (commit must be fully done before next one begins) or concurrent. If we make it strictly sequential, we need to think about how to handle concurrent syncs (as the application also needs to sync incoming messages, either via pull or subscription)
xmtp_mls/README.md Show resolved Hide resolved
xmtp_mls/README.md Show resolved Hide resolved
xmtp_mls/README.md Show resolved Hide resolved
@@ -0,0 +1,172 @@
# XMTP MLS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this README, @neekolas! What do you think of updating the title and adding a bit of intro text for context? Some suggested starter text here:

Suggested change
# XMTP MLS
# XMTP group chat using MLS
This document describes plans for how XMTP can provide group chat using [Messaging Layer Security](https://messaginglayersecurity.rocks/) (MLS).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library today only handles group chat, but in future it's gong to include 1:1 as well.

I will add the intro text, which helps set the reader up for the onslaught of technical detail that follows.

1. If no conflicts: Mark the message as committed.
If conflicts: Go back to step 2 and try again (reset the intent's state to `TO_SEND` and clear the `publish_data` and `post_commit_data` fields)

### Syncing group state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neekolas and I tentatively discussed something like this for handling concurrency.

Would love feedback @neekolas, @Bren2010, @insipx! I wrote up a lot of the rationale so that it's easier to follow, but not sure if the rationale belongs in this doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. I like it, and it feels straightforward enough to implement. Love removing the locking mechanism from the topic_refresh_state table.

@neekolas
Copy link
Contributor Author

neekolas commented Nov 9, 2023

@richardhuaaa you mind approving this so we can merge it? We can always amend it with more PRs as plans change or details get fleshed out.

Copy link
Contributor

@richardhuaaa richardhuaaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Should be good for the next few weeks.

We still need to sketch out how we're going to map between accounts and installations/track group membership, can do that separately

@neekolas neekolas merged commit b24dcff into main Nov 9, 2023
4 checks passed
@neekolas neekolas deleted the nmolnar/mls-readme branch November 9, 2023 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants