Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add README #284

Merged
merged 16 commits into from
Nov 9, 2023
Binary file added img/mls-state-machine.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
192 changes: 192 additions & 0 deletions xmtp_mls/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
# XMTP MLS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this README, @neekolas! What do you think of updating the title and adding a bit of intro text for context? Some suggested starter text here:

Suggested change
# XMTP MLS
# XMTP group chat using MLS
This document describes plans for how XMTP can provide group chat using [Messaging Layer Security](https://messaginglayersecurity.rocks/) (MLS).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library today only handles group chat, but in future it's gong to include 1:1 as well.

I will add the intro text, which helps set the reader up for the onslaught of technical detail that follows.


## Database Schema

Foreign key constraints and indexes omitted for simplicity.
neekolas marked this conversation as resolved.
Show resolved Hide resolved

```sql
CREATE TABLE groups (
-- Random ID generated by group creator
"id" BLOB PRIMARY KEY NOT NULL,
-- Based on the timestamp of the welcome message
"created_at_ns" BIGINT NOT NULL,
-- Enum of GROUP_MEMBERSHIP_STATE
"membership_state" INT NOT NULL
);

-- Allow for efficient sorting of groups
CREATE INDEX groups_created_at_idx ON groups(created_at_ns);

CREATE INDEX groups_membership_state ON groups(membership_state);

-- Successfully processed messages meant to be returned to the user
CREATE TABLE group_messages (
"id" BLOB PRIMARY KEY NOT NULL,
-- Derived via SHA256(CONCAT(decrypted_message_bytes, conversation_id, timestamp))
"group_id" BLOB NOT NULL,
-- Message contents after decryption
"decrypted_message_bytes" BLOB NOT NULL,
-- Based on the timestamp of the message
"sent_at_ns" BIGINT NOT NULL,
-- Enum GROUP_MESSAGE_KIND
"kind" INT NOT NULL,
-- Could remove this if we added a table mapping installation_ids to wallet addresses
"sender_installation_id" BLOB NOT NULL,
"sender_wallet_address" TEXT NOT NULL,
FOREIGN KEY (group_id) REFERENCES groups(id)
);

CREATE INDEX group_messages_group_id_sort_idx ON group_messages(group_id, sent_at_ns);

-- Used to keep track of the last seen message timestamp in a topic
CREATE TABLE topic_refresh_state (
"topic" TEXT PRIMARY KEY NOT NULL,
"last_message_timestamp_ns" BIGINT NOT NULL
);

-- This table is required to retry messages that do not send successfully due to epoch conflicts
CREATE TABLE group_intents (
-- Serial ID auto-generated by the DB
"id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
-- Enum INTENT_KIND
"kind" INT NOT NULL,
"group_id" BLOB NOT NULL,
-- Some sort of serializable blob that can be used to re-try the message if the first attempt failed due to conflict
"publish_data" BLOB NOT NULL,
-- Data needed after applying a commit, such as welcome messages
"post_commit_data" BLOB NOT NULL,
-- INTENT_STATE,
"state" INT NOT NULL,
-- The hash of the encrypted, concrete, form of the message if it was published.
"message_hash" BLOB,
FOREIGN KEY (group_id) REFERENCES groups(id)
);

CREATE INDEX group_intents_group_id_id ON group_intents(group_id, id);
```

## Enums

### GROUP_MEMBERSHIP_STATE

- ALLOWED // User has agreed to be a member of the group
- REJECTED // User has rejected an invite to the group or left
- PENDING // User has neither accepted or rejected whether they should join the group

### INTENT_STATE

- TO_SEND // Either has never been sent to the network or needs to be re-sent
- PUBLISHED // Sent to the network but has not been read back or committed
- COMMITTED // Committed messages could be deleted

### INTENT_KIND

- SEND_MESSAGE // An intent to send a message to the group
- ADD_MEMBERS // An intent to add members to the group
- REMOVE_MEMBERS // An intent to remove members from the group
- KEY_UPDATE // An intent to update your own group key
richardhuaaa marked this conversation as resolved.
Show resolved Hide resolved

### OUTBOUND_WELCOME_STATE

- PENDING // Needs to wait for commit to be applied before sending
- READY_TO_SEND
- SENT // Messages may be deleted at this point. We may decide to remove this state altogether.
neekolas marked this conversation as resolved.
Show resolved Hide resolved

### GROUP_MESSAGE_KIND

- APPLICATION
- MEMBER_ADDED
- MEMBER_REMOVED

## State Machine

The [following diagram](https://app.excalidraw.com/s/4nwb0c8ork7/6pPH1kQDoj3) illustrates some common flows in the state machine

![MLS State Machine](../img/mls-state-machine.png "MLS State Machine")

For the first version of MLS in XMTP, all members commit their own proposals immediately, and immediately discard any proposals from other members upon receiving them. Future versions of XMTP will have more sophisticated logic, such as batching proposals, allowing members to commit proposals from other members, as well as more sophisticated validation logic for which proposals are permitted from which members.

### Known missing items from the state machine

- Key updates
- Processing incoming welcome messages
neekolas marked this conversation as resolved.
Show resolved Hide resolved
- Tracking group membership at the account/user level
- Permissioning for adding/removing accounts/users
- Mechanism for syncing installations under each account/user

### Add members to a group

Simplified high level flow for adding members to a group:

1. Create a `group_intent` for adding the members
1. Consume Key Packages for all new members
1. Convert the intent into concrete commit and welcome messages for the current epoch
1. Write the welcome messages to the `post_commit_data` field for later
1. Publish commit message
1. Sync the state of the group with the network
1. If no conflicts: Publish welcome messages to new members.
If conflicts: Go back to step 2 and try again (reset the intent's state to `TO_SEND` and clear the `publish_data` and `post_commit_data` fields)

### Remove members from a group

Simplified high level flow for removing members from a group:

1. Create a `group_intent` for removing the members
1. Convert the intent into concrete commit for the current epoch
1. Publish commit to the network
1. Sync the state of the group with the network
1. If no conflicts: Done.
If conflicts: Go back to step 2 and try again (reset the intent's state to `TO_SEND` and clear the `publish_data` and `post_commit_data` fields)

### Send a message

Simplified high level flow for sending a group message:

1. Create a `group_intent` for sending the message
1. Convert the intent into a concrete message for the current epoch
1. Publish message to the network
1. Sync the state of the group with the network (can be debounced or otherwise only done periodically)
1. If no conflicts: Mark the message as committed.
If conflicts: Go back to step 2 and try again (reset the intent's state to `TO_SEND` and clear the `publish_data` and `post_commit_data` fields)

### Syncing group state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neekolas and I tentatively discussed something like this for handling concurrency.

Would love feedback @neekolas, @Bren2010, @insipx! I wrote up a lot of the rationale so that it's easier to follow, but not sure if the rationale belongs in this doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. I like it, and it feels straightforward enough to implement. Love removing the locking mechanism from the topic_refresh_state table.


The latest payloads on a group could be synced from the server in the following cases:

- Push notifications
- Application-triggered subscription
- Application-triggered pull
- Commit publishing flow

Any syncing strategy must be able to handle the following constraints:

- Payload syncing could be initiated concurrently from multiple locations
- Due to forward secrecy constraints, each payload may only be decrypted successfully once

These are the following possible strategies, each with their own limitations:

- Co-ordinated: Syncing can only happen in one location at a time via locks/queues
- Unco-ordinated: Allow syncing to happen in parallel

The latter is simpler to implement in the short-term, but raises the following potential challenges:

- How to handle concurrent decryption failures and return the latest data regardless
- How to handle updating the `last_message_timestamp_ns` on the `topic_refresh_state` table
- How to know if a failure is due to the message having already been decrypted, or permanent failure

For the initial version, this simple strategy can be used to pull the latest payloads:

1. Read the `last_message_timestamp_ns` from the database and pull all payloads from the server with timestamp greater than it
1. For each payload, attempt to decrypt it
1. If it succeeds, process the payload. Write the result, update the cryptographic state, and update the `last_message_timestamp_ns`, together in a single transaction. Set `last_message_timestamp_ns` to the larger value out of the value in the database and the payload's timestamp.
1. If it fails, only attempt to update `last_message_timestamp_ns` in the database to the larger value out of the value in the database and the payload's timestamp.
1. To return the result of the sync, pull the latest data from the database rather than using the in-memory data from the syncing process

This strategy effectively means that the processing of each payload succeeds or fails atomically. In the event of failure due to concurrency, the actual result can be read from the database.

For now, we can put off the issue of detecting if a decryption failure is due to concurrency or permanent failure. If OpenMLS cryptographic state is entirely database-driven, we may be able to detect that a failure is due to concurrency by the fact that `last_message_timestamp_ns` has already been updated. If OpenMLS cryptographic state is partially driven by in-memory data, we can record per-payload successes and failures in a separate table, with successes always overwriting failures.

### Updating your list of conversations

1. Read from the welcome topic for your `installation_id`, filtering for messages since `last_message_timestamp`
1. For each message, create a group with a `GROUP_MEMBERSHIP_STATE` of pending
Loading