Move staged commit to intents #984

neekolas · 2024-08-22T20:57:39Z

tl;dr

Moves staged_commits from being handled by OpenMLS to being stored on each intent
Adds a mutex for each group's sync operations. Only one sync may happen in parallel at any time per group
Creates database columns for storing the staged commit and the epoch the commit was published
Updates a test for message conflicts to better check for forked group states
I found some cases where we were calling publish_intents and then immediately calling sync. That should not be necessary, since we call publish_intents from inside the sync method
Adds tests for parallel and reentrant syncs

More Info

https://github.com/xmtp/libxmtp/issues/979

neekolas · 2024-08-22T20:58:01Z

Move staged commit to intents #984 : 2 dependent PRs (#985 , #986 ) 👈
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @neekolas and the rest of your teammates on Graphite

neekolas · 2024-08-23T05:53:17Z

dev/docker/compose

@@ -1,4 +1,4 @@
 #!/bin/bash
 set -eou pipefail

-docker-compose -f dev/docker/docker-compose.yml -p "libxmtp" "$@"
+docker compose -f dev/docker/docker-compose.yml -p "libxmtp" "$@"


The GH action runner has been upgraded and no longer has a separate docker-compose command. It's now a subcommand of docker only.

neekolas · 2024-08-23T15:52:42Z

xmtp_mls/src/groups/mod.rs

@@ -455,11 +464,6 @@ impl MlsGroup {
            Self::into_envelope(message, now)
        });

-        // Skipping a full sync here and instead just firing and forgetting
-        if let Err(err) = self.publish_intents(&provider, client).await {


This gets handled inside sync_until_intent_resolved. No need to call it twice

neekolas · 2024-08-23T15:52:52Z

xmtp_mls/src/groups/mod.rs

@@ -479,7 +483,6 @@ impl MlsGroup {
        let update_interval = Some(5_000_000);
        self.maybe_update_installations(&provider, update_interval, client)
            .await?;
-        self.publish_intents(&provider, client).await?;


Same as above

neekolas · 2024-08-23T15:55:01Z

xmtp_mls/src/groups/subscriptions.rs

@@ -66,7 +66,8 @@ impl MlsGroup {
        );

        if let Some(GroupError::ReceiveError(_)) = process_result.as_ref().err() {
-            self.sync(&client).await?;
+            self.sync_with_conn(&client.mls_provider()?, &client)


This change alone is going to resolve a lot of thrash and noise. Previously we were calling sync when we receive a commit from a stream. That would attempt to add missing members first.

Every member of the group who was online would try and do it at the same time, leading to a bunch of commits that would get thrown out.

We don't need to do this as part of stream processing.

Nice catch. I'm thinking that another optimization would be to update sync so that if we sync during the maybe_update_installations process (add_missing_installations => sync_until_intent_resolved), we don't need to do a second sync_with_conn right afterwards.

Another potential optimization - the main reason we need to call sync, rather than directly processing a streamed commit, is because streams are not guaranteed to deliver the commits in order, while syncs are.

If we can guarantee stream ordering from the server (the same way we're doing it for replication):

We don't need to call sync while streaming

We may not need to call sync while publishing, especially not repeated syncs (can handle it via stream instead, potentially the same long-running stream can handle all commits)

neekolas · 2024-08-23T22:37:57Z

Merge activity

Aug 23, 3:37 PM PDT: @neekolas started a stack merge that includes this pull request via Graphite.
Aug 23, 3:38 PM PDT: @neekolas merged this pull request with Graphite.

richardhuaaa

These are critical reliability improvements, well done. Left a few questions about things I'm unsure about, but I could be missing details

richardhuaaa · 2024-08-23T21:15:42Z

xmtp_mls/migrations/2024-08-22-044745_add_staged_commit/up.sql

@@ -0,0 +1,7 @@
+-- Your SQL goes here


Can remove this comment, here and above

richardhuaaa · 2024-08-23T21:50:12Z

xmtp_mls/src/groups/subscriptions.rs

@@ -66,7 +66,8 @@ impl MlsGroup {
        );

        if let Some(GroupError::ReceiveError(_)) = process_result.as_ref().err() {
-            self.sync(&client).await?;
+            self.sync_with_conn(&client.mls_provider()?, &client)


Another potential optimization - the main reason we need to call sync, rather than directly processing a streamed commit, is because streams are not guaranteed to deliver the commits in order, while syncs are.

If we can guarantee stream ordering from the server (the same way we're doing it for replication):

We don't need to call sync while streaming

We may not need to call sync while publishing, especially not repeated syncs (can handle it via stream instead, potentially the same long-running stream can handle all commits)

xmtp_mls/src/groups/sync.rs

richardhuaaa · 2024-08-23T22:17:04Z

xmtp_mls/src/groups/sync.rs

+    // TODO: remove clone
+    if let Some(commit) = openmls_group.clone().pending_commit() {


Wondering if we can use the commit as a reference rather than cloning the group?

Compiler didn't like that because openmls_group is a mutable reference. I'm sure there's some hack to make it work.

xmtp_mls/src/groups/sync.rs

richardhuaaa · 2024-08-23T22:27:26Z

xmtp_mls/src/groups/sync.rs

+                    provider.conn_ref().set_group_intent_published(
+                        intent.id,
+                        sha256(payload_slice),
+                        post_commit_action,
+                        staged_commit,
+                        openmls_group.epoch().as_u64() as i64,
+                    )?;
+                    log::debug!(
+                        "client [{}] set stored intent [{}] to state `published`",
+                        client.inbox_id(),
+                        intent.id
+                    );

                    client
                        .api_client
                        .send_group_messages(vec![payload_slice])
                        .await?;


Now that we're marking the intent as published before sending it, if the publish fails, either:

Network error returned from send_group_messages

App dies before the request goes through

Do we have any way of making sure the intent gets re-sent?

In the case of a network error we bubble the error back up to the caller and they can choose to retry.

In the case of the app dying, that's the big trade-off of this approach. It's just permanently in limbo. The difference from some of the alternative solutions is that it won't brick the group.

Just FYI, we are not bubbling it up in the case of send_message<ApiClient>(). And in the case of send_message_optimistic, the message would have already been rendered to the UI, and the error from the publish may or may not be bubbled up to the user

, we are not bubbling it up in the case of send_message().

We would be because sync_until_last_intent_resolved will never finish

richardhuaaa · 2024-08-23T22:31:36Z

xmtp_mls/src/groups/sync.rs

+                    if has_staged_commit {
+                        log::info!("Canceling all further publishes, since a commit was found");
+                        return Err(GroupError::PublishCancelled);
+                    }


This works if this method (publish_intents) is only called once, but won't further publishes still happen if publish_intents is called again? Then we would skip past this published intent because it is no longer in TO_PUBLISH state?

Also, are we sure we want to return an error here, rather than returning success? Won't an error here stop us from calling receive() within sync_with_conn(), and finishing off the intent lifecycle?

We continue through publish_intents errors in sync_with_conn. In the old method we would get them frequently if you had two intents lined up that both created commits, because they would trigger PendingCommit errors.

I did actually make that change in one of the down-stack PRs just to clean up error handling.

neekolas force-pushed the 08-22-move_staged_commit_to_intents branch 3 times, most recently from 805f3f5 to 51dbd6e Compare August 22, 2024 21:11

neekolas mentioned this pull request Aug 22, 2024

Refactor process_own_message to not mutate intent state directly #985

Merged

neekolas force-pushed the 08-22-move_staged_commit_to_intents branch 4 times, most recently from 492e673 to 58bbcd2 Compare August 23, 2024 00:24

neekolas mentioned this pull request Aug 23, 2024

named test clients #986

Closed

neekolas force-pushed the 08-22-move_staged_commit_to_intents branch 4 times, most recently from 2c6cb9e to 1e535db Compare August 23, 2024 05:48

neekolas commented Aug 23, 2024

View reviewed changes

neekolas force-pushed the 08-22-move_staged_commit_to_intents branch from 1e535db to 5d277d6 Compare August 23, 2024 05:58

Move staged commit to intents

c9f0f10

neekolas force-pushed the 08-22-move_staged_commit_to_intents branch from 5d277d6 to c9f0f10 Compare August 23, 2024 15:34

neekolas marked this pull request as ready for review August 23, 2024 15:51

neekolas requested a review from a team as a code owner August 23, 2024 15:51

neekolas commented Aug 23, 2024

View reviewed changes

neekolas requested review from richardhuaaa, nplasterer and cameronvoell August 23, 2024 16:14

neekolas mentioned this pull request Aug 23, 2024

Simplify aborting in streams #989

Merged

cameronvoell approved these changes Aug 23, 2024

View reviewed changes

richardhuaaa reviewed Aug 23, 2024

View reviewed changes

neekolas merged commit 846b635 into main Aug 23, 2024
7 checks passed

neekolas deleted the 08-22-move_staged_commit_to_intents branch August 23, 2024 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move staged commit to intents #984

Move staged commit to intents #984

neekolas commented Aug 22, 2024 •

edited

Loading

neekolas commented Aug 22, 2024 •

edited

Loading

neekolas Aug 23, 2024

neekolas Aug 23, 2024

neekolas Aug 23, 2024

neekolas Aug 23, 2024

cameronvoell Aug 23, 2024

richardhuaaa Aug 23, 2024

neekolas commented Aug 23, 2024 •

edited

Loading

richardhuaaa left a comment

richardhuaaa Aug 23, 2024

neekolas Aug 23, 2024

richardhuaaa Aug 23, 2024

richardhuaaa Aug 23, 2024

neekolas Aug 23, 2024

richardhuaaa Aug 23, 2024

neekolas Aug 23, 2024

richardhuaaa Aug 23, 2024

neekolas Aug 23, 2024

richardhuaaa Aug 23, 2024

richardhuaaa Aug 23, 2024

neekolas Aug 23, 2024

		// TODO: remove clone
		if let Some(commit) = openmls_group.clone().pending_commit() {

Move staged commit to intents #984

Move staged commit to intents #984

Conversation

neekolas commented Aug 22, 2024 • edited Loading

tl;dr

More Info

neekolas commented Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neekolas commented Aug 23, 2024 • edited Loading

Merge activity

richardhuaaa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neekolas commented Aug 22, 2024 •

edited

Loading

neekolas commented Aug 22, 2024 •

edited

Loading

neekolas commented Aug 23, 2024 •

edited

Loading