Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve stability of MLS 1-1 conversations #2063

Merged
merged 7 commits into from
Sep 18, 2023

Conversation

typfel
Copy link
Member

@typfel typfel commented Sep 14, 2023


PR Submission Checklist for internal contributors

  • The PR Title

    • conforms to the style of semantic commits messages¹ supported in Wire's Github Workflow²
    • contains a reference JIRA issue number like SQPIT-764
    • answers the question: If merged, this PR will: ... ³
  • The PR Description

    • is free of optional paragraphs and you have filled the relevant parts to the best of your ability

What's new in this PR?

Issues

During testing serval issues were identified:

  1. Slow sync fails due to non-recoverable errors and is therefore stuck
  2. Message migration for 1-1 conversations would sometimes fail with constraint failure on the primary key
  3. When establishing 1-1 MLS groups we didn't add other self clients
  4. When establishing an MLS group would sometime fail if the MLS group had already been created locally

Testing

Test Coverage

  • I have added automated test to this contribution

PR Post Submission Checklist for internal contributors (Optional)

  • Wire's Github Workflow has automatically linked the PR to a JIRA issue

PR Post Merge Checklist for internal contributors

  • If any soft of configuration variable was introduced by this PR, it has been added to the relevant documents and the CI jobs have been updated.

References
  1. https://sparkbox.com/foundry/semantic_commit_messages
  2. https://github.com/wireapp/.github#usage
  3. E.g. feat(conversation-list): Sort conversations by most emojis in the title #SQPIT-764.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 14, 2023

Unit Test Results

   423 files  ±0     423 suites  ±0   24s ⏱️ -1s
2 372 tests +1  2 254 ✔️ +1  118 💤 ±0  0 ±0 

Results for commit 8883ef2. ± Comparison against base commit 74cafe3.

♻️ This comment has been updated with latest results.

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (epic/mls-one2one@74cafe3). Click here to learn what that means.
The diff coverage is n/a.

@@                 Coverage Diff                 @@
##             epic/mls-one2one    #2063   +/-   ##
===================================================
  Coverage                    ?   58.11%           
  Complexity                  ?       24           
===================================================
  Files                       ?     1025           
  Lines                       ?    38836           
  Branches                    ?     3599           
===================================================
  Hits                        ?    22571           
  Misses                      ?    14726           
  Partials                    ?     1539           

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74cafe3...8883ef2. Read the comment docs.

@datadog-wireapp
Copy link

datadog-wireapp bot commented Sep 14, 2023

Datadog Report

All test runs b574d02 🔗

2 Total Test Services: 0 Failed, 0 with New Flaky, 2 Passed

Test Services
Service Name Failed Known Flaky New Flaky Passed Skipped Wall Time Branch View
kalium-ios 0 0 0 2254 118 10m 41.01s Link
kalium-jvm 0 0 0 2373 101 10m 52.01s Link

Copy link
Member

@vitorhugods vitorhugods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👨🏻‍🍳🤌🏻

} else {
Either.Left(it)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@typfel typfel merged commit c894667 into epic/mls-one2one Sep 18, 2023
16 checks passed
@typfel typfel deleted the fix/mls-hardening branch September 18, 2023 14:53
typfel added a commit that referenced this pull request Sep 19, 2023
* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt
typfel added a commit that referenced this pull request Sep 19, 2023
* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt
typfel added a commit that referenced this pull request Sep 26, 2023
* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt
typfel added a commit that referenced this pull request Oct 4, 2023
* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt
typfel added a commit that referenced this pull request Oct 4, 2023
* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt
typfel added a commit that referenced this pull request Oct 5, 2023
* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt
typfel added a commit that referenced this pull request Oct 6, 2023
* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt
typfel added a commit that referenced this pull request Oct 10, 2023
* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt
typfel added a commit that referenced this pull request Oct 11, 2023
* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt
typfel added a commit that referenced this pull request Oct 13, 2023
* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt
typfel added a commit that referenced this pull request Oct 13, 2023
* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt
@typfel typfel mentioned this pull request Oct 15, 2023
6 tasks
github-merge-queue bot pushed a commit that referenced this pull request Oct 15, 2023
* feat(mls-migration): fetch migration configuration every 24 hours #1 (#1728)

* feat(mls-migration): migrate proteus conversations to "mixed" protocol  #2 (#1729)

* feat: add api support for updating the conversation protocol

* fix: rollback when we lost the race to establish a MLS group

* feat: add method for migrating from proteus to mixed

* feat: allow controlling migration update interval in CLI

* fix: use correct use case for checking if we can register an MLS client

* chore: add debug response since BE is not implemented yet

* test: add test for aborting when losing the race to establish an mls group

* test: add network test for updating protocol

* test: add MLSMigrationManager tests

* test: add MLSMigrator tests

* chore: fix detekt

* chore: make method private

* refactor: add sql query for fetching proteus team conversations

* feat(mls-migration): support the mixed protocol #3 (#1747)

* feat(mls-migration): supported protocols for users #4 (#1786)

* feat: add API support for supported-protocols

* feat: support persisting supported protocols

* feat: add mapping of supported protocols in logic module

* chore: fix detekt

* refactor: rename UserProtocol to SupportedProtocolDTO

* chore: move BaseApi interface to its own file

* chore: fix imports after rebase

* fix: persist supported protocols locally when API request succeeds

* chore: remove debug line

* fix: mapsupported protocols when fetching one-to-one conversationd details

* refactor: move default value for supportedProtocols to the logic mapping layer

* feat(mls-migration): add use case for updating self supported protocols #5 (#1797)

* feat: add use case for calculating & updating your supported protocols

* test: add UpdateSupportedProtocolsUseCase tests

* refactor: simplify UpdateSupportedProtocolsUseCase

* test: update feature config test data

* chore: add mlsPublicKeys to NewClientDTO

* feat: persist if a self client is mls capable or not

* feat: update supported protocols from user.update events

* fix: migration (altering column positions not supported)

* chore: clean up disabled migration configuration

* feat(mls-migration): update conversation protocol to MLS when every member supports MLS #6 (#1802)

* feat: support fetching conversation which can be finalised

* feat: add function for finalising migration of team group conversations

* refactor: move methods for updating conversation protocol to the conversation repository

* chore: fix copy & paste error

* refactor: don't return a flow when fetching conversations for migration

* chore: rebase clean up

* performance: replace string matching by listing all cases

* feat(mls-migration): handle protocol changed events #7 (#1811)

* feat: support persisting protocol changed system messages

* feat: process protocol change events and insert system messages

* feat: insert history lost system message if protocol change is discovered during slow sync

* chore: add migration of new system message table

* fix: check if protocol was updated inside sql query

* fix: only fetch group conversations by protocol

* chore: document return value

* chore: fix test naming

* refactor: rename SystemMessageBuilder to SystemMessageInserter

* chore: fix logging inconsistency

* fix: import

* chore: fix compilation

* fix(mls-migration): migration feature config #8 (#1822)

* fix: mls migration config fields are all optional

* chore: update tests after removing no longer necessary fields

* chore: fix detekt

* feat(mls-migration): update self supported protocols during slow sync (#1826) #9

* feat(mls-migration): force migration when migration deadline arrives #10 (#1831)

* feat: end migration regardless when migration deadline arrives

* refactor: generalise methods for fetching conversations ids

* refactor: better naming

* feat(mls-migration): ignore inactive clients when calculating supported protocols #11 (#1904)

* test: add test cases for ignoring inactive clients when calculating supported protocols

* feat: add isActive computed property on Client model

* feat: ignore inactive clients when checking if all self clients are mls capable

* fix: updating isMLSCapable (#1905)

* fix: updating isMLSCapable

It should be possible to update isMlsCapable from false to true but not the other
way around.

* chore: add comment explaining update logic for is_mls_capable

* chore: fix detekt

* fix: fetch  group id on protocol change event (#2051)

* chore: fix rebase issues

* feat(mls-one2one): support fetching mls 1-1 conversation details (#1848)

* feat: support fetching mls 1-1 conversation details

* refactor: better naming of function

* chore: remove another magic number

* feat: support migrating to mls 1-1 (#1875)

* refactor: allow fetching whole conversations by a member (#1940)

* feat(mls): establish one-to-one conversation [WPB-2258] (#1953)

* feat(mls): migrate 1:1 connections from Proteus to MLS [WPB-2258] (#1968)

* feat: choose 1:1 protocol [WPB-2181] (#1993)

Co-authored-by: Mojtaba Chenani <[email protected]>

* feat: check supported protocols when creating/fetching team 1-1 (#1995)

* feat: establish mls 1-1 for team members if mls is the default protocol

* refactor: update GetOrCreateOneToOneConversationUseCaseTest to use the arrangement pattern

* test: update with test cases covering mls as the default protocol

* feat: support persisting the default protocol from mls feature config

* test: update & add tests for persisting the default protocol

* chore: EstablishMLSOneToOneUseCase was renamed into MLSOneOnOneConversationResolver

* test: don't expect team 1-1 to be established

* fix: use protocol selector for deciding which protocol use for team 1-1

* chore: fix detekt

* feat: active one-on-one conversations (#2010)

* feat(persistance): active/inactive one-on-one conversations

* fix: more efficient mapping

Co-authored-by: Mojtaba Chenani <[email protected]>

* refactor: join with Conversation table instead

Co-authored-by: Mojtaba Chenani <[email protected]>

* refactor: rename oneOnOneConversationId to activeOneOnOneconversationId

Co-authored-by: Vitor Hugo Schwaab <[email protected]>

---------

Co-authored-by: Mojtaba Chenani <[email protected]>
Co-authored-by: Vitor Hugo Schwaab <[email protected]>

* feat: resolve one-on-one conversation to either proteus or mls (#2012)

* chore: default selfUser when creator field is missing (#2022)

* feat: resolve one-on-one conversaions when receiving a welcome message (#2033)

* feat: resolve one-on-on conversation when the connection is accepted (#2034)

* feat: resolve active 1-1 after accepting a connection request (#2042)

* feat: support parsing quantum safe ciphersuite (#2049)

* feat: avoid race condition when establishing mls 1-1 (#2048)

* feat: add live property to Event model

 Distinguish between live events arriving via the websocket vs events fetched when catching up when connectivity is restored.

* feat: delay resolving active one-on-one when live

Delay resolving active one-on-one when a connection request is accepted and we are live. This avoids a race to establish the mls group when multiple clients are online, which is wasteful.

* chore: update tests after adding live propperty

* chore: fix detekt

* fix: always schedule resolving active 1-1 to avoid discarding welcome msg

* fix: improve stability of MLS 1-1 conversations (#2063)

* fix: don't fail migration query if messages has already been copied

* fix: don't fail the slow sync on a non-recoverable error when resolving 1-1s

* fix: don't fail the slow sync when etablishing 1-1 fails due to missing key packages

* fix: re-use existing mls group if it exists

* fix: establish 1-1 also with other self clients

* test: add missing test for establishing 1-1

* chore: fix detekt

* feat: limit re-trying commit to 2 retry attemps (#2068)

* feat: limit re-trying commit to 2 retry attemps

* refactor: rename retryCount to remainingAttempts for better readability

* chore: fix rebase issues

* fix: map and ignore BufferedFutureMessage (#2084)

* feat: update supported protocols after deleting a client (#2083)

* feat(mls): recover from stale epoch on message sending (#2076)

* feat: parse epoch timestamp in conversation response

* feat: persist epoch timestamp

* feat: verify if we have lost commits when sending fails

* test: add tests for stale epoch handler

* fix: don't compare against  last processed event instead use time heuristic

* fix: verify epoch using CC in WrongEpochHandler

* refactor: replace MLSWrongEpochHandler with StaleEpochVerifier since they are identical

* fix: fail re-try if verifying epoch fails

* Revert "feat: persist epoch timestamp"

This reverts commit e968af6.

* Revert "feat: parse epoch timestamp in conversation response"

This reverts commit f1bf8b4.

* chore: remove any remaining trace of epochTimestamp

* chore: fix rebase issues

* feat(debug): add use case for disabling event processing (#2096)

* chore: fix rebase issues

* refactor: mls feature config handling (#2114)

* refactor: mls feature config handling

* chore: remove debug println

* chore: use named arguments for better readability

* refactor: extract handlers into variables and re-use them

* fix: assign supported_protocols when migration user db (#2126)

* chore(mls): propagate activeOneOnOneConversation on ConversationDetails [WPB-4705] (#2130)

* refactor: only have one main query for inserting/updating users (#2124)

* refactor: only have one main query for inserting/updating users

* fix: incorrect user mapping for team member

* chore: inline function which is only used once

* fix: updating connection status

* fix: user type not updating for non-team contacts

* fix: typo in partial user update query

* chore: update tests after rebase

* fix: supported protocols is only available since API v5 (#2128)

* fix: skip updating supported protocols when mls is not supported (#2131)

* chore: fix rebase issues

* chore: squash migrations when possible

---------

Co-authored-by: Vitor Hugo Schwaab <[email protected]>
Co-authored-by: Mojtaba Chenani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants