Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: extremely slow paginated conversation list query [WPB-11808] 🍒 #3102

Merged

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Nov 13, 2024

BugWPB-11808 [Android] app stalls when fetching a page of conversations from the local DB

This PR was automatically cherry-picked based on the following PR:

Original PR description:



PR Submission Checklist for internal contributors

  • The PR Title

    • conforms to the style of semantic commits messages¹ supported in Wire's Github Workflow²
    • contains a reference JIRA issue number like SQPIT-764
    • answers the question: If merged, this PR will: ... ³
  • The PR Description

    • is free of optional paragraphs and you have filled the relevant parts to the best of your ability

What's new in this PR?

Issues

The paginated conversation list query is unable to show the list in a reasonable time, if someone has many conversations and messages then it can take multiple minutes.

Causes (Optional)

Left joining messages with multiple columns from different content tables and left joining unread events to just get last message for each conversation and count of different unread events is too far from optimal. We have managed to reduce full table scans but there is still a lot of data to work on by this query.

Solutions

To have a proper data to work on, I filled a database with ~1150 conversations and over 200 000 messages, biggest conversation had around 35 000 messages and 27 000 unread events.
With this DB, I waited more than 10 minutes using the current paginated query to get the result and eventually gave up. So I checked various ideas, below are the ones that looked most promising.

For unread events table, there are two ways: to left join all unread events into main query and calculate it or to make a subquery or view to first count events on just that table and then left join results of that calculations into the main query.

For message table, first idea was to match the solution implemented some time ago and to make a single subquery to get ids of last messages for each conversation and then left join it with all contents and other things - thanks to that it doesn't left join all messages and their contents first, but just the messages that it actually needs.
Second one was to create a dedicated table which contains ids of last message for each conversation, and just that, so it could replace this subquery and make the query even simpler. The main problem with that was to make sure this table is synced with the message table, so 3 triggers were created - when inserting new message, when updating visibility or type of a message (two parameters that are taken into account when selecting last message) and when deleting a last message (to find the one that from now on should be the last one). Another issue is whether this can affect the event handling times, so I checked that by executing 3000x processApplicationMessage using current code and with this additional table which needs to stay synced using triggers when new message is inserted:

image

Results show that there is no difference, so it's safe to use.

To have a perspective, here are the times of three main queries used to fetch conversation list currently without the pagination:
image

Around 175ms to get the list without pagination.
As I said, with current paginated query I couldn't get a result in over 10 minutes, but with improvements explained above, it started to work in a very reasonable time:

image

A, B, C, D are just a combinations of these improvements for joining both message and unread event table, explained here (with average execution times):

image

It looks like the winner is D, and it's unbelievable that the execution of such complex query can take under 20ms, but so far it looks like everything is working correctly.
So here in this PR, the D approach is implemented - with new dedicated table LastMessage and UnreadEventCountsGrouped view used in the main query which doesn't require anymore to be GROUPed BY.

Testing

How to Test

Enable paginated_conversation_list_enabled feature flag and open the conversation list.


PR Post Submission Checklist for internal contributors (Optional)

  • Wire's Github Workflow has automatically linked the PR to a JIRA issue

PR Post Merge Checklist for internal contributors

  • If any soft of configuration variable was introduced by this PR, it has been added to the relevant documents and the CI jobs have been updated.

References
  1. https://sparkbox.com/foundry/semantic_commit_messages
  2. https://github.com/wireapp/.github#usage
  3. E.g. feat(conversation-list): Sort conversations by most emojis in the title #SQPIT-764.

)

* fix: extremely slow paginated conversation list query [WPB-11808]

* fix name

* handle case of moving messages to other conversation

* add tests
@github-actions github-actions bot added cherry-pick PR is cherry-picking changes from another banch echoes: unplanned Any work item that isn’t part of the product or technical roadmap. type: bug / fix 🐞 👕 size: XL labels Nov 13, 2024
Copy link
Contributor Author

github-actions bot commented Nov 13, 2024

Test Results

3 271 tests  +6   3 164 ✅ +6   4m 58s ⏱️ +35s
  557 suites ±0     107 💤 ±0 
  557 files   ±0       0 ❌ ±0 

Results for commit e5613bc. ± Comparison against base commit f883583.

♻️ This comment has been updated with latest results.

Copy link
Contributor Author

🐰 Bencher Report

Branchfix/too-slow-paginated-conversation-list-query-cherry-pick
Testbedubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results
BenchmarkLatencynanoseconds (ns)
com.wire.kalium.benchmarks.logic.CoreLogicBenchmark.createObjectInFiles📈 view plot
⚠️ NO THRESHOLD
667,688.90
com.wire.kalium.benchmarks.logic.CoreLogicBenchmark.createObjectInMemory📈 view plot
⚠️ NO THRESHOLD
355,288,301.45
com.wire.kalium.benchmarks.persistence.MessagesNoPragmaTuneBenchmark.messageInsertionBenchmark📈 view plot
⚠️ NO THRESHOLD
1,304,093,985.81
com.wire.kalium.benchmarks.persistence.MessagesNoPragmaTuneBenchmark.queryMessagesBenchmark📈 view plot
⚠️ NO THRESHOLD
21,978,820.06
🐰 View full continuous benchmarking report in Bencher

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.21%. Comparing base (f883583) to head (e5613bc).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3102      +/-   ##
===========================================
+ Coverage    54.19%   54.21%   +0.02%     
===========================================
  Files         1196     1196              
  Lines        35592    35596       +4     
  Branches      3625     3625              
===========================================
+ Hits         19288    19298      +10     
+ Misses       14900    14899       -1     
+ Partials      1404     1399       -5     
Files with missing lines Coverage Δ
...tlin/com/wire/kalium/persistence/db/TableMapper.kt 100.00% <100.00%> (ø)
.../wire/kalium/persistence/db/UserDatabaseBuilder.kt 81.01% <100.00%> (+0.12%) ⬆️

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f883583...e5613bc. Read the comment docs.

@datadog-wireapp
Copy link

Datadog Report

Branch report: fix/too-slow-paginated-conversation-list-query-cherry-pick
Commit report: 6964646
Test service: kalium-jvm

✅ 0 Failed, 3164 Passed, 107 Skipped, 37.09s Total Time

@saleniuk saleniuk requested review from a team, typfel, yamilmedina, alexandreferris, saleniuk and damian-kaczmarek and removed request for a team November 14, 2024 08:23
@saleniuk saleniuk added this pull request to the merge queue Nov 14, 2024
Merged via the queue into develop with commit 837f7ea Nov 14, 2024
22 checks passed
@saleniuk saleniuk deleted the fix/too-slow-paginated-conversation-list-query-cherry-pick branch November 14, 2024 10:08
@MohamadJaara MohamadJaara added echoes: technical-roadmap Work contributing to the Technical Roadmap, to improve our velocity or reduce the technical debt. and removed echoes: unplanned Any work item that isn’t part of the product or technical roadmap. labels Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick PR is cherry-picking changes from another banch echoes: technical-roadmap Work contributing to the Technical Roadmap, to improve our velocity or reduce the technical debt. 👕 size: XL type: bug / fix 🐞
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants