Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvements in User Stats Processing #124

Merged
merged 20 commits into from
Sep 18, 2024

Conversation

TeachMeTW
Copy link
Contributor

@TeachMeTW TeachMeTW commented Sep 9, 2024

Description

This pull request introduces several performance improvements to the data page:

  1. Timer Integration: Added timing to measure the performance of the data page.
  2. Lazy Loading: Added processing of UUIDs in batches of 10.

Changes Made

1. Timer Integration

Added timing to measure the execution time of the add_user_stats function and other functions.

2. Lazy Loading Implementation

Replaced sequential user processing with batch processing; batches can be set in add_user_stats in db_utils for default value changes or in render_content in data.py

Each batch of 10 takes about ~7 seconds of processing.

@TeachMeTW
Copy link
Contributor Author

This PR would need a squash due to my extensive commits

@shankari
Copy link
Contributor

shankari commented Sep 12, 2024

@JGreenlee @TeachMeTW I looked at this a while ago and came up with a plan (or two) but didn't have time to implement. The key here is to really dig into the user model and understand that the user stats consists of two separate parts:

  • the user profile: which is stored as a single table
  • the user stats: which are computed from the trip table (e.g. % labeled, last trip, first trip, etc)

Reading the first is super fast, because it is a single small table
Reading the second is very slow because we have to essentially query for the entire trip table

We should fix this by:

  • lazy loading the user stats part: I looked up plotly and it is possible to update data as new fields are computed (e.g. something like https://community.plotly.com/t/extend-or-append-data-instead-of-update/8898/4). I might have filed an issue for this as well but can't find it now.
  • pre-computing the user stats and also storing them in the user profile: ideally we would do this as part of the pipeline OR a separate script that runs once a day. Note that we will also need a backwards compat script if we choose this.

@shankari
Copy link
Contributor

Ah I think it was Patch that I originally found for lazily updating data
https://plotly.com/blog/partial-properties/

@TeachMeTW
Copy link
Contributor Author

@shankari I made progress in regards to the lazy loading, the only thing I'd need to figure out is how to stop it from refreshing the ui. I tried to use Patch() but seems to not work fully as intended yet; I'll keep at it.

See below:

8mb.video-WNJ-MZ7G6LG7.mp4

@TeachMeTW
Copy link
Contributor Author

@shankari Progress, discussed with @JGreenlee to help resolve some of the issues I was facing; now loads without reloading the entire thing.

8mb.video-P0Y-tuqh2oNa.mp4

@TeachMeTW TeachMeTW marked this pull request as ready for review September 17, 2024 04:40
@shankari shankari changed the base branch from master to upgrade_dependencies September 18, 2024 02:34
@shankari shankari changed the base branch from upgrade_dependencies to master September 18, 2024 02:34
Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work, and I want to get it out to staging ASAP so that we can test and move to production (and unblock everybody). We can always return and polish later.

But I do have some questions for me to understand your changes better

app_sidebar_collapsible.py Show resolved Hide resolved
app_sidebar_collapsible.py Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
Comment on lines +109 to +112
# Create a Patch object to append data progressively
patched_data = Patch()
patched_data['data'] = processed_data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this is used. I see that the patched_data object is created here, but I don't see it used anywhere else in this PR. I even see that line 129 references this Patch object in a comment, but I don't see any of the Patch object methods, such as append.

Are we actually using patch? If not, what are we doing for lazy loading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was a relic of a prior iteration I made; I can remove/clean this section in another PR.

utils/db_utils.py Show resolved Hide resolved
@shankari
Copy link
Contributor

Squash merging this as well, make sure to pull the changes before starting on the next PR, @TeachMeTW

@shankari shankari merged commit b9b0c34 into e-mission:master Sep 18, 2024
shankari pushed a commit that referenced this pull request Sep 22, 2024
* Removed artifacts

* Removed artifacts
TeachMeTW added a commit to TeachMeTW/op-admin-dashboard that referenced this pull request Oct 21, 2024
shankari added a commit that referenced this pull request Oct 21, 2024
shankari added a commit that referenced this pull request Oct 21, 2024
shankari added a commit that referenced this pull request Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Tasks completed
Development

Successfully merging this pull request may close these issues.

2 participants