-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect data on performance improvements to the dashboard #145
Comments
to clarify, when I talk about "new PR for reversion", I expect the following
|
We aim to have the baseline on staging and tested tonight. If good, we'll promote to the 3 prod environments we are testing on, and I will perform the first access time @ 8am MT tomorrow. If any delay, we will push this back to 8am MT Wednesday. |
I now see the float error again; we need to cherry-pick that change
![]() |
Re-opening since this is not actually done. |
Unlike @shankari's findings; all STM, Ebike, Smart Commute, and Stage loaded fine for me Tested: 4 PM MT - 4:47 MT Load order speeds [STM, Smart Commute, Stage, ccebikes ] -Initial Load -From sidebar click "Data" -Click "Trips" tab -Click "Demographics" tab -Click "Trajectories" tab -From sidebar click "Map" -Select "Density Heatmap" -Select "Unlabeled" Everything functioned normally for me |
As I reported on Teams this morning:
This held true for the 12pm and 4pm accesses as well. No timeouts on smart commute or stm community. |
@JGreenlee what about on stage? Are you testing that periodically as well? |
I tested Stage just now. UUIDs eventually loaded. Trips, Trajectories, and Heatmap were empty – I don't think it's due to a timeout, I think it is due to a lack of data in the last week Starting tomorrow I'll access Stage along with the others. |
8 PM MT:
|
12 AM MT:
|
8am MT was consistent with my findings from yesterday. The active users box is indeed missing on ccebikes; it was likely missing yesterday as well but I didn't notice. |
I'm going to make sure my staging phones are on and connected to Wi-Fi today so that we have at least a couple trips on |
Tested: 4 PM MT
|
@shankari I discussed with @JGreenlee in regards to the analysis, he suggested this workflow; what are your thoughts?
|
I was a little late starting today because I forgot (I should put a reminder in my calendar). I started at closer to 4:30pm MT. |
@TeachMeTW @JGreenlee is your mentor, so I would take his advice 😄 BTW, I completely agree on "no ML", EDA != ML |
8 PM MT:
|
Interesting. I saw one trip and its associated trajectory on stage (presumably from Jack's phone) |
12 AM MT:
|
I saw my trip on staging yesterday and this morning. |
@shankari For the record, I had suggested that @TeachMeTW get your advice in addition to mine since most of my experience with study design + statistical analysis has been in a different domain |
4 PM MT:
|
I forgot today too and am just loading now. @TeachMeTW did you do this at 4pm PT or 4pm MT? |
@shankari I started at 3:15 PST, did not finish until 4 PST; mainly ccebike not finishing loading at all |
~ Fixing ~ |
@TeachMeTW I suggested removing the bottom 80% of stats for the pipeline changes not the admin dashboard. The admin dashboard is only launched when a user logs in, so the additional data storage is minimal. The pipeline runs every hour, so the additional data storage is substantial. I am not opposed to (eventually) dropping the bottom 80% of timing stats in the admin dashboard, but that is not blocking the push to production of the pipeline timing changes. |
@shankari please reupload the staging data snapshot; the one from tuesday became corrupted |
I also see |
That number is in seconds of execution time, maybe there wasn't enough functions; this is what i dod
|
As for pipeline, these are the values (Not the most recent): Bottom 80:
Top 20%:
|
After merging e-mission/e-mission-server#993 and e-mission/e-mission-server#1005 and #153, I logged in to staging. And the initial "Overview" load was still pretty slow. But we were still loading the UUIDs in a batch when I went to the data table. So presumably the "Overview" load didn't load all the UUIDs? Because if it did, we should just have cached and reused that data, right? So then what is the overview loading? The functionality seems to be working correctly, and we can collect data next week, but I think that this can benefit from some cleanup. |
@JGreenlee @TeachMeTW as a suggestion, while cleaning up e-mission/e-mission-server#1005, you can also split the user stats into pipeline-dependent and pipeline-independent. The pipeline-dependent stats can stay as the last stage in the pipeline, and the pipeline-independent stats can be outside the early return. This will allow us to not waste time recomputing the pipeline-dependent stats when the pipeline wasn't run. The current structure of |
Deployed to the three production environments as well. |
I believe the Overview page does create the entire list of UUIDs. On the Data page UUIDs table, we are loading stats for those UUIDs. We already have the full list of UUIDs by the time we get to the Data table, and we split that list into slices of 10 UUIDs and load the stats for one slice at a time. |
Accessed all 4 dashboards ~9am MT. Nothing appears significantly different than before. The Trips and Trajectories tabs did not take as long as I remembered. Maybe it's simply because it's winter and there is less travel in the last week. I didn't notice anything broken on any of the dashboards. There are some recent trips on |
This is likely happening faster computationally due to offloading the user stats computation to the pipeline, but we are not observing the effect because of the 20 second interval between batches. |
Loading the overview and trajectories seemed slower today across the board. |
Same results for me as well |
Too add to this, today seems better for smart commut but ccebikes still fails on active users |
Test Stage 2/3/25: Test Process:
Test Notes:
|
Why are you changing to the 2022-2025 range? Is this something you did earlier? |
@shankari it is an arbitrary range to test the date picker and to see if the data would updated. Previously, I only changed the month so perhaps I should stay consistent with that. I chose 2022 - 2025 as it could encompass most data. |
We should stick to what we were doing before (which I thought was keeping the filters untouched) If our previous measurements were taken with 1 week of data and our current measurements are taken with 2-3 years of data, we can't make comparisons. It won't matter for UUIDs and Active Users but it will matter for everything else |
@JGreenlee Understood I will rerun the suite without changing the filters |
@shankari @JGreenlee Updated my testing findings and process. |
@JGreenlee @TeachMeTW #156 is now on production across the board. You can start testing against all the canonical production environments |
…tabase We have made several changes to cache summary information in the user profile. This summary information can be used to improve the scalability of the admin dashboard (e-mission/op-admin-dashboard#145) but will also be used to determine dormant deployment and potentially in a future OpenPATH-wide dashboard. These changes started by making a single call to cache both trip and call stats e-mission#1005 This resulted in all the composite trips being read every hour, so we split the stats into pipeline-dependent and pipeline-independent stats, in 88bb35a (part of e-mission#1005) The idea was that, since the composite object query was slow because the composite trips were large, we could run only the queries for the server API stats every hour, and read the composite trips only when they were updated. However, after the improvements to the trip segmentation pipeline (e-mission#1014, results in e-mission/e-mission-docs#1105 (comment)) reading the server stats is now the bottleneck. Checking the computation time on the big deployments (e.g. ccebikes), although the time taken has significantly improved as the database load has gone down, even in the past two days, we see a median of ~ 10 seconds and a max of over two minutes. And we don't really need to query this data to generate the call summary statistics. Instead of computing them on every run, we can compute them only when _they_ change, which happens when we receive calls to the API. So, in the API's after_request hook, in addition to adding a new stat, we can also just update the `last_call_ts` in the profile. This potentially makes every API call a teeny bit slower since we are adding one more DB write, but it significantly lowers the DB load, so should make the system as a whole faster. Testing done: - the modified `TestUserStat` test for the `last_call_ts` passes
@shankari Test Process:
Test Notes:
Also tested the new stats and added to op dash: |
@TeachMeTW are we testing this in production? I don't see any updates here, and the changes have been on production for a couple of weeks. |
The 4 production environments? Yes I still am, no new updates really same as #145 (comment) regardless of the environment, the trajectories tab is quite slow. But nothing new to report other than that. Should I make daily updates or only update when things are different? |
I think at least a weekly update would be good |
…tabase We have made several changes to cache summary information in the user profile. This summary information can be used to improve the scalability of the admin dashboard (e-mission/op-admin-dashboard#145) but will also be used to determine dormant deployment and potentially in a future OpenPATH-wide dashboard. These changes started by making a single call to cache both trip and call stats e-mission#1005 This resulted in all the composite trips being read every hour, so we split the stats into pipeline-dependent and pipeline-independent stats, in 88bb35a (part of e-mission#1005) The idea was that, since the composite object query was slow because the composite trips were large, we could run only the queries for the server API stats every hour, and read the composite trips only when they were updated. However, after the improvements to the trip segmentation pipeline (e-mission#1014, results in e-mission/e-mission-docs#1105 (comment)) reading the server stats is now the bottleneck. Checking the computation time on the big deployments (e.g. ccebikes), although the time taken has significantly improved as the database load has gone down, even in the past two days, we see a median of ~ 10 seconds and a max of over two minutes. And we don't really need to query this data to generate the call summary statistics. Instead of computing them on every run, we can compute them only when _they_ change, which happens when we receive calls to the API. So, in the API's after_request hook, in addition to adding a new stat, we can also just update the `last_call_ts` in the profile. This potentially makes every API call a teeny bit slower since we are adding one more DB write, but it significantly lowers the DB load, so should make the system as a whole faster. Testing done: - the modified `TestUserStat` test for the `last_call_ts` passes
Tested this week: Test Process: Open Overview Page Test Notes: Instant Overview Page loaded |
Plan discussed at meeting
We will use the following environments:
We will access the environments at the following times:
Steps to perform:
We will collect data for one week for the baseline and one week for each change.
We will start with reverting the previous batching changes so that we can assess the impact before moving on.
Timeline:
The text was updated successfully, but these errors were encountered: