Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect data on performance improvements to the dashboard #145

Open
shankari opened this issue Oct 20, 2024 · 74 comments · Fixed by #150
Open

Collect data on performance improvements to the dashboard #145

shankari opened this issue Oct 20, 2024 · 74 comments · Fixed by #150

Comments

@shankari
Copy link
Contributor

Plan discussed at meeting

We will use the following environments:

  • stage
  • ccebikes (large: 86k trips, 108 users)
  • smart commute (medium, long-term: 18k trips, 21 users)
  • STM community (small: 735 trips, 4 users)

We will access the environments at the following times:

Steps to perform:

  • Log in aka "cold boot" of the dashboard (initializes to "Overview")
  • Data page, UUIDs table, Trip table, Trajectories table
  • Maps page with no filters

We will collect data for one week for the baseline and one week for each change.
We will start with reverting the previous batching changes so that we can assess the impact before moving on.

Timeline:

@shankari
Copy link
Contributor Author

to clarify, when I talk about "new PR for reversion", I expect the following

  1. git revert b9b0c347a633a1f44bf697b4cd50e720a279ac09
  2. commit (generate SHA1)
  3. add timing (generate SHA2)
  4. we determine baseline
  5. then, we do git revert SHA2; commit
  6. git revert SHA1; commit
  7. then we can test the new code

@JGreenlee
Copy link
Contributor

  1. master has been rolled back to Sep 2, before @TeachMeTW started working on the dashboard
  2. what used to be master is now called future
  3. @TeachMeTW is going to add timing to the new master. Once that is done, it is considered the "baseline".
  4. We will push the baseline to staging

We aim to have the baseline on staging and tested tonight. If good, we'll promote to the 3 prod environments we are testing on, and I will perform the first access time @ 8am MT tomorrow.

If any delay, we will push this back to 8am MT Wednesday.

@shankari
Copy link
Contributor Author

I now see the float error again; we need to cherry-pick that change

ERROR:app_sidebar_collapsible:Exception on /_dash-update-component [POST]
Traceback (most recent call last):
  File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/flask/app.py", line 2529, in wsgi_app
    response = self.full_dispatch_request()
  File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/dash/dash.py", line 1310, in dispatch
    ctx.run(
  File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/dash/_callback.py", line 442, in add_context
    output_value = func(*func_args, **func_kwargs)  # %% callback invoked %%
  File "/usr/src/app/app_sidebar_collapsible.py", line 318, in update_store_trips
    df, user_input_cols = query_confirmed_trips(start_date, end_date, timezone)
  File "/usr/src/app/utils/db_utils.py", line 183, in query_confirmed_trips
    df["data.primary_ble_sensed_mode"] = df.ble_sensed_summary.apply(get_max_mode_from_summary)
  File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/series.py", line 4771, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
  File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/apply.py", line 1123, in apply
    return self.apply_standard()
  File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/apply.py", line 1174, in apply_standard
    mapped = lib.map_infer(
  File "pandas/_libs/lib.pyx", line 2924, in pandas._libs.lib.map_infer
  File "/usr/src/app/utils/db_utils.py", line 179, in <lambda>
    get_max_mode_from_summary = lambda md: max(md["distance"], key=md["distance"].get) if len(md["distance"]) > 0 else "INVALID"
TypeError: 'float' object is not subscriptable
  • Trips trend does not show up
  • Trips table is still blank
  • Contrary to @TeachMeTW's experience, the UUID table is not blank
Screenshot 2024-10-21 at 9 22 39 PM

@shankari
Copy link
Contributor Author

Re-opening since this is not actually done.

@shankari shankari reopened this Oct 23, 2024
@shankari
Copy link
Contributor Author

While collecting data at 4pm MT (3pm PT) today, I ran into some blank tables, likely due to timeouts.

e.g.
Screenshot 2024-10-23 at 3 33 44 PM

Screenshot 2024-10-23 at 3 37 06 PM Screenshot 2024-10-23 at 3 38 02 PM
Object { message: "Callback error updating tabs-content.children", html: "<html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body>\r\n<center><h1>504 Gateway Time-out</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n" }
[dash_renderer.v2_15_0m1729614367.min.js:2:95440](https://openpath-stage.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js)
    _o https://openpath-stage.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    ui https://openpath-stage.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    r https://openpath-stage.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    r https://openpath-stage.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    p https://openpath-stage.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    nt https://openpath-stage.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    zi https://openpath-stage.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
Screenshot 2024-10-23 at 3 39 02 PM

Not sure if we are recording that anywhere or can record it given that it is an error in the plotly framework

Screenshot 2024-10-23 at 3 39 35 PM
Object { message: "Callback error updating card-active-users.children", html: "<html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body>\r\n<center><h1>504 Gateway Time-out</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n" }
[dash_renderer.v2_15_0m1729614367.min.js:2:95440](https://ccebikes-openpath.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js)
    _o https://ccebikes-openpath.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    ui https://ccebikes-openpath.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    r https://ccebikes-openpath.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    r https://ccebikes-openpath.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    p https://ccebikes-openpath.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    nt https://ccebikes-openpath.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2
    zi https://ccebikes-openpath.nrel.gov/admin/_dash-component-suites/dash/dash-renderer/build/dash_renderer.v2_15_0m1729614367.min.js:2

@TeachMeTW
Copy link
Contributor

TeachMeTW commented Oct 23, 2024

Unlike @shankari's findings; all STM, Ebike, Smart Commute, and Stage loaded fine for me

Tested: 4 PM MT - 4:47 MT

Load order speeds [STM, Smart Commute, Stage, ccebikes ]

-Initial Load

-From sidebar click "Data"

-Click "Trips" tab

-Click "Demographics" tab

-Click "Trajectories" tab

-From sidebar click "Map"

-Select "Density Heatmap"

-Select "Unlabeled"

Everything functioned normally for me

@JGreenlee
Copy link
Contributor

As I reported on Teams this morning:

On ccebikes, UUIDs and Trajectories both timed out after several minutes and failed to display

This held true for the 12pm and 4pm accesses as well. No timeouts on smart commute or stm community.

@shankari
Copy link
Contributor Author

@JGreenlee what about on stage? Are you testing that periodically as well?

@JGreenlee
Copy link
Contributor

I tested Stage just now. UUIDs eventually loaded. Trips, Trajectories, and Heatmap were empty – I don't think it's due to a timeout, I think it is due to a lack of data in the last week

Starting tomorrow I'll access Stage along with the others.

@TeachMeTW
Copy link
Contributor

8 PM MT:

  • CCEbike did not load active users box
  • Smart Commute did not load uuids
  • Stage did not load uuids or trajectories or trips (might need to add date change to this test?)

@TeachMeTW
Copy link
Contributor

12 AM MT:

  • CCEbike did not load active users box
  • Smart Commute now loads uuids
  • CCEbike long load times in trajectories
  • Stage loads uuids
  • Stage no trips or trajectories

@JGreenlee
Copy link
Contributor

On ccebikes, UUIDs and Trajectories both timed out after several minutes and failed to display

This held true for the 12pm and 4pm accesses as well. No timeouts on smart commute or stm community.

I tested Stage just now. UUIDs eventually loaded. Trips, Trajectories, and Heatmap were empty – I don't think it's due to a timeout, I think it is due to a lack of data in the last week

Starting tomorrow I'll access Stage along with the others.

8am MT was consistent with my findings from yesterday.

The active users box is indeed missing on ccebikes; it was likely missing yesterday as well but I didn't notice.

@JGreenlee
Copy link
Contributor

I'm going to make sure my staging phones are on and connected to Wi-Fi today so that we have at least a couple trips on stage to populate Trips, Trajectories and the map

@TeachMeTW
Copy link
Contributor

Tested: 4 PM MT

  • Home page functioned normally for all
  • CCEbike UUIDs did NOT load
  • Stage still no trips or trajectories; seems not added yet
  • CCEbike slow on trajectories

@TeachMeTW
Copy link
Contributor

@shankari I discussed with @JGreenlee in regards to the analysis, he suggested this workflow; what are your thoughts?

I recommend performing exploratory analysis+viz before removing outliers so that you know the data you're working with, you can see the outliers and identify them.
 
You will have dozens of fine-grained measures for execution time of small tasks. I think you should try to extract a couple higher-level measures that are specifically related to the changes you made.
I suggest something like i) total time to load all figures on the overview and ii) time to load the UUIDs table
 
Then generate descriptives+ charts for each high-level measure ("basic stuff" like mean execution time by week, dataset size)
You could generate these for fine-grained measures too, for exploratory purposes, but probably won't make it in the paper.
 
If I remember stats, I think the main analysis you want to perform is probably a two-way repeated measures ANOVA. We are mostly interested in 2 IVs:
The week (week 1 = baseline; week 2 = first batch of improvements; week 3 = both batches of improvements).
Dataset size (small = stm community, medium = smart commute, large = ccebikes)
With the DV being execution time for the high level tasks
 
This will allow you to test for:
Main effect of week (Did performance change significantly as we rolled out improvements?)
Main effect of dataset size (Does performance significantly degrade with size of the dataset?)
Interaction effect (Does the performance change differently depending on the size of the dataset? (highlight this for scalability)

I don't see what a correlation matrix would be useful for.
I don't see a need for modeling or ML unless (i) you think the time of day has a significant effect that you want to model or (ii) you want a regression model to predict how the system would scale to even larger programs (larger than ccebikes). But I suggest trying to keep your analysis fairly small-scale.

@shankari
Copy link
Contributor Author

I was a little late starting today because I forgot (I should put a reminder in my calendar). I started at closer to 4:30pm MT.

@shankari
Copy link
Contributor Author

shankari commented Oct 24, 2024

@TeachMeTW @JGreenlee is your mentor, so I would take his advice 😄
Seriously, if there is something that JGreenlee is unclear about and needs my help, I am happy to step in.
But otherwise, I trust Jack's judgement, and am happy to review results once they are ready and provide feedback that way.

BTW, I completely agree on "no ML", EDA != ML
I would suggest starting with basic data viz (e.g. box plots) even before using "fancy" stats like ANOVA.
Ideally, we could show the difference visually between weeks/datasets

@TeachMeTW
Copy link
Contributor

8 PM MT:

  • Home Page loaded normally albeit slow on ccebikes
  • CCEbike did not load uuids
  • Still no trajectories or trips on stage

@shankari
Copy link
Contributor Author

Still no trajectories or trips on stage

Interesting. I saw one trip and its associated trajectory on stage (presumably from Jack's phone)

@TeachMeTW
Copy link
Contributor

TeachMeTW commented Oct 25, 2024

12 AM MT:

  • Home page loaded normally
  • CCEbike did not load uuids
  • Still no trajectories or trips on stage (maybe it hasnt updated on my end?)

@JGreenlee
Copy link
Contributor

I saw my trip on staging yesterday and this morning.
Behavior is still fairly consistent on my end, except this morning I noticed that Trajectories on ccebikes loaded successfully instead of timing out.

@JGreenlee
Copy link
Contributor

Seriously, if there is something that JGreenlee is unclear about and needs my help, I am happy to step in. But otherwise, I trust Jack's judgement, and am happy to review results once they are ready and provide feedback that way.

@shankari For the record, I had suggested that @TeachMeTW get your advice in addition to mine since most of my experience with study design + statistical analysis has been in a different domain

@TeachMeTW
Copy link
Contributor

4 PM MT:

  • Only STM loaded uuids
  • Trajectories for CCEbike taking 15+ min/permanently loading

@shankari
Copy link
Contributor Author

I forgot today too and am just loading now. @TeachMeTW did you do this at 4pm PT or 4pm MT?

@TeachMeTW
Copy link
Contributor

@shankari I started at 3:15 PST, did not finish until 4 PST; mainly ccebike not finishing loading at all

@TeachMeTW
Copy link
Contributor

TeachMeTW commented Oct 26, 2024

~ Fixing ~

@shankari
Copy link
Contributor Author

@TeachMeTW I suggested removing the bottom 80% of stats for the pipeline changes not the admin dashboard. The admin dashboard is only launched when a user logs in, so the additional data storage is minimal. The pipeline runs every hour, so the additional data storage is substantial.

I am not opposed to (eventually) dropping the bottom 80% of timing stats in the admin dashboard, but that is not blocking the push to production of the pipeline timing changes.

@TeachMeTW
Copy link
Contributor

@shankari please reupload the staging data snapshot; the one from tuesday became corrupted

@shankari
Copy link
Contributor Author

I also see add_user_stats,350.3015870457756 in the bottom 80% which looks wrong. What does the number there represent?

@TeachMeTW
Copy link
Contributor

TeachMeTW commented Nov 15, 2024

I also see add_user_stats,350.3015870457756 in the bottom 80% which looks wrong. What does the number there represent?

That number is in seconds of execution time, maybe there wasn't enough functions; this is what i dod

    # Compute the 80th percentile threshold
    threshold_80 = df_total_time_grouped['data.reading'].quantile(0.8)

    # Split into top 20% and bottom 80%
    df_top_20_total_time = df_total_time_grouped[df_total_time_grouped['data.reading'] > threshold_80]
    df_bottom_80_total_time = df_total_time_grouped[df_total_time_grouped['data.reading'] <= threshold_80]

@TeachMeTW
Copy link
Contributor

As for pipeline, these are the values (Not the most recent):

Bottom 80:

data.name,data.reading
ACCURACY_FILTERING,0.02578527937728916
CLEAN_RESAMPLING,2.146863155281558
CREATE_COMPOSITE_OBJECTS,1.7069802132425573
CREATE_CONFIRMED_OBJECTS,1.4718458523202962
EXPECTATION_POPULATION,0.088950674887729
GENERATE_STORE_AND_RANGE,0.0036104017272961075
JUMP_SMOOTHING,1.0057648629329274
LABEL_INFERENCE,0.18408866121307846
STORE_USER_STATS,0.3237537619999994
TRIP_SEGMENTATION/check_out_of_order_points,0.00028674999999989126
TRIP_SEGMENTATION/create_dist_filter,2.9649600000247746e-06
TRIP_SEGMENTATION/create_places_and_trips/create_new_place,0.00014887752303130057
TRIP_SEGMENTATION/create_places_and_trips/create_raw_trip,0.00015851584249614424
TRIP_SEGMENTATION/create_places_and_trips/get_last_place_entry,0.002671166571432642
TRIP_SEGMENTATION/create_places_and_trips/get_time_series,8.577142859828818e-06
TRIP_SEGMENTATION/create_places_and_trips/handle_untracked_period,0.0034603348518523224
TRIP_SEGMENTATION/create_places_and_trips/insert_last_place,0.0006448212857141604
TRIP_SEGMENTATION/create_places_and_trips/link_and_save,0.002157715270430808
TRIP_SEGMENTATION/create_places_and_trips/start_new_chain,0.0002463035714241256
TRIP_SEGMENTATION/create_time_filter,2.7318399999920473e-06
TRIP_SEGMENTATION/fetch_location_data,0.18090795849999997
TRIP_SEGMENTATION/get_filters_in_df,0.047898718239999916
TRIP_SEGMENTATION/get_time_range,0.0019015625000001757
TRIP_SEGMENTATION/get_time_range_for_segmentation,0.001661970040000007
TRIP_SEGMENTATION/get_time_series,1.3946793103429986e-05
TRIP_SEGMENTATION/handle_out_of_order_points,0.0008165149199999533
TRIP_SEGMENTATION/identify_active_filters,0.0005030420000000646
TRIP_SEGMENTATION/initialize_filters,4.625000000091362e-06
TRIP_SEGMENTATION/mark_segmentation_done,0.0014093750000014893
TRIP_SEGMENTATION/segment_into_trips/append_segmentation,0.0014314497241378363
TRIP_SEGMENTATION/segment_into_trips/calculations_per_iteration,0.005296046663291943
TRIP_SEGMENTATION/segment_into_trips/check_transitions_post_loop,0.0022789169999981596
TRIP_SEGMENTATION/segment_into_trips/continue_just_ended,0.0002355626958475048
TRIP_SEGMENTATION/segment_into_trips/filter_bogus_points,0.025727010249998017
TRIP_SEGMENTATION/segment_into_trips/find_last_valid_point,0.00013377017130620743
TRIP_SEGMENTATION/segment_into_trips/get_filtered_location,0.21980716699999991
TRIP_SEGMENTATION/segment_into_trips/get_last_trip_end_point,0.00016301416981153078
TRIP_SEGMENTATION/segment_into_trips/handle_final_trip_end,0.0024963339999999334
TRIP_SEGMENTATION/segment_into_trips/handle_trip_end,0.0003509445943395742
TRIP_SEGMENTATION/segment_into_trips/has_trip_ended,0.004189222965080378
TRIP_SEGMENTATION/segment_into_trips/initialize_last_ts_processed,1.8749999988187938e-06
TRIP_SEGMENTATION/segment_into_trips/mark_valid,0.0003279579999997395
TRIP_SEGMENTATION/segment_into_trips/post_loop,0.008723180666670771
TRIP_SEGMENTATION/segment_into_trips/process_row,0.004088960585417935
TRIP_SEGMENTATION/segment_into_trips/select_last10Points,3.91455750994308e-05
TRIP_SEGMENTATION/segment_into_trips/set_new_trip_start,0.00039244493103457136
TRIP_SEGMENTATION/segment_into_trips/set_new_trip_start_point,8.748888886442627e-07
TRIP_SEGMENTATION/segment_into_trips/set_valid_column,0.00031091600000010544
TRIP_SEGMENTATION/segment_into_trips_dist/append_segmentation,0.0013334490765433572
TRIP_SEGMENTATION/segment_into_trips_dist/find_last_valid_point,0.0001416281774779063
TRIP_SEGMENTATION/segment_into_trips_dist/get_transition_df,1.6218878335
TRIP_SEGMENTATION/segment_into_trips_dist/handle_final_trip_end,0.002880708499998441
TRIP_SEGMENTATION/segment_into_trips_dist/has_trip_ended,0.0006933613848144605
TRIP_SEGMENTATION/segment_into_trips_dist/process_row,0.003653419676263156
TRIP_SEGMENTATION/segment_into_trips_dist/select_last10Points,3.9679244972060673e-05
TRIP_SEGMENTATION/segment_into_trips_dist/set_new_trip_start,0.0003628507185185189
TRIP_SEGMENTATION/segment_into_trips_dist/set_valid_column,0.00034896849999971336
TRIP_SEGMENTATION/segment_into_trips_time/calculate_last10PointsDistances,0.0007601339955688976
TRIP_SEGMENTATION/segment_into_trips_time/calculate_last5MinTimes,0.0014257610955149005
TRIP_SEGMENTATION/segment_into_trips_time/calculate_last5MinsDistances,0.0016315710044803179
TRIP_SEGMENTATION/segment_into_trips_time/calculate_last5MinsPoints,0.0007967894185909164
TRIP_SEGMENTATION/segment_into_trips_time/filter_bogus_points,0.026154527666664745
TRIP_SEGMENTATION/segment_into_trips_time/find_last_valid_point,0.0002195419999999615
TRIP_SEGMENTATION/segment_into_trips_time/get_last_trip_end_point,0.0004832671014494053
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended,0.006613862309123931
TRIP_SEGMENTATION/segment_into_trips_time/process_row,0.0001331676384539252
TRIP_SEGMENTATION/segment_into_trips_time/select_last10Points,4.438657852513701e-05
TRIP_SEGMENTATION/segment_into_trips_time/set_new_trip_start_before,7.739584210796873e-05
TRIP_SEGMENTATION/segment_into_trips_time/set_new_trip_start_else,2.1675398940299194e-06
TRIP_SEGMENTATION/segment_into_trips_time/set_valid_column,0.00032020799999976646
TRIP_SEGMENTATION/setup_filter_methods,2.6244999997704355e-06
USERCACHE,0.3603736498283744
USER_INPUT_MATCH_INCOMING,0.9195849730486005

Top 20%:

data.name,data.reading
MODE_INFERENCE,3.666587927518007
OUTPUT_GEN,24.278678695494197
SECTION_SEGMENTATION,5.911851689464339
TRIP_SEGMENTATION,42.91766042598784
TRIP_SEGMENTATION/create_places_and_trips,9.011653595800025
TRIP_SEGMENTATION/create_places_and_trips/loop_segmentation_points,6.382551351285717
TRIP_SEGMENTATION/get_data_df,14.46887485496
TRIP_SEGMENTATION/process_segmentation_points,3.9579038750000013
TRIP_SEGMENTATION/segment_into_trips,534.1603671744546
TRIP_SEGMENTATION/segment_into_trips/get_filtered_points_df,6.29462725
TRIP_SEGMENTATION/segment_into_trips/get_filtered_points_pre_ts_diff_df,22.4045843335
TRIP_SEGMENTATION/segment_into_trips/get_transition_df,4.042007972000001
TRIP_SEGMENTATION/segment_into_trips/loop,1799.1103744306668
TRIP_SEGMENTATION/segment_into_trips/loop_over_points,15.94234375
TRIP_SEGMENTATION/segment_into_trips_dist/get_filtered_location,2.6661164687500003
TRIP_SEGMENTATION/segment_into_trips_dist/loop_over_points,89.206788323
TRIP_SEGMENTATION/segment_into_trips_time/get_filtered_location,18.28722388525
TRIP_SEGMENTATION/segment_into_trips_time/get_transition_df,4.271482604250001
TRIP_SEGMENTATION/segment_single_filter,10.568774542

@shankari
Copy link
Contributor Author

shankari commented Dec 22, 2024

After merging e-mission/e-mission-server#993 and e-mission/e-mission-server#1005 and #153, I logged in to staging.

@JGreenlee @TeachMeTW

And the initial "Overview" load was still pretty slow. But we were still loading the UUIDs in a batch when I went to the data table.

So presumably the "Overview" load didn't load all the UUIDs? Because if it did, we should just have cached and reused that data, right? So then what is the overview loading?

The functionality seems to be working correctly, and we can collect data next week, but I think that this can benefit from some cleanup.

@shankari
Copy link
Contributor Author

shankari commented Dec 22, 2024

@JGreenlee @TeachMeTW as a suggestion, while cleaning up e-mission/e-mission-server#1005, you can also split the user stats into pipeline-dependent and pipeline-independent. The pipeline-dependent stats can stay as the last stage in the pipeline, and the pipeline-independent stats can be outside the early return. This will allow us to not waste time recomputing the pipeline-dependent stats when the pipeline wasn't run.

The current structure of run_intake_pipeline_for_user may make it hard to test the pipeline-independent code; feel free to refactor/restructure if that makes it cleaner.

@shankari
Copy link
Contributor Author

Deployed to the three production environments as well.
Happy testing and make sure to write down your testing time/results here.

@JGreenlee
Copy link
Contributor

And the initial "Overview" load was still pretty slow. But we were still loading the UUIDs in a batch when I went to the data table.

So presumably the "Overview" load didn't load all the UUIDs? Because if it did, we should just have cached and reused that data, right? So then what is the overview loading?

I believe the Overview page does create the entire list of UUIDs.

On the Data page UUIDs table, we are loading stats for those UUIDs. We already have the full list of UUIDs by the time we get to the Data table, and we split that list into slices of 10 UUIDs and load the stats for one slice at a time.

@JGreenlee
Copy link
Contributor

Accessed all 4 dashboards ~9am MT. Nothing appears significantly different than before.
The Overview is slow. UUIDs table batching works but each batch takes ~25 seconds.

The Trips and Trajectories tabs did not take as long as I remembered. Maybe it's simply because it's winter and there is less travel in the last week.

I didn't notice anything broken on any of the dashboards.

There are some recent trips on stage but no recent trips on stm-community.

@JGreenlee
Copy link
Contributor

UUIDs table batching works but each batch takes ~25 seconds.

This is likely happening faster computationally due to offloading the user stats computation to the pipeline, but we are not observing the effect because of the 20 second interval between batches.
I think we will need to look at stored stats/dashboard_time entries to see how much faster it is.

@JGreenlee
Copy link
Contributor

JGreenlee commented Jan 2, 2025

Loading the overview and trajectories seemed slower today across the board.
I also notice that the "Active Users" box failed to load on ccebikes and smart-commute. I don't believe it has consistently been that way

@TeachMeTW
Copy link
Contributor

Same results for me as well

@TeachMeTW
Copy link
Contributor

Too add to this, today seems better for smart commut but ccebikes still fails on active users

@TeachMeTW
Copy link
Contributor

TeachMeTW commented Feb 3, 2025

Test Stage 2/3/25:

Test Process:

  • Open Overview Page
  • Open Data page
  • Visit UUIDs Tab
  • Visit Trips Tab
  • Visit Demographics Tab
  • Visit Trajectories Tab
  • Open Maps Page
  • Change Map type to Density
  • Change modes to Unlabled

Test Notes:

  • Overview Page loads faster than previously
  • UUIDs tab perform the same with the batch loading but has the new columns
  • Trips and Demographics loaded the same
  • Trajectories slower than other tables
  • Changing date doesnt invoke a loading screen only the updating on the browser tab
  • Map empty graph until unlabled mode is selected; I believe it should be an empty map not an empty graph initially.

@shankari
Copy link
Contributor Author

shankari commented Feb 3, 2025

Why are you changing to the 2022-2025 range? Is this something you did earlier?

@TeachMeTW
Copy link
Contributor

@shankari it is an arbitrary range to test the date picker and to see if the data would updated. Previously, I only changed the month so perhaps I should stay consistent with that. I chose 2022 - 2025 as it could encompass most data.

@JGreenlee
Copy link
Contributor

JGreenlee commented Feb 3, 2025

We should stick to what we were doing before (which I thought was keeping the filters untouched)

If our previous measurements were taken with 1 week of data and our current measurements are taken with 2-3 years of data, we can't make comparisons.

It won't matter for UUIDs and Active Users but it will matter for everything else

@TeachMeTW
Copy link
Contributor

@JGreenlee Understood I will rerun the suite without changing the filters

@TeachMeTW
Copy link
Contributor

@shankari @JGreenlee Updated my testing findings and process.

@shankari
Copy link
Contributor Author

@JGreenlee @TeachMeTW #156 is now on production across the board. You can start testing against all the canonical production environments

shankari added a commit to shankari/e-mission-server that referenced this issue Feb 11, 2025
…tabase

We have made several changes to cache summary information in the user profile.
This summary information can be used to improve the scalability of the admin
dashboard (e-mission/op-admin-dashboard#145) but will
also be used to determine dormant deployment and potentially in a future
OpenPATH-wide dashboard.

These changes started by making a single call to cache both trip and call stats
e-mission#1005

This resulted in all the composite trips being read every hour, so we split the
stats into pipeline-dependent and pipeline-independent stats, in
88bb35a
(part of e-mission#1005)

The idea was that, since the composite object query was slow because the
composite trips were large, we could run only the queries for the server API
stats every hour, and read the composite trips only when they were updated.

However, after the improvements to the trip segmentation pipeline
(e-mission#1014, results in
e-mission/e-mission-docs#1105 (comment))
reading the server stats is now the bottleneck.

Checking the computation time on the big deployments (e.g. ccebikes), although
the time taken has significantly improved as the database load has gone down,
even in the past two days, we see a median of ~ 10 seconds and a max of over
two minutes.

And we don't really need to query this data to generate the call summary
statistics. Instead of computing them on every run, we can compute them only
when _they_ change, which happens when we receive calls to the API.

So, in the API's after_request hook, in addition to adding a new stat, we can
also just update the `last_call_ts` in the profile. This potentially makes
every API call a teeny bit slower since we are adding one more DB write, but it
significantly lowers the DB load, so should make the system as a whole faster.

Testing done:
- the modified `TestUserStat` test for the `last_call_ts` passes
@TeachMeTW
Copy link
Contributor

@shankari

Image

tested on CCEbike, as you said trajectories took a while to load but eventually did/

@TeachMeTW
Copy link
Contributor

@shankari
Test Stage 2/12/25:

Test Process:

  • Open Overview Page
  • Open Data page
  • Visit UUIDs Tab
  • Visit Trips Tab
  • Visit Demographics Tab
  • Visit Trajectories Tab
  • Changed to background/location
  • Open Maps Page
  • Change Map type to Density
  • Change modes to Unlabeled

Test Notes:

  • Instant Overview Page loaded
  • UUIDs loaded normally and as expected on batches
  • Trips loaded instantly
  • Demographics the same
    - Trajectories quite slow; does this need batch loading? This is for both recreated and background location -- especially slow on background location. Like a couple minutes slow.
  • Map Performed Normally

Also tested the new stats and added to op dash:

#164

@shankari
Copy link
Contributor Author

@TeachMeTW are we testing this in production? I don't see any updates here, and the changes have been on production for a couple of weeks.

@TeachMeTW
Copy link
Contributor

TeachMeTW commented Feb 20, 2025

@TeachMeTW are we testing this in production? I don't see any updates here, and the changes have been on production for a couple of weeks.

The 4 production environments? Yes I still am, no new updates really same as #145 (comment) regardless of the environment, the trajectories tab is quite slow. But nothing new to report other than that.

Should I make daily updates or only update when things are different?

@shankari
Copy link
Contributor Author

I think at least a weekly update would be good

TeachMeTW pushed a commit to TeachMeTW/e-mission-server that referenced this issue Feb 26, 2025
…tabase

We have made several changes to cache summary information in the user profile.
This summary information can be used to improve the scalability of the admin
dashboard (e-mission/op-admin-dashboard#145) but will
also be used to determine dormant deployment and potentially in a future
OpenPATH-wide dashboard.

These changes started by making a single call to cache both trip and call stats
e-mission#1005

This resulted in all the composite trips being read every hour, so we split the
stats into pipeline-dependent and pipeline-independent stats, in
88bb35a
(part of e-mission#1005)

The idea was that, since the composite object query was slow because the
composite trips were large, we could run only the queries for the server API
stats every hour, and read the composite trips only when they were updated.

However, after the improvements to the trip segmentation pipeline
(e-mission#1014, results in
e-mission/e-mission-docs#1105 (comment))
reading the server stats is now the bottleneck.

Checking the computation time on the big deployments (e.g. ccebikes), although
the time taken has significantly improved as the database load has gone down,
even in the past two days, we see a median of ~ 10 seconds and a max of over
two minutes.

And we don't really need to query this data to generate the call summary
statistics. Instead of computing them on every run, we can compute them only
when _they_ change, which happens when we receive calls to the API.

So, in the API's after_request hook, in addition to adding a new stat, we can
also just update the `last_call_ts` in the profile. This potentially makes
every API call a teeny bit slower since we are adding one more DB write, but it
significantly lowers the DB load, so should make the system as a whole faster.

Testing done:
- the modified `TestUserStat` test for the `last_call_ts` passes
@TeachMeTW
Copy link
Contributor

Tested this week:

Test Process:

Open Overview Page
Open Data page
Visit UUIDs Tab
Visit Trips Tab
Visit Demographics Tab
Visit Trajectories Tab
Changed to background/location
Open Maps Page
Change Map type to Density
Change modes to Unlabeled

Test Notes:

Instant Overview Page loaded
UUIDs loaded normally and as expected on batches
Trips loaded instantly
Demographics the same
Trajectories normal for stage
Map Performed Normally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants