Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft PR For Distance Confusion Matrices #71

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

rahulkulhalli
Copy link

Questions:

  1. How are we expected to infer the modes of each trip for the reference GT trajectory?
  2. Not all of thee 6 runs (for the case of the LA spec) are read. Instead, only run 0, 1, and 2 are read. Is this expected?
  3. Note: The CM code is non-functional as of now

@shankari
Copy link
Collaborator

@rahulkulhalli can you update the PR after removing outputs, that will let me see the diff.
Feel free to put the outputs in as a separate file in ./examples_with_outputs

@shankari
Copy link
Collaborator

@rahulkulhalli to remove outputs, you can use the "Restart kernel and clear outputs" option in a standard jupyter notebook.

@rahulkulhalli
Copy link
Author

rahulkulhalli commented Jul 22, 2023

@shankari Outputs cleared.

Copy link
Collaborator

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In general, it would be good if you could avoid extraneous changes to make it easier to review. I understand that the id changes may be unavoidable but there are some whitespace changes as well.

@rahulkulhalli
Copy link
Author

@shankari I've made some amends to the code. It generates the CMs for the trajectory data. Please review the code at your availability.

Copy link
Collaborator

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rahulkulhalli High level looks fine. Suggested changes:

  • remove FILE_MAPPING
  • rename all gt_ variables to ref_
  • consider treating all runs the same way (don't special case 0)
  • remove extraneous changes for easier review
  • fix hardcoded replacement of dur with dist`

Note that we had discussed generalizing this notebook instead of making a copy, or replacing values. So you could pass in a flag for the suffix, and have an associated function to compute the metric.

Once you refactor to pass in "dist" or "duration", it wouid be cool to compare them. Are the duration CMs fairly close to the distance CMs in terms of proportion/probability or are they different? It might also convert the CM into probabilities (divide by the sum?) so that they are both directly comparable without relying on the CM colors.

Comment on lines 202 to 206
"FILE_MAPPING = {\n",
" pv_la: \"unimodal_trip_car_bike_mtv_la\",\n",
" pv_sj: \"car_scooter_brex_san_jose\",\n",
" # pv_ucb: \"train_bus_ebike_mtv_ucb\"\n",
"}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that this was temporary code so didn't comment on it earlier. you should be able to retrieve the timeline id directly from the pv by using pv.spec_details.spec_id

"source": [
"from pathlib import Path\n",
"\n",
"def get_reference_trajectory(trip_id: str, section_id: str):\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are going to concatenate all the data for a single trip, then why do you need the sectionid?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that's a valid assessment. Initially, I aimed to design the function so as to be able to return a single data frame. I will make amends accordingly.

Copy link
Collaborator

@shankari shankari Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rahulkulhalli it's fine to leave as is as well - see my next comment below.
Just document why you chose the final design

Comment on lines 322 to 328
" # now, we build a timeline for each trip\n",
" trip = tr.copy()\n",
" trip['ss_timeline'] = tr_ss\n",
" trip['gts_timeline'] = tr_gts\n",
" trip['gt_trajectory'] = pd.concat(gt_traj, axis=0).reset_index(drop=True, inplace=False)\n",
" \n",
" trips.append(trip)\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weren't you going to move this into the if loop?

Comment on lines 724 to 725
" if filtered_gt_distance.shape[0] == 0:\n",
" dist = 0\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this happen a lot? I would not expect that to happen and we might want to assert instead

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It happens on some occasions, yes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rahulkulhalli can we print out the trip/section for which this happens (in addition to setting dist=0)? I don't think this should happen, and I would like to verify against the stored reference trajectories for the specific use cases.

@shankari
Copy link
Collaborator

shankari commented Jul 23, 2023

It might also convert the CM into probabilities (divide by the sum?) so that they are both directly comparable without relying on the CM colors.

@rahulkulhalli There is a standard scikit-learn method to normalize confusion matrices that you can look at to do this 😄

@rahulkulhalli
Copy link
Author

@shankari Updated the notebook and added an entry to .gitignore. Amendments made:

  • Renamed references as recommended
  • Synced fork with main and incorporated ucb CM data
  • Removed debugging print statements

The code runs until the plot_cm(). Please generate the CMs and verify whether they are as per expectations.

TODO:

  • Incorporate changes for plot_f_score()
  • Rename some titles for the confusion matrices
  • Incorporate count normalization

@rahulkulhalli
Copy link
Author

It might also convert the CM into probabilities (divide by the sum?) so that they are both directly comparable without relying on the CM colors.

@rahulkulhalli There is a standard scikit-learn method to normalize confusion matrices that you can look at to do this 😄

What normalization would you like me to incorporate? We could normalize w.r.t. the rows, the columns, or both. I can also add an extra kwarg in the plot function that would accept the desired normalization mode (None by default)

Copy link
Collaborator

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also put the normalized CM (at least column-normalized) into the issue conversation so I don't need to run the notebook?

Very few changes this time, getting there...

@@ -708,7 +683,7 @@
" trips = test_trip if type(test_trip) is list else [test_trip]\n",
" TP, FN, FP, TN = {}, {}, {}, {}\n",
" for trip in trips:\n",
" gt_trajectory = trip['gt_trajectory']\n",
" trajectory_data = trip['trajectory_data']\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to ask for changes again, but can we use the word reference or the prefix ref? that helps us distinguish between sensed trajectories and the reference trajectories.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted!

@rahulkulhalli
Copy link
Author

@shankari, while running a sanity check against the notebook, I found this in the trajectoy data:

ts longitude latitude geometry source fmt_time run
1564279411.000000 -122.110771 37.381396 POINT (-122.1107706521593 37.38139592982434) midpoint 2019-07-27T19:03:31-07:00 0
1564279412.000000 -122.110734 37.381398 POINT (-122.1107336965717 37.38139812812486) midpoint 2019-07-27T19:03:32-07:00 0
1564279413.000000 -122.110698 37.381402 POINT (-122.1106982006907 37.38140218597469) midpoint 2019-07-27T19:03:33-07:00 0
1564279414.000000 -122.110669 37.381407 POINT (-122.1106688924615 37.38140720835417) midpoint 2019-07-27T19:03:34-07:00 0
1564279415.000000 -122.110632 37.381407 POINT (-122.1106316354284 37.38140690330513) midpoint 2019-07-27T19:03:35-07:00 0
1567295351.000000 -122.084306 37.389666 POINT (-122.0843056 37.3896664) android 2019-08-31T16:49:11-07:00 5

There seem to be some readings with source == 'midpoint'. These readings will never be read because we filter only on either 'android' or 'iOS'. How would you like me to handle this case?

@shankari
Copy link
Collaborator

shankari commented Jul 25, 2023

@rahulkulhalli we should not filter the reference trajectory on android or ios. The reference trajectory is like derived ground truth. There is not a separate ground truth for android and for iOS. So there is not a separate reference trajectory for android and iOS. The sensed trajectories are android or iOS specific, the reference trajectories are not.

For the record, 'midpoint' means that the reference trajectory for that section was the midpoint of the android and iOS accuracy control streams.

@rahulkulhalli
Copy link
Author

rahulkulhalli commented Jul 26, 2023

gis_distance_cm_ios_normalized
gis_duration_cm_ios_normalized

iOS distance and duration confusion matrices [Top: Distance, Bottom: Duration]

@rahulkulhalli
Copy link
Author

gis_distance_cm_android_normalized
gis_duration_cm_android_normalized

Android distance and duration confusion matrices [Top: Distance, Bottom: Duration]

Copy link
Collaborator

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More minor fixes - mainly around comments and variable names

@@ -293,20 +278,26 @@
"\n",
" tr_gts.append(gts)\n",
"\n",
" trajectory_data = get_reference_trajectory(pv.spec_details.CURR_SPEC_ID, tr['trip_id_base'])\n",
" trajectory_data = get_reference_trajectory(pv.spec_details.CURR_SPEC_ID, trip['trip_id_base'])\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" trajectory_data = get_reference_trajectory(pv.spec_details.CURR_SPEC_ID, trip['trip_id_base'])\n",
" curr_ref_trajectory = get_reference_trajectory(pv.spec_details.CURR_SPEC_ID, trip['trip_id_base'])\n",

Comment on lines 1144 to 1146
"\n",
" if normalization == 'pred':\n",
" # After transposing, predictions are axis=0\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"\n",
" if normalization == 'pred':\n",
" # After transposing, predictions are axis=0\n",
"\n",
" # Copied from the sklearn implementation of confusion matrix.normalize\n",
" ..... (add URL Here) \n",
" if normalization == 'pred':\n",
" # After transposing, predictions are axis=0\n",

Comment on lines 1174 to 1178
" ax[k].text(j, i, np.round(df.transpose().iat[i,j], 3), horizontalalignment='center', \n",
" color='white' \n",
" if df.transpose().iat[i,j] < color_thresh \n",
" else 'black')\n",
" \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rationale for this change (e.g. adding the 3 parameter?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After normalizing, the calculated values are floating point values and do not round-off whilst displaying the confusion matrix. This makes the display texts overlap each other, rendering the entire confusion matrix unreadable. If normalization is not required, we cast to int (default implementation)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that 3 digit precision would be an ideal trade-off between readability and precision.

Comment on lines 1549 to 1550
"def export_confusion_matrix(matrix: pd.DataFrame, output_dir: Path):\n",
" pass"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't you implemented this yet?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is implemented. I will push the finished code.

@shankari
Copy link
Collaborator

shankari commented Jul 26, 2023

These:
#71 (comment)
#71 (comment)

are very sketchy. While the distance matrix might have a different proportion that the duration matrix, I find it hard to believe that there could be so many more zeros in distance versus duration.

(as an aside, these should have "distance" and "duration" in the title to distinguish between them). I checked for it in the code review and I thought I saw it... ah it is only for plot_selected. you should fix for plot_cm as well.
https://github.com/MobilityNet/mobilitynet-analysis-scripts/pull/71/files#diff-8c69e252ed21d8d644df6c004d8bd58a514637b6604b3b6416d64969c3b58bcfR1235

I would focus on the selected matrix, and the entries that are obviously and blatantly different.
For example, for 'BUS' and 'GIS' and MAHFDC we have pretty substantial difference. And there aren't that many bus trips.
Can we look closely at the BUS ground truthed trips (skip the loop if the gt mode is not BUS) and see where the difference is?

Screenshot 2023-07-26 at 12 07 56 PM Screenshot 2023-07-26 at 12 08 24 PM

I will try to do this too, but unfortunately, I have other priorities to work on as well 😄

@shankari
Copy link
Collaborator

shankari commented Jul 26, 2023

or maybe even better is GIS, iOS, HAMFDC has a zero for distance

Screenshot 2023-07-26 at 12 20 31 PM Screenshot 2023-07-26 at 12 18 28 PM

Or HAHFDC, GIS, iOS
...

By focusing on a single mode, we can expect the distance and duration to be proportional. If not, we may have a bug.

@rahulkulhalli
Copy link
Author

Noted. I will focus on the recommended parameters and see if there is any bug in the code.

@shankari
Copy link
Collaborator

@rahulkulhalli when did you last run the duration code? I wrote a new function to generate side by side distance and duration matrices and I am seeing a lot of zeros in the duration matrix as well. And of course, the notebook has been running only distance lately.

image

@shankari
Copy link
Collaborator

shankari commented Jul 26, 2023

Note also that removing the normalization makes it seem better, at least for duration.
image

@shankari
Copy link
Collaborator

Duration CM: From the paper (L), from the computation in the notebook (R), from the comparison (bottom):
So we have at least not regressed on the duration values

Screenshot 2023-07-26 at 3 14 01 PM Screenshot 2023-07-26 at 3 18 46 PM Screenshot 2023-07-26 at 3 22 02 PM

@rahulkulhalli
Copy link
Author

@rahulkulhalli when did you last run the duration code? I wrote a new function to generate side by side distance and duration matrices and I am seeing a lot of zeros in the duration matrix as well. And of course, the notebook has been running only distance lately.

image

I ran it today itself. The generated plots are from today afternoon.

@shankari
Copy link
Collaborator

(Duration) [Bicycling, NO_SENSED_MIDDLE]: 455 | (Distance) [Bicycling, NO_SENSED_MIDDLE]: 0

I would want to understand this better before declaring victory. It is also super small but , but I am worried that it might be a symptom of something more serious that we are missing.

@rahulkulhalli
Copy link
Author

I agree. I will investigate why the NO_SENSED_MIDDLE values are being missed.

@shankari
Copy link
Collaborator

shankari commented Jul 27, 2023

I tried to hack together a solution for the polygons for at least the start and end of travel legs by forcing the use of the "midpoint" method.

The good news is that we do have a filled in reference trajectory. The bad news is that because we consider all the inside_polygons trajectories together, we end up with a line between the start and end polygons, which messes things up.

    unfiltered_loc_df_a = emd.to_geo_df(e["temporal_control"]["android"]["location_df"])
    unfiltered_loc_df_b = emd.to_geo_df(e["temporal_control"]["ios"]["location_df"])
    ends_loc_df_a = unfiltered_loc_df_a.query("outside_polygons==False")
    ends_loc_df_b = unfiltered_loc_df_a.query("outside_polygons==False")
    new_location_df_a = get_int_aligned_trajectory(ends_loc_df_a, tz)
    new_location_df_i = get_int_aligned_trajectory(ends_loc_df_b, tz)
Screenshot 2023-07-27 at 11 33 39 AM

@rahulkulhalli
Copy link
Author

rahulkulhalli commented Jul 27, 2023

Investigation log no. 1:

Confusion matrix invocation: get_confusion_matrix('ios', 'HAHFDC', gisv_la, criterion='distance')

Output:
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564279157.495898, range_end=1564279266.611826, ref_trajectory.ts.min()=1564274457.0, ref_trajectory.ts.max()=1564280331.0
Duration: 1.8185987989107768 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1564279266.626976, range_end=1564279335.2597582, ref_trajectory.ts.min()=1564274457.0, ref_trajectory.ts.max()=1564280331.0
Duration: 1.1438797036806743 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1564280388.9959145, range_end=1564280393.9957416, ref_trajectory.ts.min()=1564274457.0, ref_trajectory.ts.max()=1564280331.0
Duration: 0.08333045244216919 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564339686.740189, range_end=1564339729.558238, ref_trajectory.ts.min()=1564334472.0, ref_trajectory.ts.max()=1564340862.0
Duration: 0.7136341492335002 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1564339729.5745811, range_end=1564339864.2439723, ref_trajectory.ts.min()=1564334472.0, ref_trajectory.ts.max()=1564340862.0
Duration: 2.244489852587382 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564340965.9951763, range_end=1564340970.9950037, ref_trajectory.ts.min()=1564334472.0, ref_trajectory.ts.max()=1564340862.0
Duration: 0.08333045641581217 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564356822.1187859, range_end=1564356889.523539, ref_trajectory.ts.min()=1564351534.0, ref_trajectory.ts.max()=1564357955.0
Duration: 1.12341255346934 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1564356889.550396, range_end=1564356954.610282, ref_trajectory.ts.min()=1564351534.0, ref_trajectory.ts.max()=1564357955.0
Duration: 1.0843314329783122 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1564358014.9966855, range_end=1564358016.9966118, ref_trajectory.ts.min()=1564351534.0, ref_trajectory.ts.max()=1564357955.0
Duration: 0.03333210547765096 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565575844.882667, range_end=1565575868.047845, ref_trajectory.ts.min()=1565571209.0, ref_trajectory.ts.max()=1565576969.0
Duration: 0.3860862970352173 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1565575868.070318, range_end=1565575959.262051, ref_trajectory.ts.min()=1565571209.0, ref_trajectory.ts.max()=1565576969.0
Duration: 1.5198622186978659 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1565577019.9966128, range_end=1565577022.9965, ref_trajectory.ts.min()=1565571209.0, ref_trajectory.ts.max()=1565576969.0
Duration: 0.049998120466868086 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1567294205.318416, range_end=1567294273.010757, ref_trajectory.ts.min()=1567288839.0, ref_trajectory.ts.max()=1567295355.0
Duration: 1.1282056808471679 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1567294273.024204, range_end=1567294352.9456358, ref_trajectory.ts.min()=1567288839.0, ref_trajectory.ts.max()=1567295355.0
Duration: 1.3320238629976908 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1567295410.9967937, range_end=1567295414.9966612, ref_trajectory.ts.min()=1567288839.0, ref_trajectory.ts.max()=1567295355.0
Duration: 0.06666445732116699 minutes
++++++++++

Majority NO_SENSED_MIDDLEs for gts['mode'] = BICYCLING

@rahulkulhalli
Copy link
Author

rahulkulhalli commented Jul 27, 2023

Investigation log 2:

Confusion matrix invocation: get_confusion_matrix('ios', 'HAHFDC', gisv_sj, criterion='distance')

Output:
Found pred=NO_SENSED_MIDDLE for gts['mode']='CAR'
range_start=1563823262.9935746, range_end=1563823264.9935062, ref_trajectory.ts.min()=1563821657.0, ref_trajectory.ts.max()=1563842610.0
Duration: 0.03333219289779663 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563837063.449956, range_end=1563837459.9278617, ref_trajectory.ts.min()=1563821657.0, ref_trajectory.ts.max()=1563842610.0
Duration: 6.607965095837911 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563842069.9949102, range_end=1563842070.994874, ref_trajectory.ts.min()=1563821657.0, ref_trajectory.ts.max()=1563842610.0
Duration: 0.016666062672932944 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563898620.9937162, range_end=1563898621.993705, ref_trajectory.ts.min()=1563896908.0, ref_trajectory.ts.max()=1563915576.0
Duration: 0.016666479905446372 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563910899.814049, range_end=1563911016.306354, ref_trajectory.ts.min()=1563896908.0, ref_trajectory.ts.max()=1563915576.0
Duration: 1.9415384173393249 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563915062.990666, range_end=1563915063.990633, ref_trajectory.ts.min()=1563896908.0, ref_trajectory.ts.max()=1563915576.0
Duration: 0.016666118303934732 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564245616.9967937, range_end=1564245620.9966562, ref_trajectory.ts.min()=1564244205.0, ref_trajectory.ts.max()=1564261764.0
Duration: 0.06666437387466431 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564257218.267362, range_end=1564257311.928831, ref_trajectory.ts.min()=1564244205.0, ref_trajectory.ts.max()=1564261764.0
Duration: 1.5610244830449422 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564261282.9902134, range_end=1564261283.990182, ref_trajectory.ts.min()=1564244205.0, ref_trajectory.ts.max()=1564261764.0
Duration: 0.016666142145792644 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565027758.9921055, range_end=1565027759.9920692, ref_trajectory.ts.min()=1565026142.0, ref_trajectory.ts.max()=1565047083.0
Duration: 0.016666062672932944 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565042553.9210582, range_end=1565042612.198419, ref_trajectory.ts.min()=1565026142.0, ref_trajectory.ts.max()=1565047083.0
Duration: 0.9712893486022949 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='ESCOOTER'
range_start=1565042612.2229319, range_end=1565042678.6395702, ref_trajectory.ts.min()=1565026142.0, ref_trajectory.ts.max()=1565047083.0
Duration: 1.1069439729054769 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565044004.987143, range_end=1565044006.9870703, ref_trajectory.ts.min()=1565026142.0, ref_trajectory.ts.max()=1565047083.0
Duration: 0.0333321213722229 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565046555.9895692, range_end=1565046557.9895043, ref_trajectory.ts.min()=1565026142.0, ref_trajectory.ts.max()=1565047083.0
Duration: 0.03333225250244141 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565131571.756772, range_end=1565131719.29528, ref_trajectory.ts.min()=1565116212.0, ref_trajectory.ts.max()=1565136530.0
Duration: 2.4589751323064166 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565136013.9938207, range_end=1565136014.9937887, ref_trajectory.ts.min()=1565116212.0, ref_trajectory.ts.max()=1565136530.0
Duration: 0.016666134198506672 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565196433.9950595, range_end=1565196434.9950233, ref_trajectory.ts.min()=1565194838.0, ref_trajectory.ts.max()=1565214609.0
Duration: 0.016666062672932944 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='ESCOOTER'
range_start=1565210594.9998326, range_end=1565210595.9998245, ref_trajectory.ts.min()=1565194838.0, ref_trajectory.ts.max()=1565214609.0
Duration: 0.016666531562805176 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BUS'
range_start=1565214084.9878192, range_end=1565214087.9877076, ref_trajectory.ts.min()=1565194838.0, ref_trajectory.ts.max()=1565214609.0
Duration: 0.049998140335083006 minutes
++++++++++

Majority NO_SENSED_MIDDLEs for gts['mode'] = WALKING

@shankari
Copy link
Collaborator

shankari commented Jul 27, 2023

Seems like the highest duration misses (duration > 3 minutes) are when pred = NO_SENSED_MIDDLE and gt = NO_GT_MIDDLE

Right, so we are excluding all NO_GT_MIDDLE in my patch
#71 (comment)
#71 (comment)

I really just wanted to understand the NO_SENSED_MIDDLE bicycling trip. Were we missing a reference trajectory there?

@rahulkulhalli
Copy link
Author

rahulkulhalli commented Jul 27, 2023

Log for UCB:

Invocation parameters: get_confusion_matrix('ios', 'HAHFDC', gisv_ucb, criterion='distance')

Output:
Found pred=NO_SENSED_MIDDLE for gts['mode']='TRAIN'
range_start=1563982148.860445, range_end=1563982185.4986596, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.6106369098027548 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563984657.9924605, range_end=1563984658.9924257, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.016666086514790852 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563989116.9876733, range_end=1563989125.9873734, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.14999500115712483 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='E_BIKE'
range_start=1564012923.9893892, range_end=1564012924.78658, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.01328651507695516 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564012924.802222, range_end=1564012931.9890728, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.11978084643681844 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564016393.9962385, range_end=1564016394.9962049, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.016666106383005776 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564018043.998538, range_end=1564018044.9985135, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.01666625738143921 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564022442.9959133, range_end=1564022446.9957867, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.06666455666224162 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564070915.985266, range_end=1564070917.9851942, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.03333213726679484 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564071255.992466, range_end=1564071256.9924357, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.016666162014007568 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='SUBWAY'
range_start=1564074817.2283502, range_end=1564074824.926834, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.12830806573232015 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564074824.940014, range_end=1564074841.207154, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.2711190025011698 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564075598.99119, range_end=1564075600.9911242, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.033332236607869464 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564099055.9879663, range_end=1564099056.9879246, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.016665971279144286 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BUS'
range_start=1564100498.961628, range_end=1564100576.9653723, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 1.3000624060630799 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BUS'
range_start=1564102626.9901922, range_end=1564102628.99012, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.03333212931950887 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BUS'
range_start=1564102752.9856157, range_end=1564102763.9852161, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.18332667350769044 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564102891.9970846, range_end=1564102892.9970627, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.016666301091512046 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564104317.9938264, range_end=1564104319.9937544, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.03333213329315186 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564108873.9982843, range_end=1564108874.9982548, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.016666173934936523 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='TRAIN'
range_start=1564156002.9980292, range_end=1564156004.9981625, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.033335554599761966 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564157527.991729, range_end=1564157528.991693, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.01666606664657593 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564162003.9857054, range_end=1564162006.9856012, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.04999826351801554 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564175793.5701408, range_end=1564175848.4848464, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.9152450919151306 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564182927.507062, range_end=1564182935.6827846, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.13626204331715902 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564185598.042931, range_end=1564185607.9924884, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.16582595507303874 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564188702.9903646, range_end=1564188703.9903243, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.016665995121002197 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='LIGHT_RAIL'
range_start=1564190914.990709, range_end=1564190921.251579, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.10434783299763997 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564190921.268532, range_end=1564190924.9903421, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.062030168374379475 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564190929.9901583, range_end=1564190930.9901223, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.01666606664657593 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564195289.9916215, range_end=1564195290.991588, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.016666110356648764 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='TRAIN'
range_start=1568129418.269621, range_end=1568129425.9909544, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.12868889172871908 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568131633.395437, range_end=1568131643.9984407, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.17671672900517782 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568136933.996101, range_end=1568136934.996066, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.016666086514790852 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568158811.9918032, range_end=1568158813.9917216, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.03333197434743245 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568161880.985497, range_end=1568161884.9853492, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.06666420300801595 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568163677.998957, range_end=1568163681.9988372, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.06666467189788819 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568163796.994754, range_end=1568163797.9947186, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.016666074593861897 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568167469.9902563, range_end=1568167470.99023, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.016666229565938315 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568218046.258317, range_end=1568218052.6606212, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.10670506954193115 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568222217.9927611, range_end=1568222218.992733, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.016666197776794435 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568223326.995452, range_end=1568223327.995418, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.016666102409362792 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568245904.9858875, range_end=1568245905.9858465, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.01666598320007324 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568249746.9971282, range_end=1568249747.9970932, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.01666608254114787 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568256084.9858088, range_end=1568256085.9857852, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.01666627327601115 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568736842.000698, range_end=1568736851.9892392, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.16647568543752034 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568763381.986533, range_end=1568763382.986492, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.01666598320007324 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568766337.995494, range_end=1568766338.9954555, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.016666026910146077 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568768048.9961667, range_end=1568768049.9961324, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.016666094462076824 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568768095.9959073, range_end=1568768097.9958477, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.03333233992258708 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568771864.998498, range_end=1568771868.9983878, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.06666483084360758 minutes
++++++++++

Majority NO_SENSED_MIDDLEs for gts['mode'] = WALKING

@rahulkulhalli
Copy link
Author

Edited the logs to exclude NO_GT_MIDDLE

@shankari
Copy link
Collaborator

I really just wanted to understand the NO_SENSED_MIDDLE bicycling trip. Were we missing a reference trajectory there?

so which timeline did this case happen in and were we in fact missing a reference trajectory there?

@rahulkulhalli
Copy link
Author

rahulkulhalli commented Jul 27, 2023

This happened for the LA timeline. Indeed, we do not have any reference trajectory datapoints within the time range.

@rahulkulhalli
Copy link
Author

Parameters: compare_cm(oses=['ios'], roles=['HAHFDC'], criteria=['duration', 'distance'])
Ranges: gisv_la, gisv_sj, gisv_ucb

Column normalization:
col_normalization

Row normalization:
row_norm

@shankari
Copy link
Collaborator

The column normalizations are what is more important, since we use them in the scripts for the downstream metrics. And fortunately, they seem pretty close! So this should not affect the results for the "count every trip" papers significantly, which means that we will likely not have to rewrite them.

@shankari
Copy link
Collaborator

Tried to filter separately by start and end and it kind of worked but with some kinks

    start_loc_df_a = emd.filter_geo_df(
        emd.to_geo_df(e["temporal_control"]["android"]["location_df"]).copy(),
        section_gt_shapes.filter(["start_loc"]))
    start_loc_df_b = emd.filter_geo_df(
        emd.to_geo_df(e["temporal_control"]["ios"]["location_df"]).copy(),
        section_gt_shapes.filter(["start_loc"]))

    end_loc_df_a = emd.filter_geo_df(
        emd.to_geo_df(e["temporal_control"]["android"]["location_df"]).copy(),
        section_gt_shapes.filter(["end_loc"]))
    end_loc_df_b = emd.filter_geo_df(
        emd.to_geo_df(e["temporal_control"]["ios"]["location_df"]).copy(),
        section_gt_shapes.filter(["end_loc"]))
Screenshot 2023-07-27 at 1 13 19 PM

Then I realized that this filters everything except start!!

@shankari
Copy link
Collaborator

@rahulkulhalli the column normalizations seem broken - e.g. the bicycling column does not add up to 1.
Can you please re-verify?

@shankari
Copy link
Collaborator

shankari commented Jul 27, 2023

Filtering properly, everything seems to work

Screenshot 2023-07-27 at 1 25 02 PM Screenshot 2023-07-27 at 1 25 47 PM Screenshot 2023-07-27 at 1 26 30 PM
But I have an error in the ct computation
Traceback (most recent call last):
  File "/Users/kshankar/e-mission/mobilitynet-analysis-scripts/emeval/metrics/reference_trajectory.py", line 503, in final_ref_ensemble
    ct_ref_df = ref_ct_general(e, b_merge_midpoint, dist_threshold, tz, include_ends)
  File "/Users/kshankar/e-mission/mobilitynet-analysis-scripts/emeval/metrics/reference_trajectory.py", line 353, in ref_ct_general
    initial_ends_gpdf = ref_ends(e, tz)
  File "/Users/kshankar/e-mission/mobilitynet-analysis-scripts/emeval/metrics/reference_trajectory.py", line 310, in ref_ends
    start_filtered_merged_df = start_merged_df.query("t_distance < @dist_threshold")
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/frame.py", line 3231, in query
    res = self.eval(expr, **kwargs)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/frame.py", line 3346, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/computation/eval.py", line 337, in eval
    ret = eng_inst.evaluate()
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/computation/engines.py", line 127, in evaluate
    return self.expr()
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/computation/expr.py", line 771, in __call__
    return self.terms(self.env)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/computation/ops.py", line 396, in __call__
    return self.func(left, right)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/ops/common.py", line 64, in new_method
    return method(self, other)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/ops/__init__.py", line 529, in wrapper
    res_values = comparison_op(lvalues, rvalues, op)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 256, in comparison_op
    res_values = invalid_comparison(lvalues, rvalues, op)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/ops/invalid.py", line 34, in invalid_comparison
    raise TypeError(f"Invalid comparison between dtype={left.dtype} and {typ}")
TypeError: Invalid comparison between dtype=float64 and str

With include_ends=True, we have ct computed correctly, although we don't choose it.
So let's fix that exception so we don't introduce any regressions.

GEO_DF: before filtering, len(e['temporal_control']['android']['location_df'])=283 and len(e['temporal_control']['ios']['location_df'])=150
GEO_DF: after filtering, len(filtered_utm_loc_df_a)=263 and len(filtered_utm_loc_df_b)=136
After filtering, 254 of 254 (1.0) for android and 249 of 249 (1.0) for ios
After merging, found 175 / 254 of android 254 (0.6889763779527559), ios 249 (0.7028112449799196)
Validated tf, stats are {'coverage_density': 0.6490024909685729, 'coverage_time': 0.9382721726574225, 'coverage_max_gap': 0.1186747412056819}
MATCH TRAJECTORY: len(filtered_loc_df_a)=263, len(filtered_loc_df_b)=136
After filtering, retained 158 of 254 (0.6220472440944882)
Validated ct, stats are {'coverage_density': 0.5859565347030544, 'coverage_time': 0.8937691447052918, 'coverage_max_gap': 0.2818525103634945}
for tf = 0.1186747412056819 v/s ct = 0.2818525103634945, density 0.6490024909685729 v/s 0.5859565347030544, returning tf len = 175 not cf len = 158

Fix that and then regenerate everything.

@shankari
Copy link
Collaborator

Bingo! Top (tf), bottom (cf)

Filtering at ends, for threshold 25, before merging, android: len(start_loc_df_a)=4, len(end_loc_df_a)=16, ios: len(start_loc_df_b)=4, len(end_loc_df_a)=16

Filtering at ends, for threshold UTC, before merging, android: len(start_loc_df_a)=4, len(end_loc_df_a)=16, ios: len(start_loc_df_b)=4, len(end_loc_df_a)=16

@rahulkulhalli
Copy link
Author

@shankari Normalization is fixed. Adding plots here:

Column normalization:
col_normalization

Row normalization:
row_norm

@shankari
Copy link
Collaborator

Fixed. Next tried on one of the walk_to_bus segments, which failed because there were no temporal control points in the end polygon. Added zero checks to overcome the error. Now, if there are no temporal control points in the polygon, the reference trajectory is blank.

HOWEVER there are temporal points in the reference trajectory

Screenshot 2023-07-27 at 4 09 19 PM
import shapely as shp
end_loc_p = shp.prepared.prep(breaking_e["ground_truth"]["gt_shapes"]["end_loc"])
geo_df = emd.to_geo_df(breaking_e["temporal_control"]["ios"]["location_df"].tail())
geo_df.apply(lambda p: end_loc_p.contains(p.geometry), axis=1)

Let's see why this is broken. We can revert the zero check in that case.

@shankari
Copy link
Collaborator

Ah it's because we were printing the wrong output (len(end_loc_df_a) twice). So now we have android with no entries but iOS with entries.

Filtering at ends, for threshold 25, before merging, android: len(start_unfiltered_loc_a_df)=177, len(end_unfiltered_loc_a_df)=177 -> len(start_loc_df_a)=101, len(end_loc_df_a)=0, ios: len(start_unfiltered_loc_b_df)=129, len(end_unfiltered_loc_b_df)=129 -> len(start_loc_df_b)=93, len(end_loc_df_b)=13

I guess we can also have the other side.

So we keep the zero checks, but if one of them is non-zero, we return a match to the ground truth using the code from ref_gt_general. Gah this is so finicky and bad...

@shankari
Copy link
Collaborator

Ok, so I actually implemented this

    def _match_single_to_gt(filtered_loc_df, dist_threshold):
        new_location_df = get_int_aligned_trajectory(filtered_loc_df, tz)

        new_location_df_u = emd.to_utm_df(new_location_df)

        add_gt_error_projection(new_location_df_u, utm_gt_linestring)

        new_location_df["gt_distance"] = new_location_df_u.gt_distance
        new_location_df["gt_projection"] = new_location_df_u.gt_projection

        filtered_location_df = new_location_df.query("gt_distance < @dist_threshold")
        filtered_location_df['source'] = ['match_gt'] * len(filtered_location_df)
        # filtered_location_df.drop(columns=["gt_distance", "])
        return gpd.GeoDataFrame(filtered_location_df)

But the problem is that, once we sort by timestamp, the iOS entries in the polygon are interspersed with the android locations outside the polygon

breaking_e = ref_tree['train_bus_ebike_mtv_ucb']['mtv_to_berkeley_sf_bart/walk_to_bus_1']

%%capture --no-stdout
ref_trajectory = emr.final_ref_ensemble(breaking_e, 25, tz="America/Los_angeles", include_ends=True)

ref_trajectory[1].tail(n=20)

curr_map = folium.Map()
gt_leg_gj = pv_ucb.spec_details.get_geojson_for_leg(breaking_e["ground_truth"]["leg"])
sensed_section_gj = ezgj.get_geojson_for_loc_df(breaking_e["temporal_control"]["ios"]["location_df"].tail(n=15))
gt_leg_gj_feature = folium.GeoJson(gt_leg_gj, name="ground_truth")
sensed_leg_gj_feature = folium.GeoJson(sensed_section_gj, name="sensed_values")
ref_trajectory_gj = ezgj.get_geojson_for_loc_df(ref_trajectory[1], color="blue")
ref_leg_gj_feature = folium.GeoJson(ref_trajectory_gj, name="reference_values")
curr_map.add_child(gt_leg_gj_feature)
curr_map.add_child(sensed_leg_gj_feature)
curr_map.add_child(ref_leg_gj_feature)
curr_map.fit_bounds(sensed_leg_gj_feature.get_bounds())
folium.LayerControl().add_to(curr_map)
curr_map
ts longitude latitude geometry source fmt_time gt_distance gt_projection
1.564075e+09 -122.267728 37.871076 POINT (-122.26773 37.87108) match_gt 2019-07-25T10:17:07-07:00 12.117121 69.98162
1.564075e+09 -122.267724 37.870879 POINT (-122.26772 37.87088) android 2019-07-25T10:17:07-07:00 NaN NaN
1.564075e+09 -122.267730 37.871085 POINT (-122.26773 37.87108) match_gt 2019-07-25T10:17:08-07:00 13.128124 69.98162
1.564075e+09 -122.267725 37.870881 POINT (-122.26772 37.87088) android 2019-07-25T10:17:08-07:00 NaN NaN
1.564075e+09 -122.267725 37.870882 POINT (-122.26773 37.87088) android 2019-07-25T10:17:09-07:00 NaN NaN
1.564075e+09 -122.267729 37.871089 POINT (-122.26773 37.87109) match_gt 2019-07-25T10:17:09-07:00 13.603426 69.98162
1.564075e+09 -122.267728 37.871094 POINT (-122.26773 37.87109) match_gt 2019-07-25T10:17:10-07:00 14.079957 69.98162
1.564075e+09 -122.267725 37.870883 POINT (-122.26773 37.87088) android 2019-07-25T10:17:10-07:00 NaN NaN
1.564075e+09 -122.267725 37.870885 POINT (-122.26773 37.87089) android 2019-07-25T10:17:11-07:00 NaN NaN
1.564075e+09 -122.267727 37.871098 POINT (-122.26773 37.87110) match_gt 2019-07-25T10:17:11-07:00 14.554887 69.98162
1.564075e+09 -122.267727 37.871096 POINT (-122.26773 37.87110) match_gt 2019-07-25T10:17:12-07:00 14.349172 69.98162
1.564075e+09 -122.267725 37.870888 POINT (-122.26773 37.87089) android 2019-07-25T10:17:12-07:00 NaN NaN
1.564075e+09 -122.267726 37.870892 POINT (-122.26773 37.87089) android 2019-07-25T10:17:13-07:00 NaN NaN
1.564075e+09 -122.267728 37.871094 POINT (-122.26773 37.87109) match_gt 2019-07-25T10:17:13-07:00 14.143616 69.98162
1.564075e+09 -122.267728 37.871092 POINT (-122.26773 37.87109) match_gt 2019-07-25T10:17:14-07:00 13.938226 69.98162
1.564075e+09 -122.267726 37.870897 POINT (-122.26773 37.87090) android 2019-07-25T10:17:14-07:00 NaN NaN
1.564075e+09 -122.267728 37.871090 POINT (-122.26773 37.87109) match_gt 2019-07-25T10:17:15-07:00 13.733009 69.98162
1.564075e+09 -122.267722 37.870901 POINT (-122.26772 37.87090) android 2019-07-25T10:17:15-07:00 NaN NaN
1.564075e+09 -122.267714 37.870903 POINT (-122.26771 37.87090) android 2019-07-25T10:17:16-07:00 NaN NaN
1.564075e+09 -122.267709 37.870905 POINT (-122.26771 37.87090) android 2019-07-25T10:17:17-07:00 NaN NaN

So we get something that looks like this

Screenshot 2023-07-27 at 5 54 23 PM

we can probably do something fancy to fix this - like merging these entries and seeing which is closer to the ground truth but I'm not going to attempt it now. We are either outside the polygon or inside the polygon, and if two of our phones tell us inconsistent results - it is not clear which one to believe. We could also do something like delete the overlapping time windows for the end polygon from the trajectory, but I don't want to generalize from n=1

So if either of the trajectories is not present, we just return 0.

@shankari
Copy link
Collaborator

@rahulkulhalli I have added computations of the reference trajectories in the start and end polygons as well. These expanded trajectories still have the limitation that, as

The main pending limitation is that we skip reference trajectory creation if either of the accuracy control phones has no points inside the start or end polygons, which is reasonable to state and move on.

The reference trajectories without the start and end polygons are in no_ends; the reference trajectories with the start and end polygons are in with_ends. Please change your script to read with_ends and recreate the confusion matrices. These are the final CMs that we will use in the "count every trip" papers.

@rahulkulhalli
Copy link
Author

@shankari The stable code (without the inclusion of the start and end polygons) is pushed.

@rahulkulhalli
Copy link
Author

CMs with ends included.

iOS, HAHFDC, GIS, [gisv_la, gisv_ucb, gisv_sj]:
with_ends_ios_hahfdc

android, HAMFDC, GIS, [gisv_la, gisv_ucb, gisv_sj]:
with_ends_android_hamfdc

(Both are column-normalized)

@rahulkulhalli
Copy link
Author

rahulkulhalli commented Jul 28, 2023

I have added a flag with_ends: bool to the reference trajectory retrieval function so that we may switch between modes whenever needed.

@shankari
Copy link
Collaborator

Have you addressed all of my comments? If so, you should move this off draft.
Please make a pass and verify (including the part about from X import Y versus import X as x)

@rahulkulhalli rahulkulhalli marked this pull request as ready for review August 24, 2023 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants