Draft PR For Distance Confusion Matrices #71

rahulkulhalli · 2023-07-22T17:02:05Z

Questions:

How are we expected to infer the modes of each trip for the reference GT trajectory?
Not all of thee 6 runs (for the case of the LA spec) are read. Instead, only run 0, 1, and 2 are read. Is this expected?
Note: The CM code is non-functional as of now

shankari · 2023-07-22T17:19:13Z

@rahulkulhalli can you update the PR after removing outputs, that will let me see the diff.
Feel free to put the outputs in as a separate file in ./examples_with_outputs

shankari · 2023-07-22T17:37:41Z

@rahulkulhalli to remove outputs, you can use the "Restart kernel and clear outputs" option in a standard jupyter notebook.

rahulkulhalli · 2023-07-22T17:39:28Z

@shankari Outputs cleared.

shankari

Nit: In general, it would be good if you could avoid extraneous changes to make it easier to review. I understand that the id changes may be unavoidable but there are some whitespace changes as well.

classification_analysis.ipynb

rahulkulhalli · 2023-07-22T21:01:57Z

@shankari I've made some amends to the code. It generates the CMs for the trajectory data. Please review the code at your availability.

shankari

@rahulkulhalli High level looks fine. Suggested changes:

remove FILE_MAPPING
rename all gt_ variables to ref_
consider treating all runs the same way (don't special case 0)
remove extraneous changes for easier review
fix hardcoded replacement of dur with dist`

Note that we had discussed generalizing this notebook instead of making a copy, or replacing values. So you could pass in a flag for the suffix, and have an associated function to compute the metric.

Once you refactor to pass in "dist" or "duration", it wouid be cool to compare them. Are the duration CMs fairly close to the distance CMs in terms of proportion/probability or are they different? It might also convert the CM into probabilities (divide by the sum?) so that they are both directly comparable without relying on the CM colors.

shankari · 2023-07-22T23:53:23Z

classification_analysis.ipynb

+    "FILE_MAPPING = {\n",
+    "    pv_la: \"unimodal_trip_car_bike_mtv_la\",\n",
+    "    pv_sj: \"car_scooter_brex_san_jose\",\n",
+    "    # pv_ucb: \"train_bus_ebike_mtv_ucb\"\n",
+    "}"


I thought that this was temporary code so didn't comment on it earlier. you should be able to retrieve the timeline id directly from the pv by using pv.spec_details.spec_id

shankari · 2023-07-22T23:55:26Z

classification_analysis.ipynb

+   "source": [
+    "from pathlib import Path\n",
+    "\n",
+    "def get_reference_trajectory(trip_id: str, section_id: str):\n",


if you are going to concatenate all the data for a single trip, then why do you need the sectionid?

Thank you, that's a valid assessment. Initially, I aimed to design the function so as to be able to return a single data frame. I will make amends accordingly.

@rahulkulhalli it's fine to leave as is as well - see my next comment below.
Just document why you chose the final design

classification_analysis.ipynb

shankari · 2023-07-23T00:04:07Z

classification_analysis.ipynb

    "                # now, we build a timeline for each trip\n",
    "                trip = tr.copy()\n",
    "                trip['ss_timeline']  = tr_ss\n",
    "                trip['gts_timeline'] = tr_gts\n",
+    "                trip['gt_trajectory'] = pd.concat(gt_traj, axis=0).reset_index(drop=True, inplace=False)\n",
+    "                \n",
    "                trips.append(trip)\n",


weren't you going to move this into the if loop?

classification_analysis.ipynb

shankari · 2023-07-23T00:09:16Z

classification_analysis.ipynb

+    "                        if filtered_gt_distance.shape[0] == 0:\n",
+    "                            dist = 0\n",


Does this happen a lot? I would not expect that to happen and we might want to assert instead

It happens on some occasions, yes.

@rahulkulhalli can we print out the trip/section for which this happens (in addition to setting dist=0)? I don't think this should happen, and I would like to verify against the stored reference trajectories for the specific use cases.

shankari · 2023-07-23T16:52:04Z

It might also convert the CM into probabilities (divide by the sum?) so that they are both directly comparable without relying on the CM colors.

@rahulkulhalli There is a standard scikit-learn method to normalize confusion matrices that you can look at to do this 😄

rahulkulhalli · 2023-07-24T17:28:40Z

@shankari Updated the notebook and added an entry to .gitignore. Amendments made:

Renamed references as recommended
Synced fork with main and incorporated ucb CM data
Removed debugging print statements

The code runs until the plot_cm(). Please generate the CMs and verify whether they are as per expectations.

TODO:

Incorporate changes for plot_f_score()
Rename some titles for the confusion matrices
Incorporate count normalization

rahulkulhalli · 2023-07-24T17:45:22Z

It might also convert the CM into probabilities (divide by the sum?) so that they are both directly comparable without relying on the CM colors.

@rahulkulhalli There is a standard scikit-learn method to normalize confusion matrices that you can look at to do this 😄

What normalization would you like me to incorporate? We could normalize w.r.t. the rows, the columns, or both. I can also add an extra kwarg in the plot function that would accept the desired normalization mode (None by default)

shankari

Can you also put the normalized CM (at least column-normalized) into the issue conversation so I don't need to run the notebook?

Very few changes this time, getting there...

shankari · 2023-07-24T18:25:22Z

classification_analysis.ipynb

@@ -708,7 +683,7 @@
    "        trips = test_trip if type(test_trip) is list else [test_trip]\n",
    "    TP, FN, FP, TN = {}, {}, {}, {}\n",
    "    for trip in trips:\n",
-    "        gt_trajectory = trip['gt_trajectory']\n",
+    "        trajectory_data = trip['trajectory_data']\n",


I'm going to ask for changes again, but can we use the word reference or the prefix ref? that helps us distinguish between sensed trajectories and the reference trajectories.

classification_analysis.ipynb

rahulkulhalli · 2023-07-25T14:38:47Z

@shankari, while running a sanity check against the notebook, I found this in the trajectoy data:

ts	longitude	latitude	geometry	source	fmt_time	run
1564279411.000000	-122.110771	37.381396	POINT (-122.1107706521593 37.38139592982434)	midpoint	2019-07-27T19:03:31-07:00	0
1564279412.000000	-122.110734	37.381398	POINT (-122.1107336965717 37.38139812812486)	midpoint	2019-07-27T19:03:32-07:00	0
1564279413.000000	-122.110698	37.381402	POINT (-122.1106982006907 37.38140218597469)	midpoint	2019-07-27T19:03:33-07:00	0
1564279414.000000	-122.110669	37.381407	POINT (-122.1106688924615 37.38140720835417)	midpoint	2019-07-27T19:03:34-07:00	0
1564279415.000000	-122.110632	37.381407	POINT (-122.1106316354284 37.38140690330513)	midpoint	2019-07-27T19:03:35-07:00	0
1567295351.000000	-122.084306	37.389666	POINT (-122.0843056 37.3896664)	android	2019-08-31T16:49:11-07:00	5

There seem to be some readings with source == 'midpoint'. These readings will never be read because we filter only on either 'android' or 'iOS'. How would you like me to handle this case?

shankari · 2023-07-25T14:45:59Z

@rahulkulhalli we should not filter the reference trajectory on android or ios. The reference trajectory is like derived ground truth. There is not a separate ground truth for android and for iOS. So there is not a separate reference trajectory for android and iOS. The sensed trajectories are android or iOS specific, the reference trajectories are not.

For the record, 'midpoint' means that the reference trajectory for that section was the midpoint of the android and iOS accuracy control streams.

rahulkulhalli · 2023-07-26T18:07:55Z

iOS distance and duration confusion matrices [Top: Distance, Bottom: Duration]

rahulkulhalli · 2023-07-26T18:10:16Z

Android distance and duration confusion matrices [Top: Distance, Bottom: Duration]

shankari

More minor fixes - mainly around comments and variable names

classification_analysis.ipynb

shankari · 2023-07-26T18:12:07Z

classification_analysis.ipynb

@@ -293,20 +278,26 @@
    "\n",
    "                        tr_gts.append(gts)\n",
    "\n",
-    "                    trajectory_data = get_reference_trajectory(pv.spec_details.CURR_SPEC_ID, tr['trip_id_base'])\n",
+    "                    trajectory_data = get_reference_trajectory(pv.spec_details.CURR_SPEC_ID, trip['trip_id_base'])\n",


Suggested change

" trajectory_data = get_reference_trajectory(pv.spec_details.CURR_SPEC_ID, trip['trip_id_base'])\n",

" curr_ref_trajectory = get_reference_trajectory(pv.spec_details.CURR_SPEC_ID, trip['trip_id_base'])\n",

shankari · 2023-07-26T18:16:56Z

classification_analysis.ipynb

+    "\n",
+    "        if normalization == 'pred':\n",
+    "            # After transposing, predictions are axis=0\n",


Suggested change

"\n",

" if normalization == 'pred':\n",

" # After transposing, predictions are axis=0\n",

"\n",

" # Copied from the sklearn implementation of confusion matrix.normalize\n",

" ..... (add URL Here) \n",

" if normalization == 'pred':\n",

" # After transposing, predictions are axis=0\n",

shankari · 2023-07-26T18:18:18Z

classification_analysis.ipynb

+    "                ax[k].text(j, i, np.round(df.transpose().iat[i,j], 3), horizontalalignment='center', \n",
+    "                    color='white' \n",
+    "                        if df.transpose().iat[i,j] < color_thresh \n",
+    "                        else 'black')\n",
+    "                        \n",


what is the rationale for this change (e.g. adding the 3 parameter?)

After normalizing, the calculated values are floating point values and do not round-off whilst displaying the confusion matrix. This makes the display texts overlap each other, rendering the entire confusion matrix unreadable. If normalization is not required, we cast to int (default implementation)

I assumed that 3 digit precision would be an ideal trade-off between readability and precision.

shankari · 2023-07-26T18:19:12Z

classification_analysis.ipynb

+    "def export_confusion_matrix(matrix: pd.DataFrame, output_dir: Path):\n",
+    "    pass"


haven't you implemented this yet?

Yes, it is implemented. I will push the finished code.

classification_analysis.ipynb

shankari · 2023-07-26T19:15:37Z

These:
#71 (comment)
#71 (comment)

are very sketchy. While the distance matrix might have a different proportion that the duration matrix, I find it hard to believe that there could be so many more zeros in distance versus duration.

(as an aside, these should have "distance" and "duration" in the title to distinguish between them). I checked for it in the code review and I thought I saw it... ah it is only for plot_selected. you should fix for plot_cm as well.
https://github.com/MobilityNet/mobilitynet-analysis-scripts/pull/71/files#diff-8c69e252ed21d8d644df6c004d8bd58a514637b6604b3b6416d64969c3b58bcfR1235

I would focus on the selected matrix, and the entries that are obviously and blatantly different.
For example, for 'BUS' and 'GIS' and MAHFDC we have pretty substantial difference. And there aren't that many bus trips.
Can we look closely at the BUS ground truthed trips (skip the loop if the gt mode is not BUS) and see where the difference is?

I will try to do this too, but unfortunately, I have other priorities to work on as well 😄

shankari · 2023-07-26T19:20:01Z

or maybe even better is GIS, iOS, HAMFDC has a zero for distance

Or HAHFDC, GIS, iOS
...

By focusing on a single mode, we can expect the distance and duration to be proportional. If not, we may have a bug.

rahulkulhalli · 2023-07-26T19:47:24Z

Noted. I will focus on the recommended parameters and see if there is any bug in the code.

shankari · 2023-07-26T22:07:51Z

@rahulkulhalli when did you last run the duration code? I wrote a new function to generate side by side distance and duration matrices and I am seeing a lot of zeros in the duration matrix as well. And of course, the notebook has been running only distance lately.

shankari · 2023-07-26T22:09:31Z

Note also that removing the normalization makes it seem better, at least for duration.

shankari · 2023-07-26T22:23:37Z

Duration CM: From the paper (L), from the computation in the notebook (R), from the comparison (bottom):
So we have at least not regressed on the duration values

rahulkulhalli · 2023-07-26T22:24:27Z

@rahulkulhalli when did you last run the duration code? I wrote a new function to generate side by side distance and duration matrices and I am seeing a lot of zeros in the duration matrix as well. And of course, the notebook has been running only distance lately.

I ran it today itself. The generated plots are from today afternoon.

shankari · 2023-07-27T17:22:27Z

(Duration) [Bicycling, NO_SENSED_MIDDLE]: 455 | (Distance) [Bicycling, NO_SENSED_MIDDLE]: 0

I would want to understand this better before declaring victory. It is also super small but , but I am worried that it might be a symptom of something more serious that we are missing.

rahulkulhalli · 2023-07-27T17:24:15Z

I agree. I will investigate why the NO_SENSED_MIDDLE values are being missed.

shankari · 2023-07-27T18:36:52Z

I tried to hack together a solution for the polygons for at least the start and end of travel legs by forcing the use of the "midpoint" method.

The good news is that we do have a filled in reference trajectory. The bad news is that because we consider all the inside_polygons trajectories together, we end up with a line between the start and end polygons, which messes things up.

    unfiltered_loc_df_a = emd.to_geo_df(e["temporal_control"]["android"]["location_df"])
    unfiltered_loc_df_b = emd.to_geo_df(e["temporal_control"]["ios"]["location_df"])
    ends_loc_df_a = unfiltered_loc_df_a.query("outside_polygons==False")
    ends_loc_df_b = unfiltered_loc_df_a.query("outside_polygons==False")
    new_location_df_a = get_int_aligned_trajectory(ends_loc_df_a, tz)
    new_location_df_i = get_int_aligned_trajectory(ends_loc_df_b, tz)

rahulkulhalli · 2023-07-27T18:38:34Z

Investigation log no. 1:

Confusion matrix invocation: get_confusion_matrix('ios', 'HAHFDC', gisv_la, criterion='distance')

Output:

Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564279157.495898, range_end=1564279266.611826, ref_trajectory.ts.min()=1564274457.0, ref_trajectory.ts.max()=1564280331.0
Duration: 1.8185987989107768 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1564279266.626976, range_end=1564279335.2597582, ref_trajectory.ts.min()=1564274457.0, ref_trajectory.ts.max()=1564280331.0
Duration: 1.1438797036806743 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1564280388.9959145, range_end=1564280393.9957416, ref_trajectory.ts.min()=1564274457.0, ref_trajectory.ts.max()=1564280331.0
Duration: 0.08333045244216919 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564339686.740189, range_end=1564339729.558238, ref_trajectory.ts.min()=1564334472.0, ref_trajectory.ts.max()=1564340862.0
Duration: 0.7136341492335002 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1564339729.5745811, range_end=1564339864.2439723, ref_trajectory.ts.min()=1564334472.0, ref_trajectory.ts.max()=1564340862.0
Duration: 2.244489852587382 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564340965.9951763, range_end=1564340970.9950037, ref_trajectory.ts.min()=1564334472.0, ref_trajectory.ts.max()=1564340862.0
Duration: 0.08333045641581217 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564356822.1187859, range_end=1564356889.523539, ref_trajectory.ts.min()=1564351534.0, ref_trajectory.ts.max()=1564357955.0
Duration: 1.12341255346934 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1564356889.550396, range_end=1564356954.610282, ref_trajectory.ts.min()=1564351534.0, ref_trajectory.ts.max()=1564357955.0
Duration: 1.0843314329783122 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1564358014.9966855, range_end=1564358016.9966118, ref_trajectory.ts.min()=1564351534.0, ref_trajectory.ts.max()=1564357955.0
Duration: 0.03333210547765096 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565575844.882667, range_end=1565575868.047845, ref_trajectory.ts.min()=1565571209.0, ref_trajectory.ts.max()=1565576969.0
Duration: 0.3860862970352173 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1565575868.070318, range_end=1565575959.262051, ref_trajectory.ts.min()=1565571209.0, ref_trajectory.ts.max()=1565576969.0
Duration: 1.5198622186978659 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1565577019.9966128, range_end=1565577022.9965, ref_trajectory.ts.min()=1565571209.0, ref_trajectory.ts.max()=1565576969.0
Duration: 0.049998120466868086 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1567294205.318416, range_end=1567294273.010757, ref_trajectory.ts.min()=1567288839.0, ref_trajectory.ts.max()=1567295355.0
Duration: 1.1282056808471679 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1567294273.024204, range_end=1567294352.9456358, ref_trajectory.ts.min()=1567288839.0, ref_trajectory.ts.max()=1567295355.0
Duration: 1.3320238629976908 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BICYCLING'
range_start=1567295410.9967937, range_end=1567295414.9966612, ref_trajectory.ts.min()=1567288839.0, ref_trajectory.ts.max()=1567295355.0
Duration: 0.06666445732116699 minutes
++++++++++

Majority NO_SENSED_MIDDLEs for gts['mode'] = BICYCLING

rahulkulhalli · 2023-07-27T18:46:01Z

Investigation log 2:

Confusion matrix invocation: get_confusion_matrix('ios', 'HAHFDC', gisv_sj, criterion='distance')

Output:

Found pred=NO_SENSED_MIDDLE for gts['mode']='CAR'
range_start=1563823262.9935746, range_end=1563823264.9935062, ref_trajectory.ts.min()=1563821657.0, ref_trajectory.ts.max()=1563842610.0
Duration: 0.03333219289779663 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563837063.449956, range_end=1563837459.9278617, ref_trajectory.ts.min()=1563821657.0, ref_trajectory.ts.max()=1563842610.0
Duration: 6.607965095837911 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563842069.9949102, range_end=1563842070.994874, ref_trajectory.ts.min()=1563821657.0, ref_trajectory.ts.max()=1563842610.0
Duration: 0.016666062672932944 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563898620.9937162, range_end=1563898621.993705, ref_trajectory.ts.min()=1563896908.0, ref_trajectory.ts.max()=1563915576.0
Duration: 0.016666479905446372 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563910899.814049, range_end=1563911016.306354, ref_trajectory.ts.min()=1563896908.0, ref_trajectory.ts.max()=1563915576.0
Duration: 1.9415384173393249 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563915062.990666, range_end=1563915063.990633, ref_trajectory.ts.min()=1563896908.0, ref_trajectory.ts.max()=1563915576.0
Duration: 0.016666118303934732 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564245616.9967937, range_end=1564245620.9966562, ref_trajectory.ts.min()=1564244205.0, ref_trajectory.ts.max()=1564261764.0
Duration: 0.06666437387466431 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564257218.267362, range_end=1564257311.928831, ref_trajectory.ts.min()=1564244205.0, ref_trajectory.ts.max()=1564261764.0
Duration: 1.5610244830449422 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564261282.9902134, range_end=1564261283.990182, ref_trajectory.ts.min()=1564244205.0, ref_trajectory.ts.max()=1564261764.0
Duration: 0.016666142145792644 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565027758.9921055, range_end=1565027759.9920692, ref_trajectory.ts.min()=1565026142.0, ref_trajectory.ts.max()=1565047083.0
Duration: 0.016666062672932944 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565042553.9210582, range_end=1565042612.198419, ref_trajectory.ts.min()=1565026142.0, ref_trajectory.ts.max()=1565047083.0
Duration: 0.9712893486022949 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='ESCOOTER'
range_start=1565042612.2229319, range_end=1565042678.6395702, ref_trajectory.ts.min()=1565026142.0, ref_trajectory.ts.max()=1565047083.0
Duration: 1.1069439729054769 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565044004.987143, range_end=1565044006.9870703, ref_trajectory.ts.min()=1565026142.0, ref_trajectory.ts.max()=1565047083.0
Duration: 0.0333321213722229 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565046555.9895692, range_end=1565046557.9895043, ref_trajectory.ts.min()=1565026142.0, ref_trajectory.ts.max()=1565047083.0
Duration: 0.03333225250244141 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565131571.756772, range_end=1565131719.29528, ref_trajectory.ts.min()=1565116212.0, ref_trajectory.ts.max()=1565136530.0
Duration: 2.4589751323064166 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565136013.9938207, range_end=1565136014.9937887, ref_trajectory.ts.min()=1565116212.0, ref_trajectory.ts.max()=1565136530.0
Duration: 0.016666134198506672 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1565196433.9950595, range_end=1565196434.9950233, ref_trajectory.ts.min()=1565194838.0, ref_trajectory.ts.max()=1565214609.0
Duration: 0.016666062672932944 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='ESCOOTER'
range_start=1565210594.9998326, range_end=1565210595.9998245, ref_trajectory.ts.min()=1565194838.0, ref_trajectory.ts.max()=1565214609.0
Duration: 0.016666531562805176 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BUS'
range_start=1565214084.9878192, range_end=1565214087.9877076, ref_trajectory.ts.min()=1565194838.0, ref_trajectory.ts.max()=1565214609.0
Duration: 0.049998140335083006 minutes
++++++++++

Majority NO_SENSED_MIDDLEs for gts['mode'] = WALKING

shankari · 2023-07-27T18:54:35Z

Seems like the highest duration misses (duration > 3 minutes) are when pred = NO_SENSED_MIDDLE and gt = NO_GT_MIDDLE

Right, so we are excluding all NO_GT_MIDDLE in my patch
#71 (comment)
#71 (comment)

I really just wanted to understand the NO_SENSED_MIDDLE bicycling trip. Were we missing a reference trajectory there?

rahulkulhalli · 2023-07-27T18:55:25Z

Log for UCB:

Invocation parameters: get_confusion_matrix('ios', 'HAHFDC', gisv_ucb, criterion='distance')

Output:

Found pred=NO_SENSED_MIDDLE for gts['mode']='TRAIN'
range_start=1563982148.860445, range_end=1563982185.4986596, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.6106369098027548 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563984657.9924605, range_end=1563984658.9924257, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.016666086514790852 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1563989116.9876733, range_end=1563989125.9873734, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.14999500115712483 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='E_BIKE'
range_start=1564012923.9893892, range_end=1564012924.78658, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.01328651507695516 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564012924.802222, range_end=1564012931.9890728, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.11978084643681844 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564016393.9962385, range_end=1564016394.9962049, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.016666106383005776 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564018043.998538, range_end=1564018044.9985135, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.01666625738143921 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564022442.9959133, range_end=1564022446.9957867, ref_trajectory.ts.min()=1563979911.0, ref_trajectory.ts.max()=1564023555.0
Duration: 0.06666455666224162 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564070915.985266, range_end=1564070917.9851942, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.03333213726679484 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564071255.992466, range_end=1564071256.9924357, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.016666162014007568 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='SUBWAY'
range_start=1564074817.2283502, range_end=1564074824.926834, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.12830806573232015 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564074824.940014, range_end=1564074841.207154, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.2711190025011698 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564075598.99119, range_end=1564075600.9911242, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.033332236607869464 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564099055.9879663, range_end=1564099056.9879246, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.016665971279144286 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BUS'
range_start=1564100498.961628, range_end=1564100576.9653723, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 1.3000624060630799 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BUS'
range_start=1564102626.9901922, range_end=1564102628.99012, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.03333212931950887 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='BUS'
range_start=1564102752.9856157, range_end=1564102763.9852161, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.18332667350769044 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564102891.9970846, range_end=1564102892.9970627, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.016666301091512046 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564104317.9938264, range_end=1564104319.9937544, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.03333213329315186 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564108873.9982843, range_end=1564108874.9982548, ref_trajectory.ts.min()=1564067195.0, ref_trajectory.ts.max()=1564109948.0
Duration: 0.016666173934936523 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='TRAIN'
range_start=1564156002.9980292, range_end=1564156004.9981625, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.033335554599761966 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564157527.991729, range_end=1564157528.991693, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.01666606664657593 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564162003.9857054, range_end=1564162006.9856012, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.04999826351801554 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564175793.5701408, range_end=1564175848.4848464, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.9152450919151306 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564182927.507062, range_end=1564182935.6827846, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.13626204331715902 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564185598.042931, range_end=1564185607.9924884, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.16582595507303874 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564188702.9903646, range_end=1564188703.9903243, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.016665995121002197 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='LIGHT_RAIL'
range_start=1564190914.990709, range_end=1564190921.251579, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.10434783299763997 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564190921.268532, range_end=1564190924.9903421, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.062030168374379475 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564190929.9901583, range_end=1564190930.9901223, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.01666606664657593 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1564195289.9916215, range_end=1564195290.991588, ref_trajectory.ts.min()=1564153876.0, ref_trajectory.ts.max()=1564196385.0
Duration: 0.016666110356648764 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='TRAIN'
range_start=1568129418.269621, range_end=1568129425.9909544, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.12868889172871908 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568131633.395437, range_end=1568131643.9984407, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.17671672900517782 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568136933.996101, range_end=1568136934.996066, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.016666086514790852 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568158811.9918032, range_end=1568158813.9917216, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.03333197434743245 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568161880.985497, range_end=1568161884.9853492, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.06666420300801595 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568163677.998957, range_end=1568163681.9988372, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.06666467189788819 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568163796.994754, range_end=1568163797.9947186, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.016666074593861897 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568167469.9902563, range_end=1568167470.99023, ref_trajectory.ts.min()=1568128360.0, ref_trajectory.ts.max()=1568168550.0
Duration: 0.016666229565938315 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568218046.258317, range_end=1568218052.6606212, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.10670506954193115 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568222217.9927611, range_end=1568222218.992733, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.016666197776794435 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568223326.995452, range_end=1568223327.995418, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.016666102409362792 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568245904.9858875, range_end=1568245905.9858465, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.01666598320007324 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568249746.9971282, range_end=1568249747.9970932, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.01666608254114787 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568256084.9858088, range_end=1568256085.9857852, ref_trajectory.ts.min()=1568214738.0, ref_trajectory.ts.max()=1568257054.0
Duration: 0.01666627327601115 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568736842.000698, range_end=1568736851.9892392, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.16647568543752034 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568763381.986533, range_end=1568763382.986492, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.01666598320007324 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568766337.995494, range_end=1568766338.9954555, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.016666026910146077 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568768048.9961667, range_end=1568768049.9961324, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.016666094462076824 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568768095.9959073, range_end=1568768097.9958477, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.03333233992258708 minutes
++++++++++
Found pred=NO_SENSED_MIDDLE for gts['mode']='WALKING'
range_start=1568771864.998498, range_end=1568771868.9983878, ref_trajectory.ts.min()=1568732887.0, ref_trajectory.ts.max()=1568772835.0
Duration: 0.06666483084360758 minutes
++++++++++

Majority NO_SENSED_MIDDLEs for gts['mode'] = WALKING

rahulkulhalli · 2023-07-27T18:57:14Z

Edited the logs to exclude NO_GT_MIDDLE

shankari · 2023-07-27T19:00:36Z

I really just wanted to understand the NO_SENSED_MIDDLE bicycling trip. Were we missing a reference trajectory there?

so which timeline did this case happen in and were we in fact missing a reference trajectory there?

rahulkulhalli · 2023-07-27T19:10:08Z

This happened for the LA timeline. Indeed, we do not have any reference trajectory datapoints within the time range.

rahulkulhalli · 2023-07-27T19:48:29Z

Parameters: compare_cm(oses=['ios'], roles=['HAHFDC'], criteria=['duration', 'distance'])
Ranges: gisv_la, gisv_sj, gisv_ucb

Column normalization:

Row normalization:

shankari · 2023-07-27T19:58:17Z

The column normalizations are what is more important, since we use them in the scripts for the downstream metrics. And fortunately, they seem pretty close! So this should not affect the results for the "count every trip" papers significantly, which means that we will likely not have to rewrite them.

shankari · 2023-07-27T20:13:47Z

Tried to filter separately by start and end and it kind of worked but with some kinks

    start_loc_df_a = emd.filter_geo_df(
        emd.to_geo_df(e["temporal_control"]["android"]["location_df"]).copy(),
        section_gt_shapes.filter(["start_loc"]))
    start_loc_df_b = emd.filter_geo_df(
        emd.to_geo_df(e["temporal_control"]["ios"]["location_df"]).copy(),
        section_gt_shapes.filter(["start_loc"]))

    end_loc_df_a = emd.filter_geo_df(
        emd.to_geo_df(e["temporal_control"]["android"]["location_df"]).copy(),
        section_gt_shapes.filter(["end_loc"]))
    end_loc_df_b = emd.filter_geo_df(
        emd.to_geo_df(e["temporal_control"]["ios"]["location_df"]).copy(),
        section_gt_shapes.filter(["end_loc"]))

Then I realized that this filters everything except start!!

shankari · 2023-07-27T20:34:35Z

@rahulkulhalli the column normalizations seem broken - e.g. the bicycling column does not add up to 1.
Can you please re-verify?

shankari · 2023-07-27T20:42:59Z

Filtering properly, everything seems to work

But I have an error in the ct computation

Traceback (most recent call last):
  File "/Users/kshankar/e-mission/mobilitynet-analysis-scripts/emeval/metrics/reference_trajectory.py", line 503, in final_ref_ensemble
    ct_ref_df = ref_ct_general(e, b_merge_midpoint, dist_threshold, tz, include_ends)
  File "/Users/kshankar/e-mission/mobilitynet-analysis-scripts/emeval/metrics/reference_trajectory.py", line 353, in ref_ct_general
    initial_ends_gpdf = ref_ends(e, tz)
  File "/Users/kshankar/e-mission/mobilitynet-analysis-scripts/emeval/metrics/reference_trajectory.py", line 310, in ref_ends
    start_filtered_merged_df = start_merged_df.query("t_distance < @dist_threshold")
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/frame.py", line 3231, in query
    res = self.eval(expr, **kwargs)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/frame.py", line 3346, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/computation/eval.py", line 337, in eval
    ret = eng_inst.evaluate()
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/computation/engines.py", line 127, in evaluate
    return self.expr()
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/computation/expr.py", line 771, in __call__
    return self.terms(self.env)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/computation/ops.py", line 396, in __call__
    return self.func(left, right)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/ops/common.py", line 64, in new_method
    return method(self, other)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/ops/__init__.py", line 529, in wrapper
    res_values = comparison_op(lvalues, rvalues, op)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 256, in comparison_op
    res_values = invalid_comparison(lvalues, rvalues, op)
  File "/Users/kshankar/miniconda-4.8.3/envs/emissioneval/lib/python3.8/site-packages/pandas/core/ops/invalid.py", line 34, in invalid_comparison
    raise TypeError(f"Invalid comparison between dtype={left.dtype} and {typ}")
TypeError: Invalid comparison between dtype=float64 and str

With include_ends=True, we have ct computed correctly, although we don't choose it.
So let's fix that exception so we don't introduce any regressions.

GEO_DF: before filtering, len(e['temporal_control']['android']['location_df'])=283 and len(e['temporal_control']['ios']['location_df'])=150
GEO_DF: after filtering, len(filtered_utm_loc_df_a)=263 and len(filtered_utm_loc_df_b)=136
After filtering, 254 of 254 (1.0) for android and 249 of 249 (1.0) for ios
After merging, found 175 / 254 of android 254 (0.6889763779527559), ios 249 (0.7028112449799196)
Validated tf, stats are {'coverage_density': 0.6490024909685729, 'coverage_time': 0.9382721726574225, 'coverage_max_gap': 0.1186747412056819}
MATCH TRAJECTORY: len(filtered_loc_df_a)=263, len(filtered_loc_df_b)=136
After filtering, retained 158 of 254 (0.6220472440944882)
Validated ct, stats are {'coverage_density': 0.5859565347030544, 'coverage_time': 0.8937691447052918, 'coverage_max_gap': 0.2818525103634945}
for tf = 0.1186747412056819 v/s ct = 0.2818525103634945, density 0.6490024909685729 v/s 0.5859565347030544, returning tf len = 175 not cf len = 158

Fix that and then regenerate everything.

shankari · 2023-07-27T21:03:18Z

Bingo! Top (tf), bottom (cf)

Filtering at ends, for threshold 25, before merging, android: len(start_loc_df_a)=4, len(end_loc_df_a)=16, ios: len(start_loc_df_b)=4, len(end_loc_df_a)=16

Filtering at ends, for threshold UTC, before merging, android: len(start_loc_df_a)=4, len(end_loc_df_a)=16, ios: len(start_loc_df_b)=4, len(end_loc_df_a)=16

rahulkulhalli · 2023-07-27T22:02:24Z

@shankari Normalization is fixed. Adding plots here:

Column normalization:

Row normalization:

shankari · 2023-07-27T23:10:19Z

Fixed. Next tried on one of the walk_to_bus segments, which failed because there were no temporal control points in the end polygon. Added zero checks to overcome the error. Now, if there are no temporal control points in the polygon, the reference trajectory is blank.

HOWEVER there are temporal points in the reference trajectory

import shapely as shp
end_loc_p = shp.prepared.prep(breaking_e["ground_truth"]["gt_shapes"]["end_loc"])
geo_df = emd.to_geo_df(breaking_e["temporal_control"]["ios"]["location_df"].tail())
geo_df.apply(lambda p: end_loc_p.contains(p.geometry), axis=1)

Let's see why this is broken. We can revert the zero check in that case.

shankari · 2023-07-28T00:22:10Z

Ah it's because we were printing the wrong output (len(end_loc_df_a) twice). So now we have android with no entries but iOS with entries.

Filtering at ends, for threshold 25, before merging, android: len(start_unfiltered_loc_a_df)=177, len(end_unfiltered_loc_a_df)=177 -> len(start_loc_df_a)=101, len(end_loc_df_a)=0, ios: len(start_unfiltered_loc_b_df)=129, len(end_unfiltered_loc_b_df)=129 -> len(start_loc_df_b)=93, len(end_loc_df_b)=13

I guess we can also have the other side.

So we keep the zero checks, but if one of them is non-zero, we return a match to the ground truth using the code from ref_gt_general. Gah this is so finicky and bad...

shankari · 2023-07-28T00:59:16Z

Ok, so I actually implemented this

    def _match_single_to_gt(filtered_loc_df, dist_threshold):
        new_location_df = get_int_aligned_trajectory(filtered_loc_df, tz)

        new_location_df_u = emd.to_utm_df(new_location_df)

        add_gt_error_projection(new_location_df_u, utm_gt_linestring)

        new_location_df["gt_distance"] = new_location_df_u.gt_distance
        new_location_df["gt_projection"] = new_location_df_u.gt_projection

        filtered_location_df = new_location_df.query("gt_distance < @dist_threshold")
        filtered_location_df['source'] = ['match_gt'] * len(filtered_location_df)
        # filtered_location_df.drop(columns=["gt_distance", "])
        return gpd.GeoDataFrame(filtered_location_df)

But the problem is that, once we sort by timestamp, the iOS entries in the polygon are interspersed with the android locations outside the polygon

breaking_e = ref_tree['train_bus_ebike_mtv_ucb']['mtv_to_berkeley_sf_bart/walk_to_bus_1']

%%capture --no-stdout
ref_trajectory = emr.final_ref_ensemble(breaking_e, 25, tz="America/Los_angeles", include_ends=True)

ref_trajectory[1].tail(n=20)

curr_map = folium.Map()
gt_leg_gj = pv_ucb.spec_details.get_geojson_for_leg(breaking_e["ground_truth"]["leg"])
sensed_section_gj = ezgj.get_geojson_for_loc_df(breaking_e["temporal_control"]["ios"]["location_df"].tail(n=15))
gt_leg_gj_feature = folium.GeoJson(gt_leg_gj, name="ground_truth")
sensed_leg_gj_feature = folium.GeoJson(sensed_section_gj, name="sensed_values")
ref_trajectory_gj = ezgj.get_geojson_for_loc_df(ref_trajectory[1], color="blue")
ref_leg_gj_feature = folium.GeoJson(ref_trajectory_gj, name="reference_values")
curr_map.add_child(gt_leg_gj_feature)
curr_map.add_child(sensed_leg_gj_feature)
curr_map.add_child(ref_leg_gj_feature)
curr_map.fit_bounds(sensed_leg_gj_feature.get_bounds())
folium.LayerControl().add_to(curr_map)
curr_map

ts	longitude	latitude	geometry	source	fmt_time	gt_distance	gt_projection
1.564075e+09	-122.267728	37.871076	POINT (-122.26773 37.87108)	match_gt	2019-07-25T10:17:07-07:00	12.117121	69.98162
1.564075e+09	-122.267724	37.870879	POINT (-122.26772 37.87088)	android	2019-07-25T10:17:07-07:00	NaN	NaN
1.564075e+09	-122.267730	37.871085	POINT (-122.26773 37.87108)	match_gt	2019-07-25T10:17:08-07:00	13.128124	69.98162
1.564075e+09	-122.267725	37.870881	POINT (-122.26772 37.87088)	android	2019-07-25T10:17:08-07:00	NaN	NaN
1.564075e+09	-122.267725	37.870882	POINT (-122.26773 37.87088)	android	2019-07-25T10:17:09-07:00	NaN	NaN
1.564075e+09	-122.267729	37.871089	POINT (-122.26773 37.87109)	match_gt	2019-07-25T10:17:09-07:00	13.603426	69.98162
1.564075e+09	-122.267728	37.871094	POINT (-122.26773 37.87109)	match_gt	2019-07-25T10:17:10-07:00	14.079957	69.98162
1.564075e+09	-122.267725	37.870883	POINT (-122.26773 37.87088)	android	2019-07-25T10:17:10-07:00	NaN	NaN
1.564075e+09	-122.267725	37.870885	POINT (-122.26773 37.87089)	android	2019-07-25T10:17:11-07:00	NaN	NaN
1.564075e+09	-122.267727	37.871098	POINT (-122.26773 37.87110)	match_gt	2019-07-25T10:17:11-07:00	14.554887	69.98162
1.564075e+09	-122.267727	37.871096	POINT (-122.26773 37.87110)	match_gt	2019-07-25T10:17:12-07:00	14.349172	69.98162
1.564075e+09	-122.267725	37.870888	POINT (-122.26773 37.87089)	android	2019-07-25T10:17:12-07:00	NaN	NaN
1.564075e+09	-122.267726	37.870892	POINT (-122.26773 37.87089)	android	2019-07-25T10:17:13-07:00	NaN	NaN
1.564075e+09	-122.267728	37.871094	POINT (-122.26773 37.87109)	match_gt	2019-07-25T10:17:13-07:00	14.143616	69.98162
1.564075e+09	-122.267728	37.871092	POINT (-122.26773 37.87109)	match_gt	2019-07-25T10:17:14-07:00	13.938226	69.98162
1.564075e+09	-122.267726	37.870897	POINT (-122.26773 37.87090)	android	2019-07-25T10:17:14-07:00	NaN	NaN
1.564075e+09	-122.267728	37.871090	POINT (-122.26773 37.87109)	match_gt	2019-07-25T10:17:15-07:00	13.733009	69.98162
1.564075e+09	-122.267722	37.870901	POINT (-122.26772 37.87090)	android	2019-07-25T10:17:15-07:00	NaN	NaN
1.564075e+09	-122.267714	37.870903	POINT (-122.26771 37.87090)	android	2019-07-25T10:17:16-07:00	NaN	NaN
1.564075e+09	-122.267709	37.870905	POINT (-122.26771 37.87090)	android	2019-07-25T10:17:17-07:00	NaN	NaN

So we get something that looks like this

we can probably do something fancy to fix this - like merging these entries and seeing which is closer to the ground truth but I'm not going to attempt it now. We are either outside the polygon or inside the polygon, and if two of our phones tell us inconsistent results - it is not clear which one to believe. We could also do something like delete the overlapping time windows for the end polygon from the trajectory, but I don't want to generalize from n=1

So if either of the trajectories is not present, we just return 0.

shankari · 2023-07-28T06:32:34Z

@rahulkulhalli I have added computations of the reference trajectories in the start and end polygons as well. These expanded trajectories still have the limitation that, as

The main pending limitation is that we skip reference trajectory creation if either of the accuracy control phones has no points inside the start or end polygons, which is reasonable to state and move on.

The reference trajectories without the start and end polygons are in no_ends; the reference trajectories with the start and end polygons are in with_ends. Please change your script to read with_ends and recreate the confusion matrices. These are the final CMs that we will use in the "count every trip" papers.

rahulkulhalli · 2023-07-28T15:03:57Z

@shankari The stable code (without the inclusion of the start and end polygons) is pushed.

rahulkulhalli · 2023-07-28T15:34:23Z

CMs with ends included.

iOS, HAHFDC, GIS, [gisv_la, gisv_ucb, gisv_sj]:

android, HAMFDC, GIS, [gisv_la, gisv_ucb, gisv_sj]:

(Both are column-normalized)

rahulkulhalli · 2023-07-28T15:36:25Z

I have added a flag with_ends: bool to the reference trajectory retrieval function so that we may switch between modes whenever needed.

shankari · 2023-08-20T16:30:07Z

Have you addressed all of my comments? If so, you should move this off draft.
Please make a pass and verify (including the part about from X import Y versus import X as x)

Initial commit

8366932

Cleared outputs

cae37b8

shankari reviewed Jul 22, 2023

View reviewed changes

Made suggested ammends

d6d2064

shankari requested changes Jul 23, 2023

View reviewed changes

rahulkulhalli added 2 commits July 24, 2023 12:22

Merge branch 'MobilityNet:master' into Draft_CM_Distance

23beaa9

Added output folder to .gitignore; made additional suggested amends

e7f8fdd

shankari requested changes Jul 24, 2023

View reviewed changes

rahulkulhalli added 2 commits July 25, 2023 11:36

Removed OS filter for trajectories and optimized code

462613e

Renamed some references; optimized imports

b1eb149

shankari requested changes Jul 26, 2023

View reviewed changes

classification_analysis.ipynb Outdated Show resolved Hide resolved

classification_analysis.ipynb Outdated Show resolved Hide resolved

Cleaning logs and incorporating proper count normalization

84a51b1

Merge branch 'MobilityNet:master' into Draft_CM_Distance

e01cbab

Clean up code; incorporate CSV export for downstream tasks

70598f4

Cleaned-up code and outputs

f12ef08

rahulkulhalli marked this pull request as ready for review August 24, 2023 21:09

	" trajectory_data = get_reference_trajectory(pv.spec_details.CURR_SPEC_ID, trip['trip_id_base'])\n",
	" curr_ref_trajectory = get_reference_trajectory(pv.spec_details.CURR_SPEC_ID, trip['trip_id_base'])\n",

		"def export_confusion_matrix(matrix: pd.DataFrame, output_dir: Path):\n",
		" pass"

Draft PR For Distance Confusion Matrices #71

Are you sure you want to change the base?

Draft PR For Distance Confusion Matrices #71

Conversation

rahulkulhalli commented Jul 22, 2023

shankari commented Jul 22, 2023

shankari commented Jul 22, 2023

rahulkulhalli commented Jul 22, 2023 • edited Loading

shankari left a comment

Choose a reason for hiding this comment

rahulkulhalli commented Jul 22, 2023

shankari left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shankari Jul 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shankari commented Jul 23, 2023 • edited Loading

rahulkulhalli commented Jul 24, 2023

rahulkulhalli commented Jul 24, 2023

shankari left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahulkulhalli commented Jul 25, 2023

shankari commented Jul 25, 2023 • edited Loading

rahulkulhalli commented Jul 26, 2023 • edited Loading

rahulkulhalli commented Jul 26, 2023

shankari left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shankari commented Jul 26, 2023 • edited Loading

shankari commented Jul 26, 2023 • edited Loading

rahulkulhalli commented Jul 26, 2023

shankari commented Jul 26, 2023

shankari commented Jul 26, 2023 • edited Loading

shankari commented Jul 26, 2023

rahulkulhalli commented Jul 26, 2023

shankari commented Jul 27, 2023

rahulkulhalli commented Jul 27, 2023

shankari commented Jul 27, 2023 • edited Loading

rahulkulhalli commented Jul 27, 2023 • edited Loading

rahulkulhalli commented Jul 27, 2023 • edited Loading

shankari commented Jul 27, 2023 • edited Loading

rahulkulhalli commented Jul 27, 2023 • edited Loading

rahulkulhalli commented Jul 27, 2023

shankari commented Jul 27, 2023

rahulkulhalli commented Jul 27, 2023 • edited Loading

rahulkulhalli commented Jul 27, 2023

shankari commented Jul 27, 2023

shankari commented Jul 27, 2023

shankari commented Jul 27, 2023

shankari commented Jul 27, 2023 • edited Loading

shankari commented Jul 27, 2023

rahulkulhalli commented Jul 27, 2023

shankari commented Jul 27, 2023

shankari commented Jul 28, 2023

shankari commented Jul 28, 2023

shankari commented Jul 28, 2023

rahulkulhalli commented Jul 28, 2023

rahulkulhalli commented Jul 28, 2023

rahulkulhalli commented Jul 28, 2023 • edited Loading

shankari commented Aug 20, 2023

rahulkulhalli commented Jul 22, 2023 •

edited

Loading

shankari left a comment •

edited

Loading

shankari Jul 23, 2023 •

edited

Loading

shankari commented Jul 23, 2023 •

edited

Loading

shankari commented Jul 25, 2023 •

edited

Loading

rahulkulhalli commented Jul 26, 2023 •

edited

Loading

shankari left a comment •

edited

Loading

shankari commented Jul 26, 2023 •

edited

Loading

shankari commented Jul 26, 2023 •

edited

Loading

shankari commented Jul 26, 2023 •

edited

Loading

shankari commented Jul 27, 2023 •

edited

Loading

rahulkulhalli commented Jul 27, 2023 •

edited

Loading

rahulkulhalli commented Jul 27, 2023 •

edited

Loading

shankari commented Jul 27, 2023 •

edited

Loading

rahulkulhalli commented Jul 27, 2023 •

edited

Loading

rahulkulhalli commented Jul 27, 2023 •

edited

Loading

shankari commented Jul 27, 2023 •

edited

Loading

rahulkulhalli commented Jul 28, 2023 •

edited

Loading