Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naive vs CHEER analysis measurement #168

Open
jpfleischer opened this issue Oct 14, 2024 · 116 comments
Open

Naive vs CHEER analysis measurement #168

jpfleischer opened this issue Oct 14, 2024 · 116 comments

Comments

@jpfleischer
Copy link
Contributor

jpfleischer commented Oct 14, 2024

The intention is to get quantified measurements and visualizations that compare naive method and CHEER method.
The naive method is in historical commits pre-#152
and #152 brings that CHEER calculation by using the e-mission-common module.

So using smart commute, we will read in a data frame and then use the functions to calculate the footprint in both ways and get the difference.

@Abby-Wheelis

EDIT: The notebook I wrote to accomplish the solution to this issue is available at #180

@jpfleischer
Copy link
Contributor Author

At this point in time, the naive calculations have been successfully retrieved from the history. We have the columns in a data frame that represent naive calculations. We are making good progress towards the CHEER calculations by utilizing modules from e-mission-common, such as emcb.get_rich_mode_for_value

We are dropping rows from smart commute that fall under these criteria, as we are interested in comparing carbon calculations, so non-trips are not useful

expanded_ct = expanded_ct.dropna(subset=['data_user_input_mode_confirm'])
expanded_ct = expanded_ct[expanded_ct['Mode_confirm'] != 'Not a Trip']

@jpfleischer
Copy link
Contributor Author

We have working functionality to compare the naive and CHEER calculations by outputting a graph of average CO2 emissions for each mode. This uses the smart commute data.

Image

Now that we have established this functionality, the intention is to move forward to bigger data sets. I have already started the process to get Bull eBike (Durham, NC) data access by communicating with the TSDC. Furthermore, we can already begin analysis of the whole CanBikeCO data instead of just one part (smart commute)

Finally, the main objective is to compare not only across locations, but also across years.
The notebook is at https://github.com/jpfleischer/em-public-dashboard/blob/cheernaive/viz_scripts/naive_cheer_comparison.ipynb

@Abby-Wheelis
Copy link
Member

Rough timeline

  • 11/4 result graphs generated, code clean and checked in
  • 11/11 results written into the the paper, ready to edit
  • 11/18 all large scale editing done, ready to polish
  • 11/21 polished and submitted to conference

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Oct 29, 2024

10/29 - Solve the index mixture issue, which was causing walk to have emissions (the rows were being added to each other that had the same index across sub-datasets for CanBikeCO)
10/30 - Load Bull (Durham), Mass (eBike), and CanBikeCO (logic already there) and generate the preliminary first graph which is above, for all three.
10/31 - Create visualizations that were designed on the whiteboard such as subplots for each year, done for each dataset. Comparison between years for the same dataset.
11/1 - per km

11/4 - clean out the comments, hopefully make naive^2 , cumulative CO2, put the graphs in paper
11/5 - the table is different than the direct comparison between two methods. current table shows the change from year to year, clarify which method is used

Results by end of week.

Per trip vs Per km

We calculate the emissions per trip based on distance and energy /emissions intensity per mile/km.
Some of them have a per trip penalty (but mostly edge case)
So, per trip emissions are influenced by average distance of trip for that mode.

Comparing CHEER vs naive in the same dataset, does not matter that it's per trip. However, comparing CO to NC ends up comparing the combination of the intensity and the distance of those trips. e.g. NC could be cleaner energy. So, focus on per mile (actually do per km because metric is better)

Image
The idea on the left is not ideal since we would like to compare calculation methods, not just the modes which is kind of simplistic. We would like to have a figure with subplots for each year, and the color is for each calculation method (naive vs CHEER)

We also want to have a table that shows, for the name of the columns, the mode such as Car, Bus, Train. and on the left, each row represents a year. The values within the cells are the change in percentage, for naive vs CHEER.

@shankari
Copy link
Contributor

shankari commented Oct 30, 2024

  1. Where is the older naive method (the one I implemented originally for my thesis from Chester et al) that only used the inferred modes? We have three entries in the methodology section, we should compare all three. The switch from (App, 2014) to (Dashboard, 2020) will highlight the improvement (carried through to CHEER) caused by including richer modes (Sections III B (1) and III B (2))
  2. I think we should compare naive and CHEER on the same datasets always. As you point out, it doesn't make sense to compare across datasets because there are too many variables. When exploring Durham and MassCEC, we get the geographic variation because the naive coefficients were based on transit and grid data from Denver in 2020 (when the CEO pilot ended and we were creating the plots for the first time). So if we use coefficients from Denver but apply them to Durham, we can see the geographic variation that CHEER enables.
    • We should also consider evaluating each CEO program separately, to showcase the variability even within the same state. Note that the (Dashboard, 2020) values were from Denver, while the other programs were in other areas of Colorado.
    • While the average or per km values are fine, we also want to see cumulative values because that really showcases the impact. This is another reason for us to not attempt to directly compare different datasets.

@Abby-Wheelis
Copy link
Member

Where is the older naive method (the one I implemented originally for my thesis from Chester et al) that only used the inferred modes?

Great question, that has not been implemented yet, but I'm sure we can, since the values are documented in the paper it shouldn't be too hard to multiply the value by the distance of the trip, it makes a lot of sense to show all three methods.

@jpfleischer I don't think we really have the code for this like we did for the before and after on the public dashboard, I would look at the methods in the paper and do something like - hardcode the mode-val lookup, develop a function to look up the val give the mode and multiply it by the distance, and apply that to the dataframe.

I think we should compare naive and CHEER on the same datasets always.

I see what you're saying here, and this makes sense to me. I have been focused on wanting to capture "impact of adapting over time" and "impact of adapting by geography", but you're right that we don't have to compare geographies to do that, since by comparing naive and CHEER we are already showing the difference between "using Denver values" and "using local values".

While the average or per km values are fine, we also want to see cumulative values because that really showcases the impact.

Given some of this feedback, I think I would propose a few things:

  • Compare the methods year-by-year via emissions per mile for CanBikeCO (hopefully show a growing gap between methods as we advance away from the base year), I think the table could still be helpful.
  • Compare cumulative difference (so, total not average per-trip) between methods in each location. Hopefully some interesting differences here. Could also compare per km. I don't think we should put them in the same figure, but it could be interesting to see how the gaps differ between locations. Is the gap bigger in certain places and/or for certain modes?
  • Could also do per-program within CanBikeCO compare the methods to see how the gaps differ within the state (but spanning urban, small town, rural...) - also all in the same eGrid region!

@jpfleischer
Copy link
Contributor Author

Here are the three different datasets- CanBikeCO, Bull Durham, and MassCEC with standardized y ranges
image
image
image

@shankari
Copy link
Contributor

The original coefficients are also documented in https://github.com/e-mission/e-mission-server/blob/b15fcb983c6b2f40e548f53550d417829a2f08fc/front/server/carbon_calc_details.html#L62

I think it should be as simple as using the new coefficients in the csv, and making sure to use the primary_inferred_mode or primary_sensed_mode for the lookup. Note that the older studies may not have these fields, you may need to re-implement the equivalent of https://github.com/e-mission/e-mission-server/blob/master/bin/historical/migrations/add_sections_and_summaries_to_trips.py to match up the section csvs to the trip csvs.

@shankari
Copy link
Contributor

Does MassCEC not have any bus/train trips?
Also, I am surprised that there are no values shown for e-bike in MassCEC or Durham. Is that just because the number is too small? Or is the grid mix in Mass exactly the same as Denver (seems unlikely)?

@jpfleischer
Copy link
Contributor Author

There is actually ebike, just very small 😄 is logarithmic y-scale okay?
image

@shankari
Copy link
Contributor

I would add one regular and one log scale; the log scale one is just to show the e-bike results. You could put in the e-bike results from the three programs into one chart that is log scale.

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Oct 31, 2024

Here is a preliminary yearly plot for CanBikeCO.
image

However, that does not have a per km basis. Here is a per km average.
image

Durham only has one year.
image

MassCEC:
image

@Abby-Wheelis
Copy link
Member

We can see the change in CHEER between the years in Colorado!

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Nov 4, 2024

Here is a preliminary plot for naive-naive (naive squared) method for CanBikeCO.
image

The only values that were sensed were:
['Subway' 'Train' 'Unknown' 'Walking' 'Bicycling']

{0: "unknown", 1: "walking",2: "bicycling",
3: "bus", 4: "train", 5: "car", 6: "air_or_hsr",
7: "subway", 8: "tram", 9: "light_rail"}
Unfortunately this is the case for all three sets (CO, Bull, Mass). They only have 11 (not even an option in https://github.com/e-mission/e-mission-server/blob/c8e808046088b273e38124799ae85ae57e68fb6b/emission/core/wrapper/modeprediction.py#L17C1-L27C19 ), 7 (subway), 4 (train), 2 (bicycling), 1 (walking), and 0 (unknown).

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Nov 4, 2024

Here is a cumulative plot for CanBikeCO.
image

Bull Durham
image

MassCEC
image

They are vastly different in size. Should we normalize the number of trips somehow?

@Abby-Wheelis
Copy link
Member

For the oldest method we're trying to evaluate - the data we currently have is "cleaned sections" which have a "sensed mode" which is one of the "MotionTypes", what we need is the "inferred sections" which have one of the "PredictedModeTypes"

@Abby-Wheelis
Copy link
Member

But that still doesn't explain why some of the data has a "sensed mode" that isn't an integer between 0 and 11. That's just a few sections (20 or so out of thousands)

@Abby-Wheelis
Copy link
Member

For the cumulative, maybe it would be easier to compare if we had % difference between Naive and cheer? As a table, that could be something like:

Mode CO MA NC
Bus +50% +12% -10%

Or add it to the chart in some way, maybe to the top of the CHEER bar to clearly indicate it's offset from the Naive bar?

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Nov 5, 2024

I have the inferred csvs now. The sensed modes in those data are:
['Bicycling' 'Walking' 'Unknown' 'Car' 'Air_Or_Hsr' 'Bus' 'Train' 'Light_Rail' 'Tram' 'Subway']

Sanity check of average speeds:

Average speed per mode:
  predicted_mode_name  average_speed
0          Air_Or_Hsr     504.371906
1           Bicycling       4.448306
2                 Bus       3.935390
3                 Car       7.859624
4          Light_Rail       9.634933
5              Subway       5.983504
6               Train       9.915621
7                Tram       2.458991
8             Unknown       6.346682
9             Walking       0.882035

Is it ok to make these assumptions about the correspondence between the paper values and the sensed modes?

    g_pkm = {
        'Car': 172.78, # ICEV
        'Train': 57.17,
        'Subway': 57.17,  # Treat Subway as Train
        'Bus': 165.94,
        'Air_Or_Hsr': 134.86, # is it ok to group hsr in too?
        'Walking': 0,
        'Bicycling': 0,
    }

image

@Abby-Wheelis
Copy link
Member

If the mapping in the paper was confusing, you can view the original mapping (where I got what I put in the paper) here: https://github.com/e-mission/e-mission-server/blob/b15fcb983c6b2f40e548f53550d417829a2f08fc/front/server/carbon_calc_details.html#L69

What you put in the comment looks right to me though

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Nov 6, 2024

Three method comparison:
image
image
image

@shankari
Copy link
Contributor

shankari commented Nov 6, 2024

I think that this is a good point to start from. Where is the CanBikeCO? You would then want to dig deeper into this and explain why CHEER was lower for MassCEC and higher for Durham (maybe now you can split by mode, and then further split transit by fleet and occupancy)

@jpfleischer
Copy link
Contributor Author

As a sanity check for each dataset, we take its inferred sections and its confirmed trips, and sum up the total distance covered for both. That way, when we find the (App, 2014) (naive naive) method which uses inferred sections, we want to make sure that we are only using inferred sections that are attached to a trip for fairness.
This way we can fairly compare the calculation methods because the other two use trips.

However, CanBikeCOs subsets had 1 dataset whose section distance and trip distance numbers were not congruent.

4c had 46,713,000 for section, 46,518,000 for trips
CC had 187,935,000 for section, 371,102,000 for trips
FC had 75,688,000 for section, 75,737,000 for trips
PC had 168,104,000 for section, 168,168,000 for trips
SC had 70,943,000 for section, 70,912,000 for trips
VAIL had 55,427,000 for section, 56,675,000 for trips

Does anyone know why CC has such a disparity?

@Abby-Wheelis
Copy link
Member

CC had 187,935,000 for section, 371,102,000 for trips

I would have to look at the data to know, do you think there's something different about the ids? Is the dataset shorter than expected? Maybe compare the number of rows between datasets? You'd need to use the total trips, not just the labeled ones, but if you get an idea of the ratio maybe you can tell if CC comes up short and there might be missing data?

@jpfleischer
Copy link
Contributor Author

For some reason there are many trips without sections in CC

# current_section is a dataframe of the inferred section. full_csv is a dataframe of the confirmed trips.
trip_id_column = 'data_trip_id' if 'data_trip_id' in current_section.columns else 'tripno'

all_trip_ids = set(full_csv["data_cleaned_trip"].unique())
matched_trip_ids = set(current_section[current_section[trip_id_column].isin(all_trip_ids)][trip_id_column].unique())
trips_without_sections = all_trip_ids - matched_trip_ids
num_trips_without_sections = len(trips_without_sections)

print(f"Total trips in full_csv: {len(all_trip_ids)}")
print(f"Trips in full_csv without any sections: {num_trips_without_sections}")

Total trips in full_csv: 75154
Trips in full_csv without any sections: 49238

This is very peculiar as the final number here is usually 0 or 2 or 3 for other sets. I will be rerequesting CC data

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Nov 8, 2024

The problem goes away when I use the cleaned sections instead of the inferred sections...
but only for CC.

I am continuing to use the abby_ceo folder for confirmed_trips. When I switch to using the inferred_sections for distance comparison, only CC/Boulder has a big discrepancy. When I switch to using cleaned_sections for distance comparison, only FC/Durango has a big discrepancy.

@Abby-Wheelis
Copy link
Member

I am continuing to use the abby_ceo folder for confirmed_trips.

If you have confimed_trips that were sent alongside the sections, I would use those! That folder is many months old, so if anything has changed in the TSDCs process since then it could be a data vintage mismatch issue.

Do both cleaned and inferred sections contain the mode used in the calculations? I though only the inferred sections did, do the cleaned sections have an inferred mode as well?

@jpfleischer
Copy link
Contributor Author

You are right about the sensed mode. I can only use inferred, because only that one has the right values for sensed mode.

I am going to request the entire dataset with all the types of csv's.

@shankari
Copy link
Contributor

shankari commented Dec 3, 2024

What about percentage difference? One bar per user would allow for more users.

Yes, I think you should summarize to one metric per user and then display for all users. 500 points is not too much to show in a chart as long as you are looking for patterns and not details.

@jpfleischer
Copy link
Contributor Author

Here is another per user plot showing all users with one metric. Should we sort from lowest to highest?

image

@Abby-Wheelis
Copy link
Member

Here is another per user plot showing all users with one metric. Should we sort from lowest to highest?

Yes, I would sort low to high and remove the x tick labels

@jpfleischer
Copy link
Contributor Author

Here it is sorted.

image

@shankari
Copy link
Contributor

shankari commented Dec 4, 2024

Is that a 500% change? Some of those seem awfully high; I think we will need a better explanation of the > 100% change entries. Also, if there are so many users with over 100% difference, why is the overall difference so low (I can think of many reasons, but we should figure it out and document it).

@jpfleischer
Copy link
Contributor Author

Most of that user's trips were "Other" mode confirm.
Maybe we should drop "Other."

image

@Abby-Wheelis
Copy link
Member

What is the energy and emissions intensity for "Other" in CHEER? That doesn't seem right that it would be that much higher than the bus users.

@jpfleischer
Copy link
Contributor Author

A mismatch is created because Dashboard 2020 (naive) is using Mode_confirm whereas CHEER is using data_user_input_mode_confirm, which leads to this discrepancy of huge Other output. The Other has a data_user_input_mode_confirm of air, whereas the Mode_confirm is Other.

This was accounted for previously in the figures that we created by dropping Air

Detail in the paper that Dashboard considers these air values as 0 because there is no mapping for air in mode_labels whereas CHEER considers them as having actual value

Theorized solution:
Drop lowercase air from data user input mode confirm

@jpfleischer
Copy link
Contributor Author

Here, we dropped where data_user_input_mode_confirm == 'air'

image

@Abby-Wheelis
Copy link
Member

So it does look like most of the discrepancy was for "Other" "air" trips. Which is what we thought was the case. I think this is an interesting figure, and we could include it in the revision/polishing of the paper! It highlights the impact using NTD data for calculations in a very visible way. In a much smaller way, we see the few negative bars which appear to be e-bike colored.

I would suggest dropping modes that don't appear (namely the zero emissions modes) which might make it even easier to read what mode corresponds to what color.

@jpfleischer
Copy link
Contributor Author

Here is a plot without zero emissions modes.

image

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Dec 10, 2024

I wanted to note the cumulative emissions, which checks out in Figure 5.

Cumulative Emissions Table:
        Mode_confirm  Mode_confirm_kg_CO2  cheer_kg_co2  percentage_diff
                 Bus          4656.827508  11480.766839       146.536227
              E-bike           730.720210    719.616418        -1.519568
        Free Shuttle           137.731089    265.067060        92.452599
Gas Car, drove alone         37751.002071  48018.401370        27.197687
Gas Car, with others         35851.335250  45602.069117        27.197687
       Scooter share             3.954739      3.152436       -20.287145
      Taxi/Uber/Lyft          1345.473222   1635.337568        21.543673
               Train           228.334597    161.803214       -29.137671
Region 2021 (kg/km) 2022 (kg/km)
Colorado 0.32353 0.27817
National 0.31949 0.23628

@jpfleischer
Copy link
Contributor Author

Q: How many trips are over 100 MPH? Then how many of those are not in [air, train]? (do it for each program.)

Program Name Trips Over 100 MPH Trips >100 MPH Not in 'Air' or 'Train'
canbikeco 137 48
bull-durham 11 3
masscec 9 7

@Abby-Wheelis
Copy link
Member

What are the modes of the 48 trips in canbike co? Can you do something like groupby mode and count? That's odd the number is so high.

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Dec 12, 2024

Counts of trips over 100 MPH (excluding 'air' and 'train') in CanBikeCO, grouped by mode:

data_user_input_mode_confirm count
bus 1
drove_alone 8
pilot_ebike 8
scootershare 1
shared_ride 14
taxi 1
--custom label-- 1
walk 14

Also, the UACE issue is not an issue of the coordinates, but rather there are no agencies that publish data for that UACE, causing the UACE to become None. Even though so many of the dataset is in 09298 (Boulder), only one DR demand response transit agency publishes fuel data, and no buses.

@Abby-Wheelis
Copy link
Member

Also, the UACE issue is not an issue of the coordinates, but rather there are no agencies that publish data for that UACE, causing the UACE to become None. Even though so many of the dataset is in 09298 (Boulder), only one DR demand response transit agency publishes fuel data, and no buses.

Ah, that makes lots of sense. So "occurred inside a UACE" vs "occurred inside a UACE where at least one agency published data for the requested mode"

@shankari
Copy link
Contributor

Even though so many of the dataset is in 09298 (Boulder), only one DR demand response transit agency publishes fuel data, and no buses

The buses in Boulder are run by the RTD (https://bouldercolorado.gov/services/bus).
I looked up RTD, and the UACE that RTD uses is 23527; I can't immediately figure out how to download the shape files, but I did find the list of UACE and Boulder has a UACE of 09298.

So it seems like this is a limitation of the NTD; it assumes that every transit agency only operates in one UACE.
The Bay Area has a few multi-county transit agencies, so checking them..

BART is listed with a UACE of 78904, for example, which is San Francisco--Oakland, CA but it runs upto Dublin/Pleasanton, which is in the Livermore-Dublin-Pleasanton (50533) UACE and Antioch, which is in the 02683 UACE.
https://www.bart.gov/sites/default/files/2023-09/BART-Detailed-Map-Web.pdf

@JGreenlee does CHEER support multi-UACE agencies now? If so, how?

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Dec 13, 2024

From my perspective, CHEER does not support multi-UACE agencies because the NTD only has one UACE per agency.
A solution in my eyes, would be to use OpenStreetMap overpass or something similar, to do the following

for agency in agencies
    for bus_stop in agency.bus_stops
        uace_codes = []
        look up the UACE code for that bus stops coordinates using shapefile
        add UACE code to uace_codes if not already in uace_codes
        extend CHEER's database of uace codes for that agency

The only issue would be, matching the names of agencies in NTD to the names of agencies in OpenStreetMap.

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Dec 16, 2024

After dropping trips over 100 MPH and not in Air or Train, the cumulative amounts are slightly lower now.

CanBikeCO:

image

Previously the figure looked like:
image

I will update the paper figures.

@jpfleischer
Copy link
Contributor Author

Grouped bar chart:

image

@Abby-Wheelis
Copy link
Member

That is definitely more compact! I think it might be useful to shorten the title of the NC program. I also wonder if this is the most useful version of this chart because CanBikeCO is so much higher than the others (I imagine that mostly has to do with longer collection period and/or more users). Depending on the purpose of the chart (how it is being used to support an argument in the paper) maybe it would be useful to display it differently.

@shankari
Copy link
Contributor

@Abby-Wheelis correct. Again, our focus is not on the carbon footprint, but on the difference between the baseline and the improved calculation method. I would suggest changing the chart to reflect that.

@jpfleischer
Copy link
Contributor Author

jpfleischer commented Dec 16, 2024

Here I have done a percentage change from App 2014.

image

EDIT: try to do dashboard 2020 as the baseline instead

@jpfleischer
Copy link
Contributor Author

Here, Dashboard 2020 is the baseline.

image

@jpfleischer
Copy link
Contributor Author

Here is a rudimentary version of a flowchart that compares the two methods.

image

@shankari
Copy link
Contributor

Why are we using Wh/pkm for
delta_all_fuels_wh_pkm_bus_modes_massachusetts_2018_2022.pdf

pkm is passenger km. So this is effectively "the coefficient".

I thought that the point of having the two maps was to show the two components of "the coefficient" - the ridership and the fuel efficiency. So we don't want to double-count the people, right? Why isn't the fuel efficiency Wh/km or Wh/vkm to be even for explicit?

@Abby-Wheelis and @jpfleischer

@Abby-Wheelis
Copy link
Member

Here is a rudimentary version of a flowchart that compares the two methods.

Good start! I think the CHEER method is incomplete here - where is the energy use?

@Abby-Wheelis
Copy link
Member

I thought that the point of having the two maps was to show the two components of "the coefficient" - the ridership and the fuel efficiency. So we don't want to double-count the people, right? Why isn't the fuel efficiency Wh/km or Wh/vkm to be even for explicit?

I didn't think about it this way, but you're right, it would be better to not double count the people and to show the two components of the coefficient separately

@jpfleischer
Copy link
Contributor Author

Here are the Wh/km figures.

Click to view Wh/km figures

image
image
image

@shankari
Copy link
Contributor

This is also not a correct representation of the Dashboard (2020) approach, which did use electrified fuels for the train and the e-bike
#168 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Issues being worked on
Development

No branches or pull requests

3 participants