Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate e-mission-common's CO2 footprint and energy emission calculation into Public Dashboard #146

Open
iantei opened this issue Aug 23, 2024 · 27 comments

Comments

@iantei
Copy link
Contributor

iantei commented Aug 23, 2024

  • Currently, Custom label make use of [label_options](https://github.com/e-mission/nrel-openpath-deploy-configs/tree/main/label_options) to extract the CO2 emission calculations while there is no energy emission available.
  • Use the e-mission-common to extract the CO2 footprint and energy emission calculation.
@shankari
Copy link
Contributor

I am not sure what you mean by "there is no energy emission available". We do in fact compute the energy consumed in the public dashboard.

@iantei
Copy link
Contributor Author

iantei commented Aug 23, 2024

Yes, we compute the energy consumed in the public dashboard, but we don't display the energy consumption for ones which have custom label.

We have "Timeseries of energy" metric available for study/program like nrel-commute which uses default labels, i.e. does not have label_options. However, for the study/program like usaid-loas-ev-openpath which uses custom labels, we are not enlisting the "Timeseries of energy" metric since we just have richMode {"value":"walk", "baseMode":"WALKING", "met_equivalent":"WALKING", "kgCo2PerKm": 0}, which does not have information about energy calculation in kWH.

@iantei
Copy link
Contributor Author

iantei commented Aug 23, 2024

The current computation of footprint i.e. CO2 and energy emission in the public dashboard makes use of distance parameter.
While the computation of footprint in e-mission-common requires trip as a parameter calc_footprint_for_trip(trip, mode_label_option) source code. I am trying to understand how can we pass trip as a parameter instead of distance which is a column in the dataframe.

def CO2_footprint_default(df, distance, col):
    """ Inputs:
    df = dataframe with data
    distance = distance in miles
    col = Replaced_mode or Mode_confirm
    """

    conversion_lb_to_kilogram = 0.453592 # 1 lb = 0.453592 kg

    conditions_col = [(df[col+'_fuel'] =='gasoline'),
                       (df[col+'_fuel'] == 'diesel'),
                       (df[col+'_fuel'] == 'electric')]
    gasoline_col = (df[distance]*df['ei_'+col]*0.000001)* df['CO2_'+col]
    diesel_col   = (df[distance]*df['ei_'+col]*0.000001)* df['CO2_'+col]
    electric_col = (((df[distance]*df['ei_'+col])+df['ei_trip_'+col])*0.001)*df['CO2_'+col]

    values_col = [gasoline_col,diesel_col,electric_col]
    df[col+'_lb_CO2'] = np.select(conditions_col, values_col)
    df[col+'_kg_CO2'] = df[col+'_lb_CO2'] * conversion_lb_to_kilogram
    return df

For the default label mapping, we are dependent on the energy_intensity.csv and mode_labels.csv - which does not have the required second parameter for baseMode. Since https://github.com/JGreenlee/e-mission-common/blob/master/src/emcommon/resources/label-options.default.json is added into the e-mission-common repo, would it be a good idea to use this label-option even when label-option is not specified for the program/study in the config file?

@iantei
Copy link
Contributor Author

iantei commented Aug 23, 2024

We have trip information available in the column of the data frame.

Maybe we can create a dictionary in the required parameter format, and pass into e-mission-common for footprint calculations. Sample trip format from the test_footprint_calculations

        fake_trip = {
            'distance': 10000,
            'start_fmt_time': '2022-01-01',
            'start_loc': {'coordinates': [-74.006, 40.7128]}
        }

@iantei
Copy link
Contributor Author

iantei commented Aug 25, 2024

Trying to integrate emcommon.metrics.footprint.footprint_calculations with the following changes in environment26.dashboard.additions.yml fiel

...
dependencies:
- pip:
  ...
  - git+https://github.com/JGreenlee/e-mission-common@master

Got the below issue -

---> 73 async def get_egrid_region(coords: list[float, float], year: int) -> str | None:
     74     """
     75     Get the eGRID region at the given coordinates in the year.
     76     """
     77     if year < 2018:

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

This is likely due to the support for Python 3.10 used, which dashboard still uses Python 3.9.

And while trying to use the [email protected], got the following error -

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/emcommon/metrics/footprint/footprint_calculations.py:63, in calc_footprint_for_trip(trip, mode_label_option)
     61 mode_footprint = rich_mode['footprint']
     62 if 'transit' in mode_footprint:
---> 63   mode_footprint = get_mode_footprint_for_transit(trip, mode_footprint['transit'])
     64 kwh_total = 0
     65 kg_co2_total = 0

NameError: name 'get_mode_footprint_for_transit' is not defined

This is strange because I assigned previous tag i.e. 0.5.5, which still has the function defined as get_mode_footprint_for_transit() while the master makes use of get_transit_intensities_for_trip()

While this gets fixed, I will explore how to get access to the trip data and baseMode, which are the required parameter of the function calc_footprint_for_trip.

@iantei
Copy link
Contributor Author

iantei commented Aug 26, 2024

@JGreenlee
Instead of using the @master tag for the e-mission-common, I approached to use : git+https://github.com/louisg1337/e-mission-common@master which resolved the issue of TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

I incorporated the following code changes from https://github.com/JGreenlee/e-mission-common/blob/master/test/metrics/test_footprint_calculations.py test in my Jupyter notebook:

fake_trip = {
'distance': 10000,
'start_fmt_time': '2022-01-01',
'start_loc': {'coordinates': [-74.006, 40.7128]}
}
fake_mode = {'base_mode': 'BUS'}
footprint_energy, footprint_co2 = await emffc.calc_footprint_for_trip(fake_trip, fake_mode)

I am getting the below issue -

get_transit_intensities_for_uace(year, uace, modes, metadata):
     ---> 43 actual_year = intensities_data['metadata']['year']

TypeError: 'NoneType' object is not subscriptable

It seems to lookup for data in previous year than 2022, and eventually fails after reaching to 2018.
Is there any issue with my approach, or should there be better error handling on the calculations side?

Details of the issue:

DEBUG:root:Getting footprint for trip: {'distance': 10000, 'start_fmt_time': '2022-01-01', 'start_loc': {'coordinates': [-74.006, 40.7128]}}, with mode option: {'base_mode': 'BUS'}
DEBUG:root:Getting rich mode for label_option: {'base_mode': 'BUS'}
DEBUG:root:Rich mode: {'icon': 'bus-side', 'color': '#9240a4', 'met': {'ALL': {'range': [0, inf]}}, 'footprint': {'transit': ['MB', 'RB', 'CB']}}
DEBUG:root:Getting mode footprint for transit modes ['MB', 'RB', 'CB'] in trip: {'distance': 10000, 'start_fmt_time': '2022-01-01', 'start_loc': {'coordinates': [-74.006, 40.7128]}}
DEBUG:root:Getting mode footprint for transit modes ['MB', 'RB', 'CB'] in year 2022 and coords [-74.006, 40.7128]
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): geocoding.geo.census.gov:443
DEBUG:urllib3.connectionpool:https://geocoding.geo.census.gov:443 "GET /geocoder/geographies/coordinates?x=-74.006&y=40.7128&benchmark=Public_AR_Current&vintage=Census2020_Current&layers=87&format=json HTTP/1.1" 200 4978
DEBUG:root:Getting mode footprint for transit modes ['MB', 'RB', 'CB'] in year 2022 and UACE 63217
WARNING:root:ntd data not available for 2022. Trying 2021.
WARNING:root:ntd data not available for 2021. Trying 2020.
WARNING:root:ntd data not available for 2020. Trying 2019.
WARNING:root:ntd data not available for 2019. Trying 2018.
ERROR:root:eGRID lookup failed for 2018.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 8
      2 fake_trip = {
      3 'distance': 10000,
      4 'start_fmt_time': '2022-01-01',
      5 'start_loc': {'coordinates': [-74.006, 40.7128]}
      6 }
      7 fake_mode = {'base_mode': 'BUS'}
----> 8 footprint_energy, footprint_co2 = await emffc.calc_footprint_for_trip(fake_trip, fake_mode)
      9 print(f"\n {footprint_energy}, {footprint_co2} \n")

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/emcommon/metrics/footprint/footprint_calculations.py:44, in calc_footprint_for_trip(trip, mode_label_option)
     42 mode_footprint = dict(rich_mode['footprint'])
     43 if 'transit' in mode_footprint:
---> 44     (mode_footprint, transit_metadata) = await emcmft.get_transit_intensities_for_trip(trip, mode_footprint['transit'])
     45     merge_metadatas(metadata, transit_metadata)
     46 kwh_total = 0

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/emcommon/metrics/footprint/transit.py:22, in get_transit_intensities_for_trip(trip, modes)
     20 year = util.year_of_trip(trip)
     21 coords = trip["start_loc"]["coordinates"]
---> 22 return await get_transit_intensities_for_coords(year, coords, modes)

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/emcommon/metrics/footprint/transit.py:30, in get_transit_intensities_for_coords(year, coords, modes, metadata)
     28 metadata.update({'requested_coords': coords})
     29 uace_code = await util.get_uace_by_coords(coords, year)
---> 30 return await get_transit_intensities_for_uace(year, uace_code, modes, metadata)

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/emcommon/metrics/footprint/transit.py:43, in get_transit_intensities_for_uace(year, uace, modes, metadata)
     40 Log.debug(
     41     f"Getting mode footprint for transit modes {modes} in year {year} and UACE {uace}")
     42 intensities_data = await util.get_intensities_data(year, 'ntd')
---> 43 actual_year = intensities_data['metadata']['year']
     44 metadata.update({
     45     "data_sources": [f"ntd{actual_year}"],
     46     "data_source_urls": intensities_data['metadata']['data_source_urls'],
   (...)
     51     "ntd_ids": [],
     52 })
     54 total_upt = 0

TypeError: 'NoneType' object is not subscriptable

@JGreenlee
Copy link
Contributor

I have located the issue. It is because only *.py files are being included when emcommon is bundled as a package. Therefore, the resources folder and all its .json files are missing.

I think I need to adjust the pyproject.toml

@iantei
Copy link
Contributor Author

iantei commented Aug 27, 2024

Update:

The calc_footprint_for_trip(trip, mode) is an async function.

Tried approaches to call this sync function:

  • Called await calc_footprint_for_trip(trip, mode) directly from the Jupyter notebook, which works perfectly fine.

  • We use footprint calculation in energy_calculations.ipynb notebook. This has a function add_energy_impact() in scaffolding.py , which is synchronous function. We need to make call for calc_footprint_for_trip(trip, mode) from here.

    • We can't use await calc_footprint_for_trip(trip, mode) from within the add_energy_impact() because it gives an error of await only allowed within _async_ function
    • We can't use asyncio.run(calc_footprint_for_trip(trip, mode)) because it gives an error - asyncio.run() cannot. be called from a running event loop.
    • Well, I changed the add_energy_impact() function to async and used await to call both the add_energy_impact() from Jupyter notebook, and await to call calc_footprint_for_trip(trip, mode) from calc_footprint_for_trip() function. This way we can call the async function calc_footprint_for_trip(trip, mode).
      Is there any concern with this approach?

@iantei
Copy link
Contributor Author

iantei commented Aug 27, 2024

As discussed, changing add_energy_impact() to async function makes it convenient to use await to make call from energy_calculations notebook. And this approach looks good.
Next thing, I want to explore how to figure out the baseMode associated with the particular mode of commute.

@iantei
Copy link
Contributor Author

iantei commented Aug 28, 2024

We currently have baseMode only available for list of Mode, and not Replaced Mode. However, when we are computing the energy and CO2 footprint, we are calculating the energy impact with df['Energy_Impact(kWH)'] = round((df['Replaced_mode_EI(kWH)'] - df['Mode_confirm_EI(kWH)']),3), likewise with CO2_Impact.
Even though the list of keys in Mode and Replaced Mode are identical, that's not always the case.
Therefore, we need baseMode also available for Replaced Mode so that we can compute Energy_Impact and CO2_Impact for Replaced Mode too.

@JGreenlee
Copy link
Contributor

Even though the list of keys in Mode and Replaced Mode are identical, that's not always the case.

In what instances are there a Replaced Mode that does not have a Mode by the same key?

I thought that Replaced Modes were always a subset of Modes

@iantei
Copy link
Contributor Author

iantei commented Aug 28, 2024

In what instances are there a Replaced Mode that does not have a Mode by the same key?
I thought that Replaced Modes were always a subset of Modes

You're correct! There is only a Replaced Mode - No_travel which is different from the list of Mode. No_travel does not need computation of footprint. This should be fine.

@Abby-Wheelis
Copy link
Member

One idea to cut down on the wait times to map from mode_confirm to baseMode : To get the mapping from mode_confirm to baseMode we could extract the mapping once (get the unique mode_confirm list and generate a local mapping) and then we can use the local mapping to apply to the whole dataframe synchronously, and are only waiting on the call to emcommon once for each mode_confirm not once for every row (could be 1000s)

@iantei
Copy link
Contributor Author

iantei commented Sep 12, 2024

@Abby-Wheelis I think you'd posted a discussion note here. I am unable to see it.

@Abby-Wheelis
Copy link
Member

re-writing from memory since GitHub seems to have eaten what I wrote yesterday, @iantei feel free to add if you remember any additional points

There are two general approaches that we could take here:

  1. use the list of trips
  • might be easier to pass to the function since the format is what is expected
  • still need to extract the mode and lookup the base mode
  • iterating over a list seems slow
  • still need to convert to a dataframe for the plotting functions and ensure all the same filtering gets applied
  1. use a dataframe of trips
  • would need to massage the data structure into the format expected to pass into function
  • need to extract the mode and look up the base mode (should add base mode as a column, since we want to use it for things like filtering AIR regardless
  • could use something like asyncio.gather() to speed up the iteration while applying the async footprint lookup
  • allows for filtered dataframe to stay the same (just with more info) and be ready for plotting

Both @iantei and I and leaning towards option 2 at this point, but @shankari do you have any additional thoughts?

@Abby-Wheelis
Copy link
Member

some pseudocode for my "local copy of base mode mapping" idea

mapping = {}
for mode in expanded_ct.mode_confirm.unique():
  mapping[mode] = await lookup_base_mode(mode)

which can then be used with .apply() to add the base_mode to the df quickly, and means we only await once per unique mode, and not once per row.

@iantei iantei moved this to Questions for Shankari in OpenPATH Tasks Overview Sep 12, 2024
@shankari
Copy link
Contributor

@iantei and @Abby-Wheelis I think we discussed this in an earlier team meeting. I think we should go with (1).

To address your points:

  • iterating over a list seems slow: as I pointed out, the data is stored in the database as trips, and is read as a list of trip JSON objects in the server code. We already iterate over the list using _to_data_df to create the dataframe. Please see emission/storage/timeseries/builtin_timeseries.py to understand how the interfaces work under the hood. And although apply is a dataframe method, it essentially iterates over the rows under the hood, it is not a highly efficient vectorized operation.
  • still need to convert to a dataframe for the plotting functions and ensure all the same filtering gets applied: I don't see this as a big win either way. Either you have to convert trips -> dataframe -> trips or trips -> dataframe. The second seems strictly better since there are fewer conversions
  • could use something like asyncio.gather() to speed up the iteration while applying the async footprint lookup: I don't see how this applies only to (2). you can perform operations on trips asynchronously as well

What am I missing here?

@Abby-Wheelis
Copy link
Member

Either you have to convert trips -> dataframe -> trips or trips -> dataframe. The second seems strictly better since there are fewer conversions

Particularly for this reason and the other points you made I think it does make more sense to use the list, and after @iantei and I had poked through the server code together earlier this week, I think I see a relatively clear path to doing so. I'll move forward with implementing the data gathering piece while @iantei wraps up the other open PRs (#148, #145, #150), and then plan to pass it back off for the visualization piece!

@Abby-Wheelis
Copy link
Member

Because of some issues with Docker and loading data, I have not been able to make as much progress on this topic as I would like, but I will pick back up tomorrow. I am also having issues importing from e-mission-common.

@JGreenlee
Copy link
Contributor

If you are having issues using e-mission-common I may be able to help. I also have some pending updates to e-mission-common that we should discuss and see how they may be useful here.
Let's see if we can set up a short meeting.

@Abby-Wheelis
Copy link
Member

I think my biggest issue with importing from e-mission-common is that I am using an old version, the version tagged in the server (and therefore what the notebook server uses) is 0.5.3, and the latest is 0.6.0 and much of the work on the footprint was completed between those two releases, so it simply doesn't exist on the version that I'm using. I'm submitting a PR to the server, and will use a workaround in the meantime

@JGreenlee
Copy link
Contributor

This is the method Ananta and I came up with that will allow you to use a newer version of e-mission-common for local development/testing

diff --git a/viz_scripts/docker/environment36.dashboard.additions.yml b/viz_scripts/docker/environment36.dashboard.additions.yml
index 59d26eb..df84f5e 100644
--- a/viz_scripts/docker/environment36.dashboard.additions.yml
+++ b/viz_scripts/docker/environment36.dashboard.additions.yml
@@ -7,3 +7,4 @@ dependencies:
 - pip:
   - nbparameterise==0.6
   - devcron==0.4
+  - git+https://github.com/JGreenlee/[email protected]

There should also be a way to use a local e-mission-common repo with pip install -e <path_to_local_repo> (https://github.com/JGreenlee/e-mission-common?tab=readme-ov-file#dev-workflow-to-test-local-changes-in-other-repos)
This is what I do when I'm testing my changes to e-mission-common, running on e-mission-server.
But it may be more complicated to do that here because it is dockerized

@Abby-Wheelis
Copy link
Member

This is the method Ananta and I came up with that will allow you to use a newer version of e-mission-common for local development/testing

Just added this and am now accessing version 0.6.0!

@Abby-Wheelis
Copy link
Member

I am now able to add the base_mode and footprint to each of the trips relatively easily, however, when I then call to_data_df with the edited list of trips, there are 0 trips in the frame. If I don't edit the list, there are some trips, so I am suspecting that by editing the trips, it is causing to_data_df to recognize them as malformated in some way, I'm just not sure how yet, my next step is to dig through the server code to figure out what format is expected and conform the trips to that format.

@Abby-Wheelis
Copy link
Member

We need to be sure to handle the replaced mode of no travel properly, like walk or regular bike these trips would end up with negative energy and emissions savings

@Abby-Wheelis
Copy link
Member

I am now able to add the base_mode and footprint to each of the trips relatively easily, however, when I then call to_data_df with the edited list of trips, there are 0 trips in the frame. If I don't edit the list, there are some trips, so I am suspecting that by editing the trips, it is causing to_data_df to recognize them as malformated in some way, I'm just not sure how yet

@shankari if you have any ideas on how might be the best way to insert information into the trips before converting them to a dataframe so that the conversion does not fail, that would be great! I have been trying to trace back through to_data_df but I haven't figured out what is wrong with my formatting yet.

@Abby-Wheelis
Copy link
Member

#152 is working enough to generate the old visualizations, in energy_calculations, I still need to address the timeseries, but now that we're over the first few hurdles I think we can start thinking about how we want to display this data.

Our current visualizations show:

  • timeseries of energy
  • timeseries of emissions
  • timeseries of emissions/mile
  • energy impact by replaced mode
  • emissions impact by replaced mode

Would we like to add anything?

  • emissions impact by mode?
  • energy impact by mode?
  • emissions per mile by mode?
  • energy per mile by mode?

Those last two would mostly show our constants ... but could be interesting particularly in cases like transit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Issues being worked on
Development

No branches or pull requests

4 participants