Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/postprocessing pandas concat error #100

Merged
merged 5 commits into from
Mar 1, 2024

Conversation

MaGering
Copy link
Collaborator

Resolves

INFO - Restoring attributes will overwrite existing attributes.
Traceback (most recent call last):
  File "~/Schreibtisch/oemof-B3/scripts/postprocess.py", line 61, in <module>
    rdp = ResultsDataPackage.from_energysytem(es)
  File "~/anaconda3/envs/oemof-B3_desktop/lib/python3.10/site-packages/oemoflex/model/datapackage.py", line 320, in from_energysytem
    data, rel_paths = cls._get_results(cls, es)
  File "~/anaconda3/envs/oemof-B3_desktop/lib/python3.10/site-packages/oemoflex/model/datapackage.py", line 330, in _get_results
    data_scal, rel_paths_scal = self._get_scalars(self, es)
  File "~/anaconda3/envs/oemof-B3_desktop/lib/python3.10/site-packages/oemoflex/model/datapackage.py", line 405, in _get_scalars
    all_scalars = run_postprocessing(es)
  File "~/anaconda3/envs/oemof-B3_desktop/lib/python3.10/site-packages/oemoflex/model/postprocessing.py", line 809, in run_postprocessing
    all_scalars = map_var_names(all_scalars)
  File "~/anaconda3/envs/oemof-B3_desktop/lib/python3.10/site-packages/oemoflex/model/postprocessing.py", line 585, in map_var_names
    scalars.index = scalars.index.map(map_index)
  File "~/anaconda3/envs/oemof-B3_desktop/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6491, in map
    new_values = self._map_values(mapper, na_action=na_action)
  File "~/anaconda3/envs/oemof-B3_desktop/lib/python3.10/site-packages/pandas/core/base.py", line 921, in _map_values
    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
  File "~/anaconda3/envs/oemof-B3_desktop/lib/python3.10/site-packages/pandas/core/algorithms.py", line 1743, in map_array
    return lib.map_infer(values, mapper, convert=convert)
  File "lib.pyx", line 2972, in pandas._libs.lib.map_infer
  File "~/anaconda3/envs/oemof-B3_desktop/lib/python3.10/site-packages/oemoflex/model/postprocessing.py", line 565, in map_index
    component_id = get_component_id(id)
  File "~/anaconda3/envs/oemof-B3_desktop/lib/python3.10/site-packages/oemoflex/model/postprocessing.py", line 522, in get_component_id
    component_id = get_component_id_in_tuple((id[0], id[1]))
  File "~/anaconda3/envs/oemof-B3_desktop/lib/python3.10/site-packages/oemoflex/model/postprocessing.py", line 170, in get_component_id_in_tuple
    return component_id
UnboundLocalError: local variable 'component_id' referenced before assignment

This errors originates here with pandas==2.2.1. With pandas==2.0.3 it still worked.

The two Series invested_capacity and invested_storage_capacity have a TimeStamp as column name and hence in 'name' which mixes up the index order in the Dataframe created with the concat function.

Replacing it with list(map(lambda series: series.rename('0', inplace=True), all_scalars)) did not make any difference and neither did

    timestamp_variable = pd.to_datetime("2017-01-01 00:00:00")
    all_scalars = pd.concat(all_scalars, axis=0, keys=['source', 'target', 'var_name', '0', 0, 'var_value', timestamp_variable])
    all_scalars = all_scalars.droplevel(0)

So for now, a workaround is implemented where the multi-index is written to a one-level/single-index and then retrieved from the single-index-string (with the multi-index in it) after concatenation.

@MaGering MaGering self-assigned this Feb 28, 2024
@MaGering MaGering added the bug Something isn't working label Feb 28, 2024
@MaGering MaGering added this to the v0.0.3 milestone Feb 28, 2024
@MaGering MaGering requested a review from henhuy February 28, 2024 14:12
Copy link
Collaborator

@henhuy henhuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple request/note.
Rest LGTM

# Todo: To be further investigated

# Index work-around - issues with concat and Multiindex
list(map(lambda series: series.rename("0", inplace=True), all_scalars))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not recommended, as you are abusing list generation to perform actions on a series (see i.e. https://stackoverflow.com/a/5753614/5804947).
Instead you could write:
all_scalars_renamed = [series.rename("0") for series in all_scalars]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or merge this directly with next call:
all_scalars_reindexed = [series.rename("0").reset_index() for series in all_scalars]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! "any seasoned Pythonista will give you hell over it". Don't want to experience that :-D
Adapted with commit a31f733.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I get the following error with a31f733:

INFO - 
Traceback (most recent call last):
  File "~/Schreibtisch/oemof-B3/scripts/plot_scalar_results.py", line 524, in <module>
    scalars = load_scalar_results(scalars_path)
  File "~/Schreibtisch/oemof-B3/scripts/plot_scalar_results.py", line 502, in load_scalar_results
    df = dp.format_header(
  File "~/Schreibtisch/oemof-B3/oemof_b3/tools/data_processing.py", line 103, in format_header
    raise ValueError(f"There are extra columns {extra_colums}")
ValueError: There are extra columns ['0']

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fixed with commit 0fc2956, but the DataFrame all_scalars from the version before the pandas update is now different from the current dataframe. However, the results of the postprocessing seem to be the same. I have compared the scalars.csv files with each other.

@MaGering MaGering requested a review from henhuy February 29, 2024 10:44
Copy link
Collaborator

@henhuy henhuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - as before I did not test it locally.

@MaGering
Copy link
Collaborator Author

MaGering commented Mar 1, 2024

LGTM - as before I did not test it locally.

Great thank you, just tested it in oemof-B3 (again) and in addition against the dev version. Results are same except for minimal changes such as -0 instead of 0 or numbers in the e^-13 range instead of 0, for example, which is probably caused by the new pandas version.

@MaGering MaGering merged commit 1c7e983 into dev Mar 1, 2024
2 checks passed
@MaGering MaGering deleted the fix/postprocessing_pandas_concat_error branch March 1, 2024 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants