Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Callback Error Updating store-trips.data - TypeError: 'float' object is not subscriptable fixed #121

Merged
21 changes: 20 additions & 1 deletion utils/db_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,26 @@ def query_confirmed_trips(start_date: str, end_date: str, tz: str):
# Add primary modes from the sensed, inferred and ble summaries. Note that we do this
# **before** filtering the `all_trip_columns` because the
# *_section_summary columns are not currently valid
get_max_mode_from_summary = lambda md: max(md["distance"], key=md["distance"].get) if len(md["distance"]) > 0 else "INVALID"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same functionality that I had to update in #141. Your checks are much more thorough than mine, so I'll go ahead and update to match this, but I set the result to UNKNOWN instead of INVALID, and I'm wondering if we should do the same thing in both places? I chose UNKNOWN since that is what trips where we were unable to sense a mode for most of the distance are displayed as, what purpose does INVALID serve here? Or maybe, what is the difference between an INVALID sensed mode and an UNKNOWN sensed mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Abby-Wheelis I am not entirely sure what INVALID would represent here as opposed to unknown. I believe @shankari was the one who originally wrote this line of code that I refactored. I can change it to UNKNOWN to match #141

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TeachMeTW can you look at the commit where I wrote that code and see if I added in an explanation of why my choice? I have some vague recollection of the decision making, but the comments I wrote at the time are likely to be more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shankari @Abby-Wheelis based on existing comments, this is how I interpreted it.

INVALID: used to signify that the data or result is not just unknown but invalid. It implies that something went wrong or the data provided does not meet the expected format or criteria. It's a more explicit signal that the data is unusable.

UNKNOWN: indicates that the mode is not known, but does not necessarily imply that there was an error or problem with the data. It often means that the information is not available or could not be determined but is expected to be a valid state to encounter.

Should we keep both Invalid and unknown or just use one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep both, thank you for the explanation! Hopefully once we figure out #1088 and resolve the underlying data issues there will be less INVALID sense modes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shankari if possible please resolve the issue if no further reviews/changes are needed


# Check if 'md' is not a dictionary or does not contain the key 'distance'
# or if 'md["distance"]' is not a dictionary.
# If any of these conditions are true, return "INVALID".
get_max_mode_from_summary = lambda md: (
"INVALID"
if not isinstance(md, dict)
or "distance" not in md
or not isinstance(md["distance"], dict)
# If 'md' is a dictionary and 'distance' is a valid key pointing to a dictionary:
else (
# Get the maximum value from 'md["distance"]' using the values of 'md["distance"].get' as the key for 'max'.
# This operation only happens if the length of 'md["distance"]' is greater than 0.
# Otherwise, return "INVALID".
max(md["distance"], key=md["distance"].get)
if len(md["distance"]) > 0
else "INVALID"
)
)

df["data.primary_sensed_mode"] = df.cleaned_section_summary.apply(get_max_mode_from_summary)
df["data.primary_predicted_mode"] = df.inferred_section_summary.apply(get_max_mode_from_summary)
if 'ble_sensed_summary' in df.columns:
Expand Down