-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patches: nan summaries breaking some deployments, filter land modes on underlying (not display) label #141
Conversation
switching the condition from the display mode to the underlying mode
Looking into what's happening with We should keep looking into why there are so many trips that are not connected to a mode from a BLE beacon |
Testing with |
So... air -> Air is not in our current csv way of translating the labels, so they get grouped as "Other" Great advocacy for getting those labels unified as soon as we can. I think the quickest fix would be to add "air -> Air" to the csv, flagging that we should get the labels unified (I would try to do that here, but I think #148 / #143 needs to be completed first to prevent mode chaos. The we can retire the csvs as soon as possible as a follow-on/clean-up to the unification of colors (which is proving to require more work with labels than we first thought) |
@iantei feel free to chime in on what you think about my approach to add air->Air to the csv for now, with the plan of unifying the labels and filtering on the base mode as soon as feasible. |
I think the approach to add air->Air to the csv would cover up for the default mapping case, which uses the csv for the mapping. An alternative temporary solution, which I can also think about is to modify the |
The code changes look good! Could you please test for the |
@iantei since |
Can we get the breakdown of the "Other" to verify this? At a high level, in a future fix, I think we should consider user labels if they represent a large enough proportion of the aggregate data. So if 40% of the data is "Other" and 90% of the "Other" is "Air", maybe we should just show "Air". But that is going to require a lot of judgement calls - what is "large enough proportion"? what if the text entered by users is subtly different? so let's punt on that for now. |
The labeled trips at least are not air. Labels not in the dynamic labels include - drove_alone, shared_ride, bus, and e_car_shared_ride - all of the data that I have in this dump is from test users, and I'm guessing a lot of it is from before they added their custom labels. The air/HSR trips (39/79,000 miles) must be unlabeled. |
Yup, makes sense. |
So aside from the tabled point of
This seems to be working as expected, and can serve as a patch pending the more principled unification of labels, and usage of base modes (I added to the csv, and we still filter on |
check more carefully for missing summaries
I'm not going to insist on this right now, but in a cleanup change, I think it would be helpful to maybe have a unified layer that handles both dashboards. Let's wait until the carbon calculations are integrated into the public dashboard, though. If the unified layer is small and generic enough, it could even go into the server codebase, so that anybody else working with mongodumps (e.g. TSDC) could use it as well. |
I have tested with marking the entries that are malformated as
I'll collect opinions at our meeting later today! |
They should be marked as |
@iantei @Abby-Wheelis this is not in "Ready for review", but I am deploying the related admin dashboard changes to staging, so am planning to merge and deploy this as well. Please make sure to pull this into future changes as well! |
if v2l_df.index[0][-1].isupper(): | ||
print("Found uppercase last letter") | ||
# This part if a bit tricky | ||
# We could have already had a non-zero other, and it could be small or large | ||
if "OTHER" not in v2l_df.index: | ||
# zero other will end up with misc_count | ||
if misc_count.vals > 0: | ||
v2l_df.loc["OTHER"] = misc_count | ||
elif "OTHER" in small_chunk.index: | ||
# non-zero small other will already be in misc_count | ||
v2l_df.loc["OTHER"] = misc_count | ||
else: | ||
# non-zero large other, will not already be in misc_count | ||
v2l_df.loc["OTHER"] = v2l_df.loc["OTHER"] + misc_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch! I hope this is simplified once we start using the basemode values are remove the CSV
To address #140 by updating the way we filter the data from display mode to underlying mode
What I am now using is
mode_confirm
, which should work for now (all of the label sets we have now using the label "air", but I think using thebaseMode
from the label file, which would beAIR
, but that change will be somewhat more complicated.If I figure out why there are so many
Unknown
ble mode entries indfc-fermata
, I'll fix that in this PR as well.