-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Survey Responses to Public Dashboard #124
Conversation
notebook currently uses the survey_info part of the config uses config to get the surveys, and use the xlsx files to translate between data and readable labels generates a plot for every question present in the data hopefully will work with multiple surveys, but only tested with 1 so far
starting to read via xml instead of spreadsheet, more support across languages
there was a bug (duplicate code) in how I was creating the dictionaries
We only want to display question that are still in the survey, as those are the questions we will actually have a response for
added code to the index.html file to fetch surveys, add each of the options for the survey questions, and display the charts by default on the dashboard
use labels except for likert, use values
uncovered a few bugs
These will never have answers, so no need to chart them. Example was "Please rate the following statements: "
all question names begin with a capital letter, so we can safely drop all columns that begin lowercase had to change from a hardcoded list because different survey data had different extra columns, and we need this to be as general as possible to prevent extra maintanence in the future
now properly showing the debug df and the alt text on all charts - including the ones that don't have enough data
before, I was dropping all columns that were not survey questions no longer needed because we are generating charts directly from the list of questions (not the data) - this also matches better with the frontend!
Filter off the "input" type questions -- such as "other, please specify" so that we could choose the best way to display them later
In order to properly generate the quality text, we need to know how many trips a particular survey was presented for To accomplish this, we can use the strings in the config that determine what survey to show the user (or a python version thereof) We need to use the composite trips to have the sections, and perform some other data manipulations in order to have this work properly Next is finding all of the responses that are actually present for each survey ... and associating that with each question ...
Each survey will have a different "denominator" of quality text which should be presented to the user to represent what percentage of total eligible trips have survey responses using "showsIfPy" as a backup for strings that are too complicated, changing access notation for dictionaries in e-mission/nrel-openpath-deploy-configs#88, and replacing && and ! allows us to evaluate the strings in python needed to track the survey names in order to know which denominator to use, so turned the list of urls into a dictionary instead of a list, and turned the dictionary of questions into a dictionary of dictionaries, one for each question
if there is only one survey, it isn't conditional, so the denominator is all composite trips
when preparing to evaluate the string, I had pulled individual values as a workaround using row.to_dict() instead is more generalizable
editing this filtering process to explicitly pass the eval string rather than rely on the fact the function shares scope with where it is called - for clarity
if there is no data - we catch and create empty dataframes
removed empty cell and changed "None" inputs to dummy values
removing these from the html in the same way I did in the notebook -- maybe we would add these back and a different chart type later
polishing the way quality text is handled
Even though there are no labels, we can still display charts with sensed mode and other generic metrics
emulating the comment for the study case
similar to the mode-specific notebooks, we ONLY want to run this notebook if the trip labels are surveys (ENKETO)
make stacked bar plots larger, add the surveys to the list of stacked metrics so they can have "more info" buttons
Starting to test ... most of the baseline charts are failing to generate ... going back to the notebooks to try and reproduce ... Also failing in the notebooks ... in Actually, to answer my own question, maybe not - the sensed mode is different (bluetooth), so we may need a workaround to introduce it ... but we do have the data (see below) - we'll want to update this to be the matched mode, but maybe that can be a next step So maybe it's the "labels"? We have user input but not traditional labels ...
So, there is an issue with all the charts that expect labeled data not being able to plot the sensed data because of an error with the labeled data. We have a couple choices here:
If I can do it quickly, I think just debugging is the best plan, given that the bluetooth sensed mode is also buggy (why we took it out of the quality text for now) - else just cut all broken charts for the moment being. @shankari what do you think? |
@Abby-Wheelis it might be more meaningful to create a separate notebook for the fleet case. We won't have mode_confirm, and also the sensed mode for the BLE is stored separate ( To plot the `ble_sensed_mode,
Yes, as in, you would need to compute the primary_ble_sensed_mode like we compute the primary mode now, but then you can just use it in the same way. We may want to also group the bars differently for the fleet case - since we won't have labeled versus confirmed, we may want to do trip count versus distance. So basically only one figure but with two bars. |
removed stale lines that were breaking notebook syntax!
I think I could handle the
I think this would work, we might face duplicating or losing some of the charts, though:
That's fairly minimal loss and fairly minimal duplication, that feels OK to me, will proceed with this |
We can retain trips under 80% and land trips as two more figures with count and distance. I am not sure they are as meaningful for the fleet case, but might be for the other survey cases (like washington commons) |
create a dedicated notebook for handling metrics on survey deployments currently option to show the "bluetooth sensed" mode is commented out -- this just shows a lot of unkowns, would this one day be the exact vehicles and how much they are used? or just car/ecar?
naming the charts from the specific notebook so they don't get overwritten by the other nbs with a _survey suffix
errors encountered while testing
Ah, it was the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to fix a couple of the issues and then merge. Let's fix the other issues in a cleanup PR.
@@ -69,6 +69,16 @@ def load_all_participant_trips(program, tq, load_test_users): | |||
disp.display(participant_ct_df.head()) | |||
return participant_ct_df | |||
|
|||
def filter_composite_trips(all_comp_trips, program, load_test_users): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you have removed load_composite_trips
, you can also remove filter_composite_trips
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that filter_composite_trips
is not identical to filter_labeled_trips
because it doesn't filter by blank user input. However, I don't understand why you are not filtering by blank user input. It should be possible to just call load_all_participant_trips
instead of load_all_confirmed_trips
followed by filter_composite_trips
to achieve the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I don't understand why you are not filtering by blank user input.
I do, later, but I need all of the trips in order to create the "all trips for which a survey was prompted" information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some changes to the data loading (including pushing it to scaffolding) in #135 - I have the filtered, unfiltered, and file suffix all returned from a function in scaffolding now - I don't have the debug_df or quality_text included since it varies chart to chart
"tq = scaffolding.get_time_query(year, month)\n", | ||
"all_confirmed_trips = scaffolding.load_all_confirmed_trips(tq)\n", | ||
"#we need to filter out trips (based on if including test users)\n", | ||
"all_confirmed_trips = scaffolding.filter_composite_trips(all_confirmed_trips, program, include_test_users)\n", | ||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should all go into scaffolding (similar to load_viz_notebook_sensor_inference_data
)
"#merge any cols with the same name into 1 col -- should have different values in their survey_name col\n", | ||
"#https://stackoverflow.com/questions/24390645/python-pandas-merge-samed-name-columns-in-a-dataframe\n", | ||
"def sjoin(x): return ';'.join(x[x.notnull()].astype(str))\n", | ||
"df_responses = df_responses.groupby(level=0, axis=1).apply(lambda x: x.apply(sjoin, axis=1))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for? If the responses are in different surveys, should they be combined into a single column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this was in #124 (comment) but I don't fully understand why the column is duplicated. Shouldn't the survey id be part of the column name as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands, the survey id is not part of the column name - I could refactor to include it as a part of the column name and update the way that I translate the column names to be the chart titles
They should end up in different charts since we group and filter by the survey_name
, which is its own column
We do not need to manually install em-common since it was already installed in the base server image as part of e-mission/e-mission-server#965 and e-mission/e-mission-server@d33ce2e So we removed it in 334c95b But we do need to bump up the base image to include it
- use `.get` to check whether this is an enketo survey so that it works for older deployments that predate the `survey_info` functionality as well. This has no functional difference since, if there is no `survey_info`, this is not a survey. But we get a better exception and avoid confusion later. - improve error handling for survey_responses, by splitting the attribute and name errors from the more complex errors
What dataset were you using? 336 trips is a lot less than the 990 I have in the May 8th dataset, it it's old enough I would expect only to see |
Starting to add the survey responses to the public dashboard - want to allow for any survey configuration, including multiple conditional surveys, see discussion in the issue