Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard Survey Metrics #966

Merged
merged 7 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions emission/net/api/cfc_webapp.py
Original file line number Diff line number Diff line change
Expand Up @@ -328,14 +328,18 @@ def summarize_metrics(time_type):
else:
old_style = True
is_return_aggregate = True

app_config = request.json['app_config'] if 'app_config' in request.json else None

Comment on lines +331 to +333
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, so you are assuming that the client passes in the app config? That sounds fairly bonkers, particularly since the app config is fairly large. We read the app_config directly from the study_config in multiple locations, including in cfc_webapp - why would we not just read in the app_config where it is used (presumably in api/metrics.py) instead of passing in around?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, so you are assuming that the client passes in the app config? That sounds fairly bonkers, particularly since the app config is fairly large.

I agree it's not the best solution. to mitigate the size issue I have the phone sending a partial app_config with only survey_info, since that is what is really needed for the metrics

why would we not just read in the app_config where it is used (presumably in api/metrics.py) instead of passing in around?

If we did this, then every time we query the server, the server has to query github and wait for the config before proceeding, right? I thought this would probably introduce even more latency.

I also didn't think there was a good way to keep the config in memory because I believe cfc_webapp is continuously running and wouldn't receive config updates.

Is there a better way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. The standard way to mitigate the performance issue is through caching. You would read the config once a day, and use the cached version otherwise. That would not necessarily introduce a perceptible performance impact.

However, that may still leave us with a consistency issue. If the phone config has updated, but the server has not yet updated, I suspect we may have issues because the survey names may be inconsistent across them. The standard way to resolve such consistency issues is to send over the version that the phone is using. The server would cache the last several versions and use the version that the phone sends in.

While thinking through this, I just realized that this might be even more complex. For example, if you had surveyNameA, surveyNameB and surveyNameC in May, and switched to surveyNameD, surveyNameE and surveyNameF in October, you should still apply the older version of the config (e.g. use surveyNameA, surveyNameB and surveyNameC when generating metrics for June in November. That will not work with the current approach of passing the config from the phone either.

I will merge this for now to unblock ourselves, and we should say that, after a study has been deployed to real users, we will not support changing the surveys. That is a reasonable restriction to impose, given that changing the survey while the data collection is in progress is not a recommended best-practice from a social science perspective - the results are not necessarily comparable with each other.

And then we should figure out the correct performance optimization vs. complexity tradeoff that we want to be at

time_type_map = {
'timestamp': metrics.summarize_by_timestamp,
'local_date': metrics.summarize_by_local_date
'timestamp': metrics.summarize_by_timestamp, # used by old UI
'local_date': metrics.summarize_by_local_date,
'yyyy_mm_dd': metrics.summarize_by_yyyy_mm_dd # used by new UI
}
metric_fn = time_type_map[time_type]
ret_val = metric_fn(user_uuid,
start_time, end_time,
freq_name, metric_list, is_return_aggregate)
freq_name, metric_list, is_return_aggregate, app_config)
if old_style:
logging.debug("old_style metrics found, returning array of entries instead of array of arrays")
assert(len(metric_list) == 1)
Expand Down
19 changes: 17 additions & 2 deletions emission/net/api/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,31 @@

import emission.analysis.result.metrics.time_grouping as earmt
import emission.analysis.result.metrics.simple_metrics as earms
import emission.storage.decorations.analysis_timeseries_queries as esda
import emission.storage.decorations.local_date_queries as esdl
import emission.storage.timeseries.tcquery as esttc

def summarize_by_timestamp(user_id, start_ts, end_ts, freq, metric_list, include_aggregate):
import emcommon.metrics.metrics_summaries as emcms

def summarize_by_timestamp(user_id, start_ts, end_ts, freq, metric_list, include_aggregate, app_config=None):
return _call_group_fn(earmt.group_by_timestamp, user_id, start_ts, end_ts,
freq, metric_list, include_aggregate)

def summarize_by_local_date(user_id, start_ld, end_ld, freq_name, metric_list, include_aggregate):
def summarize_by_local_date(user_id, start_ld, end_ld, freq_name, metric_list, include_aggregate, app_config=None):
local_freq = earmt.LocalFreq[freq_name]
return _call_group_fn(earmt.group_by_local_date, user_id, start_ld, end_ld,
local_freq, metric_list, include_aggregate)

def summarize_by_yyyy_mm_dd(user_id, start_ymd, end_ymd, freq, metric_list, include_agg, app_config):
time_query = esttc.TimeComponentQuery(
"data.start_local_dt",
esdl.yyyy_mm_dd_to_local_date(start_ymd),
esdl.yyyy_mm_dd_to_local_date(end_ymd)
)
trips = esda.get_entries(esda.COMPOSITE_TRIP_KEY, None, time_query)
print('found ' + str([e for e in trips]))
return emcms.generate_summaries(metric_list, trips, app_config)

def _call_group_fn(group_fn, user_id, start_time, end_time, freq, metric_list, include_aggregate):
summary_fn_list = [earms.get_summary_fn(metric_name)
for metric_name in metric_list]
Expand Down
12 changes: 12 additions & 0 deletions emission/storage/decorations/local_date_queries.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,15 @@ def get_comparison_query(field_prefix, base_ld, limit_ld, units, gt_or_lt):
return { "$or": or_conditions }
else:
return {}

def yyyy_mm_dd_to_local_date(ymd: str) -> ecwl.LocalDate:
return ecwl.LocalDate({
'year': int(ymd[0:4]),
'month': int(ymd[5:7]),
'day': int(ymd[8:10])
})

def get_yyyy_mm_dd_range_query(field_name, start_ymd: str, end_ymd: str) -> dict:
start_local_date = yyyy_mm_dd_to_local_date(start_ymd)
end_local_date = yyyy_mm_dd_to_local_date(end_ymd)
return get_range_query(field_name, start_local_date, end_local_date)
2 changes: 1 addition & 1 deletion setup/environment36.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ dependencies:
- scipy=1.10.0
- utm=0.7.0
- pip:
- git+https://github.com/JGreenlee/e-mission-common@0.4.3
- git+https://github.com/JGreenlee/e-mission-common@0.5.1
- pyfcm==1.5.4
- pygeocoder==1.2.5
- pymongo==4.3.3
Expand Down
Loading