DAP: Leading Indicators #20

kateharwood · 2020-11-25T20:49:04Z

Stakeholder: @ryantibs
Advisor(s): @sangwon-hyun @benjaminysmith
Analyst: @kateharwood @vshankar1

DAP Deliverable:
Analysis should cover following indicators:
Potential leading indicators for COVID cases: Facebook % CLI, Facebook % CLI-in-community, Doctor Visits, Google Health Trends (replace with Google Symptoms signal as soon as it's available, coordinate with Nat), and one of the SafeGraph signals (mobility, or something more interesting like trips to bars/restaurants, if that's available)
Potential leading indicators for COVID deaths: Hospital Admissions

Analysis should be at the county level whenever possible. Be careful not to get misled by counties that have high case rates but low absolute case counts (pre-screening to counties with some min number, like 500, is often helpful). If counties aren't available (or too sparse), then metro areas are probably a good backup.

Analysis at the state level would also be interesting, but clearly secondary. This would be "easy" and "clean" compared to counties (no pre-screening should be needed for states). But intuitively the signals at the state level are integrating activity across so many different sublocations that the leading relationship between an indicator and a target seems more complicated.

Analysis should cover at least two time periods: covering the surge in cases in late June, and the recent one in late September.

Timeline:
By mid-December
First, label times of sharp increase, for each location. This can be done programmatically, by e.g., estimating a derivative and then performing some kind of thresholding; it could also be done by just measuring % change in the signal; it could also be done by hand. Choose a few different ways and implement them.

For each labeling scheme under consideration, inspect what time points it labels visually, and be the judge of whether it passes a reasonable enough sanity check in order to carry it forward, for the next steps.

By end of December
For each labeling scheme that passes the sanity check: for each time point at which case rates rise, record whether the signal of interest rose before it (and if so how many days before), or rose after it (and if so how many days after), or didn’t appear to rise at all in a reasonable time buffer around the given time.

Come up with ways of summarizing the statistics computed in the last step, for various signals, and various labeling schemes for the periods of increase. At least, the summaries should include recall and precision (but not be limited to those).

Re-inspect some visual examples based on what you find in the last two steps. For example, if we get particularly good/bad performance with respect to some labeling scheme (or for some cross-section of counties or times, etc.) then look through those plots.

Any lessons to be learned about why we get leading behavior in some counties/times and not in others? (This one is a stretch goal.)

kateharwood added the DAP Data Analysis Project label Nov 25, 2020

kateharwood assigned kateharwood and vshankar1 Nov 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAP: Leading Indicators #20

DAP: Leading Indicators #20

kateharwood commented Nov 25, 2020 •

edited

Loading

DAP: Leading Indicators #20

DAP: Leading Indicators #20

Comments

kateharwood commented Nov 25, 2020 • edited Loading

kateharwood commented Nov 25, 2020 •

edited

Loading