Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAP: Leading Indicators #20

Open
3 tasks
kateharwood opened this issue Nov 25, 2020 · 0 comments
Open
3 tasks

DAP: Leading Indicators #20

kateharwood opened this issue Nov 25, 2020 · 0 comments
Assignees
Labels
DAP Data Analysis Project

Comments

@kateharwood
Copy link
Contributor

kateharwood commented Nov 25, 2020

DAP Deliverable:
Analysis should cover following indicators:
Potential leading indicators for COVID cases: Facebook % CLI, Facebook % CLI-in-community, Doctor Visits, Google Health Trends (replace with Google Symptoms signal as soon as it's available, coordinate with Nat), and one of the SafeGraph signals (mobility, or something more interesting like trips to bars/restaurants, if that's available)
Potential leading indicators for COVID deaths: Hospital Admissions

Analysis should be at the county level whenever possible. Be careful not to get misled by counties that have high case rates but low absolute case counts (pre-screening to counties with some min number, like 500, is often helpful). If counties aren't available (or too sparse), then metro areas are probably a good backup.

Analysis at the state level would also be interesting, but clearly secondary. This would be "easy" and "clean" compared to counties (no pre-screening should be needed for states). But intuitively the signals at the state level are integrating activity across so many different sublocations that the leading relationship between an indicator and a target seems more complicated.

Analysis should cover at least two time periods: covering the surge in cases in late June, and the recent one in late September.

Timeline:
By mid-December
First, label times of sharp increase, for each location. This can be done programmatically, by e.g., estimating a derivative and then performing some kind of thresholding; it could also be done by just measuring % change in the signal; it could also be done by hand. Choose a few different ways and implement them.

For each labeling scheme under consideration, inspect what time points it labels visually, and be the judge of whether it passes a reasonable enough sanity check in order to carry it forward, for the next steps.

By end of December
For each labeling scheme that passes the sanity check: for each time point at which case rates rise, record whether the signal of interest rose before it (and if so how many days before), or rose after it (and if so how many days after), or didn’t appear to rise at all in a reasonable time buffer around the given time.

Come up with ways of summarizing the statistics computed in the last step, for various signals, and various labeling schemes for the periods of increase. At least, the summaries should include recall and precision (but not be limited to those).

Re-inspect some visual examples based on what you find in the last two steps. For example, if we get particularly good/bad performance with respect to some labeling scheme (or for some cross-section of counties or times, etc.) then look through those plots.

Any lessons to be learned about why we get leading behavior in some counties/times and not in others? (This one is a stretch goal.)

@kateharwood kateharwood added the DAP Data Analysis Project label Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DAP Data Analysis Project
Projects
None yet
Development

No branches or pull requests

2 participants