Skip to content

Commit

Permalink
Merge pull request #938 from cmu-delphi/release/v3.2.12
Browse files Browse the repository at this point in the history
Release v3.2.12
  • Loading branch information
melange396 authored Feb 28, 2024
2 parents 8355492 + 43713cb commit dee2694
Show file tree
Hide file tree
Showing 13 changed files with 165 additions and 25 deletions.
2 changes: 1 addition & 1 deletion content/blog/2023-12-20-changepoint_explore.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ authors:
- ajoshi
heroImage: blog-lg-changepoint.jpeg
heroImageThumb: blog-thumb-changepoint.jpeg

summary: |
We use changepoint detection algorithms to analyze Delphi's indicators and classify them as early, on-time, late, undefined, or undetermined.
Expand Down Expand Up @@ -94,3 +93,4 @@ Overall, Changepoint detection is a powerful tool to identify early indicators.

The python notebook used for this analysis can be found on Github [here](https://github.com/TaraLakdawala/covid-changepoint-detection-exploration).

This work was supported by the Centers for Disease Control and Prevention of the U.S. Department of Health and Human Services (HHS) as part of a cooperative agreement funded solely by CDC/HHS under federal award identification number U01IP001121, “Delphi Influenza Forecasting Center of Excellence”; and by CDC funded contract number 75D30123C15907, "Digital Public Health Surveillance for the 21st Century". The contents are those of the authors and do not necessarily represent the official views of, nor an endorsement by, CDC/HHS or the U.S. Government. Additionally, this material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE1745016 and DGE2140739. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
33 changes: 16 additions & 17 deletions content/blog/2023-12-20-changepoint_explore.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,8 @@
- ajoshi
heroImage: blog-lg-changepoint.jpeg
heroImageThumb: blog-thumb-changepoint.jpeg

summary: |
We use changepoint detection algorithms to analyze Delphi's indicators and classify them as early, on-time, late, undefined, or undetermined.

output:
blogdown::html_page:
toc: true
Expand All @@ -39,9 +37,9 @@ <h2>Introduction</h2>
<h2>Establishing National Ground Truth Changepoints</h2>
<p>To establish when the composition of different COVID-19 variants changed, we used three changepoint detection algorithms: <a href="https://centre-borelli.github.io/ruptures-docs/user-guide/detection/pelt/">PELT</a>, <a href="https://centre-borelli.github.io/ruptures-docs/user-guide/detection/binseg/">Binary Segmentation</a>, and <a href="https://centre-borelli.github.io/ruptures-docs/user-guide/detection/bottomup/">Bottom-up Segmentation</a> on Delphi’s national <a href="https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html">JHU-CSSE COVID-19 Cases</a> stream<a href="#fn3" class="footnote-ref" id="fnref3"><sup>3</sup></a>.This stream was the smoothed 7-day average of the number of new confirmed COVID-19 cases (for all variants) in the United States (US). After aggregating the changepoints at a weekly level, we marked the weeks where all three methods agreed for the national cases data stream (Fig. 1). To corroborate these points, we also used the <a href="https://covid.cdc.gov/covid-data-tracker/#variant-proportions">CDC’s COVID Data Tracker</a>, which reports the ratio of different COVID-19 sub-variant variants among the infected US population. We combine the different sub-strains of COVID in the data to represent the three dominant variants: “Original,” “Delta,” and “Omicron.” These changes in variant makeup visually match the ground truth changepoints from the national signal, as seen in Figure 1.</p>
<center>
<div class="float">
<img src="/blog/2023-12-20-changepoint_explore_files/image3.png" alt="Figure 1. Total and Variant Case Counts with variant changepoints on April 24, 2021; June 19, 2021; December 18, 2021; March 5, 2022; June 18, 2022." />
<div class="figcaption"><strong>Figure 1.</strong> Total and Variant Case Counts with variant changepoints on April 24, 2021; June 19, 2021; December 18, 2021; March 5, 2022; June 18, 2022.</div>
<div class="figure">
<img src="/blog/2023-12-20-changepoint_explore_files/image3.png" alt="" />
<p class="caption"><strong>Figure 1.</strong> Total and Variant Case Counts with variant changepoints on April 24, 2021; June 19, 2021; December 18, 2021; March 5, 2022; June 18, 2022.</p>
</div>
</center>
<p> 
Expand All @@ -51,9 +49,9 @@ <h2>Establishing National Ground Truth Changepoints</h2>
<h2>Methods</h2>
<p>Next, we needed to identify how these ground truth changepoints compared to changepoints in indicators (using the same three algorithms). We used all available data streams during our selected time range from Delphi’s <a href="https://cmu-delphi.github.io/delphi-epidata/">COVIDCast database</a>, which corresponds to 60 indicators (including the Facebook survey, Google Search, claims, hospitalization, and mortality indicators), each at the state/territory and national tier. When we aggregated the calculated changepoints per indicator per week, as Fig. 2 shows, we identified that some weeks have many more changepoints than others. To match the number of ground truth changepoints, we filter to the top 5 weeks per indicator (major indicator changepoints) to directly compare with the ground truth changepoints.</p>
<center>
<div class="float">
<img src="/blog/2023-12-20-changepoint_explore_files/image5.png" alt="Figure 2. The aggregated count of changepoints per week in all fifty two regions by all three algorithms for estimated COVID related outpatient doctor visits" />
<div class="figcaption"><strong>Figure 2.</strong> The aggregated count of changepoints per week in all fifty two regions by all three algorithms for estimated COVID related outpatient doctor visits</div>
<div class="figure">
<img src="/blog/2023-12-20-changepoint_explore_files/image5.png" alt="" />
<p class="caption"><strong>Figure 2.</strong> The aggregated count of changepoints per week in all fifty two regions by all three algorithms for estimated COVID related outpatient doctor visits</p>
</div>
</center>
<p>Then, we categorized indicators as early, on-time, or late using the following definitions:</p>
Expand Down Expand Up @@ -117,23 +115,23 @@ <h2>Results and Analysis</h2>
</table>
<p><strong>Early Indicators:</strong> Early indicators are the most important in identifying changing health dynamics. For example, the majority (6/8) of indicators from <a href="https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/google-symptoms.html#symptom-set">Google Symptoms</a> are early indicators, especially indicators related to fevers. In Fig 3. we see one example of Google Symptoms early indicator (which includes searches for Fever, Hyperthermia, Chills, Shivering, and Low grade fever), where the major indicator changepoints lead ground truth changepoints by approximately one week.</p>
<center>
<div class="float">
<img src="/blog/2023-12-20-changepoint_explore_files/image2.png" alt="Figure 3. This early indicator has major indicator changepoints (blue bars) which occur a week or more before the ground truth changepoints (black line). This data is sourced from Google Symptoms and shows search data for keywords “Fever, Hyperthermia, Chills, Shivering, Low grade fever”" />
<div class="figcaption"><strong>Figure 3.</strong> This early indicator has major indicator changepoints (blue bars) which occur a week or more before the ground truth changepoints (black line). This data is sourced from Google Symptoms and shows search data for keywords “Fever, Hyperthermia, Chills, Shivering, Low grade fever”</div>
<div class="figure">
<img src="/blog/2023-12-20-changepoint_explore_files/image2.png" alt="" />
<p class="caption"><strong>Figure 3.</strong> This early indicator has major indicator changepoints (blue bars) which occur a week or more before the ground truth changepoints (black line). This data is sourced from Google Symptoms and shows search data for keywords “Fever, Hyperthermia, Chills, Shivering, Low grade fever”</p>
</div>
</center>
<p><strong>On-time Indicators:</strong> On-time indicators (Fig. 4) most frequently correspond to data involving insurance claims, suspected COVID cases among hospital admissions, and, unsurprisingly, other COVID-19 incidence data reports from JHU-CSSE. Other notable early or on-time indicators include: the number of doctors visits for cases with COVID or COVID-like symptoms, Data Strategy and Execution Workgroup Community Profile Report (CPR) data on the number of people who were fully vaccinated.</p>
<center>
<div class="float">
<img src="/blog/2023-12-20-changepoint_explore_files/image1.png" alt="Figure 4. This on-time indicator has major indicator changepoints (blue bars) which occur within a two week window around the ground truth changepoints (black)." />
<div class="figcaption"><strong>Figure 4.</strong> This on-time indicator has major indicator changepoints (blue bars) which occur within a two week window around the ground truth changepoints (black).</div>
<div class="figure">
<img src="/blog/2023-12-20-changepoint_explore_files/image1.png" alt="" />
<p class="caption"><strong>Figure 4.</strong> This on-time indicator has major indicator changepoints (blue bars) which occur within a two week window around the ground truth changepoints (black).</p>
</div>
</center>
<p><strong>Variable Indicators:</strong> Variable indicators are perhaps the most interesting set of indicators. For most of the available range, the JHU-CSSE new confirmed COVID-19 cases daily indicator was early, but the last changepoint was late. Many other JHU-CSSE indicators follow this pattern. Another example is the Google Symptom Search data related to smell and taste loss, specifically, “Anosmia, Dysgeusia, Ageusia.” In the figure below, we see the relationship between the indicator’s major changepoints and the ground truth change from early in the first ground truth changepoint to late in the middle of the pandemic, and then early again for the rest of the ground truth changepoints.</p>
<center>
<div class="float">
<img src="/blog/2023-12-20-changepoint_explore_files/image4.png" alt="Figure 5. This variable indicator has changepoints (blue bars) which occur early and late relative to the critical changepoints (black) established using CDC data." />
<div class="figcaption"><strong>Figure 5.</strong> This variable indicator has changepoints (blue bars) which occur early and late relative to the critical changepoints (black) established using CDC data.</div>
<div class="figure">
<img src="/blog/2023-12-20-changepoint_explore_files/image4.png" alt="" />
<p class="caption"><strong>Figure 5.</strong> This variable indicator has changepoints (blue bars) which occur early and late relative to the critical changepoints (black) established using CDC data.</p>
</div>
</center>
<p> 
Expand All @@ -144,6 +142,7 @@ <h2>Limitations and Discussion</h2>
<p>Due to state level reporting inconsistencies, we could only analyze 60 out of 79 indicators available in the timespan investigated. Many of these indicators were missing data, like that the number of confirmed COVID-19 patients admitted to all hospitals in the state of Alaska was reported only five times within the 595 day search window. We also recognize that not all regions will be impacted by emerging variants at the same time in the same way and that a further detailed analysis which takes into account different impacts per region is an important avenue for future work.</p>
<p>Overall, Changepoint detection is a powerful tool to identify early indicators. Of Delphi’s sixty indicators, we identified several on time and early indicators of emerging variants from the data available. We also found out that for many of the indicators, the number of days they led or lagged disease phenomena changed over time. Still, if these public health indicators continue to receive high quality data, tracking these indicators closely can help us identify changing health dynamics.</p>
<p>The python notebook used for this analysis can be found on Github <a href="https://github.com/TaraLakdawala/covid-changepoint-detection-exploration">here</a>.</p>
<p>This work was supported by the Centers for Disease Control and Prevention of the U.S. Department of Health and Human Services (HHS) as part of a cooperative agreement funded solely by CDC/HHS under federal award identification number U01IP001121, “Delphi Influenza Forecasting Center of Excellence”; and by CDC funded contract number 75D30123C15907, “Digital Public Health Surveillance for the 21st Century”. The contents are those of the authors and do not necessarily represent the official views of, nor an endorsement by, CDC/HHS or the U.S. Government. Additionally, this material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE1745016 and DGE2140739. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.</p>
</div>
<div class="footnotes footnotes-end-of-document">
<hr />
Expand Down
2 changes: 1 addition & 1 deletion content/blog/2024-01-01-flash-intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ output:
toc: true
---

Delphi publishes millions of public-health-related data points per day, including the total number of daily influenza cases, hospitalizations, and deaths per county and state in the United States (US). This data helps public health practitioners, data professionals, and members of the public make important, informed decisions relating to health and well-being.
Delphi publishes millions of public-health-related data points per day, such as the total number of daily COVID-19 cases, hospitalizations, and deaths per county and state in the United States (US). This data helps public health practitioners, data professionals, and members of the public make important, informed decisions relating to health and well-being.

Yet, as data volumes continue to grow quickly (Delphi's data volume expanded 1000x in just 3 years), it is infeasible for data reviewers to inspect every one of these data points for subtle changes in

Expand Down
8 changes: 4 additions & 4 deletions content/blog/2024-01-01-flash-intro.html
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@
<li>quality (like those resulting from data delays) or</li>
<li>disease dynamics (like an outbreak).</li>
</ul>
<p>These issues, if undetected, can have critical downstream ramifications on data users (as shown by the example in Fig 1).</p>
<div class="float">
<img src="/blog/2024-01-01-flash-intro/forecast.jpg" alt="Fig 1. Data quality changes in case counts, shown by the large spikes in March and July 2022, when cases were trending down, resulted in similar spikes for predicted counts (red) from multiple forecasts that were then sent to the US CDC. A weekly forecast per state, for cases, hospitalizations, and deaths, up to 4 weeks in the future means that modeling teams would have to review 600 forecasts per week and may not have been able to catch the upstream data issue." />
<div class="figcaption">Fig 1. Data quality changes in case counts, shown by the large spikes in March and July 2022, when cases were trending down, resulted in similar spikes for predicted counts (red) from multiple forecasts that were then sent to the US CDC. A weekly forecast per state, for cases, hospitalizations, and deaths, up to 4 weeks in the future means that modeling teams would have to review 600 forecasts per week and may not have been able to catch the upstream data issue.</div>
<p>These issues, if undetected, can have critical downstream ramifications for data users (as shown by the example in Fig 1).</p>
<div class="figure">
<img src="/blog/2024-01-01-flash-intro/forecast.jpg" alt="" />
<p class="caption">Fig 1. Data quality changes in case counts, shown by the large spikes in March and July 2022, when cases were trending down, resulted in similar spikes for predicted counts (red) from multiple forecasts that were then sent to the US CDC. A weekly forecast per state, for cases, hospitalizations, and deaths, up to 4 weeks in the future means that modeling teams would have to review 600 forecasts per week and may not have been able to catch the upstream data issue.</p>
</div>
<p>We care about finding data issues like these so that we can alert downstream data users accordingly. That is why our goal in the FlaSH team (Flagging Anomalies in Streams related to public Health) is to quickly identify data points that warrant human inspection and create tools to support data review. Towards this goal, our team of researchers, engineers, and data reviewers iterate on our deployed interdisciplinary approach. In this blog series, we will cover the different methods and perspectives of the FlaSH project.</p>
<p>Members: Ananya Joshi, Nolan Gormley, Richa Gadgil, Tina Townes  </p>
Expand Down
Loading

0 comments on commit dee2694

Please sign in to comment.