-
Notifications
You must be signed in to change notification settings - Fork 9
/
utility_summary_public_geo.Rmd
95 lines (80 loc) · 3.37 KB
/
utility_summary_public_geo.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
latex_engine: xelatex
geometry: top = 1cm, bottom=1cm, left=1.5cm, right=2cm
output:
pdf_document: default
html_document:
df_print: paged
word_document: default
tables:
style: Table
layout: autofit
width: 1.0
caption:
style: Table Caption
pre: 'Table '
sep: ': '
conditional:
first_row: true
first_column: false
last_row: false
last_column: false
no_hband: false
no_vband: true
---
\pagenumbering{gobble}
```{r setup, include=FALSE}
source("../R/functions.R")
library(arrow)
library(flextable)
data_dir = "../data/raw"
file_name <- list.files(data_dir, "public_county_geography")
detailed_file_name = paste(data_dir,"/",file_name,sep="")
#change me to whatever version you want to display
dataset_version <- gsub("public_county_geography_","", file_name)
location_quasi_identifiers = c("res_state","res_county")
quasi_identifiers = c("case_month", location_quasi_identifiers, "age_group","sex","race","ethnicity","death_yn")
data <- read_parquet(detailed_file_name, as_data_frame = TRUE)
quick <- quick_summary(data, label="all_fields", qis=quasi_identifiers, print=FALSE)
utility <- summarize_utility(data, quasi_identifiers, print=FALSE)
knitr::opts_chunk$set(echo = FALSE)
set_flextable_defaults(
fonts_ignore=TRUE
)
```
# [COVID-19 Case Surveillance Public Use Data with Geography](https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data-with-Ge/n8mc-b4w4) Utility Summary
Users should consider the level of completeness, including suppression levels when planning their analyses and use of public datasets. Privacy protections will suppress field values to reduce reidentification risks. Completeness varies by jurisdiction (i.e., state, local, and territorial) and time period. Variables are consistently coded to the value "Unknown" when jurisdictions specify in the case data submitted to CDC that the value is unknown, the value "Missing" when jurisdictions do not provide a value, and the value "NA" when the value is suppressed as part of privacy protections.
Dataset version: `r dataset_version`
Total records in dataset: `r commas(nrow(data))`
## Quick Summary
```{r ft.align="left"}
quick_df <- data.frame(cbind(" "=row.names(quick),quick))
colnames(quick_df) <- c(" ",colnames(quick))
ft1 <- theme_zebra(flextable(quick_df), odd_header="transparent", odd_body="#dce4f4",even_body="transparent")
ft1 <- align(ft1, j=c("all_fields","quasi_fields"), align="right")
ft1 <- line_spacing(ft1, space=1, part="all")
ft1 <- fontsize(ft1, size=8, part="all")
ft1 <- padding(ft1, padding.top=0, part="all")
ft1 <- hrule(ft1, rule="atleast", part="all")
ft1 <- height_all(ft1, 2, part="all")
ft1 <- hline(ft1, part="all")
ft1 <- vline(ft1, part="all")
ft1 <- border_outer(ft1, part="all")
#ft1 <- autofit(ft1, add_h=100)
ft1 <- autofit(ft1)
ft1
```
## Field-level Utility Summary
```{r ft.align="left"}
ft2 <- theme_zebra(flextable(cbind(" "=row.names(utility),utility)), odd_header="transparent", odd_body="#dce4f4",even_body="transparent")
ft2 <- autofit(ft2)
ft2 <- align(ft2, j=c("suppressed","suppressed_percent","missing","missing_percent"), align="right")
#ft2 <- height(ft2, height=1)
#ft2 <- hrule(ft2, rule="auto")
ft2 <- line_spacing(ft2, space=1, part="all")
ft2 <- fontsize(ft2, size=8, part="all")
ft2 <- hline(ft2, part="all")
ft2 <- vline(ft2, part="all")
ft2 <- border_outer(ft2, part="all")
ft2
```