Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update guidance on CT counties and planning regions #431

Open
wants to merge 6 commits into
base: version2025
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
432 changes: 244 additions & 188 deletions geographic-crosswalks/create-county-populations.html

Large diffs are not rendered by default.

65 changes: 33 additions & 32 deletions geographic-crosswalks/create-county-populations.qmd
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
---
title: "Create County Population File"
author: "Aaron R. Williams"
author: "Aaron R. Williams and Manu Alcalá Kovalski"
date: today
abstract: "This script pulls US Census Bureau Population Estimation Program and Decennial Census data to create a list of counties with population estimates for 2014-2023"
format:
format:
html:
toc: true
embed-resources: true
execute:
execute:
message: false
warning: false
editor_options:
editor_options:
chunk_output_type: console
---

@@ -42,18 +42,18 @@ split into Chugach Census Area and Copper River Census Area.
#' @return A data frame with estimate for all US counties
#'
get_pop <- function(year) {

pop <- get_estimates("county", year = year, variables = "POP") |>
mutate(year = year)

return(pop)

}


# pull county population data for each year from the Population Estimates Program
pep_2015_2019 <- map_dfr(
.x = 2015:2019,
.x = 2015:2019,
.f = ~get_pop(year = .x)
)

@@ -110,37 +110,37 @@ download and clean up the data.
```{r}
#| label: get-pop-2021-2023
read_pep <- function(year) {

file <- here("geographic-crosswalks", "data", "raw", paste0("pep", year, ".csv"))
url <-

url <-
paste0("https://www2.census.gov/programs-surveys/popest/datasets/",
"2020-", year, "/counties/totals/co-est", year, "-alldata.csv")

if (!file.exists(file)) {

download.file(
url = url,
destfile = file
)


}

pop_var <- paste0("popestimate", year)

pep <- read_csv(file) |>
rename_with(tolower) |>
filter(sumlev == "050") |>
mutate(year = year) |>
select(year, state, county, population = any_of(pop_var), any_of(c("stname", "ctyname")))

return(pep)

}

pep_2021_2023 <- map_dfr(
.x = 2021:2023,
.x = 2021:2023,
.f = read_pep
)

@@ -185,7 +185,7 @@ final_population <- final_population |>
))
```

## Remove Connecticut Counties
## Switch from Connecticut Counties to Planning Regions

Connecticut stopped using its **eight historical counties** as functional
governmental units in 1960. In 2022, the Census Bureau updated its data
@@ -195,14 +195,16 @@ state into **nine planning regions** that reflect the state’s **nine**
reporting, replacing the old **eight historical counties**. As new census data
products will follow this structure, we will only report data for **planning
regions**. Due to data quality concerns with crosswalking from the old
historical counties to planning regions, we **drop any Connecticut data prior to 2022**.
To learn more about this issue, see the [geographic harmonization guide](geographic-harmonization-guide.qmd#sec-ct-planning-regions).
historical counties to planning regions, we **keep using the old counties from 2014-2021** and **switch to planning regions after and including 2022**
To learn more about this issue, see the [geographic harmonization guide](geographic-harmonization-guide.qmd#sec-ct-planning-regions).

```{r}
#| label: remove-old-ct-counties
#| label: ct-counties-and-planning-regions

final_population |>
filter(state_name == "Connecticut") |>
distinct(year, county_name)

final_population <- final_population |>
filter(!(state_name == "Connecticut" & year < 2022))
```

```{r}
@@ -224,7 +226,7 @@ n_counties_2022_2023 <- get_n_counties(final_population, year >= 2022)
```

```{epoxy}
Removing these **eight counties** leaves us with:
Switching from the **eight counties** to **nine planning regions** leaves us with:

- **{.comma n_counties_2014_2019}** counties in 2014-2019
- **{.comma n_counties_2020_2021}** counties in 2020-2021
@@ -237,7 +239,7 @@ Removing these **eight counties** leaves us with:
final_population |>
count(year) |>
assert(
within_bounds(3134, 3144),
within_bounds(3142, 3144),
n
)
```
@@ -266,7 +268,7 @@ population_test |>

```

We can look at the counties with the largest proportion change.
We can look at the counties with the largest proportion change.

```{r}
population_test |>
@@ -275,7 +277,7 @@ population_test |>

```

We can look at counties with the largest absolute change in population.
We can look at counties with the largest absolute change in population.

```{r}
population_test |>
@@ -319,10 +321,9 @@ final_population |>

## Save Data

The PEP data are reported using the geographies at the time the estimates were generated. This means the 2023 data include Connecticut's planning regions in 2021 even though they didn't exist at that point. Accordingly, we will download earlier data.
The PEP data are reported using the geographies at the time the estimates were generated. This means the 2023 data include Connecticut's planning regions in 2021 even though they didn't exist at that point. Accordingly, we will download earlier data.

```{r}
write_csv(final_population, here("geographic-crosswalks", "data", "county-populations.csv"))

```

346 changes: 194 additions & 152 deletions geographic-crosswalks/geographic-harmonization-guide.html

Large diffs are not rendered by default.

74 changes: 47 additions & 27 deletions geographic-crosswalks/geographic-harmonization-guide.qmd
Original file line number Diff line number Diff line change
@@ -8,9 +8,9 @@ format:
toc_float: true
embed-resources: true
code-fold: show
execute:
execute:
warning: false
editor_options:
editor_options:
chunk_output_type: console
---

@@ -44,8 +44,8 @@ Check out the Census's comphrensive [guide](https://www.census.gov/programs-surv

We have:

- 3,134 counties for 2014-2019
- 3,135 for 2020-2021
- 3,142 counties for 2014-2019
- 3,143 for 2020-2021
- 3,144 counties for 2022-2023

```{r}
@@ -67,7 +67,8 @@ The number of US counties increased in 2022 **due to a new county-equivalent map

Connecticut’s historical eight counties stopped serving as functioning governmental or administrative units in 1960. Currently, Connecticut has nine [Regional Councils of Government](https://portal.ct.gov/OPM/IGPP/ORG/Planning-Regions/Planning-Regions---Overview) (COG) that carry out regional planning and service delivery activities similar to those performed by county-level governments in other states. Planning regions are administrative entities that have have the same boundaries as the state’s COGs.

The Census Bureau used to release data based on Connecticut's eight counties, but starting in 2023, it switched to reporting data for the nine planning regions, following a 2019 request from the state’s Office of Planning and Management. Since new data products will include planning regions, *we should make sure to use them instead of the historical counties moving forward*.
The Census Bureau used to release data based on Connecticut's eight counties, but starting in 2023, it switched to reporting data for the nine planning regions, following a 2019 request from the state’s Office of Planning and Management. Since new data products will include planning regions, *we should make sure to use them instead of the historical counties after and including 2022*.


See the map below for the planning region boundaries, the towns within each planning region, and historical county boundaries (delineated by thick white borders)

@@ -77,7 +78,7 @@ See the map below for the planning region boundaries, the towns within each plan
County-equivalent planning regions will simply be referred to as “counties” in Census Bureau data products, although the geographic units will be labeled with the names of the planning regions instead of counties.
:::

Due to data quality concerns with crosswalking from the old historical counties to planning regions, we **drop any Connecticut data prior to 2022**. In particular, these concerns arise from the fact that the old counties are **not nested** within the new planning regions. Since tract or block-group level data isn't available for most metrics, we can't use a narrower geography to remediate this problem. This is particularly consequential for **non-count** variables. For more information, see [Geographic Crosswalks at Urban](https://ui-research.github.io/code-library/crosswalk-guide/geographic-crosswalks-at-urban.html#what-are-geographic-crosswalks).
Due to data quality concerns with crosswalking from the old historical counties to planning regions, we we will keep using **Connecticut counties for 2014-2021** data and **switch to planning regions for data after and including 2022**. In particular, these concerns arise from the fact that the old counties are **not nested** within the new planning regions. Since tract or block-group level data isn't available for most metrics, we can't use a narrower geography to remediate this problem. This is particularly consequential for **non-count** variables. For more information, see [Geographic Crosswalks at Urban](https://ui-research.github.io/code-library/crosswalk-guide/geographic-crosswalks-at-urban.html#what-are-geographic-crosswalks).

#### Alaska county splits

@@ -87,9 +88,9 @@ In 2019, the Alaskan county of Valdez-Cordova was split into **Cugach Census Are

In summary, we have

- 3,134 counties for 2014-2019 (**due to removing 8 CT counties)**
- 3,135 for 2020-2021 (**due to the Alaska county split)**
- 3,144 counties for 2022-2023 **(due to the incorporation of 9 CT planning regions)**
- 3,142 counties for 2014-2019
- 3,143 for 2020-2021 (**due to the Alaska county split, which adds one county)**
- 3,144 counties for 2022-2023 **(due to the incorporation of 9 CT planning regions and removal of eigh CT counties)**

**Check:** Do we have the same number of counties for each year? Or any dropping in and out?

@@ -116,13 +117,6 @@ counties_comparison <- counties_per_year %>%
)
)
# Unnest the comparison results
counties_dropped_new <- counties_comparison %>%
unnest(c(dropped, new), keep_empty = TRUE) %>%
pivot_longer(cols = c(dropped, new), names_to = "status", values_to = "county") %>%
filter(!is.na(county))
number_of_counties_changed <-
counties_comparison |>
# Remove first year of our data since by definition there can't be changes
@@ -135,36 +129,62 @@ number_of_counties_changed <-
)
number_of_counties_changed
```



**Check:** Only one county from 2020 is dropped from our data (Valdez-Cordova)
**Check:** Only one county 2020 is dropped from our data in 2020 (Valdez-Cordova)
```{r}
assert_that(
all(
number_of_counties_changed %>%
filter(year != 2020) %>%
pull(dropped) == 0
)
)
assert_that(
number_of_counties_changed %>%
filter(year == 2020) %>%
pull(dropped) == 1
)
```

**Check:** 9 counties are added in 2022 (CT planning regions)

**Check:** 9 counties are added in 2022 (CT planning regions) and 8 counties are dropped (old CT counties)
```{r}
assert_that(
number_of_counties_changed %>%
filter(year == 2022) %>%
pull(new) == 9
)
assert_that(
all(
number_of_counties_changed %>%
filter(year == 2022) %>%
pull(dropped) == 8
)
)
```

**Check:** Besides the counties removed and added in 2020 and 2022, there are no other dropped or added counties

```{r}
assert_that(
all(
number_of_counties_changed %>%
filter(!(year %in% c(2020, 2022))) %>%
pull(dropped) == 0
)
)
assert_that(
all(
number_of_counties_changed %>%
filter(!(year %in% c(2020, 2022))) %>%
pull(new) == 0
)
)
```
### Places

Our final data contains incorporated places with populations greater than 75,000 in 2020.
@@ -259,7 +279,7 @@ This process estimates data for the target geography by appropriately weighting

::: {.callout-note}

This overview generally applies for crosswalking **count** variables like population.
This overview generally applies for crosswalking **count** variables like population.
There are some more nuances to consider when crosswalking **non-count** variables. To learn more
about this see [Geographic Crosswalks at Urban](https://ui-research.github.io/code-library/crosswalk-guide/geographic-crosswalks-at-urban.html#what-are-geographic-crosswalks).
:::