Skip to content

Commit

Permalink
organize code more
Browse files Browse the repository at this point in the history
  • Loading branch information
hrecht committed Apr 5, 2024
1 parent 6ba5979 commit b59c786
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 25 deletions.
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
* New `has_api_key()` helper function detects if there is a stored Census Bureau API key in the Renviron, intended mainly for internal use.

### Variable typing
* `getCensus()` uses improved logic to automatically convert columns that contain all numbers to numeric, unless the column name is in a specific list of geography names or other string type columns. Use `convert_variable = FALSE` to leave all columns as characters.
* `getCensus()` uses improved logic to automatically convert columns that contain all numbers to numeric, unless the column name is in a specific list of geography names or other string type columns. Use `convert_variables = FALSE` to leave all columns as characters.

### Metadata
* `listCensusApis()` now has optional `name` and `vintage` parameters to get metadata for a subset of datasets or a single dataset. (#103)
Expand Down
44 changes: 21 additions & 23 deletions R/getcensus_functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -49,25 +49,24 @@ getFunction <- function(apiurl, name, key, get, region, regionin, time, show_cal
df <- data.frame(raw)
df <- df[-1,]
df <- cleanColnames(df)
# Make all columns character
df[] <- lapply(df, as.character)


# Make columns numeric unless they're in specific string/geography column names lists
# Microdata weighting variables
# } else if (grepl("cps/", name, ignore.case = T) |
# name %in% c("acs/acs5/pums", "acs/acs5/pumspr", "acs/acs1/pums", "acs/acs1/pumspr")) {
# numeric_cols <- grep("[0-9]|PWSSWGT|HWHHWGT|PWFMWGT|PWLGWGT|PWCMPWGT|PWORWGT|PWVETWGT|WGTP|PWGTP", names(df), value=TRUE, ignore.case = T)
# string_cols <- grep(common_string_cols, numeric_cols, value = TRUE, ignore.case = T)
#
# }
# Make all columns character - they already are from the Census but just in
# case the Census does wonky things
df[] <- lapply(df, as.character)

if (convert_variables == TRUE) {
# If these are part of the variable name, keep as string
string_col_parts_list <- c("_TTL", "_NAME", "NAICS", "FAGE4", "LABEL",
"_DESC", "CAT", "UNIT_QY", "_FLAG",
"DISTRICT", "EMPSZES", "POPGROUP")

# Collapse into a | delimited string for grepl
collapse_col_parts <- function(parts) {
collapsed <- paste0(parts, collapse = "|")
return(collapsed)
}
common_string_cols <- collapse_col_parts(string_col_parts_list)

# Geography variables - exact matches only
geos_list <- c("GEO_ID", "GEOID", "GEOID1", "GEOID2", "GEOCOMP",
"SUMLEVEL", "GEOTYPE", "GEOMAME", "GEOVARIANT",
Expand Down Expand Up @@ -127,15 +126,6 @@ getFunction <- function(apiurl, name, key, get, region, regionin, time, show_cal
# SIPP microdata
"TFIPSST")

collapse_col_parts <- function(parts) {
collapsed <- paste0(parts, collapse = "|")
return(collapsed)
}
common_string_cols <- collapse_col_parts(string_col_parts_list)

# Columns that match geos_list exactly
geo_cols <- names(df)[toupper(names(df)) %in% geos_list]

# Microdata APIs - don't convert string identifier variables that appear
# in >5 endpoints as strings only or nearly always as strings
if (grepl("cps/|pums|sipp", name, ignore.case = T)) {
Expand All @@ -154,12 +144,20 @@ getFunction <- function(apiurl, name, key, get, region, regionin, time, show_cal

# For ACS data, also keep as strings ACS annotation variables
# ending in MA or EA or SS
if (grepl("acs/acs", name, ignore.case = T) & !(grepl("pums", name, ignore.case = T))) {
common_string_cols <- collapse_col_parts(c("MA", "EA", "SS", common_string_cols))
if (grepl("acs/acs", name, ignore.case = T) &
!(grepl("pums", name, ignore.case = T))) {
common_string_cols <- collapse_col_parts(
c("MA", "EA", "SS",
common_string_cols))
}

# Columns that contain string parts
# Columns that contain string parts in the name stay as strings
string_part_cols <- grep(common_string_cols, names(df), value = TRUE, ignore.case = T)

# Columns that match geos_list exactly stay as strings (other than case sensitivity)
geo_cols <- names(df)[toupper(names(df)) %in% geos_list]

# Identify all the geo/string columns to keep as strings
string_cols <- c(geo_cols, string_part_cols)

# For columns that aren't explicitly defined here as strings, convert them to numeric
Expand Down
2 changes: 1 addition & 1 deletion docs/news/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit b59c786

Please sign in to comment.