Iss418 - Pull request for Digital Access - Version 2025 #444

ridhi96 · 2025-01-10T23:30:37Z

Mobility metric pull request template

Please include the following points in your PR:

A link to the issue that this PR relates to. You can bring up a list of suggested issues and pull requests within the repository by typing #. Reconfigure digital access metric and update data to 2023 #418
A description of the content in this pull request.

What was changed?
- Re-created the digital access metric using data from IPUMS (5-yr ACS).
What should the reviewer be focusing on?
- Methodology used to re-create the variable, quality variable calculations
Is there a logical order to review the files in?
- No

Detail on any issues or flags that the metric reviewer/data-team should be aware of.

Still working on making the evaluate_final_data function execute

kmartinchek · 2025-01-24T22:32:56Z

Please add a note to users at the beginning of the script to (1) use here::here() to check the root folder and (2) use the R project file so that paths do not need to be manually set.
Add a note that users need to uncomment the code in lines 76-114 (if they run as-is without doing so it wouldn’t work).
Could be useful to flag which lines of code need to be manually adjusted at the top of the file (e.g., the extract number).
After performing specific functions, please add unit tests to confirm they worked. For example, confirm that only GQ codes 1, 2, and 5 remain in the data after line 185 and confirm the race and income subgroup variables were coded correctly using a table between old and new codes after line 258 and line 276.
More context as to what you are testing in lines 328-338 would be helpful. Also recommend using an assert to confirm that results are as expected.
Should the testing for CIHISPEED be removed from this script? I think it would be easier to say that CIHISPEED takes on this value when these conditions are met and then test directly for those conditions. That would slim down the codebase.
It is a bit hard to know if your calculation is doing what you expect it to because I am not sure exactly what you want to capture. I would add these details as a description—for instance, “here, we calculate a household-level measure of whether households have access to a laptop or tablet at home AND have access to broadband internet at home”. This definition seems reasonable to me, and is what you generate. I see that this is on line 548—I would move this up.
Lines 418 to 425 should be removed or transformed into an assert-like statement, like assert that this variable does not have these values or otherwise remove, instead of how it is currently formatted.
A broader comment is that I think that the code used here to query IPUMS and merge in the xwalk is easier to parse than some other code used in the UMF codebase—so I really appreciate that.
For the merges, it would be helpful to use library("tidylog", warn.conflicts = FALSE) so you can see the merge information. Regardless, I recommend adding context for what you expect as output from the merge (e.g., how many matches, characteristics of matched/unmatched data), and then create a unit test to assess this.
Why do you adjust HHWT for afact for places but not counties? Is this what you intend? I would describe why you do this on line 502.
Why do you drop_na HHWT in line 519? Explain and also do a check of these NA values before you drop them to make sure the drop is not a mistake. It looks like when I run the count code on line 516, all observations report this as false. Same considerations with respect to the place code block that follows.
For the effective sample code lines 575 to 621, I was surprised to see that all counties had an effective sample of 30. It does in some ways make sense if you’re using the 5 year estimates vs. 1 year -- is that the extract you want to use? The HH income metric is from 1 year ACS data, so just wanted to double-check.
When you calculate share digital access, recommend adding an assert that these values fall in the bounded range between 0 and 100 percent – I see now this happens on line 777, so I would note that you assess quality there. I would also assert that if effective sample is less than 30, the value is NA. It is helpful to have these texts presented after the calculations.
Great job on the data visualization functions.
More detail on the decision for line 931 would be helpful in the file.
Add a note on what sum_products is, why you test it to be unique, and why you use it for data quality. I can’t find any documentation on it—I know it’s a derivative of the geocorr xwalk, but it would be good to reference this documentation where the variable is discussed, because otherwise the user may think it is from IPUMS and not be able to track down the definition.
Also, wouldn’t data quality level two need to be “(effective_sample_count >= 30 & sum_products >= 0.35 & sum_products < 0.75) ~ 2”, otherwise quality 1 would be overwritten? See line 1000.
Do you need to make any adjustments based on AFACT? I have seen this done in other microdata UMF files (like for HH income calculations), but didn’t catch it here.
Confirm that the distinct numbers in lines 1044 to 1061 are what you expect—otherwise, users won’t know if what their console returns are what is expected.
Really interesting function for geographic equivalence – but it could use slightly more detailed text on what exactly you are testing for and why.
For some of these checks at the end of the file, it may be better to format them as asserts instead of users needing to check the output and hand verify.
When you surface the missing counties for ethnicity, (line 1153 to 1160), I would clarify why they are missing and why it makes sense to add them in. If they erroneously disappeared earlier, adding them back in this manner would not address the underlying issue.
You mentioned the final evaluation function wasn’t working for you, so I did not check this. Also, the CIs were not calculated, so no checking related to this happened in this review.
Could be useful when generating estimates from microdata with Cis: https://github.com/UI-Research/mobility-from-poverty/blob/main/01_financial-well-being/R/calc_income_quantiles.R

ridhi96 · 2025-02-13T19:58:34Z

Hi @kmartinchek! Please find below my responses to your comments. I have tried to provide more context in the code file. Please let me know if anything is unclear and I'd be happy to add more information. Thank you!

Please add a note to users at the beginning of the script to (1) use here::here() to check the root folder and (2) use the R project file so that paths do not need to be manually set.
Add a note that users need to uncomment the code in lines 76-114 (if they run as-is without doing so it wouldn’t work).
Could be useful to flag which lines of code need to be manually adjusted at the top of the file (e.g., the extract number).
After performing specific functions, please add unit tests to confirm they worked. For example, confirm that only GQ codes 1, 2, and 5 remain in the data after line 185 and confirm the race and income subgroup variables were coded correctly using a table between old and new codes after line 258 and line 276.
More context as to what you are testing in lines 328-338 would be helpful. Also recommend using an assert to confirm that results are as expected.
Should the testing for CIHISPEED be removed from this script? I think it would be easier to say that CIHISPEED takes on this value when these conditions are met and then test directly for those conditions. That would slim down the codebase.
- I updated more context around this in the code book. We need to keep this because IPUMS is not very clear on why some data would have N/A(GQ) values if group quarters are removed (the value suggests N/A is assigned for observations categorized as group quarters). We decided to use two separate variables to define what is the universe of observations (denominator) and the households which have digital access (numerator). Please let me know if I can provide more context on this.
It is a bit hard to know if your calculation is doing what you expect it to because I am not sure exactly what you want to capture. I would add these details as a description—for instance, “here, we calculate a household-level measure of whether households have access to a laptop or tablet at home AND have access to broadband internet at home”. This definition seems reasonable to me, and is what you generate. I see that this is on line 548—I would move this up.
Lines 418 to 425 should be removed or transformed into an assert-like statement, like assert that this variable does not have these values or otherwise remove, instead of how it is currently formatted.
A broader comment is that I think that the code used here to query IPUMS and merge in the xwalk is easier to parse than some other code used in the UMF codebase—so I really appreciate that.
For the merges, it would be helpful to use library("tidylog", warn.conflicts = FALSE) so you can see the merge information. Regardless, I recommend adding context for what you expect as output from the merge (e.g., how many matches, characteristics of matched/unmatched data), and then create a unit test to assess this.
Why do you adjust HHWT for afact for places but not counties? Is this what you intend? I would describe why you do this on line 502.
- I added more context around this.
Why do you drop_na HHWT in line 519? Explain and also do a check of these NA values before you drop them to make sure the drop is not a mistake. It looks like when I run the count code on line 516, all observations report this as false. Same considerations with respect to the place code block that follows.
For the effective sample code lines 575 to 621, I was surprised to see that all counties had an effective sample of 30. It does in some ways make sense if you’re using the 5 year estimates vs. 1 year -- is that the extract you want to use? The HH income metric is from 1 year ACS data, so just wanted to double-check.
- Yes, I am using 5 year estimates as mentioned in the beginning of the code book. Effective sample less than 30 fails for some subgroups though and I have detailed this through tests in the code book.
When you calculate share digital access, recommend adding an assert that these values fall in the bounded range between 0 and 100 percent – I see now this happens on line 777, so I would note that you assess quality there. I would also assert that if effective sample is less than 30, the value is NA. It is helpful to have these texts presented after the calculations.
Great job on the data visualization functions.
More detail on the decision for line 931 would be helpful in the file.
- Added in the code earlier when talking about creating the metric.
Add a note on what sum_products is, why you test it to be unique, and why you use it for data quality. I can’t find any documentation on it—I know it’s a derivative of the geocorr xwalk, but it would be good to reference this documentation where the variable is discussed, because otherwise the user may think it is from IPUMS and not be able to track down the definition.
- Provided more detail and the crosswalk file used to create this.
Also, wouldn’t data quality level two need to be “(effective_sample_count >= 30 & sum_products >= 0.35 & sum_products < 0.75) ~ 2”, otherwise quality 1 would be overwritten? See line 1000.
- No we don't need to add sum_products < 0.75 as the case_when function takes care of it. I have added some new tests for data quality ranges to provide more assurance.
Do you need to make any adjustments based on AFACT? I have seen this done in other microdata UMF files (like for HH income calculations), but didn’t catch it here.
- This is done when I adjust HHWT at an earlier point in the metric calculation
Confirm that the distinct numbers in lines 1044 to 1061 are what you expect—otherwise, users won’t know if what their console returns are what is expected.
Really interesting function for geographic equivalence – but it could use slightly more detailed text on what exactly you are testing for and why.
For some of these checks at the end of the file, it may be better to format them as asserts instead of users needing to check the output and hand verify.
When you surface the missing counties for ethnicity, (line 1153 to 1160), I would clarify why they are missing and why it makes sense to add them in. If they erroneously disappeared earlier, adding them back in this manner would not address the underlying issue.
You mentioned the final evaluation function wasn’t working for you, so I did not check this. Also, the CIs were not calculated, so no checking related to this happened in this review.
- Added the final evaluation function.
Could be useful when generating estimates from microdata with Cis: https://github.com/UI-Research/mobility-from-poverty/blob/main/01_financial-well-being/R/calc_income_quantiles.R
- I didn't really understand what I need to do for this comment. Please let me know and I can update.

…and 2023

added new version of digital access metric

fc83667

ridhi96 marked this pull request as draft January 10, 2025 23:30

cdsolari requested a review from kmartinchek January 16, 2025 20:06

cdsolari assigned kmartinchek Jan 16, 2025

ridhi96 added 4 commits February 7, 2025 16:36

added final evaluation function and edits

ccf8d64

updated saving data functionality

7ce0ff3

updated in response to review

9dfcafd

added evaluation function and addressed review comments

43e90ed

ridhi96 marked this pull request as ready for review February 13, 2025 20:03

cdsolari requested review from cdsolari and jwalsh28 March 4, 2025 15:39

ridhi96 marked this pull request as draft March 5, 2025 19:13

created separate file for metric exploration, included data for 2018 …

df83a2a

…and 2023

ridhi96 marked this pull request as ready for review March 7, 2025 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iss418 - Pull request for Digital Access - Version 2025 #444

Iss418 - Pull request for Digital Access - Version 2025 #444

ridhi96 commented Jan 10, 2025

kmartinchek commented Jan 24, 2025

ridhi96 commented Feb 13, 2025

Iss418 - Pull request for Digital Access - Version 2025 #444

Are you sure you want to change the base?

Iss418 - Pull request for Digital Access - Version 2025 #444

Conversation

ridhi96 commented Jan 10, 2025

kmartinchek commented Jan 24, 2025

ridhi96 commented Feb 13, 2025