-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
337 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,236 @@ | ||
--- | ||
title: "Data Anonymization with R's sdcMicro Package" | ||
author: "Renata Goncalves Curty - UCSB Library, Research Data Services" | ||
date: "2023-02-17" | ||
output: | ||
html_document: default | ||
pdf_document: default | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE) | ||
``` | ||
|
||
## South Park Elementary School Data | ||
|
||
![Our Clients](fig/southpark.png) | ||
|
||
Mayor McDaniels and Peter Charles (aka PC Principal) are concerned that even after removing direct identifiers such as names, SSNs, and IDs, students may still be easily re-identified in the yearly assessment dataset and have their math and reading scores revealed. For example, everyone in school knows that Tolkien Williams is the wealthiest kid in the whole town, whereas Kenny and his sister Karen are from a very poor family. | ||
|
||
They have requested our assistance to compute this risk of disclosure, implement strategies to minimize it, and determine information loss for the anonymized dataset they would like to make public to other school board members\*. They asked for our help, and we will be using the sdcMicro package for this purpose. | ||
|
||
In summary, our client has three main questions to for us (and none of them involve finding out who keeps killing Keny and how come he keeps coming back to life): | ||
|
||
*Q1. What is the level of disclosure risk associated with this dataset?* | ||
|
||
*Q2. How can the risk of re-identification be significantly reduced?* | ||
|
||
*Q3. What would be the utility and information loss after implementing the anonymization strategies?* | ||
|
||
\*Caveat: We have a relative small dataset for this exercise (rows and columns, so we can't strive for some of the tresholds recommended in the literature. | ||
|
||
#### Package & Data | ||
|
||
```{r} | ||
library(sdcMicro) | ||
data <- read.csv("southpark-sdc.csv") | ||
``` | ||
|
||
#### Taking a closer look at the variables included in this dataset | ||
|
||
```{r} | ||
# Read the CSV dataset into a data frame | ||
? | ||
# Show the list of variable names | ||
? | ||
``` | ||
|
||
#### Data Prep - Converting variables | ||
|
||
As we can see, we will need to convert some of the variables first. | ||
|
||
The stu-id, SSN, name and dob will be removed soon from the dataset as they are direct identifiers. | ||
|
||
Let's focus on the remaining ones that should be converted before we can proceed. | ||
|
||
```{r} | ||
fname = "southpark-sdc.csv" | ||
file <- read.csv(fname) | ||
file <- varToFactor(obj=file, var=c("zip","age", "sex","race","ethn", "snap", "income", "learn_dis","phys_dis")) | ||
#Convert to numeric math_sc and read_sc | ||
? | ||
``` | ||
|
||
#### Q1. What is the level of disclosure risk associated with this dataset? | ||
|
||
To answer this question we have to set up an SDC problem. In other words we must select variables and create an object of class *sdcMicroObj* for the SDC process in *R.* | ||
|
||
```{r} | ||
# Select variables for creating sdcMicro object | ||
# All variable names should correspond to the names in the data file | ||
# select categorical key variables - aka quasi-identifiers | ||
sdcInitial <- createSdcObj(dat=file, | ||
keyVars=c(?), | ||
numVars=c(?), | ||
weightVar=NULL, | ||
hhId=NULL, | ||
strataVar=NULL, | ||
pramVars=NULL, | ||
excludeVars=c(?), | ||
seed=0, | ||
randomizeRecords=FALSE, | ||
alpha=c(1)) | ||
# Summary of object | ||
? | ||
``` | ||
|
||
What about the stu_id? Why we are keeping it? | ||
|
||
Check the results below, and the number of observations that violate 2-5 anonymity. What does that mean? | ||
|
||
##### Time to calculate the risk of re-identification for the entire dataset | ||
|
||
```{r} | ||
# The treshold depends on the size of the dataset and the access control (conservative number for large surveys are 0.04) | ||
? | ||
``` | ||
|
||
Was it good? | ||
|
||
Let's see if we can get that lowered to less than 15% and a k=5. | ||
|
||
We have to get some work done to reduce that. But that would be the first answer to our clients. | ||
|
||
We can inspect this issue a little further before moving to the second question. | ||
|
||
##### Which observations/subjects have a higher risk to be re-identified? | ||
|
||
```{r} | ||
``` | ||
|
||
##### How many combinations of key variables each record have? | ||
|
||
```{r} | ||
#Categorical variable risk | ||
#Frequency of the particular combination of key variables (quasi-identifiers) for each record in the sample | ||
? | ||
``` | ||
|
||
#### Q2. How can the risk of re-identification be significantly reduced? | ||
|
||
We learned that there are different techniques to de-identify and anonymize datasets. | ||
|
||
First, let's use some non-perturbative methods such as global recoding and top and bottom coding techniques. | ||
|
||
*Income* | ||
|
||
As mentioned before, the household income of some students may pose a risk to their privacy in this dataset. So let's see if using top and bottom recoding could help reducing that risk. | ||
|
||
```{r} | ||
# Frequencies of income before recoding | ||
table(sdcInitial@manipKeyVars$income) | ||
``` | ||
|
||
```{r} | ||
## Recode variable income (top coding) | ||
sdcInitial <- groupAndRename(obj= sdcInitial, var= c("income"), before=c("200,000-249,999","500,000+"), after=c("200,000+")) | ||
## Recode variable income (bottom coding) | ||
sdcInitial <- groupAndRename(obj= sdcInitial, var= c("income"), before=c("10,000-24,999","75,000-99,999"), after=c("10,000-99,999")) | ||
``` | ||
|
||
*Age* | ||
|
||
```{r} | ||
# Frequencies of age before recoding | ||
? | ||
``` | ||
|
||
```{r} | ||
#Recode Age (top and bottom) | ||
? | ||
``` | ||
|
||
##### **Note: Undoing things** | ||
|
||
```{r} | ||
# Important note: If the results are reassigned to the same sdcMicro object, it is possible to undo the last step in the SDC process. Using: | ||
# sdcInitial <- undolast(sdcInitial) | ||
# It might be helpful to tune some parameters. The results of the last step, however, will be lost after undoing that step. | ||
# We can also choose to assign results to a new sdcMicro object this time, using: | ||
# sdc1 <- functionName(sdcInitial) specially if you anticipate creating multiple sdc problems to test out.Otherwise, you can delete the object and re-run the code when needed | ||
``` | ||
|
||
Let's see if those steps lowered the risk of re-identification of subjects. | ||
|
||
```{r} | ||
? | ||
``` | ||
|
||
Only a tiny improvement compared to the original dataset. Let's try something else. | ||
|
||
##### Time for a more powerful technique. Let's use the k-anonymization function! | ||
|
||
```{r} | ||
#Local suppression to obtain k-anonymity | ||
? | ||
# Setting the parameters that we are aiming for at least 5 observations sharing the same attributes in the dataset. | ||
#Alternatively, we could have set the order of importance for each keyvariables | ||
#sdcInitial <- kAnon(sdcInitial, importance=c(9,5,6,7,8,4,3,1,2), k=c(5)) | ||
``` | ||
|
||
More on importance (pg. 50): <https://cran.r-project.org/web/packages/sdcMicro/sdcMicro.pdf> | ||
|
||
Time to check it again: | ||
|
||
```{r} | ||
? | ||
``` | ||
|
||
Alright! We managed lower the risk of identification from 81% to about 10% and now we have 0 observations violating 5-anonymity! We can tell our clients we used some recoding, but supression via k-anonymity was necessary to improve the privacy level of this dataset. | ||
|
||
#### Q3. What would be the utility and information loss after implementing anonymization strategies? | ||
|
||
##### Time to measure the utility and information loss for the anonymized dataset. | ||
|
||
```{r} | ||
#First we retrieve the total number of suppressions for each categorical key variable | ||
? | ||
``` | ||
|
||
```{r} | ||
#We can also compare the number of NAs before and after our interventions | ||
# Store the names of all categorical key variables in a vector | ||
namesKeyVars <- names(sdcInitial@manipKeyVars) | ||
# Matrix to store the number of missing values (NA) before and after anonymization | ||
NAcount <- matrix(NA, nrow = 2, ncol = length(namesKeyVars)) | ||
colnames(NAcount) <- c(paste0('NA', namesKeyVars)) # column names | ||
rownames(NAcount) <- c('initial', 'treated') # row names | ||
# NA count in all key variables (NOTE: only those coded NA are counted) | ||
for(i in 1:length(namesKeyVars)) { | ||
NAcount[1, i] <- sum(is.na(sdcInitial@origData[,namesKeyVars[i]])) | ||
NAcount[2, i] <- sum(is.na(sdcInitial@manipKeyVars[,i]))} | ||
# Show results | ||
NAcount | ||
``` | ||
|
||
Based on the results we can tell PC Principal and the Mayor that the supression greatly reduced the level of detail about the income and the race of the students. We could continue exploring removing other less relevant variables and explore other functions in this package or even considering different ways of recoding that variable. But let's call the day for today, and export the anonymized dataset we produced. | ||
|
||
##### Creating a new random number to replace the student ID | ||
|
||
```{r} | ||
## Adding a new randomized ID-variable | ||
? | ||
``` | ||
|
||
##### Exporting the anonymized dataset | ||
|
||
```{r} | ||
writeSafeFile(obj=sdcInitial, format="csv", randomizeRecords="no", col.names=TRUE, sep=",", dec=".", fileOut="southpark-anon.csv") | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
zip,stu_id,ssn,name,dob,age,sex,race,ethn,snap,income,learn_dis,phys_dis,math_sc,read_sc | ||
80220,8206630976,998126245,Stan Marsh,10/19/2012,10,Male,White,Non-hispanic,0,"200,000-249,999",0,0,299,300 | ||
80220,6555504757,807281100,Kyle Broflovski,05/26/2012,10,Male,White,Non-hispanic,0,"100,000-149,999",0,0,209,209 | ||
80220,5737953702,890807948,Kenny McCormick,03/12/2011,11,Male,White,Non-hispanic,1,"10,000-24,999",0,0,200,201 | ||
80220,5705942436,991920659,Eric Cartman,07/01/2012,10,Male,White,Non-hispanic,0,"75,000-99,999",0,0,211,215 | ||
80220,2809004240,921479968,Butters Scotch,11/11/2012,10,Male,White,Non-hispanic,0,"75,000-99,999",0,0,224,230 | ||
80220,4486369132,804989533,Clyde Donovan,04/10/2012,10,Male,White,Non-hispanic,0,"75,000-99,999",0,0,213,227 | ||
80220,4038126650,854569146,Wendy Testaburger,12/04/2013,9,Female,White,Non-hispanic,0,"100,000-149,999",0,0,204,210 | ||
80221,6008064113,761499326,Bebe Stevens,01/01/2013,10,Female,White,Non-hispanic,0,"75,000-99,999",0,0,202,214 | ||
80220,8307803951,925072083,Tolkien Williams,05/25/2012,10,Male,Black,Non-hispanic,0,"500,000+",0,0,202,222 | ||
80220,3787379332,772439783,Timmy Burch,11/25/2012,10,Male,White,Non-hispanic,0,"75,000-99,999",1,1,205,225 | ||
80221,6685370248,693123835,Jimmy Valmer,06/20/2011,11,Male,White,Non-hispanic,0,"75,000-99,999",0,1,211,206 | ||
80221,6730800673,947344677,Craig Tucker,03/05/2012,10,Male,White,Non-hispanic,0,"75,000-99,999",0,0,205,226 | ||
80220,7961994919,795573368,Tweek Tweak,09/08/2013,9,Male,White,Non-hispanic,0,"75,000-99,999",1,0,225,190 | ||
80220,4109750140,784443358,Karen McCormick,01/31/2014,9,Female,White,Non-hispanic,1,"10,000-24,999",0,0,220,208 | ||
80222,5809626852,727780211,Scott Malkinson,02/28/2013,9,Male,White,Non-hispanic,0,"75,000-99,999",0,0,310,280 | ||
80220,3022294345,931425223,Kevin Stoley,06/02/2013,9,Male,White,Non-hispanic,0,"75,000-99,999",0,0,217,203 | ||
80221,2503282093,923511748,Ike Broflovksi,10/14/2012,10,Male,White,Non-hispanic,0,"75,000-99,999",0,0,225,227 | ||
80222,3120649456,915859337,Firkle Smith,12/16/2012,10,Male,White,Non-hispanic,0,"100,000-149,999",0,0,223,214 | ||
80222,8281247724,747094897,Pete Thelman,10/24/2013,9,Male,White,Non-hispanic,0,"100,000-149,999",0,0,204,208 | ||
80220,7009901765,731342745,Bradley Biggle,02/13/2013,10,Male,White,Non-hispanic,0,"100,000-149,999",0,0,220,200 | ||
80222,3454129258,950392557,Charlotte Knobs,05/10/2013,9,Female,White,Non-hispanic,0,"75,000-99,999",1,0,205,209 | ||
80220,6940797462,712886703,Jenny Simons,03/17/2012,10,Female,White,Non-hispanic,1,"100,000-149,999",0,0,215,215 | ||
80221,6498370605,730143577,Sophie Gray,11/25/2012,10,Female,White,Non-hispanic,0,"200,000-249,999",0,0,209,205 | ||
80220,7411380937,745820080,Damien Thorn,08/01/2013,9,Male,White,Non-hispanic,0,"100,000-149,999",0,0,204,223 | ||
80221,4858260462,889675717,Jason White,08/09/2013,9,Male,White,Non-hispanic,0,"75,000-99,999",0,0,214,226 | ||
80221,3179780954,826397725,David Rodriguez,12/14/2013,9,Male,White,Hispanic,0,"100,000-149,999",0,0,208,229 | ||
80220,5414029866,742465554,Red McArthu,09/09/2013,9,Female,White,Non-hispanic,0,"100,000-149,999",0,1,221,202 | ||
80221,8032142324,676029102,Sally Turner,04/05/2012,10,Female,White,Non-hispanic,0,"200,000-249,999",0,0,206,229 | ||
80220,6371437335,861512602,Allie Nelson,02/25/2012,10,Female,White,Non-hispanic,0,"200,000-249,999",0,0,222,229 | ||
80222,2441678663,981476167,Kelly-Ann Barlow,04/18/2012,10,Female,White,Non-hispanic,0,"200,000-249,999",0,0,224,215 | ||
80220,2946755760,817924686,Larry Feegan,03/20/2012,10,Male,White,Non-hispanic,0,"100,000-149,999",0,0,200,221 | ||
80220,6597334829,914272742,Shelly Marsh,01/09/2010,13,Female,White,Non-hispanic,0,"100,000-149,999",0,0,219,229 | ||
80221,8687651665,992929130,Kay Chi,02/13/2015,8,Male,Asian,Non-hispanic,0,"75,000-99,999",0,0,218,202 | ||
80222,7044694117,861295557,Lee Roberts,09/03/2013,9,Male,Asian,Non-hispanic,0,"100,000-149,999",0,0,217,225 | ||
80222,2383266993,890054995,Donna Base,07/15/2012,10,Female,Black,Non-hispanic,0,"200,000-249,999",0,0,214,207 | ||
80220,5842799162,809048300,Rose River,05/02/2011,11,Female,White,Non-hispanic,0,"100,000-149,999",0,0,219,220 | ||
80222,5511548259,874008998,George Kuala,08/12/2012,10,Male,NA,Non-hispanic,0,"75,000-99,999",1,0,211,214 | ||
80221,8777913067,731193948,Jamal Campos,04/04/2012,10,Male,White,Hispanic,0,"200,000-249,999",0,0,202,226 | ||
80221,8721700078,680534365,Henry Fords,03/02/2011,11,Male,White,Non-hispanic,0,"100,000-149,999",1,0,211,217 | ||
80221,6139178090,676782088,Amelia Papimidous,10/02/2012,10,Female,White,Non-hispanic,0,"100,000-149,999",0,0,208,228 | ||
80220,8065237237,893247941,Tom Battle,03/05/2011,11,Male,White,Non-hispanic,0,"100,000-149,999",0,0,221,207 | ||
80222,4139401352,720273141,Fatima Ali,05/24/2011,11,Female,White,Non-hispanic,0,"75,000-99,999",1,1,202,225 | ||
80220,6118666721,890255724,Ahmed Khan,09/06/2011,11,Male,Asian,Non-hispanic,0,"75,000-99,999",0,0,205,210 | ||
80221,6482073290,902205262,Maria Rodriguez,12/15/2011,11,Female,White,Hispanic,0,"100,000-149,999",0,0,200,215 | ||
80220,5103331881,801617194,Kim Lee,02/08/2012,11,Female,Asian,Non-hispanic,0,"75,000-99,999",0,0,215,207 | ||
80220,5662616805,946404891,Thomas Smith,06/29/2012,10,Male,White,Non-hispanic,0,"75,000-99,999",0,0,200,205 | ||
80221,4026604192,736706846,Jasmine Davis,11/18/2012,10,Female,Black,Non-hispanic,0,"75,000-99,999",0,1,222,206 | ||
80220,5114724288,951436962,Ahmed Mohammed,01/26/2013,10,Male,White,Non-hispanic,1,"100,000-149,999",1,0,218,207 | ||
80222,3604059408,974991555,Sofia García,04/12/2013,9,Female,White,Non-hispanic,0,"75,000-99,999",0,0,224,223 | ||
80220,8309202481,782831373,Wei Chen,07/23/2013,9,Male,Asian,Non-hispanic,0,"100,000-149,999",1,0,205,200 | ||
80220,8572729673,840029061,David Brown,10/05/2011,11,Male,White,Non-hispanic,0,"75,000-99,999",0,0,207,217 | ||
80221,2358285712,947687323,Aaliyah Jackson,01/14/2012,11,Female,Black,Non-hispanic,0,"100,000-149,999",0,0,216,216 | ||
80220,6039967434,725131243,Omar Hassan,03/27/2012,10,Male,White,Non-hispanic,0,"75,000-99,999",0,0,213,228 | ||
80221,7152325605,853935170,Isabella Sanchez,06/17/2012,10,Female,White,Hispanic,0,"75,000-99,999",0,0,203,217 | ||
80221,3237419728,679200682,Min Lee,09/08/2012,10,Female,Asian,Non-hispanic,0,"100,000-149,999",0,0,207,205 | ||
80222,6408366982,962422496,Matthew Taylor,12/21/2012,10,Male,White,Non-hispanic,0,"100,000-149,999",1,0,200,220 | ||
80221,6075761054,983690557,Nia Wilson,02/19/2013,9,Female,White,Non-hispanic,0,"100,000-149,999",1,0,220,229 | ||
80222,3362479883,814225395,Tariq Ahmed,05/10/2013,9,Male,White,Non-hispanic,0,"100,000-149,999",0,0,203,226 | ||
80222,2271343893,985949515,Juan Gonzalez,08/25/2013,9,Male,White,Hispanic,0,"100,000-149,999",0,0,213,206 | ||
80220,7915099255,688180331,Yuna Kim,11/07/2011,11,Female,Asian,Non-hispanic,0,"200,000-249,999",0,0,201,219 | ||
80222,5693148881,751760341,William Jones,01/20/2012,11,Male,White,Non-hispanic,0,"75,000-99,999",0,0,214,227 | ||
80220,6503772597,941474697,Leah Harris,04/08/2012,10,Female,White,Non-hispanic,0,"75,000-99,999",0,0,225,204 | ||
80221,4880530272,850574744,Muhammad Ali,07/02/2012,10,Male,White,Non-hispanic,0,"200,000-249,999",0,0,212,214 | ||
80222,7764246104,682988160,Rosa Martinez,09/15/2012,10,Female,White,Hispanic,0,"200,000-249,999",0,0,208,220 | ||
80222,6436583685,869234973,Jie Zhang,12/30/2012,10,Female,Asian,Non-hispanic,0,"75,000-99,999",0,0,203,221 | ||
80221,2353134813,728912982,Benjamin Wilson,02/12/2013,10,Male,White,Non-hispanic,0,"75,000-99,999",0,0,202,223 | ||
80222,8006712610,980850357,Madison Johnson,05/03/2013,9,Female,Black,Non-hispanic,0,"75,000-99,999",0,0,217,214 | ||
80220,7952143247,910514155,Samir Bakr,08/16/2013,9,Male,White,Non-hispanic,0,"75,000-99,999",0,0,207,230 | ||
80222,5709468005,913828533,Ana Maria,11/05/2012,10,Female,White,Hispanic,0,"75,000-99,999",0,0,222,226 | ||
80222,6867660951,992438714,Kenji Nakamura,01/24/2013,10,Male,Asian,Non-hispanic,0,"75,000-99,999",0,0,223,201 | ||
80221,3870154420,815924622,Andrew Johnson,04/14/2013,9,Male,White,Non-hispanic,1,"200,000-249,999",0,0,211,214 | ||
80220,7797506673,844679624,Destiny Wilson,07/08/2013,9,Female,Black,Non-hispanic,0,"200,000-249,999",0,0,204,221 | ||
80220,2494395308,722181895,Tarek Farouk,10/22/2011,11,Male,White,Non-hispanic,0,"200,000-249,999",0,0,223,207 | ||
80221,8546373623,850279771,Carlos Hernandez,12/10/2011,11,Male,White,Hispanic,0,"200,000-249,999",0,0,201,204 | ||
80222,8656992107,718288946,Min-ji Park,03/05/2012,10,Female,Asian,Non-hispanic,0,"200,000-249,999",0,0,206,227 | ||
80222,6958413848,962473136,Jacob Smith,06/19/2012,10,Male,White,Non-hispanic,0,"100,000-149,999",0,0,218,208 | ||
80222,5523349842,801869852,Shayla Adams,09/06/2012,10,Female,Black,Non-hispanic,0,"100,000-149,999",0,0,200,201 | ||
80221,6595472746,951090549,Ahmed Ibrahim,12/01/2012,10,Male,White,Non-hispanic,0,"200,000-249,999",0,0,203,226 | ||
80220,7489643955,915131594,Isabel Rodriguez,02/28/2013,9,Female,White,Hispanic,0,"100,000-149,999",0,0,219,214 | ||
80220,2823018885,680803261,Xian Chen,05/16/2013,9,Male,Asian,Non-hispanic,0,"75,000-99,999",0,1,210,216 | ||
80221,7451127312,773234402,John Brown,08/31/2013,9,Male,White,Non-hispanic,0,"200,000-249,999",0,0,211,210 | ||
80220,7677217980,855443490,Amara Jones,11/17/2011,11,Female,Black,Non-hispanic,0,"200,000-249,999",0,0,213,214 | ||
80221,4783679758,902383995,Saeed Al-Saud,02/03/2012,11,Male,NA,Non-hispanic,0,"75,000-99,999",0,0,215,230 | ||
80221,2817826392,702406777,Juanita Lopez,04/23/2012,10,Female,White,Non-hispanic,0,"100,000-149,999",0,0,219,218 | ||
80221,7654249506,943837518,Yuyu Lee,07/15/2012,10,Female,Asian,Non-hispanic,0,"200,000-249,999",0,0,213,213 | ||
80220,2935358711,848394421,Benjamin Turner,10/08/2012,10,Male,White,Non-hispanic,0,"100,000-149,999",1,0,201,219 | ||
80220,4934388712,993842853,Lauren Wilson,12/25/2012,10,Female,Black,Non-hispanic,0,"100,000-149,999",1,0,205,203 | ||
80220,5360939504,988758591,Tariq Mustafa,03/09/2013,9,Male,NA,Non-hispanic,0,"200,000-249,999",0,0,213,220 | ||
80221,7221658825,842610748,Carlos Martinez,06/01/2013,9,Male,White,Non-hispanic,0,"100,000-149,999",0,0,200,204 | ||
80222,2777782104,677272490,Tomomi Nakamura,08/20/2013,9,Female,Asian,Non-hispanic,0,"200,000-249,999",0,0,210,214 | ||
80220,2657793427,927243456,Michael Davis,11/12/2012,10,Male,White,Non-hispanic,0,"100,000-149,999",0,0,224,203 | ||
80220,3246294874,841276165,Alberta Lowe,06/02/2011,11,Female,White,Non-hispanic,0,"100,000-149,999",0,0,211,219 | ||
80222,2522484954,692145969,Macy Brown,10/14/2013,9,Female,Black,Non-hispanic,0,"100,000-149,999",0,0,205,207 | ||
80222,6226479455,769066928,Joe Baker,12/16/2012,10,Male,White,Non-hispanic,0,"200,000-249,999",0,0,212,216 | ||
80220,8567366395,929068451,Samuel Tetris,10/24/2012,10,Male,White,Non-hispanic,0,"100,000-149,999",0,0,215,215 | ||
80220,3511819696,845874181,Paul Stage,02/13/2011,12,Male,White,Non-hispanic,0,"100,000-149,999",0,0,215,228 | ||
80221,8430191132,709355630,Lee King,05/10/2012,10,Male,Asian,Non-hispanic,0,"200,000-249,999",0,0,208,208 | ||
80220,4516023907,719087380,Benjamin Power,03/03/2012,10,Male,White,Non-hispanic,0,"200,000-249,999",0,0,213,210 | ||
80220,7032553463,723535202,Debra Smith,06/06/2012,10,Female,White,Non-hispanic,0,"100,000-149,999",0,0,201,217 | ||
80220,4986378999,720104857,Morgan Ellins,10/02/2011,11,Female,White,Non-hispanic,0,"100,000-149,999",0,0,209,213 |