You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deidentified data does not contain identifying variable
Reproducibility
All scripts run from the master after adding the correct folder path to line(s) X (and XX)
The master script is organized in a way that allows you to understand the general tasks being performed in the code
The master script tracks which scripts create and use which files
The data sets created by the reviewer are exactly the same as those shared by the coder
Code organization and readability
Code names are informative
It is clear in the code why tasks are being executed
The code structure facilitates understanding of the tasks
Code uses white space to improve readability
There is extensive use of comments to explain the code
The code is efficient (tasks are executed in the simplest way possible, loops are used when needed rather than repeating lines, pre-defined functions are used)
Common tasks are abstracted and automated (e.g. using functions or macros)
Clean data set checks (pre-publication)
The data does not include direct identifiers
The data set has a clearly labeled, uniquely and fully identifying ID variable
The level of observation of the data set is clear from the dataset name, ID variables and documentation
Variables have informative labels or an acompanying dictionary
Categorical variables have clear and informative value labels
No modification is made from the raw to the clean data other then correcting problems
No raw variables are processed (winsorized, for example)
Variables can be easily traced back to the original questionnaire
Data cleaning tasks
Are new variables being created in the cleaning do-files?
Are any changes being made to observations values in the cleaning do-files?
Check merges: Are any observations dropped? If so, is there a clear justification for that? If any observations didn't match, is that explained in the comments?
Are missing values coded consistently? Are extended missing values used?
The text was updated successfully, but these errors were encountered:
Data cleaning code review checklist
Data source/survey round
Date
List of files to be checked [Add names or links]
Master script
Clean dataset(s)
Cleaning scripts
Identifiers
Reproducibility
Code organization and readability
Clean data set checks (pre-publication)
Data cleaning tasks
The text was updated successfully, but these errors were encountered: