Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All chapters: organizational cleanup #473

Merged
merged 65 commits into from
Aug 5, 2020
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
1764a94
Title and preamble
bbdaniels Jul 8, 2020
8a69b37
Update all chapter names and filenames
bbdaniels Jul 8, 2020
87eb178
Update repo name
bbdaniels Jul 13, 2020
d82ff54
Update name and notes
bbdaniels Jul 13, 2020
24d7fc2
Rewrite introduction to chapters
bbdaniels Jul 13, 2020
8c46be2
Update homepage
bbdaniels Jul 13, 2020
bcbbf54
Title
bbdaniels Jul 13, 2020
ede778f
Fix PDF render
bbdaniels Jul 15, 2020
727f386
Fix URL in preamble
bbdaniels Jul 21, 2020
824e4f7
Book title in conclusion
bbdaniels Jul 21, 2020
c24daea
Accept suggestion
bbdaniels Jul 22, 2020
a6ff30b
Accept suggestion
bbdaniels Jul 22, 2020
467e183
Accept suggestion
bbdaniels Jul 22, 2020
7c84fe4
Accept suggestion
bbdaniels Jul 22, 2020
0d1f33b
preamble - fix commit URL too long new repo name
kbjarkefur Jul 22, 2020
32471e3
Squash git commit bug
bbdaniels Jul 22, 2020
e3643a2
preamble - reorder copy right page
kbjarkefur Jul 22, 2020
5b2b11f
intro - typo
kbjarkefur Jul 22, 2020
7a2e772
Accept suggestion
bbdaniels Jul 27, 2020
d3464a3
Accept suggestion
bbdaniels Jul 27, 2020
cb973ab
Accept suggestion
bbdaniels Jul 27, 2020
e39ae00
Accept suggestion
bbdaniels Jul 27, 2020
c3fef57
Full book title
bbdaniels Jul 27, 2020
8fbed14
Merge remote-tracking branch 'origin/bbd-org-cleanup' into bbd-org-cl…
bbdaniels Jul 27, 2020
73b2b54
Accept suggestion
bbdaniels Jul 28, 2020
2591d15
Rewrite intro header
bbdaniels Jul 28, 2020
7fa9994
Intro first section
bbdaniels Jul 28, 2020
1da6b81
Reproducible and code
bbdaniels Jul 28, 2020
e3ae70c
Move code pieces to Stata Appendix
bbdaniels Jul 28, 2020
b8b3fca
Develop data section
bbdaniels Jul 28, 2020
d68ff1d
Accept suggestion
bbdaniels Aug 2, 2020
9a16622
Accept suggestion
bbdaniels Aug 2, 2020
0507aaf
Accept suggestion
bbdaniels Aug 2, 2020
b476dc0
Accept suggestion
bbdaniels Aug 2, 2020
3576d11
Add pillars
bbdaniels Aug 3, 2020
b759a02
Merge remote-tracking branch 'origin/bbd-org-cleanup' into bbd-org-cl…
bbdaniels Aug 3, 2020
c1fa7ed
No staffing specs
bbdaniels Aug 3, 2020
610221a
First moves
bbdaniels Aug 3, 2020
715a23b
Rewrite first section
bbdaniels Aug 3, 2020
d4db01c
Second section
bbdaniels Aug 3, 2020
ee935e0
Section title
bbdaniels Aug 3, 2020
9402738
data quality
bbdaniels Aug 3, 2020
d52802e
Rewrite part of last section
bbdaniels Aug 3, 2020
75c62fe
Add research design appendix
bbdaniels Aug 3, 2020
aaa05ee
Move aux files to aux folder
bbdaniels Aug 3, 2020
129b79e
Better filenames
bbdaniels Aug 3, 2020
dd799e9
Fixes
bbdaniels Aug 3, 2020
0c19bae
Restructure
bbdaniels Aug 3, 2020
6bedc77
Shuffling and shaping
bbdaniels Aug 3, 2020
ed8652f
Accept suggestion
bbdaniels Aug 4, 2020
c397966
Accept suggestion
bbdaniels Aug 4, 2020
ab5d827
Accept suggestion
bbdaniels Aug 4, 2020
3c9d056
Accept suggestion
bbdaniels Aug 4, 2020
55d429d
Move para to end
bbdaniels Aug 4, 2020
94cf975
Add DIME stats
bbdaniels Aug 4, 2020
d19173b
Accept changes
bbdaniels Aug 4, 2020
6797519
Accept suggestion
bbdaniels Aug 4, 2020
61bd392
Reframe
bbdaniels Aug 4, 2020
1ac076c
Merge branch 'bbd-org-cleanup' of https://github.com/worldbank/dime-d…
bbdaniels Aug 4, 2020
ea45d7e
Newline
bbdaniels Aug 4, 2020
2685c92
Context?
bbdaniels Aug 4, 2020
752fd32
Clearer
bbdaniels Aug 4, 2020
358cc59
[intro] typo in \cite{}
kbjarkefur Aug 5, 2020
be3a9b3
Accept suggestion
bbdaniels Aug 5, 2020
ce0ac96
Accept suggestion
bbdaniels Aug 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 65 additions & 56 deletions chapters/introduction.tex → chapters/0-introduction.tex
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
\begin{fullwidth}
Welcome to \textit{Data for Development Impact}.
Welcome to \textit{Development Research in Practice}.
This book is intended to teach all users of development data
how to handle data effectively, efficiently, and ethically.
An empirical revolution has changed the face of research economics rapidly over the last decade.
bbdaniels marked this conversation as resolved.
Show resolved Hide resolved
%had to remove cite {\cite{angrist2017economic}} because of full page width
Today, especially in the development subfield, working with raw data --
bbdaniels marked this conversation as resolved.
Show resolved Hide resolved
whether collected through surveys or acquired from ``big'' data sources like sensors, satellites, or call data records --
whether collected through surveys or acquired from ``big'' data sources
bbdaniels marked this conversation as resolved.
Show resolved Hide resolved
like sensors, satellites, or call data records --
is a key skill for researchers and their staff.
At the same time, the scope and scale of empirical research projects is expanding:
more people are working on the same data over longer timeframes.
Expand Down Expand Up @@ -55,19 +55,6 @@ \section{Doing credible research at scale}
DIME Analytics was created to take advantage of the concentration and scale of research at DIME to develop and test solutions,
to ensure high quality data collection and research across the DIME portfolio,
and to make training and tools publicly available to the larger community of development researchers.
\textit{Data for Development Impact} compiles the ideas, best practices and software tools Analytics
has developed while supporting DIME's global impact evaluation portfolio.

The \textbf{DIME Wiki} is one of our flagship products, a free online collection of our resources and best practices.\sidenote{
\url{https://dimewiki.worldbank.org}}
This book complements the DIME Wiki by providing a structured narrative of the data workflow for a typical research project.
We will not give a lot of highly specific details in this text,
but we will point you to where they can be found.\sidenote{Like this:
\url{https://dimewiki.worldbank.org/Primary_Data_Collection}}
Each chapter focuses on one task, providing a primarily narrative account of:
what you will be doing; where in the workflow this task falls;
when it should be done; and how to implement it according to best practices.

We will use broad terminology throughout this book to refer to research team members:
\textbf{principal investigators (PIs)} who are responsible for
the overall design and stewardship of the study;
Expand All @@ -76,10 +63,27 @@ \section{Doing credible research at scale}
and \textbf{research assistants (RAs)} who are responsible for
handling data processing and analytical tasks.

\textit{Development Research in Practice} compiles the ideas, best practices and software tools
that the DIME Analytics team
has developed while supporting DIME's global impact evaluation portfolio.
Each chapter in this book focuses on one task, providing a primarily narrative account of:
what you will be doing; where in the workflow this task falls;
when it should be done; and how to implement it according to best practices.

\section{Adopting reproducible tools}
We will not always give a lot of highly specific implementation details in this text,
but will often point you to where they can be found on the \textbf{DIME Wiki}.\sidenote{Like this:
\url{https://dimewiki.worldbank.org/Primary_Data_Collection}}
The DIME Wiki is one of DIME Analytics' flagship products,
a free online collection of our resources and best practices.\sidenote{
\url{https://dimewiki.worldbank.org}}
This book complements the DIME Wiki by providing a structured narrative
of the data workflow for a typical research project.
The Wiki, by contrast, provides unstructured but detailed information
on how to complete each task, and links to further practical resources.

\section{Adopting reproducible practices through code}

We assume througout all of this book
We assume throughout all of this book
that you are going to do nearly all of your data work though code.
It may be possible to perform all relevant tasks
through the user interface in some statistical software,
Expand All @@ -99,29 +103,30 @@ \section{Adopting reproducible tools}
We believe that this must change somewhat:
in particular, we think that development practitioners
must begin to think about their code and programming workflows
just as methodologically as they think about their research workflows.
just as methodologically as they think about their research workflows,
and think of code and data as research outputs, just as manuscripts and briefs are.

Most tools have a learning and adaptation process,
meaning you will become most comfortable with each tool
only by using it in real-world work.
To support your process of learning reproducible tools and workflows,
will reference free and open-source tools wherever possible,
we reference free and open-source tools wherever possible,
and point to more detailed instructions when relevant.
Stata, as a proprietary software, is the notable exception here
due to its current popularity in development economics.\sidenote{
\url{https://aeadataeditor.github.io/presentation-20191211/\#9}}
bbdaniels marked this conversation as resolved.
Show resolved Hide resolved
This book also includes
the DIME Analytics Stata Style Guide
This book also includes, as an appendix,
the \textbf{DIME Analytics Stata Style Guide}
that we use in our work, which provides
some new standards for coding so that code styles
standards for coding in Stata so that code styles
can be harmonized across teams for easier understanding and reuse of code.
Stata has relatively few resources of this type available,
and the ones that we have created and shared here
and the one that we have created and shared here
we hope will be an asset to all its users.


\section{Writing reproducible code in a collaborative environment}
Throughout the book, we refer to the importance of good coding practices.
Throughout this book, we refer to the importance of good coding practices.
These are the foundation of reproducible and credible data work,
and a core part of the new data science of development research.
Code today is no longer a means to an end (such as a research paper),
Expand All @@ -130,7 +135,7 @@ \section{Writing reproducible code in a collaborative environment}
As this is fundamental to the remainder of the book's content,
we provide here a brief introduction to \textbf{``good'' code} and \textbf{process standardization}.

``Good'' code has two elements: (1) it is correct, i.e. it doesn't produce any errors,
``Good'' code has two elements: (1) it is correct, in that it doesn't produce any errors,
and (2) it is useful and comprehensible to someone who hasn't seen it before
(or even yourself a few weeks, months or years later).
Many researchers have been trained to code correctly.
Expand All @@ -155,8 +160,9 @@ \section{Writing reproducible code in a collaborative environment}
\textbf{structure}, \textbf{syntax}, and \textbf{style}.
We always tell people to ``code as if a stranger would read it''
(from tomorrow, that stranger could be you!).
The \textbf{structure} is the environment your code lives in:
good structure means that it is easy to find individual pieces of code that correspond to tasks.
The \textbf{structure} is the environment and file organization your code lives in:
good structure means that it is easy to find individual pieces of code
that correspond to specific tasks and outputs.
Good structure also means that functional blocks are sufficiently independent from each other
that they can be shuffled around, repurposed, and even deleted without damaging other portions.
The \textbf{syntax} is the literal language of your code.
Expand All @@ -166,7 +172,8 @@ \section{Writing reproducible code in a collaborative environment}
to figure out what a code chunk is trying to do.
\textbf{Style}, finally, is the way that the non-functional elements of your code convey its purpose.
Elements like spacing, indentation, and naming (or lack thereof) can make your code much more
bbdaniels marked this conversation as resolved.
Show resolved Hide resolved
(or much less) accessible to someone who is reading it for the first time and needs to understand it quickly and correctly.
(or much less) accessible to someone who is reading it for the first time
and needs to understand it quickly and correctly.

As you gain experience in coding
and get more confident with the way you implement these suggestions,
Expand All @@ -182,7 +189,6 @@ \section{Writing reproducible code in a collaborative environment}
What would happen if more observations would be added to the dataset?
Can my code be made more efficient or easier to understand?

\subsection{Code examples}
For some implementation portions where precise code is particularly important,
we will provide minimal code examples either in the book or on the DIME Wiki.
All code guidance is software-agnostic, but code examples are provided in Stata.
Expand All @@ -201,43 +207,46 @@ \subsection{Code examples}
but you should reference Stata help-files by writing \texttt{help [command]}
whenever you do not understand the command that is being used.
We hope that these snippets will provide a foundation for your code style.
Providing some standardization to Stata code style is also a goal of this team;
we provide our guidance on this in the Stata Style Guide in the Appendix.
Providing some standardization to Stata code style is also a goal of this team.

\section{Outline of this book}
mariaruth marked this conversation as resolved.
Show resolved Hide resolved

This book covers each stage of an empirical research project, from design to publication.
We start with ethical principles to guide empirical research,
focusing on research transparency and the right to privacy.
In Chapter 1, we outline a set of practices that help to ensure
research participants are appropriately protected and
research consumers can be confident in the conclusions reached.
Chapter 2 will teach you to structure your data work to be efficient,
collaborative and reproducible.
It discusses the importance of planning data work at the outset of the research project --
focusing on research reproducibility, transparency, and credibility.
In Chapter 1, we outline a set of practices that help to ensure that
mariaruth marked this conversation as resolved.
Show resolved Hide resolved
research consumers can be confident in the conclusions reached,
and research work can be assumed and verified to be reliable.
Chapter 2 will teach you to structure your data work for collaborative research,
while ensuring the privacy and security of research participants.
It discusses the importance of planning the tools that will be used;
lays the groundwork to structure the research project at its outset --
long before any data is acquired -- and provides suggestions for collaborative workflows and tools.
bbdaniels marked this conversation as resolved.
Show resolved Hide resolved
In Chapter 3, we turn to research design,
focusing specifically on how to measure treatment effects
and structure data for common experimental and quasi-experimental research methods.
We provide an overview of research designs frequently used for
causal inference, and consider implications for data structure.
Chapter 4 concerns sampling and randomization:
how to implement both simple and complex designs reproducibly,
and how to use power calculations and randomization inference
to critically and quantitatively assess
sampling and randomization to make optimal choices when planning studies.

Chapter 5 covers data acquisition. We start with
In Chapter 3, we turn to establishing a measurement framework,
focusing specifically on how to translate research design to a data work plan
and how to implement both simple and complex randomized designs in a reproducible manner.

Chapter 4 covers data acquisition. We start with
the legal and institutional frameworks for data ownership and licensing,
dive in depth on collecting high-quality survey data,
and finally discuss secure data handling during transfer, sharing, and storage.
Chapter 6 teaches reproducible and transparent workflows for data processing and analysis,
and provides guidance on de-identification of personally-identified data,
focusing on how to organize data work so that it is easy to code the desired analysis.
In Chapter 7, we turn to publication. You will learn
Chapter 5 teaches workflows for data processing.
It details how to construct ``tidy'' data at the appropriate units of analysis,
how to ensure uniquely identified datasets, and
how to routinely incorporate data quality checks into the workflow.
It also provides guidance on de-identification and cleaning of personally-identified data,
focusing on how to understand and structure data
so that it is ready for indicator construction and analytical work.
Chapter 6 discusses data analysis.
It begins with data construction, or the creation of new variables
from the raw data acquired or collected in the field.
It also introduces core principles for writing analytical code
and creating, exporting, and storing research outputs such as figures and tables reproducibily with dynamic documents.
In Chapter 7, we turn to publication.
This chapter discusses
how to effectively collaborate on technical writing,
how and why to publish data,
and guidelines for preparing functional and informative replication packages.
and guidelines for preparing functional and informative reproducibility packages.


While adopting the workflows and mindsets described in this book requires an up-front cost,
bbdaniels marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions chapters/conclusion.tex
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
We hope you have enjoyed \textit{Data for Development Impact: The DIME Analytics Resource Guide}.
We hope you have enjoyed \textit{Development Research in Practice: The DIME Analytics Data Handbook}.
bbdaniels marked this conversation as resolved.
Show resolved Hide resolved
Our aim was to teach you to handle data more efficiently, effectively, and ethically.
We laid out a complete vision of the tasks of a modern researcher,
from planning a project's data governance to publishing code and data
Expand Down Expand Up @@ -41,4 +41,4 @@
and come back to it anytime you need more information.
We wish you all the best in your work
and will love to hear any input you have on ours!\sidenote{
You can share your comments and suggestion on this book through \url{https://worldbank.github.io/d4di}.}
You can share your comments and suggestion on this book through \url{https://worldbank.github.io/dime-data-handbook}.}
bbdaniels marked this conversation as resolved.
Show resolved Hide resolved
10 changes: 5 additions & 5 deletions chapters/notes.tex
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
This is a draft peer review edition of
\textit{Data for Development Impact:
The DIME Analytics Resource Guide}.
\textit{Development Research in Practice:
The DIME Analytics Data Handbook}.
This version of the book has been substantially revised
since the first release in June 2019
with feedback from readers and other experts.
Expand All @@ -12,9 +12,9 @@
This book is intended to remain a living product
that is written and maintained in the open.
The raw code and edit history are online at:
\url{https://github.com/worldbank/d4di}.
\url{https://github.com/worldbank/dime-data-handbook}.
You can get a PDF copy at:
\url{https://worldbank.github.com/d4di}.
\url{https://worldbank.github.com/dime-data-handbook}.
The website also includes the most updated instructions
for providing feedback, as well as
a log of errata and updates that have been made to the content.
Expand All @@ -27,7 +27,7 @@ \subsection{Feedback}
We encourage feedback and corrections
so that we can improve the contents of the book
in future editions. Please visit
\url{https://worldbank.github.com/d4di/feedback} to
\url{https://worldbank.github.com/dime-data-handbook/feedback} to
see different options on how to provide feedback.
You can also email us at \url{[email protected]}
with input or comments, and we will be very thankful.
Expand Down
12 changes: 7 additions & 5 deletions chapters/preamble.tex
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@
% BOOK META-INFORMATION
%----------------------------------------------------------------------------------------

\title{Data for \\ \noindent Development Impact: \\ \bigskip
\noindent The DIME Analytics \\ \noindent Resource Guide} % Title of the book
\title{Development \\ \noindent Research \\ \noindent in Practice: \\ \bigskip
\noindent The DIME Analytics \\ \noindent Data Handbook} % Title of the book

\author{Kristoffer Bj{\"a}rkefur \\ \noindent Lu{\'i}za Cardoso de Andrade \\ \noindent Benjamin Daniels \\ \noindent Maria Jones \\} % Author

Expand All @@ -126,7 +126,7 @@

%Set this user input
\newcommand{\gitfolder}{.git} %relative path to .git folder from .tex doc
\newcommand{\reponame}{worldbank/d4di} % Name of account and repo be set in URL
\newcommand{\reponame}{worldbank/dime-data-handbook} % Name of account and repo be set in URL

%Based on this https://tex.stackexchange.com/questions/455396/how-to-include-the-current-git-commit-id-and-branch-in-my-document
\CatchFileDef{\headfull}{\gitfolder/HEAD.}{} %Get path to head file for checked out branch
Expand Down Expand Up @@ -176,15 +176,17 @@

\bigskip\par\smallcaps{Published by \thanklesspublisher}

\par\smallcaps{\url{https://worldbank.github.com/d4di}}
\par\smallcaps{\url{https://worldbank.github.com/dime-data-handbook}}

\par Released under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

\url{https://creativecommons.org/licenses/by/4.0}

\par\textit{First printing, \monthyear}

\par Compiled from: \commiturl
\par Compiled from commit: \newline
\vspace{-0.5cm}
\commiturl
\end{fullwidth}

%----------------------------------------------------------------------------------------
Expand Down
24 changes: 12 additions & 12 deletions manuscript.tex
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@
%----------------------------------------------------------------------------------------

\cleardoublepage
\chapter{Introduction: Data for development impact} % The asterisk leaves out this chapter from the table of contents
\chapter{Introduction: Development research in practice} % The asterisk leaves out this chapter from the table of contents

\input{chapters/introduction.tex}
\input{chapters/0-introduction.tex}

%----------------------------------------------------------------------------------------
% CHAPTER 1
Expand All @@ -35,16 +35,16 @@ \chapter{Introduction: Data for development impact} % The asterisk leaves out th
\chapter{Chapter 1: Reproducibility, transparency, and credibility}
\label{ch:1}

\input{chapters/1a-reproducibility.tex}
\input{chapters/1-reproducibility.tex}

%----------------------------------------------------------------------------------------
% CHAPTER 2
%----------------------------------------------------------------------------------------

\chapter{Chapter 2: Collaborating on code and data}
\chapter{Chapter 2: Setting the stage for collaboration}
\label{ch:2}

\input{chapters/planning-data-work.tex}
\input{chapters/2-collaboration.tex}

%----------------------------------------------------------------------------------------
% CHAPTER 3
Expand All @@ -54,29 +54,29 @@ \chapter{Chapter 2: Collaborating on code and data}
\chapter{Chapter 3: Establishing a measurement framework}
\label{ch:3}

\input{chapters/sampling-randomization-power.tex}
\input{chapters/3-measurement.tex}


%----------------------------------------------------------------------------------------
% CHAPTER 4
%----------------------------------------------------------------------------------------


\chapter{Chapter 4: Acquiring data}
\chapter{Chapter 4: Acquiring development data}
\label{ch:4}

\input{chapters/data-collection.tex}
\input{chapters/4-data-collection.tex}



%----------------------------------------------------------------------------------------
% CHAPTER 5
%----------------------------------------------------------------------------------------

\chapter{Chapter 5: Cleaning data}
\chapter{Chapter 5: Cleaning and processing research data}
\label{ch:5}

\input{chapters/data-processing.tex}
\input{chapters/5-data-processing.tex}

%----------------------------------------------------------------------------------------
% CHAPTER 6
Expand All @@ -85,13 +85,13 @@ \chapter{Chapter 5: Cleaning data}
\chapter{Chapter 6: Analyzing research data}
\label{ch:6}

\input{chapters/data-analysis.tex}
\input{chapters/6-data-analysis.tex}

%----------------------------------------------------------------------------------------
% CHAPTER 7
%----------------------------------------------------------------------------------------

\chapter{Chapter 7: Publishing collaborative research}
\chapter{Chapter 7: Publishing research outputs}
\label{ch:7}

\input{chapters/7-publication.tex}
Expand Down
Binary file removed mkdocs/docs/bookpdf/Data-for-Development-Impact.pdf
Binary file not shown.
Loading