From d87dfc2118df0c7205fb3070b00655897cdd40df Mon Sep 17 00:00:00 2001 From: Alex Lehner Date: Thu, 16 Feb 2023 09:45:03 -0600 Subject: [PATCH] Update start-simple.md I did not like the enumeration, let's do bullet points instead! (sorry, I did not find anything else) --- texts/start-simple.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/texts/start-simple.md b/texts/start-simple.md index 550290f..27b1e2c 100644 --- a/texts/start-simple.md +++ b/texts/start-simple.md @@ -3,8 +3,8 @@ How this looks in practice will depend on how exploratory you are in your approa **A more ambitious but extremely useful approach to building models gradually is to work with fake data.** This is particularly apt when we are blinded to part of the data. We do this by assuming some "data generating process" (DGP), which is a function that generates datasets (typically with some noise!) and then programming your analytics to work with your DGP outputs. This approach has many advantages, including -1. testing performance of your methods (e.g., bias, precision, statistical power) -1. being able to program your analysis without waiting for data -1. checking your understanding of how your chosen statistical methods work "under the hood" +* testing performance of your methods (e.g., bias, precision, statistical power) +* being able to program your analysis without waiting for data +* checking your understanding of how your chosen statistical methods work "under the hood" -In cases where you know something about the problem, you can code a DGP without first looking at the data. Typically, however, you will use some existing data to design your DGP. For example, if studying an intervention meant to reduce child mortality in a country, you can typically find census data with geographical and temporal variation for that country. This information will allow you to simulate a realistic data structure (via resampling, adding noise, or just visually checking that your DGP matches census data) against which you can test your methods. Note as well that programming a DGP first will often help you spot problems in data from your research project. In our example, suppose the project-collected data for an indicator has a distribution which does not resemble that of the same indicator on the census (e.g., it has much higher mean or much lower variation). Then you may need to ask yourself if this indicator is defined consistently with other data sources, in what way your sample is representative, or hypothesise some necessary model adjustments. \ No newline at end of file +In cases where you know something about the problem, you can code a DGP without first looking at the data. Typically, however, you will use some existing data to design your DGP. For example, if studying an intervention meant to reduce child mortality in a country, you can typically find census data with geographical and temporal variation for that country. This information will allow you to simulate a realistic data structure (via resampling, adding noise, or just visually checking that your DGP matches census data) against which you can test your methods. Note as well that programming a DGP first will often help you spot problems in data from your research project. In our example, suppose the project-collected data for an indicator has a distribution which does not resemble that of the same indicator on the census (e.g., it has much higher mean or much lower variation). Then you may need to ask yourself if this indicator is defined consistently with other data sources, in what way your sample is representative, or hypothesise some necessary model adjustments.