Merge branch 'main' of github.com:langcog/experimentology

langcog · Dec 18, 2023 · 041476a · 041476a
2 parents 3a08e41 + 8fefaf7
commit 041476a
Show file tree

Hide file tree

Showing 3 changed files with 7 additions and 7 deletions.
diff --git a/001-experiments.qmd b/001-experiments.qmd
@@ -72,7 +72,7 @@ The answer is **randomization**. If you randomly split a large roomful of people
 
 ![If you randomly split a large group of people into groups, the groups will, on average, be equal in every way.](images/experiments/money4-drawing.png){#fig-experiments-money4  .column-margin}
 
-So, here's our simple experimental design: we randomly assign some people to a money group and some people to a no-money control group! Then we measure happiness. The basic logic of randomization is that, if money causes happiness, we should see more happiness -- on average -- in the money group.^[You may already be protesting that this experiment could be done better. Maybe we could measure happiness before and after randomization, to increase precision. Maybe we need to give a small amount of money to participants in the control condition to make sure that participants in both conditions interact with an experimenter and hence that the conditions are as similar as possible. We agree! These are important parts of experimental design, and we'll touch on them in subsequent chapters.]
+So, here's our simple experimental design: we randomly assign some people to a money group and some people to a no-money control group! (We sometimes call these groups **conditions**). Then we measure the happiness of people in both groups. The basic logic of randomization is that, if money causes happiness, we should see more happiness -- on average -- in the money group.^[You may already be protesting that this experiment could be done better. Maybe we could measure happiness before and after randomization, to increase precision. Maybe we need to give a small amount of money to participants in the control condition to make sure that participants in both conditions interact with an experimenter and hence that the conditions are as similar as possible. We agree! These are important parts of experimental design, and we'll touch on them in subsequent chapters.]
 
 Randomization is a powerful tool, but there is a caveat: it doesn’t work every time. *On average*, randomization will ensure that your money and no-money groups will be equal with respect to confounds like number of friends, education attainment, and sleep quality. But just as you can flip a coin and sometimes get heads 9 out of 10 times, sometimes you use randomization and still get more highly-educated people in one condition than the other. When you randomize, you guarantee that, on average, all confounds are controlled. Hence, there is no systematic bias in your estimate from these confounds. But there will stil be some noise from random variation.
 
@@ -142,7 +142,7 @@ There are three things to notice about this simulation, however. First, the nois
 Finally, although the small experiments are individually very noisy, the *average effect* across all of the small experiments is still very close to the true effect. This last point illustrates what we mean when we say that randomized experiments remove confounds. Even though friendship is still an important factor determining happiness in our simulation, the average effect across experiments is correct and each individual estimate is unbiased.
 :::
 
-In sum, randomization is a remarkably simple and effective way of holding everything constant besides a manipulated variable. In doing so, randomization allows experimental psychologists to make unbiased estimates of causal relationships. Importantly, randomization works both when you do have control of every aspect of the experiment -- like when you are baking a cake -- and even when you don't -- like when you are doing experiments with people. 
+In sum, randomization is a remarkably simple and effective way of holding everything constant besides a manipulated variable. In doing so, randomization allows experimental psychologists to make unbiased estimates of causal relationships. Importantly, randomization works both when you do have control of every aspect of the experiment -- like when you are baking a cake -- and even when you don't -- like when you are doing experiments with people.^[There's an important caveat to this discussion: you don't always have to randomize *people*. You can use an experimental design called a **within-participants** design, in which the same people are in multiple conditions. This type of design has a different set of unknown confounds, this time centering around **time**, not person. So, to get around them, you have to randomize the order in which your manipulation is delivered. This randomization works very well for some kinds of manipulations, but not so well for others. For example, the money-happiness experiment we've been talking about won't work very well. We'll talk more about these kinds of designs in @sec-design.]
 
 
 ## Generalizability 

diff --git a/006-inference.qmd b/006-inference.qmd
@@ -27,13 +27,13 @@ The move to "go test for significance" before visualizing your data and trying t
 
 In this chapter, we will describe NHST, the conventional method that many students still learn (and many scientists still use) as their primary method for engaging with data. All practicing experimentalists need to understand NHST, both to read the literature and also to apply this method in appropriate situations. For example, NHST may be a reasonable tool for testing whether an intervention leads to a difference between a treatment condition and an appropriate control, although it still doesn't tell you about the size of the intervention effect! But we will also try to contextualize NHST as a very special case of a broader set of modeling and inference strategies. Further, we will continue to flesh out our account of how some of the pathologies of NHST have been a driver of the replication crisis.
 
-![Clarifying the distinctions between Bayesian and Frequentist paradigms and the ways that they approach inference and estimation. For many settings, we think the estimation mindset is more useful. Adapted from @kruschke2018.](images/inference/krushke2.png){#fig-inference-krushke .column-margin}
+![Clarifying the distinctions between Bayesian and Frequentist paradigms and the tools they offer for measurement and hypothesis testing. For many settings, we think the measurement mindset is more useful. Adapted from @kruschke2018.](images/inference/krushke3.png){#fig-inference-krushke .column-margin}
 
 What should replace NHST? @fig-inference-krushke shows one way of organizing different inferential approaches. There has been a recent move towards the use of Bayes Factors to quantify the evidence in support of different candidate hypotheses. Bayes Factors can help answer questions like (2. We introduce these tools, and believe that they have broader applicability than the NHST framework and should be known by students. On the other hand, Bayes Factors are not a panacea. They have many of the same problems as NHST when they are applied dichotomously.
 
-Instead of dichotomous frequentist or Bayesian inference, we advocate for **estimation** and **modeling** strategies, which are more suited towards questions (3) and (4) [@cumming2014;@kruschke2018]. The goal of these strategies is to yield an accurate and precise estimate of the relationships underlying observed variation in the data. 
+Instead of dichotomous frequentist or Bayesian hypothesis testing, we advocate for a **measurement** strategy, which is more suited towards questions (3) and (4) [@cumming2014;@kruschke2018]. The goal of these strategies is to yield an accurate and precise estimate of the relationships underlying observed variation in the data. 
 
-This isn't a statistics book and we won't attempt to teach the full array of important statistical concepts that will allow students to build good models of a broad array of datasets. (Sorry!).^[If you're interested in going deeper, here are two books that have been really influential for us. The first is @gelman2006b and its successor @gelman2020, which teach regression and multi-level modeling from the perspective of data description. The second is @mcelreath2018, a course on building Bayesian models of the causal structure of your data. Honestly, neither is an easy book to sit down and read (unless you are the kind of person who reads statistics books on the subway for fun) but both really reward detailed study. We encourage you to get together a reading group and go through the exercises together. It'll be well worth while in its impact on your statistical and scientific thinking.] But we do want you to be able to reason about inference and modeling. In this chapter, we'll start by making some inferences about our tea-tasting example from the last chapter, using this example to build up intuitions about inference and estimation. Then in @sec-models, we'll start to look at more sophisticated models and how they can be fit to real datasets.
+This isn't a statistics book and we won't attempt to teach the full array of important statistical concepts that will allow students to build good models of a broad array of datasets. (Sorry!).^[If you're interested in going deeper, here are two books that have been really influential for us. The first is @gelman2006b and its successor @gelman2020, which teach regression and multi-level modeling from the perspective of data description. The second is @mcelreath2018, a course on building Bayesian models of the causal structure of your data. Honestly, neither is an easy book to sit down and read (unless you are the kind of person who reads statistics books on the subway for fun) but both really reward detailed study. We encourage you to get together a reading group and go through the exercises together. It'll be well worth while in its impact on your statistical and scientific thinking.] But we do want you to be able to reason about inference and modeling. In this chapter, we'll start by making some inferences about our tea-tasting example from the last chapter, using this example to build up intuitions about hypothesis testing and inference. Then in @sec-models, we'll start to look at more sophisticated models and how they can be fit to real datasets.
 
 ![Sampling distribution for the treatment effect in the tea-tasting experiment, given many different repetitions of the same experiment, each with N=9 per group. Circles represent average treatment effects from different individual experiments, while the thick line represents the form of the underlying distribution.](images/inference/sampling-small.png){#fig-inference-sampling-small .column-margin}
 
@@ -165,7 +165,7 @@ As we saw before, the larger the sample size, the smaller the standard error. Th
 
 ![Example distribution of treatment effects under the null model for a larger experiment.](images/inference/p-region-bigger.png){#fig-inference-null-model2 .column-margin}
 
-The more participants in the experiment, the tighter the null distribution becomes, and hence the smaller the region in which we should expect a null treatment effect to fall. Because our expectation based on the null becomes more precise, we will be able to reject the null based on smaller treatment effects. In inference of this type, as with estimation, our goals matter. If we're merely testing a hypothesis out of curiosity, perhaps we don't want to measure too many cups of tea. But if we were designing the tea strategy for a major cafe chain, the stakes would be higher and a more precise estimate might necessary; in that case, maybe we'd want to do a more extensive experiment! 
+The more participants in the experiment, the tighter the null distribution becomes, and hence the smaller the region in which we should expect a null treatment effect to fall. Because our expectation based on the null becomes more precise, we will be able to reject the null based on smaller treatment effects. In this type of hypothesis testing, as with estimation, our goals matter. If we're merely testing a hypothesis out of curiosity, perhaps we don't want to measure too many cups of tea. But if we were designing the tea strategy for a major cafe chain, the stakes would be higher; in that case, maybe we'd want to do a more extensive experiment! 
 
 ::: {.callout-note title="code"}
 We can do a more systematic simulation of the null regions for different sample sizes by simply adding a parameter to our simulation. 
@@ -526,7 +526,7 @@ Credible intervals are nice because they don't give rise to many of the inferenc
 
 ## Chapter summary: Inference
 
-Inference tools help you move from characteristics of the sample to characteristics of the population. This move is a critical part of generalization from research data. But we hope we've convinced you that inference doesn't have to mean making a binary decision about the presence or absence of an effect. A strategy that seeks to estimate the size of a particular effects and then make inferences about the precision of that estimate is often much more helpful as a building block for theory. As we move towards estimating causal effects in more complex experimental designs, this estimation process will require more sophisticated models. Towards that goal, the next chapter provides some guidance for how to build such models.
+Inference tools help you move from characteristics of the sample to characteristics of the population. This move is a critical part of generalization from research data. But we hope we've convinced you that inference doesn't have to mean making a binary decision about the presence or absence of an effect. A strategy that seeks to measure the size of a particular effect and then make inferences about the precision of that estimate is often much more helpful as a building block for theory. As we move towards estimating causal effects in more complex experimental designs, the process will require more sophisticated models. Towards that goal, the next chapter provides some guidance for how to build such models.
 
 ::: {.callout-note title="discussion questions"}
 1. Can you write the definition of a $p$-value and a Bayes Factor without looking them up? Try this out -- what parts of the definitions did you get wrong?

diff --git a/images/inference/krushke3.png b/images/inference/krushke3.png