From dbee72488769c1b92c0493f78d662ffa06952f3e Mon Sep 17 00:00:00 2001 From: debbieyuster Date: Thu, 26 Dec 2019 22:28:26 -0500 Subject: [PATCH 1/2] Update 030-Variation.Rmd --- 030-Variation.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/030-Variation.Rmd b/030-Variation.Rmd index ca15e48..2b78c4a 100644 --- a/030-Variation.Rmd +++ b/030-Variation.Rmd @@ -43,7 +43,7 @@ Sometimes it's sensible to translate the levels of a categorical variable into Without a sense of *inbetweenness* of levels, it's arbitrary to assign numbers to the various levels. Except in one situation. -Often, categorical variables have only two levels. Yes or no. Dead or alive. Accepted or rejected. Treatment and control. Such variables are sometimes called *binary* (like the 0/1 of computer bits) or *dicotomous* or *binomial* (meaning, having two names) or even *two-level*. In the previous chapter, we called them *indicator* variables. +Often, categorical variables have only two levels. Yes or no. Dead or alive. Accepted or rejected. Treatment and control. Such variables are sometimes called *binary* (like the 0/1 of computer bits) or *dichotomous* or *binomial* (meaning, having two names) or even *two-level*. In the previous chapter, we called them *indicator* variables. When dealing with an indicator variable, there's no level to be inbetween; there are only two levels and the idea of "in between" requires at least three distinct things. So we can easily agree, regardless of our opinions about how the world works, that the difference is zero between labels that are the same (say, *yes* and *yes* or between *no* and *no*). And when the labels are different (say, *yes* and *no*) we just need to assign a non-zero number to the difference. From b37c4d6e1d45e3305703c4fd8980db2a15d8859e Mon Sep 17 00:00:00 2001 From: debbieyuster Date: Thu, 26 Dec 2019 22:34:31 -0500 Subject: [PATCH 2/2] Update 040-Modeling-Variation.Rmd --- 040-Modeling-Variation.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/040-Modeling-Variation.Rmd b/040-Modeling-Variation.Rmd index 8a76d48..6269958 100644 --- a/040-Modeling-Variation.Rmd +++ b/040-Modeling-Variation.Rmd @@ -8,7 +8,7 @@ library(splines) The point of statistics is to understand how things vary. For instance, human height varies from one person to another. Some of that variation is associated with the sex of the person: women *tend to be* slightly shorter than men. Some of the variation in height relates to genes and genetic variation, some to differing nutrition and general health, etc. -Statistical models attempt to use the variation in explanatory variables -- sex, genetic traits -- to account for the variation in a response variable. To offer a contemporary example, some automobiles are involved in fatal accidents and some (the vast majority, thankfully!) are not. It varies. What's behind the variation? It could be the weather conditions at the time. It also be human driver fatique, inebriation, incompetence, distraction, etc. It could also be characteristics of the vehicle itself: size, weight, maneuvrability, breaking power, physical wear, automatic breaking, etc. And a lot of the variation is a matter of chance: for instance, the arrival of another car at an intersection at a particular instant. +Statistical models attempt to use the variation in explanatory variables -- sex, genetic traits -- to account for the variation in a response variable. To offer a contemporary example, some automobiles are involved in fatal accidents and some (the vast majority, thankfully!) are not. It varies. What's behind the variation? It could be the weather conditions at the time. It also be human driver fatique, inebriation, incompetence, distraction, etc. It could also be characteristics of the vehicle itself: size, weight, maneuverability, braking power, physical wear, automatic braking, etc. And a lot of the variation is a matter of chance: for instance, the arrival of another car at an intersection at a particular instant. ## Statistical models @@ -131,7 +131,7 @@ gf_jitter(as.numeric(sex == "F") ~ mother, data = Galton, gf_labs(y = "Child's sex", x = "Mother's height (inches)") ``` -Again, the model output is numeric, in the form of the probability that the child is female. The model suggests that 60-inch tall mothers are slightly less likely to bear girls and 68-inch tall mothers. Common sense suggests that a baby's sex is not influenced by the mother's height. Correspondingly, the model output is around 50% regardless of the mother's height. +Again, the model output is numeric, in the form of the probability that the child is female. The model suggests that 60-inch tall mothers are slightly less likely to bear girls than 68-inch tall mothers. Common sense suggests that a baby's sex is not influenced by the mother's height. Correspondingly, the model output is around 50% regardless of the mother's height. Perhaps you're surprised to see that there is any slope at all to the function. Don't be surprised yet, because we haven't shown that such a statement is justified by the data: we have a setting for inference but have not yet carried out the inference calculations to tell us if the statement is justified.