Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed minor typos #1

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 030-Variation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Sometimes it's sensible to translate the levels of a categorical variable into

Without a sense of *inbetweenness* of levels, it's arbitrary to assign numbers to the various levels. Except in one situation.

Often, categorical variables have only two levels. Yes or no. Dead or alive. Accepted or rejected. Treatment and control. Such variables are sometimes called *binary* (like the 0/1 of computer bits) or *dicotomous* or *binomial* (meaning, having two names) or even *two-level*. In the previous chapter, we called them *indicator* variables.
Often, categorical variables have only two levels. Yes or no. Dead or alive. Accepted or rejected. Treatment and control. Such variables are sometimes called *binary* (like the 0/1 of computer bits) or *dichotomous* or *binomial* (meaning, having two names) or even *two-level*. In the previous chapter, we called them *indicator* variables.

When dealing with an indicator variable, there's no level to be inbetween; there are only two levels and the idea of "in between" requires at least three distinct things. So we can easily agree, regardless of our opinions about how the world works, that the difference is zero between labels that are the same (say, *yes* and *yes* or between *no* and *no*). And when the labels are different (say, *yes* and *no*) we just need to assign a non-zero number to the difference.

Expand Down
4 changes: 2 additions & 2 deletions 040-Modeling-Variation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ library(splines)

The point of statistics is to understand how things vary. For instance, human height varies from one person to another. Some of that variation is associated with the sex of the person: women *tend to be* slightly shorter than men. Some of the variation in height relates to genes and genetic variation, some to differing nutrition and general health, etc.

Statistical models attempt to use the variation in explanatory variables -- sex, genetic traits -- to account for the variation in a response variable. To offer a contemporary example, some automobiles are involved in fatal accidents and some (the vast majority, thankfully!) are not. It varies. What's behind the variation? It could be the weather conditions at the time. It also be human driver fatique, inebriation, incompetence, distraction, etc. It could also be characteristics of the vehicle itself: size, weight, maneuvrability, breaking power, physical wear, automatic breaking, etc. And a lot of the variation is a matter of chance: for instance, the arrival of another car at an intersection at a particular instant.
Statistical models attempt to use the variation in explanatory variables -- sex, genetic traits -- to account for the variation in a response variable. To offer a contemporary example, some automobiles are involved in fatal accidents and some (the vast majority, thankfully!) are not. It varies. What's behind the variation? It could be the weather conditions at the time. It also be human driver fatique, inebriation, incompetence, distraction, etc. It could also be characteristics of the vehicle itself: size, weight, maneuverability, braking power, physical wear, automatic braking, etc. And a lot of the variation is a matter of chance: for instance, the arrival of another car at an intersection at a particular instant.

## Statistical models

Expand Down Expand Up @@ -131,7 +131,7 @@ gf_jitter(as.numeric(sex == "F") ~ mother, data = Galton,
gf_labs(y = "Child's sex", x = "Mother's height (inches)")
```

Again, the model output is numeric, in the form of the probability that the child is female. The model suggests that 60-inch tall mothers are slightly less likely to bear girls and 68-inch tall mothers. Common sense suggests that a baby's sex is not influenced by the mother's height. Correspondingly, the model output is around 50% regardless of the mother's height.
Again, the model output is numeric, in the form of the probability that the child is female. The model suggests that 60-inch tall mothers are slightly less likely to bear girls than 68-inch tall mothers. Common sense suggests that a baby's sex is not influenced by the mother's height. Correspondingly, the model output is around 50% regardless of the mother's height.

Perhaps you're surprised to see that there is any slope at all to the function. Don't be surprised yet, because we haven't shown that such a statement is justified by the data: we have a setting for inference but have not yet carried out the inference calculations to tell us if the statement is justified.

Expand Down