Lesson15.Rmd

---
title: "Lesson 15: Review for Exam 2"
output: 
  html_document:
    theme: cerulean
    toc: true
    toc_float: false
---

<script type="text/javascript">
 function showhide(id) {
    var e = document.getElementById(id);
    e.style.display = (e.style.display == 'block') ? 'none' : 'block';
 }
</script>


This is not designed to be a comprehensive review.  There may be items on the exam that are not covered in this review.  Similarly, there may be items in this review that are not tested on this exam.  You are strongly encouraged to review the readings, homework exercises, and other activities from Units 1 and 2 as you prepare for the exam. In particular, you should go over the [Lesson 8: Review for Exam 1](Lesson08.html). 

<!-- Use the <span style="font-variant:small-caps"> -->
<!-- [[Index]]</span> to review definitions of important terms. 
Needs to be built-->

## Unit 2 Lesson Outcomes 

<a href="javascript:showhide('oc')"><span style="font-size:8pt;">Show/Hide Outcomes</span></a>
<div id="oc" style="display:none;">
The expectation on the exam is the following outcomes. You should be able to do:

- All of the Outcomes from [Lesson 08 (Unit 1)](Lesson08.html)

- Confidence Intervals: 
    + Calculate and interpret a confidence interval given a confidence level and a given parameter.
    + Explain how the margin of error changes with the sample size and the level of confidence. 
    + Identify a point estimate and margin of error for a given confidence interval.
    + Show the appropriate connections between the numerical and graphical summaries that support a confidence interval. 
    + Check the requirements for a confidence interval.
    + Determining correct confidence interval for a given scenario

- Calculate a confidence interval for 
    + A single mean with $\sigma$ known.
    + A single mean with $\sigma$ unknown.
    + The mean of differences with dependent samples.
    + Difference of two means of independent samples.

- Hypothesis Testing
    + State the null and alternative hypothesis for the chosen test. 
    + Calculate the test-statistic, degrees of freedom and p-value of the hypothesis test.
    + Assess the statistical significance by comparing the p-value to the α-level.
    + Check the requirements for the hypothesis test.
    + Show the appropriate connections between the numerical and graphical summaries that support a hypothesis test. 
    + Draw a correct conclusion for a hypothesis test.
    + Interpret a Type I and II error.
    + Determining correct hypothesis test for a given scenario. 

- Conduct a Hypothesis Test for
    + A single mean with σ known.
    + A single mean with σ unknown.
    + Difference of two means with dependent samples.
    + The mean of differences with independent samples.
    + Several means (ANOVA).
</div>
<br>

## Unit 2 Lesson Summaries

Here are the summaries for each lesson in Unit 2. Reviewing these key points from each lesson will help you in your preparation for the exam.

<a href="javascript:showhide('ls')"><span style="font-size:8pt;">Show/Hide Summaries</span></a>
<div id="ls">
<br>
<div class="RecapHeading">Lesson 09 Recap</div>
<div class="Summary">

- The **null hypothesis ($H_0$)** is the foundational assumption about a population and represents the status quo. It is a statement of equality ($=$). The **alternative hypothesis ($H_a$)** is a different assumption about a population and is a statement of inequality ($<$, $>$, or $\ne$). Using a **hypothesis test**, we determine whether it is more likely that the null hypothesis or the alternative hypothesis is true.

- The **$P$-value** is the probability of getting a test statistic at least as extreme as the one you got, assuming $H_0$ is true. A $P$-value is calculated by finding the area under the normal distribution curve that is more extreme (farther away from the mean) than the z-score. The alternative hypothesis tells us whether we look at both tails or only one.

- The **level of significance ($\alpha$)** is the standard for determining whether or not the null hypothesis should be rejected. Typical values for $\alpha$ are $0.05$, $0.10$, and $0.01$. If the $P$-value is less than $\alpha$ we reject the null. If the $P$-value is not less than $\alpha$ we fail to reject the null.

- A **Type I error** is committed when we reject a null hypothesis that is, in reality, true. A **Type II error** is committed when we fail to reject a null hypothesis that is, in reality, not true. The value of $\alpha$ is the probability of committing a Type I error.
<br>
</div>
<br>

<div class="RecapHeading">Lesson 10 Recap</div>
<div class="Summary">

- The **margin of error** gives an estimate of the variability of responses. It is calculated as $\displaystyle{m=z^*\frac{\sigma}{\sqrt{n}}}$ where $z^*$ represents a calculated z-score corresponding to a particular confidence level.

- A **confidence interval** is an interval estimator used to give a range of plausible values for a parameter. The width of a confidence interval depends on the chosen confidence level (and its corresponding value of $z^*$) as well as the sample size ($n$). This is the equation for calculating confidence intervals:
$$\displaystyle{\left(\bar x-z^*\frac{\sigma}{\sqrt{n}},~\bar x+z^*\frac{\sigma}{\sqrt{n}}\right)}$$

- The **sample size formula** allows us to estimate the number of observations required to obtain a specific margin of error. $\displaystyle{n=\left(\frac{z^*\sigma}{m}\right)^2}$
<br>
</div>
<br>

<div class="RecapHeading">Lesson 11 Recap</div>
<div class="Summary">

- In practice we rarely know the true standard deviation $\sigma$ and will therefore be unable to calculate a z-score. **Student's t-distribution** gives us a new test statistic, $t$, that is calculated using the sample standard deviation ($s$) instead.
$$ \displaystyle{ t = \frac {\bar x - \mu} {s / \sqrt{n}} } $$

- The $t$-distribution is similar to a normal distribution in that it is bell-shaped and symmetrical, but the exact shape of the $t$-distribution depends on the **degrees of freedom ($df$)**. 
$$df=n-1$$

- You will use Excel to carry out hypothesis testing and create confidence intervals involving $t$-distributions.
<br>
</div>
<br>

<div class="RecapHeading">Lesson 12 Recap</div>
<div class="Summary">

- The key characteristic of **dependent samples** (or **matched pairs**) is that knowing which subjects will be in group 1 determines which subjects will be in group 2.

- We use slightly different variables when conducting inference using dependent samples:

    Group 1 values: $x_1$&nbsp;&nbsp;Group 2 values: $x_2$&nbsp;&nbsp;Differences: $d$&nbsp;&nbsp;Population mean: $\mu_d$&nbsp;&nbsp;Sample mean: $\bar d$&nbsp;&nbsp;Sample standard deviation: $s_d$

- When conducting hypothesis tests using dependent samples, the null hypothesis is always $\mu_d=0$, indicating that there is no change between the first population and the second population. The alternative hypothesis can be left-tailed ($<$), right-tailed($>$), or two-tailed($\ne$).
<br>
</div>
<br>

<div class="RecapHeading">Lesson 13 Recap</div>
<div class="Summary">

- In contrast to dependent samples, two samples are independent if knowing which subjects are in group 1 tells you nothing about which subjects will be in group 2. With **independent samples**, there is no pairing between the groups.

- When conducting inference using independent samples we use $\bar x_1$, $s_1$, and $n_1$ for the mean, standard deviation, and sample size, respectively, of group 1. We use the symbols $\bar x_2$, $s_2$, and $n_2$ for group 2.

- When working with independent samples it is important to graphically illustrate each sample separately. Combining the groups to create a single graph is not appropriate.

- When conducting hypothesis tests using independent samples, the null hypothesis is always $\mu_1=\mu_2$, indicating that there is no difference between the two populations. The alternative hypothesis can be left-tailed ($<$), right-tailed($>$), or two-tailed($\ne$).

- Whenever zero is contained in the confidence interval of the difference of the true means we conclude that there is no significant difference between the two populations.
<br>
</div>
<br>

<div class="RecapHeading">Lesson 14 Recap</div>
<div class="Summary">

**ANOVA** is used to compare the means for several groups. The hypotheses for the test are always:
$$
\begin{align}
H_0: & ~ \textrm{All the means are equal} \\
H_a: & ~ \textrm{At least one of the means differs}
\end{align}
$$

- For ANOVA testing we use an **$F$-distribution**, which is right-skewed. The $P$-value of an ANOVA test is always the area to the right of the $F$-statistic.

- We can conduct ANOVA testing when the following three requirements are satisfied:
    1. The data come from a simple random sample.
    2. The data are normally distributed within each group.
        - This is considered met unless one or more of the groups has a *strongly* skewed distribution.
    3. The variance is constant.
        - This is satisfied when the largest variance is not more than four times the smallest variance.
<br>
</div>
<br>
</div>
<br>

## Navigation

<center>
| **Previous Reading** | **This Reading** | **Next Reading** |
| :------------------: | :--------------: | :--------------: |
| [Lesson 14: <br> Inference for Several Means (ANOVA)](Lesson14.html) | Lesson 15: <br> Review for Exam 2 | [Lesson 16: <br> Describing Categorical Data: Proportions; Sampling Distribution of a Sample Proportion](Lesson16.html) |
</center>