From this section, students are expected to be able to:
- Describe real-world examples of questions that can be answered with the statistical inference methods presented in this course (e.g., estimation, hypothesis testing).
- Name common population parameters (mean, proportion, median, variance, standard deviation, and correlation) that are often estimated using sample data, and write computer scripts to calculate estimates of these parameters.
- Define the following terms in relation to statistical inference: population, sample, population parameters, estimate, sampling distribution, sample distribution.
- Write an R script to draw random samples from a finite population (e.g., census data).
- Write an R script to reveal a sampling distribution from a finite population.
From this section, students are expected to be able to:
- Compare and contrast quantitative and categorical variables.
- Explain random and representative sampling and how this can influence estimation.
- Define random variables and explain how they relate to sampling.
- Define standard error and explain its purpose.
- Compare and contrast population distribution, sample distribution and an estimator's sampling distribution.
- Explain what a sampling distribution is, list its properties, and its purpose in statistical inference.
From this section, students are expected to be able to:
- Explain why we don’t know/have a sampling distribution in practice/real life.
- Define bootstrapping.
- Write a computer script to create a bootstrap distribution to approximate a sampling distribution.
- Contrast the bootstrap sampling distribution with an assumed sampling distribution.
- Estimate the standard error of an estimator using bootstrapping.
From this section, students are expected to be able to:
- Define and calculate sample quantiles.
- Define what a confidence interval is, and why we want to generate one.
- Explain how the bootstrap sampling distribution can be used to create confidence intervals.
- Write a computer script to calculate confidence intervals for a population parameter using bootstrapping.
- Effectively visualize point estimates and confidence intervals.
- Interpret and explain results from confidence intervals.
- Discuss the potential limitations of these methods.
In this week, students will write a mid-term exam, and begin working on their projects. Also from this section, students are expected to be able to
- Propose parameters that are useful, given the type of data.
- Propose parameters that are useful, given a question.
- Choose an appropriate way to present estimator uncertainty, given a question, by comparing and contrasting the usefulness of confidence intervals vs. standard error.
From this section, students are expected to be able to:
- Give an example of a question you could answer with a hypothesis test.
- Differentiate composite vs. simple hypotheses.
- Given an inferential question, formulate null and alternative hypotheses to be used in a hypothesis test.
- Identify the steps and components of a basic hypothesis test ("there is only one hypothesis test").
- Write computer scripts to perform hypothesis testing via simulation, randomization and bootstrapping approaches, as well as interpret the output.
- Describe the relationship between confidence intervals and hypothesis testing.
- Discuss the potential limitations of this simulation approach to hypothesis testing.
Week 7: Confidence Intervals (of means and proportions) Based on the Assumption of Normality or the Central Limit Theorem
From this section, students are expected to be able to:
- Describe the Law of Large Numbers.
- Describe a normal distribution.
- Explain the Central Limit Theorem and its role in constructing confidence intervals.
- Write a computer script to calculate confidence intervals based on the assumption of normality / the Central Limit Theorem.
- Discuss the potential limitations of these methods.
- Decide whether to use asymptotic theory or bootstrapping to compute estimator uncertainty.
From this section, students are expected to be able to:
- Describe a t-distribution and its relationship with the normal distribution.
- Use results from the assumption of normality or the Central Limit Theorem to perform estimation and hypothesis testing.
- Compare and contrast the parts of estimation and hypothesis testing that differ between simulation- and resampling-based approaches with the assumption of normality or the Central Limit Theorem- based approaches.
- Write a computer script to perform hypothesis testing based on results from the assumption of normality or the Central Limit Theorem.
- Discuss the potential limitations of these methods.
From this section, students are expected to be able to:
- Define type I & II errors.
- Describe responsible use and reporting of p-values from hypothesis tests.
- Discuss how these errors are linked to a "reproducibility crisis".
- Measure how these errors amplify when performing multiple hypothesis testing, in the context of multiple comparisons.
From this section, students are expected to be able to:
- Run a simple one-way ANOVA, without knowing the details of the test.
- Apply FDR or Bonferroni correction to control the errors when performing multiple hypothesis testing.
- The value of presenting an entire distribution as a prediction.
- Estimate a population distribution using simulation (if there's time). Example: wedding planning: https://www.tomasbeuzen.com/post/party-planning-probability/
- Calculate, interpret, and visualize prediction intervals.
This week is designed as independent studying where the students will be working on a project that aims at answering an inferential question with the material they have learned from weeks 1-11.