Merge pull request #424 from UBC-DSCI/dev

update master with dev
UBC-DSCI · Feb 5, 2022 · 8239813 · 8239813
2 parents bb5742b + aebdc0e
commit 8239813
Show file tree

Hide file tree

Showing 18 changed files with 143 additions and 98 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -102,7 +102,6 @@ RUN sed -i 's/256MiB/4GiB/' /etc/ImageMagick-6/policy.xml
 RUN sed -i 's/512MiB/4GiB/' /etc/ImageMagick-6/policy.xml
 RUN sed -i 's/1GiB/4GiB/' /etc/ImageMagick-6/policy.xml
 
-
 # install version of tinytex with fixed index double-compile (no release for this yet, so install from commit hash)
 RUN Rscript -e "remove.packages('xfun')"
 RUN Rscript -e "devtools::install_github('yihui/[email protected]')"

diff --git a/acknowledgements.Rmd b/acknowledgements.Rmd
@@ -19,7 +19,7 @@ Rohan Alexander, Isabella Ghement, Virgilio Gómez Rubio, Albert Kim, Adam Loy,
 The book was improved substantially by their insights.
 We would like to give special thanks to Jim Zidek
 for his support and encouragement throughout the process, and to
-Roger Peng for graciously offering to write the foreword.
+Roger Peng for graciously offering to write the Foreword.
 
 Finally, we owe a debt of gratitude to all of the students of DSCI 100 over the past
 few years. They provided invaluable feedback on the book and worksheets; 

diff --git a/build_html.sh b/build_html.sh
@@ -1,2 +1,2 @@
 # Script to generate HTML book
-docker run --rm -m 5g -v $(pwd):/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds:v0.21.0 /bin/bash -c "cd /home/rstudio/introduction-to-datascience; Rscript _build_html.r"
+docker run --rm -m 5g -v $(pwd):/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds:v0.22.0 /bin/bash -c "cd /home/rstudio/introduction-to-datascience; Rscript _build_html.r"
diff --git a/build_pdf.sh b/build_pdf.sh
@@ -25,7 +25,7 @@ cp -r data/ pdf/data
 cp -r img/ pdf/img
 
 # Build the book with bookdown
-docker run --rm -m 5g -v $(pwd):/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds:v0.21.0 /bin/bash -c "cd /home/rstudio/introduction-to-datascience/pdf; Rscript _build_pdf.r"
+docker run --rm -m 5g -v $(pwd):/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds:v0.22.0 /bin/bash -c "cd /home/rstudio/introduction-to-datascience/pdf; Rscript _build_pdf.r"
 
 # clean files in pdf dir
 rm -rf pdf/references.bib

diff --git a/classification1.Rmd b/classification1.Rmd
@@ -1415,9 +1415,14 @@ wkflw_plot
 ## Exercises
 
 Practice exercises for the material covered in this chapter 
-can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_classification1/worksheet_classification1.ipynb).
-The worksheet tries to provide automated feedback 
-and help guide you through the problems. 
-To make sure this functionality works as intended, 
-please follow the instructions for computer setup needed to run the worksheets 
-found in Chapter \@ref(move-to-your-own-machine).
+can be found in the accompanying 
+[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
+in the "Classification I: training and predicting" row.
+You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
+You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
+If you instead decide to download the worksheet and run it on your own machine,
+make sure to follow the instructions for computer setup
+found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
+and guidance that the worksheets provide will function as intended.
+
+
diff --git a/classification2.Rmd b/classification2.Rmd
@@ -1386,12 +1386,17 @@ fwd_sel_accuracies_plot
 ## Exercises
 
 Practice exercises for the material covered in this chapter 
-can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_classification2/worksheet_classification2.ipynb).
-The worksheet tries to provide automated feedback 
-and help guide you through the problems. 
-To make sure this functionality works as intended, 
-please follow the instructions for computer setup needed to run the worksheets 
-found in Chapter \@ref(move-to-your-own-machine).
+can be found in the accompanying 
+[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
+in the "Classification II: evaluation and tuning" row.
+You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
+You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
+If you instead decide to download the worksheet and run it on your own machine,
+make sure to follow the instructions for computer setup
+found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
+and guidance that the worksheets provide will function as intended.
+
+
 
 ## Additional resources
 - The [`tidymodels` website](https://tidymodels.org/packages) is an excellent

diff --git a/clustering.Rmd b/clustering.Rmd
@@ -1103,12 +1103,15 @@ elbow_plot
 ## Exercises
 
 Practice exercises for the material covered in this chapter 
-can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_clustering/worksheet_clustering.ipynb).
-The worksheet tries to provide automated feedback 
-and help guide you through the problems. 
-To make sure this functionality works as intended, 
-please follow the instructions for computer setup needed to run the worksheets 
-found in Chapter \@ref(move-to-your-own-machine).
+can be found in the accompanying 
+[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
+in the "Clustering" row.
+You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
+You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
+If you instead decide to download the worksheet and run it on your own machine,
+make sure to follow the instructions for computer setup
+found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
+and guidance that the worksheets provide will function as intended.
 
 ## Additional resources
 - Chapter 10 of *An Introduction to Statistical

diff --git a/inference.Rmd b/inference.Rmd
@@ -1169,14 +1169,15 @@ statistical techniques you may learn about in the future!
 ## Exercises
 
 Practice exercises for the material covered in this chapter 
-can be found in the two accompanying worksheets
-([first worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_inference1/worksheet_inference1.ipynb) 
-and [second worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_inference2/worksheet_inference2.ipynb)).
-The worksheets try to provide automated feedback 
-and help guide you through the problems. 
-To make sure this functionality works as intended, 
-please follow the instructions for computer setup needed to run the worksheets 
-found in Chapter \@ref(move-to-your-own-machine).
+can be found in the accompanying 
+[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
+in the two "Statistical inference" rows.
+You can launch an interactive version of each worksheet in your browser by clicking the "launch binder" button.
+You can also preview a non-interactive version of each worksheet by clicking "view worksheet."
+If you instead decide to download the worksheets and run them on your own machine,
+make sure to follow the instructions for computer setup
+found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
+and guidance that the worksheets provide will function as intended.
 
 ## Additional resources
 

diff --git a/intro.Rmd b/intro.Rmd
@@ -528,7 +528,7 @@ image_read("img/ggplot_function.jpeg") |>
   image_crop("1625x1900")
 ```
 
-```{r barplot-mother-tongue, fig.width=5, fig.height=3, warning=FALSE, fig.cap = "Bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue."}
+```{r barplot-mother-tongue, fig.width=5, fig.height=3, warning=FALSE, fig.cap = "Bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue. Note that this visualization is not done yet; there are still improvements to be made."}
 ggplot(ten_lang, aes(x = language, y = mother_tongue)) +
   geom_bar(stat = "identity")
 ```
@@ -567,7 +567,7 @@ words (e.g. `"Mother Tongue (Number of Canadian Residents)"`) as arguments to
 layers to format the plot further, and we will explore these in Chapter
 \@ref(viz).
 
-(ref:barplot-mother-tongue-labs) Bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue with x and y labels.
+(ref:barplot-mother-tongue-labs) Bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue with x and y labels. Note that this visualization is not done yet; there are still improvements to be made.
 
 ```{r barplot-mother-tongue-labs, fig.width=5, fig.height=3.6, warning=FALSE, fig.cap = "(ref:barplot-mother-tongue-labs)", fig.pos = "H", out.extra=""}
 ggplot(ten_lang, aes(x = language, y = mother_tongue)) +
@@ -583,7 +583,7 @@ currently making it difficult to read the different language names.
 One solution is to rotate the plot such that the bars are horizontal rather than vertical.
 To accomplish this, we will swap the x and y coordinate axes:
 
-```{r barplot-mother-tongue-flipped, fig.width=5, fig.height=3, fig.pos = "H", out.extra="", warning=FALSE, fig.cap = "Horizontal bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue."}
+```{r barplot-mother-tongue-flipped, fig.width=5, fig.height=3, fig.pos = "H", out.extra="", warning=FALSE, fig.cap = "Horizontal bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue. There are no more serious issues with this visualization, but it could be refined further."}
 ggplot(ten_lang, aes(x = mother_tongue, y = language)) +
   geom_bar(stat = "identity") +
   xlab("Mother Tongue (Number of Canadian Residents)") +
@@ -704,9 +704,12 @@ knitr::include_graphics("img/help-filter.png")
 ## Exercises
 
 Practice exercises for the material covered in this chapter 
-can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_intro/worksheet_intro.ipynb).
-The worksheet tries to provide automated feedback 
-and help guide you through the problems. 
-To make sure this functionality works as intended, 
-please follow the instructions for computer setup needed to run the worksheets 
-found in Chapter \@ref(move-to-your-own-machine).
+can be found in the accompanying 
+[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
+in the "R and the tidyverse" row.
+You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
+You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
+If you instead decide to download the worksheet and run it on your own machine,
+make sure to follow the instructions for computer setup
+found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
+and guidance that the worksheets provide will function as intended.
diff --git a/pdf/latex/before_body.tex b/pdf/latex/before_body.tex
@@ -18,5 +18,7 @@
 %\includegraphics{images/dedication.pdf}
 \end{center}
 
+\cleardoublepage\newpage\thispagestyle{empty}\null
+
 \setlength{\abovedisplayskip}{-5pt}
 \setlength{\abovedisplayshortskip}{-5pt}
diff --git a/preface-text.Rmd b/preface-text.Rmd
@@ -53,11 +53,11 @@ to help you practice the concepts you will learn. We strongly recommend that you
 work through the worksheet when you finish reading each chapter 
 before moving on to the next chapter. All of the worksheets
 are available at 
-[https://ubc-dsci.github.io/data-science-a-first-intro-worksheets](https://ubc-dsci.github.io/data-science-a-first-intro-worksheets);
+[https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme);
 the "Exercises" section at the end of each chapter points you to the right worksheet for that chapter.
-The worksheets are designed to provide automated feedback and help guide you through the problems.
-To make sure that functionality works as intended, make sure to follow the setup directions 
-in Chapter \@ref(move-to-your-own-machine) regarding downloading the worksheets.
-
-
-
+For each worksheet, you can either launch an interactive version of the worksheet in your browser by clicking the "launch binder" button,
+or preview a non-interactive version of the worksheet by clicking "view worksheet."
+If you instead decide to download the worksheet and run it on your own machine,
+make sure to follow the instructions for computer setup
+found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
+and guidance that the worksheets provide will function as intended.
diff --git a/reading.Rmd b/reading.Rmd
@@ -971,8 +971,8 @@ knitr::include_graphics("img/sg1.png")
 ```
 
 If we then click the size of an apartment listing, SelectorGadget shows us
-the `span` selector, and highlights much of the page; this indicates that the
-`span` selector is not specific enough to just capture apartment sizes (Figure \@ref(fig:sg3)). 
+the `span` selector, and highlights many of the lines on the page; this indicates that the
+`span` selector is not specific enough to capture only apartment sizes (Figure \@ref(fig:sg3)). 
 
 ```{r sg3, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Using the SelectorGadget on a Craigslist webpage to obtain a CCS selector useful for obtaining apartment sizes.", fig.retina = 2, out.width="100%"}
 knitr::include_graphics("img/sg3.png")
@@ -994,7 +994,6 @@ The selector gadget returns them to us as a comma-separated list (here
 R if we are using more than one CSS selector.
 
 **Stop! Are you allowed to scrape that website?**
-
 *Before* scraping \index{web scraping!permission} data from the web, you should always check whether or not
 you are *allowed* to scrape it! There are two documents that are important
 for this: the `robots.txt` file and the Terms of Service
@@ -1242,12 +1241,15 @@ data you are requesting and how frequently you are making requests.
 ## Exercises
 
 Practice exercises for the material covered in this chapter 
-can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_reading/worksheet_reading.ipynb).
-The worksheet tries to provide automated feedback 
-and help guide you through the problems. 
-To make sure this functionality works as intended, 
-please follow the instructions for computer setup needed to run the worksheets 
-found in Chapter \@ref(move-to-your-own-machine).
+can be found in the accompanying 
+[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
+in the "Reading in data locally and from the web" row.
+You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
+You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
+If you instead decide to download the worksheet and run it on your own machine,
+make sure to follow the instructions for computer setup
+found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
+and guidance that the worksheets provide will function as intended.
 
 ## Additional resources
 - The [`readr` documentation](https://readr.tidyverse.org/) 

diff --git a/regression1.Rmd b/regression1.Rmd
@@ -93,16 +93,18 @@ is that we are now predicting numerical variables instead of categorical variabl
 
 \newpage
 
-> **Note:** You can usually tell whether a \index{categorical variable}\index{numerical variable}
-> variable is numerical or categorical&mdash;and therefore whether you
-> need to perform regression or classification&mdash;by taking two response variables X and Y from your
-> data, and asking the question, "is response variable X *more* than response variable Y?"
-> If the variable is categorical, the question will make no sense (Is blue more than red?
-> Is benign more than malignant?). If the variable is numerical, it will make sense
-> (Is 1.5 hours more than 2.25 hours? Is \$500,000 more than \$400,000?).
-> Be careful when applying this heuristic, though: sometimes categorical variables will be encoded as
-> numbers in your data (e.g., "1" represents "benign", and "0" represents "malignant"). In these cases
-> you have to ask the question about the *meaning* of the labels ("benign" and "malignant"), not their values ("1" and "0"). 
+> **Note:** You can usually tell whether a\index{categorical variable}\index{numerical variable} variable is numerical or
+> categorical&mdash;and therefore whether you need to perform regression or
+> classification&mdash;by taking two response variables X and Y from your data,
+> and asking the question, "is response variable X *more* than response
+> variable Y?" If the variable is categorical, the question will make no sense.
+> (Is blue more than red?  Is benign more than malignant?) If the variable is
+> numerical, it will make sense. (Is 1.5 hours more than 2.25 hours? Is
+> \$500,000 more than \$400,000?) Be careful when applying this heuristic,
+> though: sometimes categorical variables will be encoded as numbers in your
+> data (e.g., "1" represents "benign", and "0" represents "malignant"). In
+> these cases you have to ask the question about the *meaning* of the labels
+> ("benign" and "malignant"), not their values ("1" and "0"). 
 
 ## Exploring a data set
 
@@ -868,9 +870,14 @@ regression has both strengths and weaknesses. Some are listed here:
 ## Exercises
 
 Practice exercises for the material covered in this chapter 
-can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_regression1/worksheet_regression1.ipynb).
-The worksheet tries to provide automated feedback 
-and help guide you through the problems. 
-To make sure this functionality works as intended, 
-please follow the instructions for computer setup needed to run the worksheets 
-found in Chapter \@ref(move-to-your-own-machine).
+can be found in the accompanying 
+[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
+in the "Regression I: K-nearest neighbors" row.
+You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
+You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
+If you instead decide to download the worksheet and run it on your own machine,
+make sure to follow the instructions for computer setup
+found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
+and guidance that the worksheets provide will function as intended.
+
+
diff --git a/regression2.Rmd b/regression2.Rmd
@@ -895,12 +895,17 @@ that will serve you well when moving to more advanced books on the topic.
 ## Exercises
 
 Practice exercises for the material covered in this chapter 
-can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_regression2/worksheet_regression2.ipynb).
-The worksheet tries to provide automated feedback 
-and help guide you through the problems. 
-To make sure this functionality works as intended, 
-please follow the instructions for computer setup needed to run the worksheets 
-found in Chapter \@ref(move-to-your-own-machine).
+can be found in the accompanying 
+[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
+in the "Regression II: linear regression" row.
+You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
+You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
+If you instead decide to download the worksheet and run it on your own machine,
+make sure to follow the instructions for computer setup
+found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
+and guidance that the worksheets provide will function as intended.
+
+
 
 ## Additional resources
 - The [`tidymodels` website](https://tidymodels.org/packages) is an excellent