Skip to content

Commit

Permalink
Merge pull request #424 from UBC-DSCI/dev
Browse files Browse the repository at this point in the history
update master with dev
  • Loading branch information
trevorcampbell authored Feb 5, 2022
2 parents bb5742b + aebdc0e commit 8239813
Show file tree
Hide file tree
Showing 18 changed files with 143 additions and 98 deletions.
1 change: 0 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,6 @@ RUN sed -i 's/256MiB/4GiB/' /etc/ImageMagick-6/policy.xml
RUN sed -i 's/512MiB/4GiB/' /etc/ImageMagick-6/policy.xml
RUN sed -i 's/1GiB/4GiB/' /etc/ImageMagick-6/policy.xml


# install version of tinytex with fixed index double-compile (no release for this yet, so install from commit hash)
RUN Rscript -e "remove.packages('xfun')"
RUN Rscript -e "devtools::install_github('yihui/[email protected]')"
Expand Down
2 changes: 1 addition & 1 deletion acknowledgements.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Rohan Alexander, Isabella Ghement, Virgilio Gómez Rubio, Albert Kim, Adam Loy,
The book was improved substantially by their insights.
We would like to give special thanks to Jim Zidek
for his support and encouragement throughout the process, and to
Roger Peng for graciously offering to write the foreword.
Roger Peng for graciously offering to write the Foreword.

Finally, we owe a debt of gratitude to all of the students of DSCI 100 over the past
few years. They provided invaluable feedback on the book and worksheets;
Expand Down
2 changes: 1 addition & 1 deletion build_html.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Script to generate HTML book
docker run --rm -m 5g -v $(pwd):/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds:v0.21.0 /bin/bash -c "cd /home/rstudio/introduction-to-datascience; Rscript _build_html.r"
docker run --rm -m 5g -v $(pwd):/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds:v0.22.0 /bin/bash -c "cd /home/rstudio/introduction-to-datascience; Rscript _build_html.r"
2 changes: 1 addition & 1 deletion build_pdf.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ cp -r data/ pdf/data
cp -r img/ pdf/img

# Build the book with bookdown
docker run --rm -m 5g -v $(pwd):/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds:v0.21.0 /bin/bash -c "cd /home/rstudio/introduction-to-datascience/pdf; Rscript _build_pdf.r"
docker run --rm -m 5g -v $(pwd):/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds:v0.22.0 /bin/bash -c "cd /home/rstudio/introduction-to-datascience/pdf; Rscript _build_pdf.r"

# clean files in pdf dir
rm -rf pdf/references.bib
Expand Down
17 changes: 11 additions & 6 deletions classification1.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -1415,9 +1415,14 @@ wkflw_plot
## Exercises

Practice exercises for the material covered in this chapter
can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_classification1/worksheet_classification1.ipynb).
The worksheet tries to provide automated feedback
and help guide you through the problems.
To make sure this functionality works as intended,
please follow the instructions for computer setup needed to run the worksheets
found in Chapter \@ref(move-to-your-own-machine).
can be found in the accompanying
[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
in the "Classification I: training and predicting" row.
You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
If you instead decide to download the worksheet and run it on your own machine,
make sure to follow the instructions for computer setup
found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
and guidance that the worksheets provide will function as intended.


17 changes: 11 additions & 6 deletions classification2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -1386,12 +1386,17 @@ fwd_sel_accuracies_plot
## Exercises

Practice exercises for the material covered in this chapter
can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_classification2/worksheet_classification2.ipynb).
The worksheet tries to provide automated feedback
and help guide you through the problems.
To make sure this functionality works as intended,
please follow the instructions for computer setup needed to run the worksheets
found in Chapter \@ref(move-to-your-own-machine).
can be found in the accompanying
[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
in the "Classification II: evaluation and tuning" row.
You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
If you instead decide to download the worksheet and run it on your own machine,
make sure to follow the instructions for computer setup
found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
and guidance that the worksheets provide will function as intended.



## Additional resources
- The [`tidymodels` website](https://tidymodels.org/packages) is an excellent
Expand Down
15 changes: 9 additions & 6 deletions clustering.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -1103,12 +1103,15 @@ elbow_plot
## Exercises

Practice exercises for the material covered in this chapter
can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_clustering/worksheet_clustering.ipynb).
The worksheet tries to provide automated feedback
and help guide you through the problems.
To make sure this functionality works as intended,
please follow the instructions for computer setup needed to run the worksheets
found in Chapter \@ref(move-to-your-own-machine).
can be found in the accompanying
[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
in the "Clustering" row.
You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
If you instead decide to download the worksheet and run it on your own machine,
make sure to follow the instructions for computer setup
found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
and guidance that the worksheets provide will function as intended.

## Additional resources
- Chapter 10 of *An Introduction to Statistical
Expand Down
17 changes: 9 additions & 8 deletions inference.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -1169,14 +1169,15 @@ statistical techniques you may learn about in the future!
## Exercises

Practice exercises for the material covered in this chapter
can be found in the two accompanying worksheets
([first worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_inference1/worksheet_inference1.ipynb)
and [second worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_inference2/worksheet_inference2.ipynb)).
The worksheets try to provide automated feedback
and help guide you through the problems.
To make sure this functionality works as intended,
please follow the instructions for computer setup needed to run the worksheets
found in Chapter \@ref(move-to-your-own-machine).
can be found in the accompanying
[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
in the two "Statistical inference" rows.
You can launch an interactive version of each worksheet in your browser by clicking the "launch binder" button.
You can also preview a non-interactive version of each worksheet by clicking "view worksheet."
If you instead decide to download the worksheets and run them on your own machine,
make sure to follow the instructions for computer setup
found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
and guidance that the worksheets provide will function as intended.

## Additional resources

Expand Down
21 changes: 12 additions & 9 deletions intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -528,7 +528,7 @@ image_read("img/ggplot_function.jpeg") |>
image_crop("1625x1900")
```

```{r barplot-mother-tongue, fig.width=5, fig.height=3, warning=FALSE, fig.cap = "Bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue."}
```{r barplot-mother-tongue, fig.width=5, fig.height=3, warning=FALSE, fig.cap = "Bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue. Note that this visualization is not done yet; there are still improvements to be made."}
ggplot(ten_lang, aes(x = language, y = mother_tongue)) +
geom_bar(stat = "identity")
```
Expand Down Expand Up @@ -567,7 +567,7 @@ words (e.g. `"Mother Tongue (Number of Canadian Residents)"`) as arguments to
layers to format the plot further, and we will explore these in Chapter
\@ref(viz).

(ref:barplot-mother-tongue-labs) Bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue with x and y labels.
(ref:barplot-mother-tongue-labs) Bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue with x and y labels. Note that this visualization is not done yet; there are still improvements to be made.

```{r barplot-mother-tongue-labs, fig.width=5, fig.height=3.6, warning=FALSE, fig.cap = "(ref:barplot-mother-tongue-labs)", fig.pos = "H", out.extra=""}
ggplot(ten_lang, aes(x = language, y = mother_tongue)) +
Expand All @@ -583,7 +583,7 @@ currently making it difficult to read the different language names.
One solution is to rotate the plot such that the bars are horizontal rather than vertical.
To accomplish this, we will swap the x and y coordinate axes:

```{r barplot-mother-tongue-flipped, fig.width=5, fig.height=3, fig.pos = "H", out.extra="", warning=FALSE, fig.cap = "Horizontal bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue."}
```{r barplot-mother-tongue-flipped, fig.width=5, fig.height=3, fig.pos = "H", out.extra="", warning=FALSE, fig.cap = "Horizontal bar plot of the ten Aboriginal languages most often reported by Canadian residents as their mother tongue. There are no more serious issues with this visualization, but it could be refined further."}
ggplot(ten_lang, aes(x = mother_tongue, y = language)) +
geom_bar(stat = "identity") +
xlab("Mother Tongue (Number of Canadian Residents)") +
Expand Down Expand Up @@ -704,9 +704,12 @@ knitr::include_graphics("img/help-filter.png")
## Exercises

Practice exercises for the material covered in this chapter
can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_intro/worksheet_intro.ipynb).
The worksheet tries to provide automated feedback
and help guide you through the problems.
To make sure this functionality works as intended,
please follow the instructions for computer setup needed to run the worksheets
found in Chapter \@ref(move-to-your-own-machine).
can be found in the accompanying
[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
in the "R and the tidyverse" row.
You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
If you instead decide to download the worksheet and run it on your own machine,
make sure to follow the instructions for computer setup
found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
and guidance that the worksheets provide will function as intended.
2 changes: 2 additions & 0 deletions pdf/latex/before_body.tex
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,7 @@
%\includegraphics{images/dedication.pdf}
\end{center}

\cleardoublepage\newpage\thispagestyle{empty}\null

\setlength{\abovedisplayskip}{-5pt}
\setlength{\abovedisplayshortskip}{-5pt}
14 changes: 7 additions & 7 deletions preface-text.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,11 @@ to help you practice the concepts you will learn. We strongly recommend that you
work through the worksheet when you finish reading each chapter
before moving on to the next chapter. All of the worksheets
are available at
[https://ubc-dsci.github.io/data-science-a-first-intro-worksheets](https://ubc-dsci.github.io/data-science-a-first-intro-worksheets);
[https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme);
the "Exercises" section at the end of each chapter points you to the right worksheet for that chapter.
The worksheets are designed to provide automated feedback and help guide you through the problems.
To make sure that functionality works as intended, make sure to follow the setup directions
in Chapter \@ref(move-to-your-own-machine) regarding downloading the worksheets.



For each worksheet, you can either launch an interactive version of the worksheet in your browser by clicking the "launch binder" button,
or preview a non-interactive version of the worksheet by clicking "view worksheet."
If you instead decide to download the worksheet and run it on your own machine,
make sure to follow the instructions for computer setup
found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
and guidance that the worksheets provide will function as intended.
20 changes: 11 additions & 9 deletions reading.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -971,8 +971,8 @@ knitr::include_graphics("img/sg1.png")
```
If we then click the size of an apartment listing, SelectorGadget shows us
the `span` selector, and highlights much of the page; this indicates that the
`span` selector is not specific enough to just capture apartment sizes (Figure \@ref(fig:sg3)).
the `span` selector, and highlights many of the lines on the page; this indicates that the
`span` selector is not specific enough to capture only apartment sizes (Figure \@ref(fig:sg3)).
```{r sg3, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Using the SelectorGadget on a Craigslist webpage to obtain a CCS selector useful for obtaining apartment sizes.", fig.retina = 2, out.width="100%"}
knitr::include_graphics("img/sg3.png")
Expand All @@ -994,7 +994,6 @@ The selector gadget returns them to us as a comma-separated list (here
R if we are using more than one CSS selector.
**Stop! Are you allowed to scrape that website?**
*Before* scraping \index{web scraping!permission} data from the web, you should always check whether or not
you are *allowed* to scrape it! There are two documents that are important
for this: the `robots.txt` file and the Terms of Service
Expand Down Expand Up @@ -1242,12 +1241,15 @@ data you are requesting and how frequently you are making requests.
## Exercises
Practice exercises for the material covered in this chapter
can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_reading/worksheet_reading.ipynb).
The worksheet tries to provide automated feedback
and help guide you through the problems.
To make sure this functionality works as intended,
please follow the instructions for computer setup needed to run the worksheets
found in Chapter \@ref(move-to-your-own-machine).
can be found in the accompanying
[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
in the "Reading in data locally and from the web" row.
You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
If you instead decide to download the worksheet and run it on your own machine,
make sure to follow the instructions for computer setup
found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
and guidance that the worksheets provide will function as intended.
## Additional resources
- The [`readr` documentation](https://readr.tidyverse.org/)
Expand Down
39 changes: 23 additions & 16 deletions regression1.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -93,16 +93,18 @@ is that we are now predicting numerical variables instead of categorical variabl

\newpage

> **Note:** You can usually tell whether a \index{categorical variable}\index{numerical variable}
> variable is numerical or categorical—and therefore whether you
> need to perform regression or classification—by taking two response variables X and Y from your
> data, and asking the question, "is response variable X *more* than response variable Y?"
> If the variable is categorical, the question will make no sense (Is blue more than red?
> Is benign more than malignant?). If the variable is numerical, it will make sense
> (Is 1.5 hours more than 2.25 hours? Is \$500,000 more than \$400,000?).
> Be careful when applying this heuristic, though: sometimes categorical variables will be encoded as
> numbers in your data (e.g., "1" represents "benign", and "0" represents "malignant"). In these cases
> you have to ask the question about the *meaning* of the labels ("benign" and "malignant"), not their values ("1" and "0").
> **Note:** You can usually tell whether a\index{categorical variable}\index{numerical variable} variable is numerical or
> categorical—and therefore whether you need to perform regression or
> classification—by taking two response variables X and Y from your data,
> and asking the question, "is response variable X *more* than response
> variable Y?" If the variable is categorical, the question will make no sense.
> (Is blue more than red? Is benign more than malignant?) If the variable is
> numerical, it will make sense. (Is 1.5 hours more than 2.25 hours? Is
> \$500,000 more than \$400,000?) Be careful when applying this heuristic,
> though: sometimes categorical variables will be encoded as numbers in your
> data (e.g., "1" represents "benign", and "0" represents "malignant"). In
> these cases you have to ask the question about the *meaning* of the labels
> ("benign" and "malignant"), not their values ("1" and "0").
## Exploring a data set

Expand Down Expand Up @@ -868,9 +870,14 @@ regression has both strengths and weaknesses. Some are listed here:
## Exercises

Practice exercises for the material covered in this chapter
can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_regression1/worksheet_regression1.ipynb).
The worksheet tries to provide automated feedback
and help guide you through the problems.
To make sure this functionality works as intended,
please follow the instructions for computer setup needed to run the worksheets
found in Chapter \@ref(move-to-your-own-machine).
can be found in the accompanying
[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
in the "Regression I: K-nearest neighbors" row.
You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
If you instead decide to download the worksheet and run it on your own machine,
make sure to follow the instructions for computer setup
found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
and guidance that the worksheets provide will function as intended.


17 changes: 11 additions & 6 deletions regression2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -895,12 +895,17 @@ that will serve you well when moving to more advanced books on the topic.
## Exercises

Practice exercises for the material covered in this chapter
can be found in the accompanying [worksheet](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets/blob/main/worksheet_regression2/worksheet_regression2.ipynb).
The worksheet tries to provide automated feedback
and help guide you through the problems.
To make sure this functionality works as intended,
please follow the instructions for computer setup needed to run the worksheets
found in Chapter \@ref(move-to-your-own-machine).
can be found in the accompanying
[worksheets repository](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme)
in the "Regression II: linear regression" row.
You can launch an interactive version of the worksheet in your browser by clicking the "launch binder" button.
You can also preview a non-interactive version of the worksheet by clicking "view worksheet."
If you instead decide to download the worksheet and run it on your own machine,
make sure to follow the instructions for computer setup
found in Chapter \@ref(move-to-your-own-machine). This will ensure that the automated feedback
and guidance that the worksheets provide will function as intended.



## Additional resources
- The [`tidymodels` website](https://tidymodels.org/packages) is an excellent
Expand Down
Loading

0 comments on commit 8239813

Please sign in to comment.