diff --git a/content/worksheets/worksheet_b02.ipynb b/content/worksheets/worksheet_b02.ipynb index b51ec312..6f2bdc74 100644 --- a/content/worksheets/worksheet_b02.ipynb +++ b/content/worksheets/worksheet_b02.ipynb @@ -80,7 +80,20 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "3e3bd34313b0495214eaa25a27ed9e14", + "grade": false, + "grade_id": "cell-6884214b97e12c4d", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, "outputs": [], "source": [ "# An unlocked code chunk." @@ -93,7 +106,7 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "410e5d233709a1a21cc6741e9ddd9fed", + "checksum": "84d708a28db6692de6d7b09b44d8a79d", "grade": false, "grade_id": "cell-f13d49ed5f0c8271", "locked": true, @@ -108,7 +121,11 @@ "\n", "## Question 1\n", "\n", - "There's that famous sentence about the \"quick fox\" that contains all letters of the alphabet, although we don't quite remember the sentence. Obtain a vector of all sentences from the `stringr::sentences` dataset containing the word `\"fox\"`. Store the resulting vector in a variable named `answer1`." + "There's that famous sentence about the \"quick fox\" that contains all letters of the alphabet, although we don't quite remember the sentence. Obtain a vector of all sentences from the `stringr::sentences` dataset containing the word `\"fox\"`. Store the resulting vector in a variable named `answer1`.\n", + "\n", + "```\n", + "answer1 <- str_subset(FILL_THIS_IN, FILL_THIS_IN)\n", + "```" ] }, { @@ -119,7 +136,7 @@ "lines_to_next_cell": 0, "nbgrader": { "cell_type": "code", - "checksum": "343e80475608ac9f841bae6d3148a281", + "checksum": "7ff223abe6b9643640add72b73b3ff7f", "grade": false, "grade_id": "cell-a58e8d689deb018c", "locked": false, @@ -130,9 +147,6 @@ }, "outputs": [], "source": [ - "# answer1 <- str_subset(FILL_THIS_IN, FILL_THIS_IN)\n", - "\n", - "# FILL_THIS_IN\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "print(answer1)" @@ -244,7 +258,7 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "5950813ad356a797ccec2d24ace7e63e", + "checksum": "2b6845228d706a2975015ffe299f2480", "grade": false, "grade_id": "cell-7552bac49a0bfb48", "locked": true, @@ -256,7 +270,11 @@ "source": [ "## Question 3\n", "\n", - "With stringr, we can substitute parts of a string, too. Replace the word \"fox\" from `answer1` with \"giraffe\" using `str_replace()`, and store the result in a variable named `answer3`." + "With stringr, we can substitute parts of a string, too. Replace the word \"fox\" from `answer1` with \"giraffe\" using `str_replace()`, and store the result in a variable named `answer3`.\n", + "\n", + "```\n", + "answer3 <- str_replace(answer1, pattern = FILL_THIS_IN, replacement = FILL_THIS_IN)\n", + "```" ] }, { @@ -266,7 +284,7 @@ "deletable": false, "nbgrader": { "cell_type": "code", - "checksum": "44be23331f27f04ea4fbd42f5d124443", + "checksum": "415773f3320f5d1abe56deb8dd50b09d", "grade": false, "grade_id": "cell-57e67c68f704efe5", "locked": false, @@ -277,8 +295,6 @@ }, "outputs": [], "source": [ - "# answer3 <- str_replace(answer1, pattern = FILL_THIS_IN, replacement = FILL_THIS_IN)\n", - "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", "print(answer3)" @@ -393,9 +409,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "fce188d2a341b8c87380bcd185287203", + "checksum": "206fbf7d348088c61f99857ff0f4dd46", "grade": false, - "grade_id": "cell-4279ce4715735a66", + "grade_id": "cell-80604f3d738356db", "locked": true, "schema_version": 3, "solution": false, @@ -403,9 +419,7 @@ } }, "source": [ - "## Part 2: Manipulating character columns in a tibble\n", - "\n", - "Consider the wedding dataset on the UBC-STAT/stat545.stat.ubc.ca GitHub repository:" + "Now let's practice working with character columns in a tibble. Consider the wedding dataset on the UBC-STAT/stat545.stat.ubc.ca GitHub repository:" ] }, { @@ -438,9 +452,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "690080f1b8e2800d1200c4dd6c3d76b4", + "checksum": "24d2f59a0262bc2a05596d9327e37fef", "grade": false, - "grade_id": "cell-5b1e786c558fb481", + "grade_id": "cell-5f11b4a5fd8423b5", "locked": true, "schema_version": 3, "solution": false, @@ -448,40 +462,7 @@ } }, "source": [ - "### Question 5\n", - "\n", - "Split the `name` column into two columns, named `first` and `last` containing the first and last names (which are currently separated by a space). Store the resulting tibble in a variable called `answer5`.\n", - "\n", - "**Hint**:\n", - "\n", - "- Use `tidyr::separate()`.\n", - "- Want a challenge? Try the same exercise, using `str_split()`. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "deletable": false, - "nbgrader": { - "cell_type": "code", - "checksum": "4b28a8c13ff8cc87730e20b7a4ebfebe", - "grade": false, - "grade_id": "cell-b364738c4c61a736", - "locked": false, - "schema_version": 3, - "solution": true, - "task": false - } - }, - "outputs": [], - "source": [ - "# answer5 <- wedding %>% \n", - "# separate(FILL_THIS_IN, into = FILL_THIS_IN, sep = FILL_THIS_IN)\n", - "\n", - "# your code here\n", - "fail() # No Answer - remove if you provide an answer\n", - "print(answer5)" + "Back in Worksheet A-4, we used `tidyr::separate()` to split the `name` column into two columns, named `first` and `last` containing the first and last names (which are currently separated by a space): " ] }, { @@ -492,11 +473,10 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "1e3f3d0b736b1ab2f9618a174475630f", - "grade": true, - "grade_id": "cell-1de02515f91c4afd", + "checksum": "8ff55b6843028d0053c7dc8fe19ce6f6", + "grade": false, + "grade_id": "cell-4c1655e2ac698c4e", "locked": true, - "points": 1, "schema_version": 3, "solution": false, "task": false @@ -504,12 +484,9 @@ }, "outputs": [], "source": [ - "test_that(\"Question 5\", {\n", - " expect_identical(\n", - " digest(unclass(select(answer5, first, last))), \n", - " \"8fa9b7d74019d7998c6e3b5e63a1d08c\"\n", - ")\n", - "})" + "wedding_fl <- wedding %>% \n", + " separate(name, into = c(\"first\", \"last\"), sep = \" \")\n", + "head(wedding_fl)" ] }, { @@ -519,9 +496,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "b9da0010e3c9e22a8ede30ad1e217cb2", + "checksum": "f17d7dd1d1ffef5fa5450da924f8c997", "grade": false, - "grade_id": "cell-c7b56cff833b6a70", + "grade_id": "cell-a119b9f4940ba4f0", "locked": true, "schema_version": 3, "solution": false, @@ -529,27 +506,19 @@ } }, "source": [ - "### Question 6\n", - "\n", - "Using the answer to the previous question, do the opposite operation: combine the `last` and `first` columns into a new column named `name`, that has their name in the form \"last, first\". Store the resulting tibble in a variable named `answer6`.\n", + "## Question 5\n", "\n", - "**Hint**:\n", + "Make a new column named `greeting` with entries of each row following the following format: \n", + "`\"Hello there, [FIRST_NAME_HERE] from party [PARTY_NUMBER_HERE]!\"` Store the resulting tibble in a variable named `answer5`. \n", "\n", - "- Use the `tidyr::unite()` function.\n", - "- Want a challenge? Try the same exercise, using `str_c()` -- an important step for understanding the difference between the `sep` and `collapse` arguments.\n", + "```\n", + "answer5 <- wedding_fl %>%\n", + " mutate(greeting = str_c(FILL_THIS_IN))\n", + "```\n", "\n", - "Starter code:\n", + "*Hint 1*: The `str_c()` function can take in any number of character vectors of the same length, stack them up side by side, and glue them together. \n", "\n", - "```\n", - "# Using tidyr:\n", - "answer6 <- answer5 %>% \n", - " unite(col = FILL_THIS_IN, FILL_THIS_IN, FILL_THIS_IN, sep = FILL_THIS_IN)\n", - "# Challenge:\n", - "answer6 <- answer5 %>% \n", - " mutate(name = str_c(FILL_THIS_IN)) %>% \n", - " select(-first, -last) %>%\n", - " select(party, name, everything())\n", - "```" + "*Hint 2*: `str_c()` can recycle values. For example, if you pass in a character vector of length 1 (say `\"Apple\"`, and a character vector of length 3 (say `c(\"Pie\", \"Crisp\", \"Crumble\")`, then it can return `c(\"Apple Pie\", \"Apple Crisp\", \"Apple Crumble\")`. " ] }, { @@ -559,9 +528,9 @@ "deletable": false, "nbgrader": { "cell_type": "code", - "checksum": "301158a4022e42348f37b94b2f84ddab", + "checksum": "a6cb91a5057f25b9d4b1dfb5a9116726", "grade": false, - "grade_id": "cell-eedbe7a076ab42c4", + "grade_id": "cell-77e7d9db69cf47a8", "locked": false, "schema_version": 3, "solution": true, @@ -570,12 +539,9 @@ }, "outputs": [], "source": [ - "# answer6 <- answer5 %>% \n", - "# unite(col = FILL_THIS_IN, FILL_THIS_IN, FILL_THIS_IN, sep = FILL_THIS_IN)\n", - "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "print(answer6)" + "print(answer5, width=Inf)" ] }, { @@ -586,9 +552,9 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "d5e0b084bda73c67eda868e21ef48d04", + "checksum": "eacb8f1ec2148c3ec581220a0a88cde8", "grade": true, - "grade_id": "cell-1e3e417be0d6be03", + "grade_id": "cell-b560283049b996c8", "locked": true, "points": 1, "schema_version": 3, @@ -598,8 +564,11 @@ }, "outputs": [], "source": [ - "test_that(\"Question 6\", {\n", - " expect_identical(digest(answer6$name), \"86be22dab9696ef3e2e5c235ce7914a3\")\n", + "test_that(\"Question 5\", {\n", + " expect_identical(\n", + " digest(unclass(select(answer5, party, first, greeting))), \n", + " \"428d7ce6c81771e8ac3cd6a64e31a614\"\n", + ")\n", "})" ] }, @@ -610,7 +579,7 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "51025fa80477bd0772bee8a8753c09ff", + "checksum": "85763ee0cfdb499d0244f3b399671175", "grade": false, "grade_id": "cell-2252c504c7171940", "locked": true, @@ -620,25 +589,27 @@ } }, "source": [ - "## Question 7\n", + "## Question 6\n", "\n", - "Still using the tibble with the first and last names separated into their own columns, make a tibble with one row per party, with columns named `people` and `wedding_status`:\n", + "Make a tibble with one row per party, with columns named `people` and `wedding_status`:\n", "\n", "- `people`: contains the first names of everyone in the party, separated by commas (and a space: `\", \"`).\n", "- `wedding_status`: should be `\"CONFIRMED\"` if all their wedding status entries are `\"CONFIRMED\"`, and `\"PENDING\"` otherwise. \n", "\n", - "Store the resulting tibble in a variable named `answer7`.\n", + "Store the resulting tibble in a variable named `answer6`.\n", "\n", "Starter code:\n", "\n", "```\n", - "answer7 <- answer5 %>% \n", + "answer6 <- wedding_fl %>% \n", " group_by(party) %>% \n", " summarise(\n", - " people = str_c(FILL_THIS_IN),\n", + " people = str_flatten(FILL_THIS_IN),\n", " wedding_status = if_else(FILL_THIS_IN, \"CONFIRMED\", \"PENDING\")\n", " )\n", - "```" + "```\n", + "\n", + "*Hint*: The `str_flatten()` function concatenates a vector of characters into a single string, with an optional argument that lets you put things between vector elements before concatenating them. " ] }, { @@ -648,7 +619,7 @@ "deletable": false, "nbgrader": { "cell_type": "code", - "checksum": "ca06772c9278c514b4d0bf1fd3737979", + "checksum": "0705450fe354c78bd8a8179ddc38bb11", "grade": false, "grade_id": "cell-684b49cb1e966bbe", "locked": false, @@ -661,7 +632,7 @@ "source": [ "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "print(answer7)" + "print(answer6)" ] }, { @@ -672,7 +643,7 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "bf00c8ed2291913ef5775c54e33fc068", + "checksum": "c3f300c8ee05619f159de5ccb7d3b8ff", "grade": true, "grade_id": "cell-02b42dee1ed42a07", "locked": true, @@ -684,9 +655,9 @@ }, "outputs": [], "source": [ - "test_that(\"Question 7\", {\n", + "test_that(\"Question 6\", {\n", " expect_identical(\n", - " digest(unclass(select(answer7, people, wedding_status))), \n", + " digest(unclass(select(answer6, people, wedding_status))), \n", " \"cf18af7e44c899c48e53b83614739b86\"\n", ")\n", "})" @@ -699,9 +670,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "844532f9d7c2fa6ad6ce27c750cc7bed", + "checksum": "a09cc6db4238c791b56cd0b9f742d459", "grade": false, - "grade_id": "cell-188406f9e4766f85", + "grade_id": "cell-96aea45598f83a67", "locked": true, "schema_version": 3, "solution": false, @@ -709,7 +680,7 @@ } }, "source": [ - "# Part 3: Exploring Regular Expressions (regex)" + "*Good to know*: If you wanted to be more gramatically correct, you could have replaced `str_flatten()` with the `str_flatten_comma()` function to enforce English grammar rules for listing off items. For example, this function would allow you to separate two names with `\" and \"`. " ] }, { @@ -719,9 +690,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "443db9c86a4afc06f75c68636a1a5bf7", + "checksum": "020035694bcc6a8e63070053cec70ea5", "grade": false, - "grade_id": "cell-23fe9cddb2cfe2bb", + "grade_id": "cell-7e40062d918cda9c", "locked": true, "schema_version": 3, "solution": false, @@ -729,40 +700,27 @@ } }, "source": [ - "## Question 8\n", - "\n", - "Select individuals in the `wedding` tibble whose first name starts between \"A\" and \"Em\" inclusive. Store the resulting tibble in a variable named `answer8`.\n", - "\n", - "Starter code:\n", - "\n", - "```\n", - "answer8 <- wedding %>% \n", - " filter(FILL_THIS_IN(name, \"FILL_THIS_IN\")) %>% \n", - " arrange(name)\n", - "```" + "# Part 2: Introduction to Regular Expressions" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": { "deletable": false, + "editable": false, "nbgrader": { - "cell_type": "code", - "checksum": "a271eae2c82ed880f4c589dc0eb1e93b", + "cell_type": "markdown", + "checksum": "b2395642e70fb960336faac7e33a7926", "grade": false, - "grade_id": "cell-1e2d94b35a9a495f", - "locked": false, + "grade_id": "cell-966c52fd3f31334e", + "locked": true, "schema_version": 3, - "solution": true, + "solution": false, "task": false } }, - "outputs": [], "source": [ - "# your code here\n", - "fail() # No Answer - remove if you provide an answer\n", - "print(answer8)" + "Regular expressions -- or \"Regex\" for short -- express a pattern in text that can be passed into `stringr` functions to do powerful things. Let's start by learning the basics of how to write these patterns with the helpful `str_view()` function:" ] }, { @@ -773,11 +731,10 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "7634ba56213c7b46fe5d30f17c905592", - "grade": true, - "grade_id": "cell-c9b809048a680c8e", + "checksum": "3af804462e4d6fe91fa462242ded86b8", + "grade": false, + "grade_id": "cell-6f500fd32a7773d0", "locked": true, - "points": 1, "schema_version": 3, "solution": false, "task": false @@ -785,9 +742,7 @@ }, "outputs": [], "source": [ - "test_that(\"Question 8\", {\n", - " expect_identical(digest(sort(answer8$name)), \"6bbf440d3cca5b2e4b670b48f7bddc14\")\n", - "})" + "str_view(fruit, \"melon\")" ] }, { @@ -797,9 +752,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "91daeb51888762631e76004627aed55c", + "checksum": "0a2a2666bdb43ddd2a2945ce54ba117c", "grade": false, - "grade_id": "cell-f9ca00fbe3607872", + "grade_id": "cell-3479e82ec0ff8e1b", "locked": true, "schema_version": 3, "solution": false, @@ -807,7 +762,9 @@ } }, "source": [ - "Let's use the countries from the gapminder dataset:" + "The `str_view()` function took in a character vector containing fruits, and highlighted the entries matching the regular expression pattern `\"melon\"`. This is the simplest type of regex pattern, and it means to look for an exact match. \n", + "\n", + "Let's learn more about the language of regex patterns using the countries in the gapminder data set: " ] }, { @@ -818,9 +775,9 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "f58dc58098dead3bd6c629b28f30b6a2", + "checksum": "94bbd21ff5e0b237ac8ed5aa752755a7", "grade": false, - "grade_id": "cell-b3b967ca0f246559", + "grade_id": "cell-dc0f3d9aac8aa10b", "locked": true, "schema_version": 3, "solution": false, @@ -840,9 +797,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "2ad9a67afb327f9f22da041490a8ac02", + "checksum": "e50dba57a589b9d78f46d8e7fd8edf10", "grade": false, - "grade_id": "cell-cedf173918d151e8", + "grade_id": "cell-05c7d18faf298c4c", "locked": true, "schema_version": 3, "solution": false, @@ -850,13 +807,15 @@ } }, "source": [ - "## Question 9: The \"any character\"\n", + "## Question 7: \"any\" characters\n", + "\n", + "The \".\" character when used in a regular expression means \"any single character\". \n", "\n", - "Use `str_subset()` to find all countries in the gapminder data set with the following pattern: \"i\", followed by any single character, followed by \"a\". Store the result in a vector named `answer9`.\n", + "Use the `str_subset()` function to find all countries in the gapminder data set with the following pattern: \"i\", followed by any single character, followed by \"a\". Store the result in a vector named `answer7`. \n", "\n", - "Note that Italy is not on the list, because regex is case-sensitive.\n", + "Note that Italy will not be on the list, because regex is case-sensitive.\n", "\n", - "Explore further: use `str_view_all()` to get a visual of what's being matched, and where. This is especially useful for debugging!" + "*Good to know*: You can specify \"any single character in this list of characters\" or \"any single character except those in this list of characters\" using square brackets. For example, `\"[abc]\"` for \"a, b, or c\" and `\"[^abc]\"` for \"anything but a, b, or c\". " ] }, { @@ -866,7 +825,7 @@ "deletable": false, "nbgrader": { "cell_type": "code", - "checksum": "0163a0f617ef827fd86c9e48fa09712b", + "checksum": "43c1465962c38516805283f2524c1a29", "grade": false, "grade_id": "cell-22a801393523e273", "locked": false, @@ -877,12 +836,12 @@ }, "outputs": [], "source": [ - "# answer9 <- str_subset(countries, pattern = \"FILL_THIS_IN\")\n", - "# str_view_all(countries, pattern = \"FILL_THIS_IN\", match = TRUE)\n", + "# str_view(countries, pattern = \"FILL_THIS_IN\")\n", + "# answer7 <- str_subset(countries, pattern = \"FILL_THIS_IN\")\n", "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "print(answer9)" + "print(answer7)" ] }, { @@ -893,7 +852,7 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "73f124818cf8700ca6c09dde709c6a3e", + "checksum": "466006269940c9e27ecde1e9175b8c2b", "grade": true, "grade_id": "cell-c835b50877ea9090", "locked": true, @@ -905,8 +864,8 @@ }, "outputs": [], "source": [ - "test_that(\"Question 9\", {\n", - " expect_identical(digest(answer9), \"fdf1c0b93db219fb32d927700cab3c4e\")\n", + "test_that(\"Question 7\", {\n", + " expect_identical(digest(answer7), \"fdf1c0b93db219fb32d927700cab3c4e\")\n", "})" ] }, @@ -917,9 +876,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "b9239c039f346b80339aff9f2528d3a5", + "checksum": "ff3c68e4d717f276012daa49f3371360", "grade": false, - "grade_id": "cell-0e3d3b29201e9ad6", + "grade_id": "cell-b406fa6a0014ed7c", "locked": true, "schema_version": 3, "solution": false, @@ -927,9 +886,13 @@ } }, "source": [ - "## Question 10\n", + "## Question 8: the \"escape\" \n", + "\n", + "Uh oh! But what if I wanted to literally search for countries with a period in the name? I can't use the regex `\".\"`, since that'll match \"any single character\". I need to \"escape the period\" to indicate that I really mean to search for the character \".\", and don't mean to use the character \".\" in its special regex meaning. We can escape the period by adding `\\\\` in front of it.\n", + "\n", + "\"Escape the period\" to make a vector of all countries with at least one period in their name. Store the result in a vector named `answer8`.\n", "\n", - "Canada isn't the only country with three interspersed \"a\"'s. Find all countries with a similar pattern, storing the result in a vector named `answer10`." + "*Good to know*: If you've used regex outside of R, you might be surprised to see that we need to add `\\\\` rather than `\\`. This is because `\\` itself is a special character in R strings that need to be escaped with `\\`. " ] }, { @@ -937,12 +900,11 @@ "execution_count": null, "metadata": { "deletable": false, - "lines_to_next_cell": 2, "nbgrader": { "cell_type": "code", - "checksum": "12cb6dd9f5ea82aa90df48da319d6ece", + "checksum": "e4f3162bd13035017073d6af649cccaf", "grade": false, - "grade_id": "cell-ff62684172be86e4", + "grade_id": "cell-6837e2f60c9ab577", "locked": false, "schema_version": 3, "solution": true, @@ -951,12 +913,12 @@ }, "outputs": [], "source": [ - "# answer10 <- str_subset(countries, pattern = \"FILL_THIS_IN\")\n", - "# str_view_all(countries, pattern = \"FILL_THIS_IN\", match = TRUE)\n", + "# str_view(countries, pattern = \"FILL_THIS_IN\")\n", + "# answer8 <- str_subset(countries, pattern = \"FILL_THIS_IN\")\n", "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "print(answer10)" + "print(answer8)" ] }, { @@ -967,9 +929,9 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "1554f462b8516f16ac6f288486c78cee", + "checksum": "9ff5e4f15af81aa0873e8ede9bd912e9", "grade": true, - "grade_id": "cell-b5dc4690ed847190", + "grade_id": "cell-900364bf4a541ab3", "locked": true, "points": 1, "schema_version": 3, @@ -979,8 +941,8 @@ }, "outputs": [], "source": [ - "test_that(\"Question 10\", {\n", - " expect_identical(digest(answer10), \"4751851d94825e74a6569abdf9759209\")\n", + "test_that(\"Question 8\", {\n", + " expect_identical(digest(answer8), \"4c500f226f5abbe540ef2506a4644375\")\n", "})" ] }, @@ -991,9 +953,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "55c2c7fd0178de82dfda7de607b852bf", + "checksum": "3edf434fe4d40c6595849dee1404eafb", "grade": false, - "grade_id": "cell-26ccd1349f2c43c6", + "grade_id": "cell-344db5435da777df", "locked": true, "schema_version": 3, "solution": false, @@ -1001,11 +963,14 @@ } }, "source": [ - "## Question 11: The escape\n", + "## Question 9: Position indicators\n", + "\n", + "Use:\n", "\n", - "What if I wanted to literally search for countries with a period in the name? \"Escape the period\" to make a vector of all countries with at least one period in their name. Store the result in a vector named `answer11`.\n", + "- `^` to correspond to the __beginning__ of a string.\n", + "- `$` to correspond to the __end__ of a string.\n", "\n", - "Explore further: use `str_view_all()` to get a visual of what's being matched, and where. This is especially useful for debugging!" + "Find all countries that end in \"land\". Store the result in a vector named `answer9`." ] }, { @@ -1015,9 +980,9 @@ "deletable": false, "nbgrader": { "cell_type": "code", - "checksum": "35b049d0dfe7e6e26f1fdeaa93b4a37e", + "checksum": "ceda041f5916a78df57dbd986402766e", "grade": false, - "grade_id": "cell-6837e2f60c9ab577", + "grade_id": "cell-35f0de5542c6d0d0", "locked": false, "schema_version": 3, "solution": true, @@ -1026,12 +991,12 @@ }, "outputs": [], "source": [ - "# answer11 <- str_subset(countries, pattern = \"FILL_THIS_IN\")\n", - "# str_view_all(countries, pattern = \"FILL_THIS_IN\", match = TRUE)\n", + "# str_view(countries, \"FILL_THIS_IN\")\n", + "# answer9 <- str_subset(countries, \"FILL_THIS_IN\")\n", "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "print(answer11)" + "print(answer9)" ] }, { @@ -1042,9 +1007,9 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "b2aba9e21e431460456cb37c047cf441", + "checksum": "e7efaffe70e8b80378d0f403ed2b7086", "grade": true, - "grade_id": "cell-900364bf4a541ab3", + "grade_id": "cell-7dfee270fe57717c", "locked": true, "points": 1, "schema_version": 3, @@ -1054,8 +1019,8 @@ }, "outputs": [], "source": [ - "test_that(\"Question 11\", {\n", - " expect_identical(digest(answer11), \"4c500f226f5abbe540ef2506a4644375\")\n", + "test_that(\"Question 9\", {\n", + " expect_identical(digest(answer9), \"692ee00b59194cea743c5ac3bf2302ae\")\n", "})" ] }, @@ -1066,9 +1031,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "090d7888026eb5a9d059346d30733ab9", + "checksum": "427b99e13b0ff4a8926a159facdb10b8", "grade": false, - "grade_id": "cell-25fa31dfede7bb7d", + "grade_id": "cell-df3ce76905844afa", "locked": true, "schema_version": 3, "solution": false, @@ -1076,9 +1041,15 @@ } }, "source": [ - "## Question 12: Groups\n", + "## Question 10: Quantifiers/Repetition\n", + "\n", + "The handy ones are:\n", + "\n", + "- `*` for 0 or more\n", + "- `+` for 1 or more\n", + "- `?` for 0 or 1\n", "\n", - "Find all countries with three non-vowel letters next to each other (don't count spaces, commas, and periods). Store the resulting vector in a variable named `answer12`. " + "Find all countries that have any number of \"o\"'s (but at least 1), following an \"r\". Store the resulting vector in a variable named `answer10`." ] }, { @@ -1089,9 +1060,9 @@ "lines_to_next_cell": 2, "nbgrader": { "cell_type": "code", - "checksum": "52be584845c84b48ac09b704fa2dc744", + "checksum": "26188621e13f94c518458453aa402cc0", "grade": false, - "grade_id": "cell-1475c044057a2436", + "grade_id": "cell-36d6437e624d72f5", "locked": false, "schema_version": 3, "solution": true, @@ -1100,12 +1071,12 @@ }, "outputs": [], "source": [ - "# answer12 <- str_subset(countries, pattern = \"FILL_THIS_IN\")\n", - "# str_view_all(countries, pattern = \"FILL_THIS_IN\", match = TRUE)\n", + "# str_view(countries, \"FILL_THIS_IN\")\n", + "# answer10 <- str_subset(countries, \"FILL_THIS_IN\")\n", "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "print(answer12)" + "print(answer10)" ] }, { @@ -1116,9 +1087,9 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "b0d6b76e244dfc9bb46f32a223034966", + "checksum": "13c94a6ec52633a7a7903b508463a04a", "grade": true, - "grade_id": "cell-8d59e7d3cdd07728", + "grade_id": "cell-2755ef7fa0a525de", "locked": true, "points": 1, "schema_version": 3, @@ -1128,8 +1099,8 @@ }, "outputs": [], "source": [ - "test_that(\"Question 12\", {\n", - " expect_identical(digest(answer12), \"3bc834e20e4109423e850f263d1c0cee\")\n", + "test_that(\"Question 10\", {\n", + " expect_identical(digest(answer10), \"fa31d9cfe634b9a841cabdf9e31c0eeb\")\n", "})" ] }, @@ -1140,7 +1111,7 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "596539e0506987efafc8ff66ecc2396d", + "checksum": "368570bde9da73971064ef703d96c890", "grade": false, "grade_id": "cell-d80fd1a2aa56bff1", "locked": true, @@ -1150,7 +1121,7 @@ } }, "source": [ - "## Question 13: \"Or\" and Precedence\n", + "## Question 11: \"Or\" and Precedence\n", "\n", "Use `|` to denote \"or\". \"And\" is implied otherwise, and has precedence. Use parentheses to be deliberate with precedence.\n", "\n", @@ -1166,7 +1137,7 @@ "lines_to_next_cell": 2, "nbgrader": { "cell_type": "code", - "checksum": "448a14d61f3f8ad8db82aa01f41e4cba", + "checksum": "1dd39a81f5b12e9062d94ed5a9d99797", "grade": false, "grade_id": "cell-894e6b100d9b24e0", "locked": true, @@ -1179,9 +1150,9 @@ "source": [ "bbb <- c(\"bear\", \"beer\", \"bar\")\n", "cat(\"'bee' or 'ar':\")\n", - "str_view_all(bbb, pattern = \"bee|ar\")\n", + "str_view(bbb, pattern = \"bee|ar\")\n", "cat(\"'e' or 'a':\")\n", - "str_view_all(bbb, pattern = \"be(e|a)r\") " + "str_view(bbb, pattern = \"be(e|a)r\") " ] }, { @@ -1191,7 +1162,7 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "a7951c2e88742b71a256bc299e2153b3", + "checksum": "68cf4aaad12072e8c31438c414d09189", "grade": false, "grade_id": "cell-825d29c47d308e2c", "locked": true, @@ -1201,7 +1172,7 @@ } }, "source": [ - "Now, find all countries that have either \"o\" twice in a row or \"e\" twice in a row (no changeover allowed). Store the resulting vector in a variable named `answer13`." + "Now, find all countries that have either \"o\" twice in a row or \"e\" twice in a row (\"oe\" and \"eo\" are not allowed). Store the resulting vector in a variable named `answer11`." ] }, { @@ -1212,7 +1183,7 @@ "lines_to_next_cell": 2, "nbgrader": { "cell_type": "code", - "checksum": "c75e9f260f39e4ab7f330fecbee8ded3", + "checksum": "6d789d764f67002e7555234a4978e2d6", "grade": false, "grade_id": "cell-7e58b1b7884a641e", "locked": false, @@ -1223,11 +1194,12 @@ }, "outputs": [], "source": [ - "# answer13 <- str_subset(countries, \"FILL_THIS_IN\")\n", + "# str_view(countries, \"FILL_THIS_IN\")\n", + "# answer11 <- str_subset(countries, \"FILL_THIS_IN\")\n", "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "print(answer13)" + "print(answer11)" ] }, { @@ -1238,7 +1210,7 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "6bc39e2eafb55df0ff1274865b3c1495", + "checksum": "f43062757692db28aac16dfb3e4f467f", "grade": true, "grade_id": "cell-9e7786647b5e1c8c", "locked": true, @@ -1250,8 +1222,8 @@ }, "outputs": [], "source": [ - "test_that(\"Question 13\", {\n", - " expect_identical(digest(answer13), \"558af24f1d19b86ffc6b74541aef9f9b\")\n", + "test_that(\"Question 11\", {\n", + " expect_identical(digest(answer11), \"558af24f1d19b86ffc6b74541aef9f9b\")\n", "})" ] }, @@ -1262,9 +1234,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "1c8d83bf7991b760b8589dc1a6ee97e3", + "checksum": "acd23335e6d31b84b9566ff2e30368d1", "grade": false, - "grade_id": "cell-03d4f05e17b28896", + "grade_id": "cell-c77eca26543c2632", "locked": true, "schema_version": 3, "solution": false, @@ -1272,44 +1244,11 @@ } }, "source": [ - "## Question 14\n", - "\n", - "Task: what letters are used in the first sentence of the `stringr::sentences` dataset? Make a vector of all the unique letters in the sentence (in lowercase), and store it in a variable called `answer14`. Don't forget to remove non-letters, which are either a space or a period.\n", + "## Question 13: Groups\n", "\n", - "Hint:\n", + "You can use parentheses not only to specify precendence, but also to indicate groups that you can refer to later using integers to refer to the group number. \n", "\n", - "```\n", - "answer14 <- sentences[1] %>% \n", - " str_remove_all(\"FILL_THIS_IN\") %>% \n", - " FILL_THIS_IN() %>% \n", - " str_split(FILL_THIS_IN) %>% \n", - " .[[1]] %>% \n", - " unique()\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "deletable": false, - "lines_to_next_cell": 2, - "nbgrader": { - "cell_type": "code", - "checksum": "dbbec46757dc78b206b8b78e5a5e125d", - "grade": false, - "grade_id": "cell-cfeb5f91ce0d581b", - "locked": false, - "schema_version": 3, - "solution": true, - "task": false - } - }, - "outputs": [], - "source": [ - "# your code here\n", - "fail() # No Answer - remove if you provide an answer\n", - "print(answer14)" + "Example using a's and b's: matching all instances of a character sandwiched between the same two characters:" ] }, { @@ -1320,11 +1259,10 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "e0c9f4b0e2afaef40b185a538232e67e", - "grade": true, - "grade_id": "cell-b3c3272abf48f2ad", + "checksum": "e9004e4f1ecc1b0b514ca82f774a64fc", + "grade": false, + "grade_id": "cell-c7b58e02356d6bdb", "locked": true, - "points": 1, "schema_version": 3, "solution": false, "task": false @@ -1332,10 +1270,8 @@ }, "outputs": [], "source": [ - "\n", - "test_that(\"Question 14\", {\n", - " expect_identical(digest(sort(answer14)), \"d586631001ba6d44947a09efecc4f960\")\n", - "})" + "ab <- c(\"aaa\", \"aab\", \"aba\", \"baa\", \"abb\", \"bab\", \"bba\", \"bbb\")\n", + "str_view(ab, pattern=\"(.)(.)\\\\1\")" ] }, { @@ -1345,9 +1281,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "e5c390f13febc20ea9217746e0a655f4", + "checksum": "ae6859a1ecd33072ef70cb188383504b", "grade": false, - "grade_id": "cell-df3ce76905844afa", + "grade_id": "cell-c9ccf764fb47c488", "locked": true, "schema_version": 3, "solution": false, @@ -1355,43 +1291,7 @@ } }, "source": [ - "## Question 15: Quantifiers/Repetition\n", - "\n", - "The handy ones are:\n", - "\n", - "- `*` for 0 or more\n", - "- `+` for 1 or more\n", - "- `?` for 0 or 1\n", - "\n", - "See list at https://r4ds.had.co.nz/strings.html#repetition\n", - "\n", - "Find all countries that have any number of \"o\"'s (but at least 1), following an \"r\". Store the resulting vector in a variable named `answer15`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "deletable": false, - "lines_to_next_cell": 2, - "nbgrader": { - "cell_type": "code", - "checksum": "bdd834120fed1d52fd0e3903f62df757", - "grade": false, - "grade_id": "cell-36d6437e624d72f5", - "locked": false, - "schema_version": 3, - "solution": true, - "task": false - } - }, - "outputs": [], - "source": [ - "# answer15 <- str_subset(countries, \"FILL_THIS_IN\")\n", - "\n", - "# your code here\n", - "fail() # No Answer - remove if you provide an answer\n", - "print(answer15)" + "Example: matching all instances of a character followed by two identical characters:" ] }, { @@ -1402,11 +1302,10 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "e123db3e9a26d13092730b9c43bc8034", - "grade": true, - "grade_id": "cell-2755ef7fa0a525de", + "checksum": "a01265a3ff1709becff530eda9ba8062", + "grade": false, + "grade_id": "cell-f86af2dbcd3966ea", "locked": true, - "points": 1, "schema_version": 3, "solution": false, "task": false @@ -1414,9 +1313,7 @@ }, "outputs": [], "source": [ - "test_that(\"Question 15\", {\n", - " expect_identical(digest(answer15), \"fa31d9cfe634b9a841cabdf9e31c0eeb\")\n", - "})" + "str_view(ab, pattern=\"(.)(.)\\\\2\")" ] }, { @@ -1426,9 +1323,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "679f5c28a0d8bfb4ebd9b2b9c1babbe0", + "checksum": "a714696895beafdec6e91973a5db7656", "grade": false, - "grade_id": "cell-0afbed6bfd2bdbc9", + "grade_id": "cell-b55551b61d558e4d", "locked": true, "schema_version": 3, "solution": false, @@ -1436,9 +1333,7 @@ } }, "source": [ - "## Question 16\n", - "\n", - "Find all countries that have either \"o\" or \"e\", twice in a row (with a changeover allowed, such as \"oe\" or \"eo\"). Store the resulting vector in a variable named `answer16`." + "Your task: Find all countries that have the same letter repeated twice (like \"Greece\", which has \"ee\"). Store the result in a vector named `answer12`." ] }, { @@ -1446,12 +1341,11 @@ "execution_count": null, "metadata": { "deletable": false, - "lines_to_next_cell": 2, "nbgrader": { "cell_type": "code", - "checksum": "9d9a06bfd16391b67222f5d640b05ff4", + "checksum": "89622420de7e027ee13cce625e9c3944", "grade": false, - "grade_id": "cell-65af3008d6fb9213", + "grade_id": "cell-7be4b6cfc12e36b5", "locked": false, "schema_version": 3, "solution": true, @@ -1460,11 +1354,12 @@ }, "outputs": [], "source": [ - "# answer16 <- str_subset(countries, \"FILL_THIS_IN\")\n", + "# str_view(countries, \"FILL_THIS_IN\")\n", + "# answer12 <- str_subset(countries, \"FILL_THIS_IN\")\n", "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "print(answer16)" + "print(answer12)" ] }, { @@ -1475,9 +1370,9 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "cc8608ef87c04418676fec184fd0ec26", + "checksum": "477c63ccafb621870001849fd660a041", "grade": true, - "grade_id": "cell-7e0ce46a9dedac69", + "grade_id": "cell-3250e4c8720f31b7", "locked": true, "points": 1, "schema_version": 3, @@ -1487,8 +1382,8 @@ }, "outputs": [], "source": [ - "test_that(\"Question 16\", {\n", - " expect_identical(digest(answer16), \"f64216702a5b71dfb2b4ae0661d084cb\")\n", + "test_that(\"Question 12\", {\n", + " expect_identical(digest(answer12), \"3531a88f6935e86d4ff1054504182875\")\n", "})" ] }, @@ -1499,9 +1394,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "2098c84a62682b66a3cc8ba224a9e919", + "checksum": "722c972b9def63562f4c332a788e83c6", "grade": false, - "grade_id": "cell-344db5435da777df", + "grade_id": "cell-79b27633b808bacd", "locked": true, "schema_version": 3, "solution": false, @@ -1509,64 +1404,32 @@ } }, "source": [ - "## Question 17: Position indicators\n", - "\n", - "Use:\n", - "\n", - "- `^` to correspond to the __beginning__ of a string.\n", - "- `$` to correspond to the __end__ of a string.\n", - "\n", - "Find all countries that end in \"land\". Store the result in a vector named `answer17`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "deletable": false, - "nbgrader": { - "cell_type": "code", - "checksum": "c1b4091ad6dc6a82c79478144809a4b6", - "grade": false, - "grade_id": "cell-35f0de5542c6d0d0", - "locked": false, - "schema_version": 3, - "solution": true, - "task": false - } - }, - "outputs": [], - "source": [ - "# answer17 <- str_subset(countries, \"FILL_THIS_IN\")\n", - "\n", - "# your code here\n", - "fail() # No Answer - remove if you provide an answer\n", - "print(answer17)" + "# Part 3: Stringr with regular expressions" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { - "cell_type": "code", - "checksum": "d80a8274200dae0f183d5733e23b4529", - "grade": true, - "grade_id": "cell-7dfee270fe57717c", + "cell_type": "markdown", + "checksum": "3d588fd85beb0354834e4b863fb778ac", + "grade": false, + "grade_id": "cell-8d9dd521604370f0", "locked": true, - "points": 1, "schema_version": 3, "solution": false, "task": false } }, - "outputs": [], "source": [ - "test_that(\"Question 17\", {\n", - " expect_identical(digest(answer17), \"692ee00b59194cea743c5ac3bf2302ae\")\n", - "})" + "Now that you have your bearings with stringr and with regular expressions, let's practice putting them together in (semi)realistic scenarios. \n", + "\n", + "Useful links: \n", + "- [Posit Strings cheatsheet](https://github.com/rstudio/cheatsheets/blob/main/strings.pdf) covers stringr on page 1 and regular expressions on page 2.\n", + "- [Regexlearn.com](https://regexlearn.com/) for another regular expressions tutorial (general, not specific to R). \n", + "- [Regexr](https://regexr.com/) is very helpful, especially when constructing more complex regular expressions." ] }, { @@ -1576,9 +1439,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "cc691efdd00ecf40411270d85fba433a", + "checksum": "eb8f2712003a0ff19f897df8f32b5b4e", "grade": false, - "grade_id": "cell-da95f6d7bd91757d", + "grade_id": "cell-eaebbc8f707f7c7a", "locked": true, "schema_version": 3, "solution": false, @@ -1586,9 +1449,17 @@ } }, "source": [ - "## Question 18\n", + "## Question 13\n", + "\n", + "Select individuals in the wedding tibble whose first name starts between \"A\" and \"Em\" inclusive, and sort them in alphabetical order by first name. Store the resulting tibble in a variable named `answer14`.\n", + "\n", + "Starter code:\n", "\n", - "Find all countries that start with \"Ca\". Store the result in a vector named `answer18`." + "```\n", + "answer13 <- wedding %>% \n", + " filter(FILL_THIS_IN(name, \"FILL_THIS_IN\")) %>% \n", + " arrange(name)\n", + "```" ] }, { @@ -1598,9 +1469,9 @@ "deletable": false, "nbgrader": { "cell_type": "code", - "checksum": "81a2736ca26104013e33e5f2ea82b4b2", + "checksum": "c1cf1c4a5e44804a670091467e5597f7", "grade": false, - "grade_id": "cell-715328be4a8115ac", + "grade_id": "cell-1e2d94b35a9a495f", "locked": false, "schema_version": 3, "solution": true, @@ -1609,11 +1480,9 @@ }, "outputs": [], "source": [ - "# answer18 <- str_subset(countries, pattern = \"FILL_THIS_IN\")\n", - "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "print(answer18)" + "print(answer13)" ] }, { @@ -1624,9 +1493,9 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "c1332933ef73eeef1a30bb840df157e5", + "checksum": "549a0e461b1a516682ec454beadff66c", "grade": true, - "grade_id": "cell-e437b5dfe4af255e", + "grade_id": "cell-c9b809048a680c8e", "locked": true, "points": 1, "schema_version": 3, @@ -1636,8 +1505,8 @@ }, "outputs": [], "source": [ - "test_that(\"Question 18\", {\n", - " expect_identical(digest(answer18), \"649cb10a94daec6fe36112c82e659b39\")\n", + "test_that(\"Question 13\", {\n", + " expect_identical(digest(sort(answer13$name)), \"6bbf440d3cca5b2e4b670b48f7bddc14\")\n", "})" ] }, @@ -1648,9 +1517,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "355ba9295bc97c98e00b77b6202e33a2", + "checksum": "78ae70c88a73e019cc59132f42052baf", "grade": false, - "grade_id": "cell-f46eabf657a76141", + "grade_id": "cell-659a9254e1ba185a", "locked": true, "schema_version": 3, "solution": false, @@ -1658,9 +1527,9 @@ } }, "source": [ - "## Question 19\n", + "## Question 14\n", "\n", - "Find all countries that only contain letters. Hint for making the regex: the word should start with a letter, continue as a letter, and end as a letter. Store the result in a vector named `answer19`." + "Add a column called `prop_vowels` to the `wedding_fl` tibble that contains the proportion of vowels in each first name. For example, \"Emaan\" has 3 vowels and 5 letters, so the proportion of vowels is 3/5 = 60\\%. Store the resulting tibble in a variable named `answer14`. " ] }, { @@ -1670,9 +1539,9 @@ "deletable": false, "nbgrader": { "cell_type": "code", - "checksum": "53a7367473493ee04fac5036ddefb12c", + "checksum": "68b676d4669af30d64cb2744f41ab3ac", "grade": false, - "grade_id": "cell-ab8eda58f3d1e1d7", + "grade_id": "cell-8b60c0985702eea5", "locked": false, "schema_version": 3, "solution": true, @@ -1681,11 +1550,9 @@ }, "outputs": [], "source": [ - "# answer19 <- str_subset(countries, \"FILL_THIS_IN\")\n", - "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "head(answer19)" + "print(answer14)" ] }, { @@ -1696,9 +1563,9 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "3a07c7cc6409f5ab87768d9bcd40a0b0", + "checksum": "97dc0c8a28d0e119af91cc907ce90672", "grade": true, - "grade_id": "cell-18188f0c252ada00", + "grade_id": "cell-a6cd23444c761b38", "locked": true, "points": 1, "schema_version": 3, @@ -1708,8 +1575,15 @@ }, "outputs": [], "source": [ - "test_that(\"Question 19\", {\n", - " expect_identical(digest(answer19), \"43b7b3e81361aa5d00367c2ff01ab240\")\n", + "test_that(\"Question 14\", {\n", + " expect_identical(\n", + " answer14 %>% \n", + " select(first, prop_vowels) %>% \n", + " arrange(first) %>%\n", + " mutate(prop_vowels = round(prop_vowels, digits = 3)) %>% \n", + " digest(), \n", + " \"f37d71c09ea60921ddd3959252db3f14\"\n", + " )\n", "})" ] }, @@ -1720,9 +1594,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "00f7fbe27e0a7466c5fc7314bada39d3", + "checksum": "34e3d2d4a77cfe181e2f8308be42cffb", "grade": false, - "grade_id": "cell-c77eca26543c2632", + "grade_id": "cell-01cb2b580c312841", "locked": true, "schema_version": 3, "solution": false, @@ -1730,96 +1604,20 @@ } }, "source": [ - "## Question 20: Groups\n", + "## Question 15\n", "\n", - "You can use parentheses not only to specify precendence, but also to indicate groups that you can refer to later using integers to refer to the group number. \n", + "Task: what letters are used in the first sentence of the `stringr::sentences` dataset? Make a vector of all the unique letters in the sentence (in lowercase), and store it in a variable called `answer15`. Don't forget to remove non-letters, which are either a space or a period.\n", "\n", - "Example using a's and b's: matching all instances of a character sandwiched between the same two characters:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "deletable": false, - "editable": false, - "nbgrader": { - "cell_type": "code", - "checksum": "018432fcb5eb9f1daea35c46e80d4905", - "grade": false, - "grade_id": "cell-c7b58e02356d6bdb", - "locked": true, - "schema_version": 3, - "solution": false, - "task": false - } - }, - "outputs": [], - "source": [ - "ab <- c(\"aaa\", \"aab\", \"aba\", \"baa\", \"abb\", \"bab\", \"bba\", \"bbb\")\n", - "str_view_all(ab, pattern=\"(.)(.)\\\\1\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": false, - "editable": false, - "nbgrader": { - "cell_type": "markdown", - "checksum": "ae6859a1ecd33072ef70cb188383504b", - "grade": false, - "grade_id": "cell-c9ccf764fb47c488", - "locked": true, - "schema_version": 3, - "solution": false, - "task": false - } - }, - "source": [ - "Example: matching all instances of a character followed by two identical characters:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "deletable": false, - "editable": false, - "nbgrader": { - "cell_type": "code", - "checksum": "f7f9996b5c968e252cafb316e34b999b", - "grade": false, - "grade_id": "cell-f86af2dbcd3966ea", - "locked": true, - "schema_version": 3, - "solution": false, - "task": false - } - }, - "outputs": [], - "source": [ - "str_view_all(ab, pattern=\"(.)(.)\\\\2\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": false, - "editable": false, - "nbgrader": { - "cell_type": "markdown", - "checksum": "2a05893dd01dad657de7bfe66c7c64e4", - "grade": false, - "grade_id": "cell-b55551b61d558e4d", - "locked": true, - "schema_version": 3, - "solution": false, - "task": false - } - }, - "source": [ - "Your task: Find all countries that have the same letter repeated twice (like \"Greece\", which has \"ee\"). Store the result in a vector named `answer20`." + "Hint:\n", + "\n", + "```\n", + "answer15 <- sentences[1] %>% \n", + " str_remove_all(\"FILL_THIS_IN\") %>% \n", + " FILL_THIS_IN() %>% \n", + " str_split(FILL_THIS_IN) %>% \n", + " .[[1]] %>% \n", + " unique()\n", + "```" ] }, { @@ -1829,9 +1627,9 @@ "deletable": false, "nbgrader": { "cell_type": "code", - "checksum": "6d5d30f02f22c6a589390472457d04c4", + "checksum": "0cd60d94bf66fa327f56254a7b8d0ac3", "grade": false, - "grade_id": "cell-7be4b6cfc12e36b5", + "grade_id": "cell-1c604d815b14c39f", "locked": false, "schema_version": 3, "solution": true, @@ -1840,11 +1638,9 @@ }, "outputs": [], "source": [ - "# answer20 <- str_subset(countries, \"FILL_THIS_IN\")\n", - "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "print(answer20)" + "print(answer15)" ] }, { @@ -1855,9 +1651,9 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "2f97359b22e2e135fed335b10dd05a83", + "checksum": "ee236652ce0edced5939b912d56c33c6", "grade": true, - "grade_id": "cell-3250e4c8720f31b7", + "grade_id": "cell-a221cbb3ad7d2176", "locked": true, "points": 1, "schema_version": 3, @@ -1867,8 +1663,8 @@ }, "outputs": [], "source": [ - "test_that(\"Question 20\", {\n", - " expect_identical(digest(answer20), \"3531a88f6935e86d4ff1054504182875\")\n", + "test_that(\"Question 15\", {\n", + " expect_identical(digest(sort(answer15)), \"d586631001ba6d44947a09efecc4f960\")\n", "})" ] }, @@ -1879,9 +1675,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "ba70647b1063716a9859ed8bd2fbdd2b", + "checksum": "e9e5d880ba17c0c5ac4521d378046445", "grade": false, - "grade_id": "cell-fc5fdfd2a45ca5a0", + "grade_id": "cell-21cd29c7f490f51b", "locked": true, "schema_version": 3, "solution": false, @@ -1889,34 +1685,9 @@ } }, "source": [ - "## Question 21\n", + "## Question 16\n", "\n", - "Find all countries that end in two vowels (not including \"y\"). Store the result in a vector named `answer21`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "deletable": false, - "nbgrader": { - "cell_type": "code", - "checksum": "e224c00df40b0beffbaf707bd1cf7b42", - "grade": false, - "grade_id": "cell-a70f4f7292285c77", - "locked": false, - "schema_version": 3, - "solution": true, - "task": false - } - }, - "outputs": [], - "source": [ - "# answer21 <- str_subset(countries, \"FILL_THIS_IN\")\n", - "\n", - "# your code here\n", - "fail() # No Answer - remove if you provide an answer\n", - "print(answer21)" + "Here is a tibble with made-up names and telephone numbers: " ] }, { @@ -1927,11 +1698,10 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "a004df9358426b6579ea4e0ec64b0fc9", - "grade": true, - "grade_id": "cell-9d6f26cfc4615bd7", + "checksum": "53495b064006c15cb28665740de8e002", + "grade": false, + "grade_id": "cell-74076a735448e3a9", "locked": true, - "points": 1, "schema_version": 3, "solution": false, "task": false @@ -1939,9 +1709,15 @@ }, "outputs": [], "source": [ - "test_that(\"Question 21\", {\n", - " expect_identical(digest(answer21), \"627ee4e6c6f8977349ed20734a52c1c0\")\n", - "})" + "contact <- tibble(name = c(\"Kayden Lavoie\", \n", + " \"Ethan Fortin\", \n", + " \"Emma Davis\", \n", + " \"Aliyah Chan\"), \n", + " phone = c(\"604-971-9949\", \n", + " \"6046182277\", \n", + " \"(778)881-5831\", \n", + " \"604-544-2554\"))\n", + "print(contact)" ] }, { @@ -1951,9 +1727,9 @@ "editable": false, "nbgrader": { "cell_type": "markdown", - "checksum": "8b3189cd3c01a7dc0fba1fe4c58e0f95", + "checksum": "bdc0870b86e0f7d741736549d2ff3a7c", "grade": false, - "grade_id": "cell-106aa3db7d84ad47", + "grade_id": "cell-69fc1804815e256b", "locked": true, "schema_version": 3, "solution": false, @@ -1961,9 +1737,9 @@ } }, "source": [ - "## Question 22\n", + "Unfortunately, these four people have entered in their phone numbers in different formats. Let's fix that, in the spirit of routine data cleaning. Change the `phone` column to have all four phone numbers match Aliyah Chan's format. Store the resulting tibble in a variable called `answer16`. \n", "\n", - "Find all countries that start with two non-vowels (don't count \"y\" as a vowel). Store the result in a vector named `answer22`." + "*Hint*: `mutate()`, `str_remove_all()`, `separate()`, and `unite()`. " ] }, { @@ -1973,9 +1749,9 @@ "deletable": false, "nbgrader": { "cell_type": "code", - "checksum": "4a39f6ab09a629c4f0e5500dd3803edc", + "checksum": "b715e1ab4567216f4a69104505824706", "grade": false, - "grade_id": "cell-75725c3de590a7d6", + "grade_id": "cell-753ecdad7449dc0d", "locked": false, "schema_version": 3, "solution": true, @@ -1984,11 +1760,9 @@ }, "outputs": [], "source": [ - "# answer22 <- str_subset(countries, \"FILL_THIS_IN\")\n", - "\n", "# your code here\n", "fail() # No Answer - remove if you provide an answer\n", - "print(answer22)" + "print(answer16)" ] }, { @@ -1999,9 +1773,9 @@ "editable": false, "nbgrader": { "cell_type": "code", - "checksum": "4dd656b9c4daf22bc3d071fa6751f77c", + "checksum": "791588fcdda0d94ef176b900a57ff459", "grade": true, - "grade_id": "cell-91b6d34ac7348626", + "grade_id": "cell-b9d9087649c148b5", "locked": true, "points": 1, "schema_version": 3, @@ -2011,33 +1785,11 @@ }, "outputs": [], "source": [ - "test_that(\"Question 22\", {\n", - " expect_identical(digest(answer22), \"2256308acab2196cc90529e7410881e0\")\n", + "test_that(\"Question 16\", {\n", + " expect_identical(answer16 %>% arrange(name) %>% digest(), \n", + " \"83e461c86c45469beecbbd011c6b11e6\")\n", "})" ] - }, - { - "cell_type": "markdown", - "metadata": { - "deletable": false, - "editable": false, - "nbgrader": { - "cell_type": "markdown", - "checksum": "4ac896b9f00f3a902338ef7c3d176f86", - "grade": false, - "grade_id": "cell-1e76794d76776fd1", - "locked": true, - "schema_version": 3, - "solution": false, - "task": false - } - }, - "source": [ - "## More Practice\n", - "\n", - "Want more interactive practice? Check out this [regex crossword](https://regexcrossword.com/challenges/beginner/puzzles/1).\n", - "\n" - ] } ], "metadata": { @@ -2057,7 +1809,7 @@ "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", - "version": "4.1.2" + "version": "4.3.1" } }, "nbformat": 4, diff --git a/content/worksheets/worksheet_b02.ipynb.zip b/content/worksheets/worksheet_b02.ipynb.zip index b2743456..50c19d96 100644 Binary files a/content/worksheets/worksheet_b02.ipynb.zip and b/content/worksheets/worksheet_b02.ipynb.zip differ