Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some comments after watching the trial run #9

Open
15 of 20 tasks
ismayc opened this issue Apr 3, 2018 · 2 comments
Open
15 of 20 tasks

Some comments after watching the trial run #9

ismayc opened this issue Apr 3, 2018 · 2 comments

Comments

@ismayc
Copy link

ismayc commented Apr 3, 2018

First, let me say how great this is that you've shared this for evaluation and comments from everyone in the company. Bold move, but it's awesome how welcome to feedback you are on this. I hope you find this helpful:

  • I usually say that the library() function loads the package INTO the library of all other R packages installed.
  • It might be a little confusing to call it titanic since there is a built-in Titanic dataset in R. Maybe passengers_on_titanic? And show the power of tab complete.
  • I think instead of hopping in immediately, I'd frame it as the important questions you'd like to be able to solve and then show how dplyr can help you do so with simplicity.
  • It's worth mentioning that you are on a Mac so the shortcut is Cmd + Shift but it is Ctrl + Shift on Windows.
  • I like to mention that %>% can be read "and then" so that you can really read dplyr code as a sentence.
  • Make sure to note that filter() chooses only the rows that match that condition. So there are now 577 rows out of the 891 rows. It doesn't reduce the number of columns.
  • I really like the typos and walking through how to fix the errors that arise. I hope you also do this in the live session, both intentionally and unintentionally 😄.
  • I'd reiterate that mutate() can be used to create new columns but also modify existing columns in much the same way that a mutation may from a biological perspective. It's not exactly right but a nice way to provide context for verb choice.
  • ?ifelse is really helpful in that it tells you that yes is the second argument and no is the third argument. That's how I remember. It might also be better to use if_else instead of ifelse for consistency sake.
  • More technically, I like to think about the aes() function as a mapping of the aesthetics of the plot to the variables in the data. I've found this helpful for beginners to be able to read their code off as well just like with the %>%. Anytime someone wants to do a mapping of one of the variables to a plot's aesthetics it has to be wrapped inside of the aes() function. Students frequently will wrap things like color = "black" in aes() as well and this usually comes about because they think everything has to go in aes().
  • I'd just run ggplot(titanic, aes(x = Sex)) so that viewers see the blank canvas that has been created and then do a +. Worth noting that the code won't run if you put + to begin a line maybe too?
  • You can also add position = "fill" to geom_bar() to show the percentages instead of raw counts. Not sure if that is what you are after here though.
  • I'd also pause to carefully read over the ggplot() code to discuss how it can be read in sentence form just like dplyr() code can. "We take the data as titanic and we map Age to the x axis and Fare to the y axis, adding points on as the layer of the plot." Telling the story helps beginners put this all together.
  • Being able to map color to a variable with legend automatically generated is what makes ggplot particularly awesome. You could also show that color = "black" by default in geom_bar() but you can set it otherwise to map to values of a variable.
  • I'd be explicit in writing out color instead of col as well since beginners frequently read col as "column" and it's a point of confusion.
  • Technically the aesthetic is alpha that corresponds to transparency.
  • I know that you don't have a ton of time here but when you add aes() into ggplot() you are assigning aesthetic mappings on a global scales for all layers to follow. Students are frequently amazed to know that the mapping argument exists in any of the geom_* functions so you can specify exactly how you'd like each of them to be coordinated instead of at the global level across all layers too.
  • The ~ is particularly useful when you want to create multiple plots across multiple variables y ~ x for instance to create a 2D grid.
  • summarize() also works because Hadley is extremely friendly to everyone 😃 .
  • Love that you spent a brief moment to go over what the median is.
@ismayc
Copy link
Author

ismayc commented Apr 3, 2018

A few more:

  • If you have time I'd show what happens when you do filter(Sex = "male") and make a mention that there is a tidyverse error guide too.
  • Love your use of multiple filter()s and then together and then to group_by(). Really slick and intuitive!
  • Maybe make a note that meanFare and Survivors are the names of the variables in the newly created aggregated data set.
  • n() returns the total number in each group because of the use of group_by(). It might be worth introducing n() outside of this since it's a little tricky for beginners to get. If you want to be really bold, you can use mean(Survived) to show that's the same as sum(Survived)/n() because Survived is 1 or 0, but I'll leave that up to your discretion.

Awesome stuff, @hugobowne! Can't wait for the reception of this.

@hugobowne
Copy link

thanks, @ismayc , super helpful. I've incorporated a bunch of these into the Rmd and will attempt to remember the rest. wonderful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants