You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, let me say how great this is that you've shared this for evaluation and comments from everyone in the company. Bold move, but it's awesome how welcome to feedback you are on this. I hope you find this helpful:
I usually say that the library() function loads the package INTO the library of all other R packages installed.
It might be a little confusing to call it titanic since there is a built-in Titanic dataset in R. Maybe passengers_on_titanic? And show the power of tab complete.
I think instead of hopping in immediately, I'd frame it as the important questions you'd like to be able to solve and then show how dplyr can help you do so with simplicity.
It's worth mentioning that you are on a Mac so the shortcut is Cmd + Shift but it is Ctrl + Shift on Windows.
I like to mention that %>% can be read "and then" so that you can really read dplyr code as a sentence.
Make sure to note that filter() chooses only the rows that match that condition. So there are now 577 rows out of the 891 rows. It doesn't reduce the number of columns.
I really like the typos and walking through how to fix the errors that arise. I hope you also do this in the live session, both intentionally and unintentionally 😄.
I'd reiterate that mutate() can be used to create new columns but also modify existing columns in much the same way that a mutation may from a biological perspective. It's not exactly right but a nice way to provide context for verb choice.
?ifelse is really helpful in that it tells you that yes is the second argument and no is the third argument. That's how I remember. It might also be better to use if_else instead of ifelse for consistency sake.
More technically, I like to think about the aes() function as a mapping of the aesthetics of the plot to the variables in the data. I've found this helpful for beginners to be able to read their code off as well just like with the %>%. Anytime someone wants to do a mapping of one of the variables to a plot's aesthetics it has to be wrapped inside of the aes() function. Students frequently will wrap things like color = "black" in aes() as well and this usually comes about because they think everything has to go in aes().
I'd just run ggplot(titanic, aes(x = Sex)) so that viewers see the blank canvas that has been created and then do a +. Worth noting that the code won't run if you put + to begin a line maybe too?
You can also add position = "fill" to geom_bar() to show the percentages instead of raw counts. Not sure if that is what you are after here though.
I'd also pause to carefully read over the ggplot() code to discuss how it can be read in sentence form just like dplyr() code can. "We take the data as titanic and we map Age to the x axis and Fare to the y axis, adding points on as the layer of the plot." Telling the story helps beginners put this all together.
Being able to map color to a variable with legend automatically generated is what makes ggplot particularly awesome. You could also show that color = "black" by default in geom_bar() but you can set it otherwise to map to values of a variable.
I'd be explicit in writing out color instead of col as well since beginners frequently read col as "column" and it's a point of confusion.
Technically the aesthetic is alpha that corresponds to transparency.
I know that you don't have a ton of time here but when you add aes() into ggplot() you are assigning aesthetic mappings on a global scales for all layers to follow. Students are frequently amazed to know that the mapping argument exists in any of the geom_* functions so you can specify exactly how you'd like each of them to be coordinated instead of at the global level across all layers too.
The ~ is particularly useful when you want to create multiple plots across multiple variables y ~ x for instance to create a 2D grid.
summarize() also works because Hadley is extremely friendly to everyone 😃 .
Love that you spent a brief moment to go over what the median is.
The text was updated successfully, but these errors were encountered:
If you have time I'd show what happens when you do filter(Sex = "male") and make a mention that there is a tidyverse error guide too.
Love your use of multiple filter()s and then together and then to group_by(). Really slick and intuitive!
Maybe make a note that meanFare and Survivors are the names of the variables in the newly created aggregated data set.
n() returns the total number in each group because of the use of group_by(). It might be worth introducing n() outside of this since it's a little tricky for beginners to get. If you want to be really bold, you can use mean(Survived) to show that's the same as sum(Survived)/n() because Survived is 1 or 0, but I'll leave that up to your discretion.
Awesome stuff, @hugobowne! Can't wait for the reception of this.
First, let me say how great this is that you've shared this for evaluation and comments from everyone in the company. Bold move, but it's awesome how welcome to feedback you are on this. I hope you find this helpful:
library()
function loads the package INTO the library of all other R packages installed.titanic
since there is a built-inTitanic
dataset in R. Maybepassengers_on_titanic
? And show the power of tab complete.dplyr
can help you do so with simplicity.%>%
can be read "and then" so that you can really readdplyr
code as a sentence.filter()
chooses only the rows that match that condition. So there are now 577 rows out of the 891 rows. It doesn't reduce the number of columns.mutate()
can be used to create new columns but also modify existing columns in much the same way that a mutation may from a biological perspective. It's not exactly right but a nice way to provide context for verb choice.?ifelse
is really helpful in that it tells you thatyes
is the second argument andno
is the third argument. That's how I remember. It might also be better to useif_else
instead ofifelse
for consistency sake.aes()
function as amapping
of theaes
thetics of the plot to the variables in the data. I've found this helpful for beginners to be able to read their code off as well just like with the%>%
. Anytime someone wants to do a mapping of one of the variables to a plot's aesthetics it has to be wrapped inside of theaes()
function. Students frequently will wrap things likecolor = "black"
inaes()
as well and this usually comes about because they think everything has to go inaes()
.ggplot(titanic, aes(x = Sex))
so that viewers see the blank canvas that has been created and then do a+
. Worth noting that the code won't run if you put+
to begin a line maybe too?position = "fill"
togeom_bar()
to show the percentages instead of raw counts. Not sure if that is what you are after here though.ggplot()
code to discuss how it can be read in sentence form just likedplyr()
code can. "We take the data as titanic and we map Age to the x axis and Fare to the y axis, adding points on as the layer of the plot." Telling the story helps beginners put this all together.color
to a variable with legend automatically generated is what makes ggplot particularly awesome. You could also show thatcolor = "black"
by default ingeom_bar()
but you can set it otherwise to map to values of a variable.color
instead ofcol
as well since beginners frequently readcol
as "column" and it's a point of confusion.alpha
that corresponds to transparency.aes()
intoggplot()
you are assigning aesthetic mappings on a global scales for all layers to follow. Students are frequently amazed to know that themapping
argument exists in any of thegeom_*
functions so you can specify exactly how you'd like each of them to be coordinated instead of at the global level across all layers too.~
is particularly useful when you want to create multiple plots across multiple variablesy ~ x
for instance to create a 2Dgrid
.summarize()
also works because Hadley is extremely friendly to everyone 😃 .The text was updated successfully, but these errors were encountered: