layout | title | author | changes | minutes |
---|---|---|---|---|
topic |
ggplot2 |
adapted from Ista Zahn https://github.com/izahn/workshops/edit/master/R/Rgraphics/Rgraphics.md |
changed the examples and the introduction |
30 |
- learn to plot data using ggplot2
R has very simple and valid set of commands for making plots, however more recently a plotting system named ggplot have been developed to improve graphical representations. Have a look at the [ggplot2 website]](http://ggplot2.org/) for more details and instruction for download and install. ggplot2 was developed by Hadley Wickham (ggplot2, Use R, DOI 10.1007/978-0-387-98141_1, © Springer Science+Business Media, LLC 2009) according to the grammar of graphics
(Wilkinson, 2005):
ggplot2 is designed to work in a layered fashion, starting with a layer showing the raw data then adding layers of annotation and statistical summaries. [..]
Using ggplot2 has many advantages that will be clear through the practical part of this tutorial.
The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want. Building blocks of a graph include:
- data
- aesthetic mapping
- geometric object
- statistical transformations
- scales
- coordinate system
- position adjustments
- faceting
The ggplot()
function is used to initialize the basic graph structure, then we add to it. The structure of a ggplot looks like this:
ggplot(data = <default data set>,
aes(x = <default x axis variable>,
y = <default y axis variable>,
... <other default aesthetic mappings>),
... <other plot defaults>) +
geom_<geom type>(aes(size = <size variable for this geom>,
... <other aesthetic mappings>),
data = <data for this point geom>,
stat = <statistic string or function>,
position = <position string or function>,
color = <"fixed color specification">,
<other arguments, possibly passed to the _stat_ function) +
scale_<aesthetic>_<type>(name = <"scale label">,
breaks = <where to put tick marks>,
labels = <labels for tick marks>,
... <other options for the scale>) +
theme(plot.background = element_rect(fill = "gray"),
... <other theme elements>)
Don't be afraid, you will understand this by the end of the tutorial! The basic idea is that you specify different parts of the plot, and add them together using the +
operator.
We will use a data set that comes with the basic version of R named iris
. Check in the data set page info bout rock. To visualize it type the data set name on your R prompt and have a look at the table:
iris
head(iris[1:4, ]) # *first four rows, all columns*
Also to have an idea of the data.frame structure type:
str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
As you see the last column is the species. To find out how many different species there are:
levels(iris$Species)
[1] "setosa" "versicolor" "virginica"
Compared to base graphics, ggplot2
- is more verbose for simple / canned graphics
- is less verbose for complex / custom graphics
- does not have methods (data should always be in a
data.frame
) - uses a different system for adding plot elements
Base graphics histogram example:
hist(iris$Sepal.Length)
ggplot2
histogram example:
library(ggplot2)
ggplot(iris, aes(x =Sepal.Length )) +
geom_histogram()
Base wins!
Base colored scatter plot example:
plot(Sepal.Length ~ Sepal.Width, data=subset(iris, Species=="setosa"))
points(Sepal.Length ~ Sepal.Width, data=subset(iris, Species=="versicolor"), col="red")
legend(3.8, 4.7,
c("setosa", "versicolor"), title="Species",
col=c("black", "red"),
pch=c(1, 1))
ggplot2
colored scatter plot example:
ggplot(subset(iris, Species %in% c("setosa", "versicolor")),
aes(x=Sepal.Length, y=Sepal.Width, color=Species))+
geom_point()
ggplot2
wins!
If we don't have to subset it can be even easier:
ggplot(iris,
aes(x=Sepal.Length, y=Sepal.Width, color=Species))+
geom_point()
In ggplot land aesthetic means "something you can see". Examples include:
- position (i.e., on the x and y axes)
- color ("outside" color)
- fill ("inside" color)
- shape (of points)
- linetype
- size
Each type of geom accepts only a subset of all aesthetics--refer to the geom help pages to see what mappings each geom accepts. Aesthetic mappings are set with the aes()
function.
Geometric objects are the actual marks we put on a plot. Examples include:
- points (
geom_point
, for scatter plots, dot plots, etc) - lines (
geom_line
, for time series, trend lines, etc) - boxplot (
geom_boxplot
, for, well, boxplots!)
A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the +
operator
You can get a list of available geometric objects using the code below:
help.search("geom_", package = "ggplot2")
or simply type geom_<tab>
in any good R IDE (such as Rstudio or ESS) to see a list of functions starting with geom_
.
Now that we know about geometric objects and aesthetic mapping, we can make a ggplot. geom_point
requires mappings for x and y, all others are optional.
ggplot(iris,
aes(y = Petal.Length , x = Sepal.Length )) +
geom_point()
Now try also
ggplot(iris,
aes(y = Petal.Length , x = Sepal.Length )) +
geom_point( aes(color=Species) )
We can also assign the plot to an object
myplot <- ggplot(iris, aes(y = Petal.Length , x = Sepal.Length ))
Each geom
accepts a particualar set of mappings--for example geom_text()
accepts a labels
mapping.
myplot +
geom_text(aes(label=Species), size = 3)
Also try this:
myplot + geom_text(aes(label=Species, color=Species), size = 3)
Note that variables are mapped to aesthetics with the aes()
function, while fixed aesthetics are set outside the aes()
call. This sometimes leads to confusion, as in this example:
myplot +
geom_point(aes(size = 2),# incorrect! 2 is not a variable
color="red") # this is fine -- all points red
Other aesthetics are mapped in the same way as x and y in the previous example.
myplot +
geom_point(aes(color=Species, shape = Species))