Skip to content

Latest commit

 

History

History
263 lines (188 loc) · 7.77 KB

d1l5.ggplot2.md

File metadata and controls

263 lines (188 loc) · 7.77 KB
layout title author changes minutes
topic
ggplot2
changed the examples and the introduction
30

Learning Objectives

  • learn to plot data using ggplot2

Using ggplot2: graphs in layers


R has very simple and valid set of commands for making plots, however more recently a plotting system named ggplot have been developed to improve graphical representations. Have a look at the [ggplot2 website]](http://ggplot2.org/) for more details and instruction for download and install. ggplot2 was developed by Hadley Wickham (ggplot2, Use R, DOI 10.1007/978-0-387-98141_1, © Springer Science+Business Media, LLC 2009) according to the grammar of graphics (Wilkinson, 2005):

ggplot2 is designed to work in a layered fashion, starting with a layer showing the raw data then adding layers of annotation and statistical summaries. [..]

Using ggplot2 has many advantages that will be clear through the practical part of this tutorial.

What Is The Grammar Of Graphics?

The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want. Building blocks of a graph include:

  • data
  • aesthetic mapping
  • geometric object
  • statistical transformations
  • scales
  • coordinate system
  • position adjustments
  • faceting

The structure of a ggplot

The ggplot() function is used to initialize the basic graph structure, then we add to it. The structure of a ggplot looks like this:

  ggplot(data = <default data set>,
         aes(x = <default x axis variable>,
             y = <default y axis variable>,
             ... <other default aesthetic mappings>),
         ... <other plot defaults>) +

         geom_<geom type>(aes(size = <size variable for this geom>,
                        ... <other aesthetic mappings>),
                    data = <data for this point geom>,
                    stat = <statistic string or function>,
                    position = <position string or function>,
                    color = <"fixed color specification">,
                    <other arguments, possibly passed to the _stat_ function) +

    scale_<aesthetic>_<type>(name = <"scale label">,
                       breaks = <where to put tick marks>,
                       labels = <labels for tick marks>,
                       ... <other options for the scale>) +

    theme(plot.background = element_rect(fill = "gray"),
          ... <other theme elements>)

Don't be afraid, you will understand this by the end of the tutorial! The basic idea is that you specify different parts of the plot, and add them together using the + operator.

Example Data: iris

We will use a data set that comes with the basic version of R named iris. Check in the data set page info bout rock. To visualize it type the data set name on your R prompt and have a look at the table:

  iris
  head(iris[1:4, ])  # *first four rows, all columns*

Also to have an idea of the data.frame structure type:

str(iris)
'data.frame':	150 obs. of  5 variables:
$ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

As you see the last column is the species. To find out how many different species there are:

levels(iris$Species)
[1] "setosa"     "versicolor" "virginica"

ggplot2 VS Base Graphics

Compared to base graphics, ggplot2

  • is more verbose for simple / canned graphics
  • is less verbose for complex / custom graphics
  • does not have methods (data should always be in a data.frame)
  • uses a different system for adding plot elements

ggplot2 VS Base for simple graphs

Base graphics histogram example:

  hist(iris$Sepal.Length)

ggplot2 histogram example:

  library(ggplot2)
  ggplot(iris, aes(x =Sepal.Length )) +
    geom_histogram()

Base wins!

ggplot2 Base graphics VS ggplot for more complex graphs:

Base colored scatter plot example:

  plot(Sepal.Length ~ Sepal.Width, data=subset(iris, Species=="setosa"))
  points(Sepal.Length ~ Sepal.Width, data=subset(iris, Species=="versicolor"), col="red")
  legend(3.8, 4.7,
         c("setosa", "versicolor"), title="Species",
         col=c("black", "red"),
         pch=c(1, 1))

ggplot2 colored scatter plot example:

  ggplot(subset(iris, Species %in% c("setosa", "versicolor")),
         aes(x=Sepal.Length, y=Sepal.Width, color=Species))+
    geom_point()

ggplot2 wins!

If we don't have to subset it can be even easier:

  ggplot(iris,
         aes(x=Sepal.Length, y=Sepal.Width, color=Species))+
    geom_point()

Geometric Objects And Aesthetics

Aesthetic Mapping

In ggplot land aesthetic means "something you can see". Examples include:

  • position (i.e., on the x and y axes)
  • color ("outside" color)
  • fill ("inside" color)
  • shape (of points)
  • linetype
  • size

Each type of geom accepts only a subset of all aesthetics--refer to the geom help pages to see what mappings each geom accepts. Aesthetic mappings are set with the aes() function.

Geometic Objects (geom)

Geometric objects are the actual marks we put on a plot. Examples include:

  • points (geom_point, for scatter plots, dot plots, etc)
  • lines (geom_line, for time series, trend lines, etc)
  • boxplot (geom_boxplot, for, well, boxplots!)

A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator

You can get a list of available geometric objects using the code below:

  help.search("geom_", package = "ggplot2")

or simply type geom_<tab> in any good R IDE (such as Rstudio or ESS) to see a list of functions starting with geom_.

Points (Scatterplot)

Now that we know about geometric objects and aesthetic mapping, we can make a ggplot. geom_point requires mappings for x and y, all others are optional.

  ggplot(iris,
         aes(y = Petal.Length , x = Sepal.Length )) +
    geom_point()

Now try also

  ggplot(iris,
         aes(y = Petal.Length , x = Sepal.Length )) +
    geom_point( aes(color=Species) )

We can also assign the plot to an object

myplot <- ggplot(iris, aes(y = Petal.Length , x = Sepal.Length ))

Text (Label Points)

Each geom accepts a particualar set of mappings--for example geom_text() accepts a labels mapping.

  myplot  +
    geom_text(aes(label=Species), size = 3)

Also try this:

myplot  + geom_text(aes(label=Species, color=Species), size = 3)

Aesthetic Mapping VS Assignment

Note that variables are mapped to aesthetics with the aes() function, while fixed aesthetics are set outside the aes() call. This sometimes leads to confusion, as in this example:

  myplot +
    geom_point(aes(size = 2),# incorrect! 2 is not a variable
               color="red") # this is fine -- all points red

Mapping Variables To Other Aesthetics

Other aesthetics are mapped in the same way as x and y in the previous example.

 myplot +
    geom_point(aes(color=Species, shape = Species))