-
Notifications
You must be signed in to change notification settings - Fork 1
/
Part2-Fig2Recreation.R
207 lines (207 loc) · 8.76 KB
/
Part2-Fig2Recreation.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
#' ---
#' title: Figure 2
#' output: html_document
#' ---
#'
#' Reading in Data and Example Analysis
#' ------------------------
#'
#' In this exercise, we will learn how to read in data as an object in R, find and load
#' packages into our library, and then re-create Figure 2 from the 2017 paper by Morini et al.
#' paper "Control of white mold of dry bean and residual activity of fungicides applied
#' by chemigation."
#'
#' We can use the `read.table()` function to read these data in to R. It's
#' important to remember that while in R, these data are simply a copy kept *in
#' memory*, not on the disk, so we don't have to worry too much about
#' accidentally deleting the data :).
#'
#' So, how do we actually USE the `read.table()` function? A good first step to
#' figuring out how you can use a function is to look at it's help page. The way
#' you can do that is by typing either help("function_name") or ?function_name.
#'
#'
#'
# Type ?read.table and answer these three questions:
#
# 1. What does it do? (Description)
# 2. What are the first three arguments and their defaults? (Usage/Arguments)
# 3. What does it return? (Value)
#'
#'
#' In order to read our data into R, we will need to provide three things:
#'
#' 1. The path to the data set : Results2017_combined-class.csv
#' 2. If the first row are column names : yes
#' 3. The separator for each cell in the data : comma
#'
#' Now that we have these elements, we can read our data into a variable, which
#' we can call "res" because it is short for "results". Once we do this, we should
#' check the dimensions to make sure that we have all of the data.
#'
#' Now we can read in the data and inspect to make sure it is what we expect.
res <- read.csv("Results2017_combined-class.csv", head = TRUE, stringsAsFactors = FALSE)
head(res)
dim(res)
#' This shows the data in columns named DAI, Number, Your.name, Area, and
#' Treatment.
#'
#'
# Thinking before doing: Your data and the analysis
# --------------------------------------------
#
# The example data presented here were collected by students in the 2017
# PLPT802 class and should represent the same data that this year's class collected.
# This data consists of six columns of data: DAI, Number, Your.name, Area, Treatment, and Block.
# With these data, we want to answer the following questions:
#
# 1. Can we convert lesion areas to a measurement that is relative to the control?
# 2. What do we want to plot in order to demonstrate the residual disease control over
# time by treatment?
#
# To answer these questions, we will need to manipulate the layout of the data and will also
# need to write a function to perform a calculation and then plot the results of the
# calculations over time. Not all data manipulations will be described in detail here and
# are shown so that you can re-run this analysis using data collected in class. We will
# do this using the following four steps:
#
# 1. Create a function
# 2. Spread the data in different columns by treatment
# 3. Calculate percent disease control
# 4. Summarize the average lesion area in a plot
#
#
#' Preparing Data and Packages
#' ===========================
#'
#' We will re-create the figure using three packages dplyr, tidyr, and ggplot2. These packages
#' add flexibility to data and figure manipulations. We will not go into these in detail. Refer
#' to the tab in this website called "Other Exercises" to get started learning more.
#'
#+ package_loading, warning = FALSE, message = FALSE
library("dplyr")
library("tidyr")
library("ggplot2")
#' Reordering factors
#' ------------------
#'
#' The factors in the treatments will be out of order when plotting because it's
#' always alphabetical by default (and 1 comes before 2). Here we are reordering
#' the factors:
#'
unique(res$Treatment) # The correct order
res <- mutate(res, Treatment = factor(Treatment, levels = unique(Treatment)))
levels(res$Treatment)
#'
#' Calculating Percent Disease Control
#' ====================================
#'
#' Percent disease control is "estimated as the difference in lesion area of the
#' control and treatment, divided by lesion area of the control and expressed as
#' percent." Because we will have to make this calculation many times, it's
#' best to write a function for it.
#'
#' Step 1: Create a function
#' --------------------------
#'
#' Our function will take in two numbers, the lesion area of the control and
#' the lesion area of the treatment.
percent_control <- function(control, treatment){
res <- (control - treatment)/control # estimate the disease control
return(res*100) # express as percent
}
#'
#' We can show that this works:
percent_control(control = 10, treatment = 5)
#'
#' Step 2: Spread the data in different columns by treatment
#' ---------------------------------------------------------
#'
#' Our data were recorded in a "tidy" fashion in which we had one observation
#' per row. In order to calculate the percent control, we'll need to rehshape
#' our data so that we have one column per treatment such that each row will
#' represent a single block per day after application. We'll use the *tidyr*
#' function `spread()` to do this.
blocks <- res %>%
select(DAI, Block, Treatment, Area) %>% # We don't need name or number here
spread(Treatment, Area) # make new columns from treatment, and fill with Area
blocks
#'
#' Step 3: Calculate percent disease control
#' ---------------------------------
#'
#' Now that we have reshaped our data, we can manipulate each treatment column
#' to give us the percent control.
# Note: the backtics "`" allow R to recognize a variable with spaces.
percents <- blocks %>%
mutate(`2.53 mm water` = percent_control(control, `2.53 mm water`)) %>%
mutate(`5.07 mm water` = percent_control(control, `5.07 mm water`)) %>%
mutate(`10.13 mm water` = percent_control(control, `10.13 mm water`)) %>%
mutate(control = percent_control(control, 0))
percents
#' Because figure 2 plotted the average value, we want to summarize our data in
#' averages. To do this, we need to convert our data back to tidy format by
#' using the *tidyr* function `gather()`:
percents <- percents %>%
gather(key = Treatment, value = Area, -DAI, -Block) %>%
mutate(Treatment = factor(Treatment, levels = unique(Treatment))) # reset factor
percents
#'
#' Step 4: Summarize the average lesion area
#' -----------------------------------------
#'
#' We can summarize the average area per DAI and Treatment, which will allow us
#' to plot the data in the manner of Morini *et al.* 2017.
avgs <- percents %>%
group_by(DAI, Treatment) %>%
summarize(meanArea = mean(Area)) %>%
ungroup()
avgs
#'
#' Now that we have our averages, we can plot it using *ggplot2*
#+ fig.width = 6, fig.height = 4
ggplot(avgs, aes(x = DAI, y = meanArea, group = Treatment)) +
geom_point(aes(pch = Treatment), size = 3) + # plot the points
stat_smooth(aes(lty = Treatment), method = "lm", se = FALSE, color = "black") + # plot the regression
theme_classic() + # change the appearance
ylim(0, 100) + # set the limits on the y axis
theme(text = element_text(size = 14)) + # increase the text size
labs(list( # Set the labels
x = "Days after fluazinam application",
y = "Percent disease control",
pch = "Irrigation levels",
lty = "Irrigation levels"))
#'
#' Additional visualizations
#' ==========================
#'
#' When we plot the averages, we inadvertently hide the data. Of course, if we
#' tried to plot all the data in one graph, it would look a bit messy:
#+ fig.width = 6, fig.height = 4
ggplot(percents, aes(x = DAI, y = Area, group = Treatment)) +
geom_point(aes(pch = Treatment), size = 3) + # plot the points
stat_smooth(aes(lty = Treatment), method = "lm", se = FALSE, color = "black") + # plot the regression
theme_classic() + # change the appearance
ylim(0, 100) + # set the limits on the y axis
theme(text = element_text(size = 14)) + # increase the text size
labs(list( # Set the labels
x = "Days after fluazinam application",
y = "Percent disease control",
pch = "Irrigation levels",
lty = "Irrigation levels"))
#'
#' With ggplot2, we can spread the data out into "facets":
#+ fig.width = 8, fig.height = 3
ggplot(percents, aes(x = DAI, y = Area, group = Treatment)) +
geom_point(aes(pch = Treatment), size = 2) + # plot the points
stat_smooth(aes(lty = Treatment), method = "lm", se = FALSE, color = "black") + # plot the regression
theme_classic() + # change the appearance
ylim(0, 100) + # set the limits on the y axis
theme(text = element_text(size = 14)) + # increase the text size
facet_wrap(~Treatment, nrow = 1) +
theme(aspect.ratio = 1) +
labs(list( # Set the labels
x = "Days after fluazinam application",
y = "Percent disease control",
pch = "Irrigation levels",
lty = "Irrigation levels"))