-
Notifications
You must be signed in to change notification settings - Fork 7
/
Intro_R.Rmd
338 lines (241 loc) · 8.28 KB
/
Intro_R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
% Basic introduction to R
Some online resources:
http://www.introductoryr.co.uk/R_Resources_for_Beginners.html
http://www.statmethods.net/index.html
http://www.ats.ucla.edu/stat/r/
http://www.r-tutor.com/r-introduction
####**R as a calculator**
R has basics operations to add, subtract, multiply and divide. Also, R defines certain calculations, such as pi, based on alpha-numeric nomenclature.
```{r, R as a basic calculator}
7+2
5/2
6*9
1000-89
12/pi
```
####**Creating Objects**
This first step enables us to start creating data vectors, or more simply a list of information of interest.
```{r, Creating objects in R}
X <- 43
X
x <- 23
x
ls() ### To see that we have both "X" and "x"
Y <- c(3,2,6)
Y
Y2 <- c(3,X,x)
Y2
Y3 <- c(Y,Y2)
Y3
```
####**Sequences of numbers**
The following code illustrates different examples of how one can define a list (vector) of information.
```{r}
Set1 <- c(1,2,3,4,5,6,7,8,9)
Set2 <- 1:9
Set3 <- seq(1,10)
Set4 <- seq(1,10,0.5)
### Set4 comes from the following code, more formally defined: seq(from=1,to=10,by=0.5)
Set4
Set1
Set2
Set3
```
####**Vectors of data and information**
```{r, Vectors}
X <- c(4,2,7)
X2 <- seq(3,6)
# Combining vectors of information: cbind (column) or rbind (row)
X <- c(1,2,3)
Y <- c(7,8,9)
Z <- cbind(X,Y)
Z
Z2 <- rbind(X,Y)
Z2
# Understanding how R interprets different types of vectors
example=c(1,2,3,4,5,6,7,8,9,10)
class(example) #numeric
new.example=c("A","B","C","D")
class(new.example) #character
```
####**Matrices and arrays**
Matrices and arrays are ways to organize data into a collection of data entries (rows and columns, along with subgroups of information).
```{r}
# Matrix
Mat1 <- matrix(1,ncol=3,nrow=4) #Matrix of 1's, with 4 rows and 3 columns
Mat1
#Let's use X and Y previously defined to make a matrix of data
Z <- c(X,Y)
Z
Mat2 <- matrix(Z,ncol=2,byrow=F)
Mat2
class(Mat2)
```
####**Creating a matrix using "plant pathological" data**
```{r, Example}
disease=matrix(c(1,1,1,1,2,2,2,2,5,15,35,55,11,30,61,75),ncol=2,nrow=8)
colnames(disease)=c("Trt","Sev")
disease
is.matrix(disease)
```
**Evaluating a vector - basic methods**
Basic calculations like:
mean()
median()
var()
summary() - Notice here we are provided with a basic 5 (6) number summary.
```{r, Calculations made on a matrix}
X <- rnorm(20,mean=5,sd=2)
#rnorm() generates a random vector of 20 observations, each from a mean=5 with a standard deviation of 2
length(X)
mean(X)
median(X)
var(X)
summary(X)
```
####**Matrix calculations**
The same functions can be applied to matrices, but it is important to understand that with some of the functions, for example var(), the calculation is based on the columns and comparisons (covariance as one idea) between them.
```{r}
Mat2 <- matrix(Z,ncol=2,byrow=F)
Mat2
length(Mat2) # Less straightforward in 2 dimensions
dim(Mat2) # Rows and columns
mean(Mat2) # Can you see how this was calculated?
median(Mat2)
var(Mat2) # Notice now we are working in 2 dimensions for this calculation - variances-covariances-correlations
summary(Mat2) # By column
## Another example (much larger)
Mat3 <- matrix(seq(1,50),ncol=2,byrow=T)
Mat3
head(Mat3) # gives the first 6 rows by default
head(Mat3, n=10)
tail(Mat3) # gives the last 6 rows by default
```
####**Working with matrices - operations**
```{r, Matrices and general operators}
Mat2
5*Mat2
5+Mat2
Mat2[,1] <- Mat2[,1] + 100 # Changing just the first column of Mat2
Mat2
```
####**Data frames**
More commonly, we will employ a database created in another program, for example Excel. In this case, we are working with a data frame that has mixed information, such as alpha-numerica, data, etc. Nonetheless, we can handle this information in R like the previously examples. In this part, we will take a matrix and turn this into a data.frame, but after, we will see generically, examples of introducing the data from file.
```{r, Data frames}
disease
GreatData <- data.frame(disease)
GreatData
names(GreatData)
# Renaming columns
names(GreatData) <-c('Variety', 'Severity')
GreatData
GreatData$Variety
GreatData$Severity
```
**Integrating Functions**
In R, as well as in many other programming languages, we can combine functions to simply the number of lines of code.
```{r, Integrating functions - days after planting and 2 disease assessments}
dap<-c(7,14,21,28,35,42,49)
dis1<-c(0,5,7,25,55,60,75)
dis2<-c(3,14,33,50,65,75,78)
progress<-data.frame(cbind(dap,dis1,dis2)) #We combined data.frame() and cbind()
progress
class(progress)
```
####**Lists**
Understanding how R interprets and formats your data is critical since at times you will need to identify specific components of an output for further analyses.
```{r, Lists}
L28 <- list(c(1,2,3),1000,seq(1,2,.1))
L28
L28[[3]] # third component of list
L28[[3]][4] # fourth entry in third component of list
```
####**Illustration of a list of information**
```{r, General example illustrating list of information}
field.work=list(loc="Janesville",year=2010,field="Soybean",trts=c("A","B","C"),assess=c(7,14,21,28,35,42))
field.work
names(field.work)
field.work$field
```
####**Logical operators**
```{r, Logical operators}
8 < 10 # Try this
8 == 10 # The double equal signs are used for logical statements
8 != 10 # The exclamation point means ‘not’
X <- 1:10
X < 8
X < 8 & X > 3 # The ampersand means ‘and’, both must be true
X < 3 | X > 8 # The ‘|’ means ‘or’, either must be true
sum(X < 8)/10 * 100
```
####**Character Data**
```{r}
A1 <- c('Severity', 'Yield')
A1[1]
A2 <- paste('Disease', 'Severity')
A2
A3 <- paste('B', 1:10, sep='') # specifies no space between between the characters
A3
A4 <- paste('B', 1:10, sep='-') # a dash goes between the characters
A4
D1 <- 'Mississippi'
substring(D1, 1,4) # takes letters 1 through 4
C1 <- paste('B', 1:10, sep=' ')
C1
substring(C1,1,1)
substring(C1,1,2)
substring(C1,1,3)
substring(C1,1,4) #Where is the difference with the previous example?
```
**Indices for Selecting Subsets**
```{r, Indices and subsets}
# Suppose we have a set of labels for experimental units
D5n <- rep(1:10,3)
D5c <- c(rep('A',10), rep('B',10), rep('C',10))
D5 <- paste(D5c,D5n,sep='')
D5
# We can make an index to select only those in treatment A
Aindex <- substring(D5,1,1) == 'A'
# Suppose this is the corresponding list of yields
Yield <- 1:30
# We can apply the logical index to select only those yields corresponding to treatment A
Yield[Aindex]
```
####**Loops**
Many times we are interested in repeating some calculations. In R, there are many methods to do this, including the use of loops. We will also see some other methods that improve programming performance when we are interested in repeating a specific function many times.
```{r, Loops}
K1 <- c(4,2,8,5)
L1 <- c(1,3,4,2)
M1 <- 0*1:4 # This in the object where we will place the answer to our query
M1
# This loop finds the maximum of K1 and L1 at each position
for (j in 1:4){
M1[j] <- max(K1[j],L1[j])
}
M1
```
####**Apply a function**
While loops work well for some functions and programming, they are rather inefficient for large operations. In R, we can take advantage of the functions: apply(), lapply() and tapply().
Using the help() options, we can see that:
apply() = returns a vector or array or list of values obtained by applying a function to margins of an array or matrix
lapply() = returns a list of the same length of X, where each element is the result of applying a function to the corresponding element of X
tapply() = apply a function to each cell a ragged array, meaning that the function is applied to each, non-empty group of values given by a unique combination of the levels of certain factors
```{r, apply, lapply, tapply}
#apply() - works on rows or columns
group1<-rnorm(10,5,2)
group2<-rnorm(10,10,5)
group3<-rnorm(10,15,7)
example.apply<-cbind(group1,group2,group3)
apply(example.apply, MARGIN=2, mean)
apply(example.apply, MARGIN=2, sd)
apply(example.apply, MARGIN=2, function (x) sd(x)/mean(x))
#lapply() - works on a list
L28 <- list(c(1,2,3),1000,seq(1,2,.1))
L28
lapply(L28,mean)
#tapply() - works to summarize information by some defined factor
factor<-rep(c("A","B","C","D","E"), each=2)
tapply(example.apply[,1], factor, mean) #Summarizing the first column, by factor
tapply(example.apply[,1], factor, sd) #Summarizing the first column, by factor
tapply(example.apply[,1], factor, function (x) sd(x)/mean(x)) #CV by factor
```