Intro_R.Rmd

% Basic introduction to R

Some online resources:

http://www.introductoryr.co.uk/R_Resources_for_Beginners.html

http://www.statmethods.net/index.html

http://www.ats.ucla.edu/stat/r/

http://www.r-tutor.com/r-introduction

####**R as a calculator**

R has basics operations to add, subtract, multiply and divide. Also, R defines certain calculations, such as pi, based on alpha-numeric nomenclature.

```{r, R as a basic calculator}
7+2
5/2
6*9
1000-89
12/pi
```

####**Creating Objects**
This first step enables us to start creating data vectors, or more simply a list of information of interest.

```{r, Creating objects in R}
X <- 43 
X
x <- 23 
x
ls() ### To see that we have both "X" and "x"

Y <- c(3,2,6) 
Y
Y2 <- c(3,X,x)
Y2
Y3 <- c(Y,Y2)
Y3 

```

####**Sequences of numbers**

The following code illustrates different examples of how one can define a list (vector) of information.
```{r}
Set1 <- c(1,2,3,4,5,6,7,8,9)
Set2 <- 1:9
Set3 <- seq(1,10)
Set4 <- seq(1,10,0.5)
### Set4 comes from the following code, more formally defined: seq(from=1,to=10,by=0.5)
Set4

Set1
Set2
Set3

```

####**Vectors of data and information**

```{r, Vectors}
X <- c(4,2,7)
X2 <- seq(3,6)

# Combining vectors of information: cbind (column) or rbind (row)
X <- c(1,2,3)
Y <- c(7,8,9)
Z <- cbind(X,Y)
Z
Z2 <- rbind(X,Y)
Z2

# Understanding how R interprets different types of vectors
example=c(1,2,3,4,5,6,7,8,9,10)
class(example) #numeric

new.example=c("A","B","C","D")
class(new.example) #character

```

####**Matrices and arrays**
Matrices and arrays are ways to organize data into a collection of data entries (rows and columns, along with subgroups of information). 

```{r}
# Matrix
Mat1 <- matrix(1,ncol=3,nrow=4) #Matrix of 1's, with 4 rows and 3 columns
Mat1

#Let's use X and Y previously defined to make a matrix of data
Z <- c(X,Y)
Z
Mat2 <- matrix(Z,ncol=2,byrow=F)
Mat2
class(Mat2)

```

####**Creating a matrix using "plant pathological" data**

```{r, Example}
disease=matrix(c(1,1,1,1,2,2,2,2,5,15,35,55,11,30,61,75),ncol=2,nrow=8)
colnames(disease)=c("Trt","Sev")
disease
is.matrix(disease)
```

**Evaluating a vector - basic methods**
Basic calculations like:
mean()
median()
var()
summary() - Notice here we are provided with a basic 5 (6) number summary.

```{r, Calculations made on a matrix}
X <- rnorm(20,mean=5,sd=2) 
#rnorm() generates a random vector of 20 observations, each from a mean=5 with a standard deviation of 2

length(X)
mean(X)
median(X)
var(X)
summary(X)

```

####**Matrix calculations**

The same functions can be applied to matrices, but it is important to understand that with some of the functions, for example var(), the calculation is based on the columns and comparisons (covariance as one idea) between them. 

```{r}
Mat2 <- matrix(Z,ncol=2,byrow=F)
Mat2

length(Mat2) # Less straightforward in 2 dimensions
dim(Mat2) # Rows and columns

mean(Mat2) # Can you see how this was calculated?
median(Mat2)

var(Mat2) # Notice now we are working in 2 dimensions for this calculation - variances-covariances-correlations
summary(Mat2) # By column

## Another example (much larger)
Mat3 <- matrix(seq(1,50),ncol=2,byrow=T)
Mat3

head(Mat3) # gives the first 6 rows by default
head(Mat3, n=10)

tail(Mat3) # gives the last 6 rows by default

```

####**Working with matrices - operations**

```{r, Matrices and general operators}
Mat2

5*Mat2
5+Mat2

Mat2[,1] <- Mat2[,1] + 100 # Changing just the first column of Mat2
Mat2

```

####**Data frames**

More commonly, we will employ a database created in another program, for example Excel. In this case, we are working with a data frame that has mixed information, such as alpha-numerica, data, etc. Nonetheless, we can handle this information in R like the previously examples. In this part, we will take a matrix and turn this into a data.frame, but after, we will see generically, examples of introducing the data from file.

```{r, Data frames}
disease 

GreatData <- data.frame(disease)
GreatData
names(GreatData)

# Renaming columns
names(GreatData) <-c('Variety', 'Severity')
GreatData
GreatData$Variety
GreatData$Severity
```

**Integrating Functions**
In R, as well as in many other programming languages, we can combine functions to simply the number of lines of code.
```{r, Integrating functions - days after planting and 2 disease assessments}
dap<-c(7,14,21,28,35,42,49)
dis1<-c(0,5,7,25,55,60,75)
dis2<-c(3,14,33,50,65,75,78)

progress<-data.frame(cbind(dap,dis1,dis2)) #We combined data.frame() and cbind()
progress
class(progress)

```

####**Lists**

Understanding how R interprets and formats your data is critical since at times you will need to identify specific components of an output for further analyses.

```{r, Lists}
L28 <- list(c(1,2,3),1000,seq(1,2,.1))
L28

L28[[3]] # third component of list

L28[[3]][4] # fourth entry in third component of list

```

####**Illustration of a list of information**

```{r, General example illustrating list of information}
field.work=list(loc="Janesville",year=2010,field="Soybean",trts=c("A","B","C"),assess=c(7,14,21,28,35,42))

field.work
names(field.work)
field.work$field

```

####**Logical operators**

```{r, Logical operators}
8 < 10 # Try this
8 == 10 # The double equal signs are used for logical statements
8 != 10 # The exclamation point means ‘not’
X <- 1:10
X < 8
X < 8 & X > 3 # The ampersand means ‘and’, both must be true
X < 3 | X > 8 # The ‘|’ means ‘or’, either must be true
sum(X < 8)/10 * 100 

```

####**Character Data**

```{r}
A1 <- c('Severity', 'Yield') 
A1[1]
A2 <- paste('Disease', 'Severity')
A2
A3 <- paste('B', 1:10, sep='') # specifies no space between between the characters
A3
A4 <- paste('B', 1:10, sep='-') # a dash goes between the characters
A4

D1 <- 'Mississippi'
substring(D1, 1,4) # takes letters 1 through 4

C1 <- paste('B', 1:10, sep=' ') 
C1

substring(C1,1,1)
substring(C1,1,2)
substring(C1,1,3)
substring(C1,1,4) #Where is the difference with the previous example?
```

**Indices for Selecting Subsets**
```{r, Indices and subsets}
# Suppose we have a set of labels for experimental units
D5n <- rep(1:10,3) 
D5c <- c(rep('A',10), rep('B',10), rep('C',10))
D5 <- paste(D5c,D5n,sep='')
D5

# We can make an index to select only those in treatment A
Aindex <- substring(D5,1,1) == 'A'
# Suppose this is the corresponding list of yields
Yield <- 1:30
# We can apply the logical index to select only those yields corresponding to treatment A
Yield[Aindex]

```

####**Loops**

Many times we are interested in repeating some calculations. In R, there are many methods to do this, including the use of loops. We will also see some other methods that improve programming performance when we are interested in repeating a specific function many times.

```{r, Loops}

K1 <- c(4,2,8,5)
L1 <- c(1,3,4,2)
M1 <- 0*1:4  # This in the object where we will place the answer to our query
M1

# This loop finds the maximum of K1 and L1 at each position
for (j in 1:4){
  M1[j] <- max(K1[j],L1[j])
}

M1

```

####**Apply a function**

While loops work well for some functions and programming, they are rather inefficient for large operations. In R, we can take advantage of the functions: apply(), lapply() and tapply().

Using the help() options, we can see that:

apply() = returns a vector or array or list of values obtained by applying a function to margins of an array or matrix

lapply() = returns a list of the same length of X, where each element is the result of applying a function to the corresponding element of X

tapply() = apply a function to each cell a ragged array, meaning that the function is applied to each, non-empty group of values given by a unique combination of the levels of certain factors

```{r, apply, lapply, tapply}

#apply() - works on rows or columns

group1<-rnorm(10,5,2)
group2<-rnorm(10,10,5)
group3<-rnorm(10,15,7)

example.apply<-cbind(group1,group2,group3)
apply(example.apply, MARGIN=2, mean)
apply(example.apply, MARGIN=2, sd)
apply(example.apply, MARGIN=2, function (x) sd(x)/mean(x))

#lapply() - works on a list
L28 <- list(c(1,2,3),1000,seq(1,2,.1))
L28
lapply(L28,mean)

#tapply() - works to summarize information by some defined factor
factor<-rep(c("A","B","C","D","E"), each=2)
tapply(example.apply[,1], factor, mean) #Summarizing the first column, by factor
tapply(example.apply[,1], factor, sd) #Summarizing the first column, by factor
tapply(example.apply[,1], factor, function (x) sd(x)/mean(x)) #CV by factor

```