TreeBasedTechniques.Rmd

---
title: "Decision Trees in R"
output:
  html_document: default
  html_notebook: default
---


### This is a article on how to implement Tree based Learning Technique in R to do Predictive Modelling.
 
Trees involve stratifying or sagmenting the Predictor($X_i$) space into a number of simple Regions.The tree based Methods generate a set of $Splitting \  Rules$ which are used to sagment the Predictor Space.These techniques of sagmenting and stratifying data into different Regions $R_j$ are called __Decision Trees__.Decision Trees are used in both Regression and Classification Problems.
These are Statistical Learning Techniques which are easier to understand and Simpler in terms of interpretablity.

The Rules generated are of form --

$$R_j = If (X = X_1 \cap X_2  \cap ....X_p) -->  Y_i $$.


-----------


#### Loading the required Packages

```{r,warning=FALSE,message=FALSE}

require(ISLR) #package containing data
require(ggplot2)
require(tree)

#Using the Carseats data set 

attach(Carseats)
?Carseats


```


Carseats is a simulated data set containing sales of child car seats at 400 different stores.


```{r}
#Checking the distribution of Sales

ggplot(aes(x = Sales),data = Carseats) + 
  geom_histogram(color="black",fill = 'purple',alpha = 0.6, bins=30) + 
  labs(x = "Unit Sales in Thousands", y = "Frequency")


```
As the histogram suggests - It is Normally distributed
Highest frequency of around 8000 Unit Sales


```{r}

#Making a Factor variable from Sales

HighSales<-ifelse(Sales <= 8,"No","Yes")
head(HighSales)

#Making a Data frame
Carseats<-data.frame(Carseats,HighSales)


```


----------


## Fitting a Binary Classification Tree


Now we are going to fit a __Tree__ to the Carseats Data to predict if we are going to have High Sales or not.The __tree()__ function uses a *__Top-down Greedy__* approch to fit a Tree which is also known as *__Recursive Binary Splitting__*.It is Greedy because it dosen't finds the best split amongst all possible splits,but only the best splits at the immediate place its looking i.e the best Split at that particular step.

```{r}
#We will use the tree() function to fit a Desicion Tree
?tree

#Excluding the Sales atrribute
CarTree<-tree(HighSales ~ . -Sales , data = Carseats,split = c("deviance","gini"))
#split argument split	to specify the splitting criterion to use.

CarTree #Outputs a Tree with various Splits at different Variables and Response at Terminals Nodes
#The numeric values within the braces are the Proportions of Yes and No for each split.

#Summary of the Decision Tree
summary(CarTree)


```

The summary of the Model consists of the imporatant variables used for splitting the data which minimizes the deviance(Error) Rate and another Splitting criterion used is __Gini Indes__, which is also called *Purity Index*.


---------------


#Plotting the Decision Tree


```{r}
plot(CarTree)
#Adding Predictors as text to plot
text(CarTree ,pretty = 1 )

```

This tree is quiet Complicated and hard to understand due to lots of Splits and lots of variables included in the predictor space.The leaf nodes consists of the Response value i.e __Yes / No __.


---------------


### Splitting data to Training and Test Set


```{r}
set.seed(1001)
#A training sample of 250  examples sampled without replacement
train<-sample(1:nrow(Carseats), 250)
#Fitting another Model
tree1<-tree(HighSales ~ .-Sales , data = Carseats, subset = train)
summary(tree1)
#Plotting
plot(tree1);text(tree1)

```
Now the tree is somewhat different and detailed but is quiet hard to interpret too due to lots of splits.


__Predicting on Test Set__


```{r}
#Predicting the Class labels for Test set
pred<-predict(tree1, newdata = Carseats[-train,],type = "class")
head(pred)

#Confusion Matrix to check number of Misclassifications
with(Carseats[-train,],table(pred,HighSales))

#Misclassification Error Rate on Test Set
mean(pred!=Carseats[-train,]$HighSales)


```
The __Diagonals__ are the correctly classified Test Examples , whereas the __off-diagonals__ represent the misclassified examples.The Mean Error Rate is $\text{26%}$.


The above tree was grown to Full length and might have lots of variables in it which might be degrading the Perfomance.We will now use 10 fold __Cross Validation__ to *Prune* the Tree.


------


## Pruning The tree using Cross Validation


```{r}
#10 fold CV
#Performing Cost Complexity Pruning
cv.tree1<-cv.tree(tree1, FUN=prune.misclass)
cv.tree1
plot(cv.tree1)
#Deviance minimum for tree size 15 i.e 15 Splits 

prune.tree1<-prune.misclass(tree1,best = 15)
plot(prune.tree1);text(prune.tree1)

```


__Testing the pruned Tree on Test Set__ -

```{r}
pred1<-predict(prune.tree1 , Carseats[-train,],type="class")

#Confusion Matrix
with(Carseats[-train,],table(pred1,HighSales))

#Misclassification Rate
ErrorPrune<-mean(pred1!=Carseats[-train,]$HighSales)
ErrorPrune
#Error reduced to 25 %


```


------------


## Conclusion


As we can notice by the perfomance on Test Set the Pruned Tree dosen't performs better as the Error rate reduced only by a factor of 0.1 % i.e from 26% to 25%.
It's just that Pruning lead us to a more __simpler__ Tree with *lesser Splits and a subset of predictors* which is somewhat easier to interpret and understand.


Usually *Trees* don't actually give good perfomance on Test Sets , and is called a __Weak Learner__.

Applying Ensembling Techniques such as __Random Forests , Bagging and Boosting__ improves the Perfomance of Trees a lot by combining a lot of Trees trained on samples from training examples and finally *__combining(averaging)__* the Trees to form a single Strong Tree which performs nicely.


Hope you guys liked the article , make sure to share and like it.