-
Notifications
You must be signed in to change notification settings - Fork 25
/
Copy pathBoostingDoc.Rmd
116 lines (59 loc) · 3.72 KB
/
BoostingDoc.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: "Boosting in R"
output:
html_document: default
html_notebook: default
---
## Boosting
Random Forests are actually used to reduce the variance of the Trees by averaging them. So it generates big Bushy trees and then averages them to get rid of variance.
__Boosting__ on other hand generates smaller simpler trees and goes at the *__Bias__*.So the Idea in Boosting is to convert a *__Weak learner__* to a *__Strong Learner__* by doing *weighted averaging* of lots of Models generated on Harder Examples and using the Information from a previous Model.
Harder Examples in the sense means the training Examples which were not classified correctly or more generally which were not predicted correctly by the previous Model.
Boosting is a Sequential Method. Each tree that's added into the mix is added to improve the perfomance of previous collection of Trees.
-----
###Implementing Gradient Boosting in R using gbm package
'gbm' package is the Gradient Boosting Package.
```{r,warning=FALSE,message=FALSE}
require(gbm)
require(MASS)
```
Building the Boosted Trees on Boston Housing Dataset.
```{r}
Boston.boost<-gbm(medv ~ .,data = Boston[-train,],distribution = "gaussian",n.trees = 10000,
shrinkage = 0.01, interaction.depth = 4)
Boston.boost
summary(Boston.boost) #Summary gives a table of Variable Importance and a plot of Variable Importance
```
The above Boosted Model is a Gradient Boosted Model which generates 10000 trees and the shrinkage parameter $\lambda= 0.01$ which is also a sort of __Learning Rate__. Next parameter is the interaction depth which is the total *splits* we want to do.So here each tree is a small tree with only 4 splits.
The summary of the Model gives a *__Feature importance Plot__* . And the 2 most important features which explaines the maximum variance in the Data set is 'lstat' and 'rm'.
-----
###Let's plot the Partial Dependence Plots
The partial Dependence Plots will tell us the relationship and dependence of the variables with the Response variable.
```{r}
plot(Boston.boost,i="lstat") #Plot of Response variable with lstat variable
#Inverse relation with lstat variable ie
plot(Boston.boost,i="rm")
#as the average number of rooms increases the the price increases
```
In the above plots, the y-axis contains the Response values and the x-axis contains the variable values.So 'medv' is inversely related to the 'lstat' variable , and the 'rm' variable is related directly to 'medv'.
------
### Prediction on Test Set
We will compute the Test Error as a function of number of Trees.
```{r}
n.trees = seq(from=100 ,to=10000, by=100) #no of trees-a vector of 100 values
#Generating a Prediction matrix for each Tree
predmatrix<-predict(Boston.boost,Boston[-train,],n.trees = n.trees)
dim(predmatrix) #dimentions of the Prediction Matrix
#Calculating The Mean squared Test Error
test.error<-with(Boston[-train,],apply( (predmatrix-medv)^2,2,mean))
head(test.error)
#Plotting
plot(n.trees , test.error , pch=19,col="blue",xlab="Number of Trees",ylab="Test Error", main = "Perfomance of Boosting on Test Set")
#adding the RandomForests Minimum Error line
abline(h = min(test.err),col="red")
legend("topright",c("Minimum Test error Line for Random Forests"),col="red",lty=1,lwd=1)
```
Boosting outperforms Random Forests on same Test dataset with lesser Mean squared Test Errors.
-----
###Conclusion
In the above plot we can notice that if Boosting is done properly by selecting appropiate Tuning parameters such as Shrinkage parameter $\lambda$ and secondly the Number of Splits we want , then it can outperform Random Forests most of the times.
Both methods are amazingly good Ensembling Techniques and reduce Overfitting and improve the perfomance of Statistical Models.