-
Notifications
You must be signed in to change notification settings - Fork 25
/
GAM.Rmd
124 lines (64 loc) · 3.41 KB
/
GAM.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
title: "Generalized Additive Models"
author: "Anish Singh Walia"
date: "June 27, 2017"
output: html_document
---
$\text{This article is going to talk about Generalized Additive Models and their implementation in R.}$
This is also a famous and very flexible technique of fitting and Modelling Non Linear Functions which are more flexible and fits data well.
In this technique we simply add __Non linear Functions__ on different variables to the Regression equation.
$\text{That Non linear function can be anything - Cubic Spline , natural Spline ,Smoothing Splines and even polynomial function}$
$$f(x) = y_i = \alpha \ + \ f_1(x_1) \ + f_2(x_2) + \ .... + \ f_p(x_p) \ + \epsilon_i $$
$$\text{where} \ f_p(x_p) \ \text {is a Non Linear function on} \ x_p \ variables.$$
Requiring the __'gam'__ package which helps in fitting *__Generalized Additive Models__*.
```{r,message=FALSE,warning=FALSE , message=FALSE, warning=FALSE}
#requiring the Package
require(gam)
#ISLR package contains the 'Wage' Dataset
require(ISLR)
attach(Wage) #Mid-Atlantic Wage Data
?Wage # To search more on the dataset
gam1<-gam(wage~s(age,df=6)+s(year,df=6)+education ,data = Wage)
#in the above function s() is the shorthand for fitting smoothing splines in gam() function
summary(gam1)
#Plotting the Model
par(mfrow=c(1,3))
plot(gam1,se = TRUE)
```
In the above Plots the Y-axis contains the Non Linear functions and x-axis contains the Predictors used in the Model and the dashed lines Represent the __Standard Error bands__.The Whole Model is *__Additive__* in nature.
$$\textbf {The Curvy plots shows that the functions are Non linear in nature}$$
---
### We can also fit a Logistic Regression Model using gam()
```{r}
#logistic Regression Model
gam2<-gam(I(wage >250) ~ s(age,df=4) + s(year,df=4) +education , data=Wage,family=binomial)
plot(gam2,se=T)
```
####So we are plotting the logit of Probabilities of each variable as a saperate function but on the whole additive in nature.
---
###Now we can also check if we need Non linear Terms for Year variable or not?
```{r}
#fitting the Additive Regression Model which is linear in Year
gam3<-gam(I(wage >250) ~ s(age,df=4)+ year + education , data =Wage, family = binomial)
plot(gam3)
#anova() function to test the goodness of fit and choose the best Model
#Using Chi-squared Non parametric Test due to Classification Problem and categorial Target
anova(gam2,gam3,test = "Chisq")
```
$$\text {The plot for the Year is a straight Line i.e it is Linear function in Year.}$$
As the above Test indicates that Model with __Non linear terms for Year__ is not Significant.So we can neglect that Model.
###Now we can also fit a Additive Model using lm() function
```{r}
lm1<-lm(wage ~ ns(age,df=4) + ns(year,df=4)+ education , data = Wage)
#ns() is function used to fit a Natural Spline
lm1
#Now plotting the Model
plot.gam(lm1,se=T)
#Hence the Results are same
```
####So by using the lm() function too we can fit a Genaralized Additive Model.
---
## Conclusion
####Hence GAMs are a very nice technique and method to Model Non linearities and Learn complex function other than just Linear functions.They are easily interpretable too.
####And the most basic idea behind learning Non Linearities is to transform the Data and the variables which can capture and Learn and make sense of something more complicated than just a linear relationship.
$$\text {Because the truth is not always "Linear"}$$