-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathresidue.Rnw
201 lines (175 loc) · 7.54 KB
/
residue.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
%!Rnw root = Geyser-Analysis.Rnw
%!TeX root = Geyser-Analysis.Rnw
\section{Model implementation and residues}
\label{sec:models}
In the following we show the implementation of each of the used models, the residues after the fit, the strategy for obtaining the initial parameters and the values of the estimated parameters.
\subsection{Logistic interpolation with regressor: duration}
This model is just a standard implementation of a logistic function interpolating between two plateaus. The parameters of the exponential function, which determine the transition region and speed are fixed and taken from the fit of the duration dependent waiting time distribution (c.f.\ Appendix~\ref{sec:density}). Thus the implementation takes the form:
<<log1_print>>=
lg1.wait <- createlog(dwait2.p1[1:2])
getAnywhere(lg1.wait)
@
As initial parameters for the plateaus we chose the mean of the two peaks in the waiting time distribution (again, c.f.\ Appendix~\ref{sec:details}). The result is shown in Table~\ref{tab:log1}. The residue of the fit is depicted in Figure~\ref{fig:residue1} (left) on page~\pageref{fig:residue1}. All in all there is not much structure in the residues except for a few outliers which are easily identified with the outliers in the waiting time distribution.
\begin{table}[htbp]
\centering
\sisetup{
table-number-alignment = right,
table-figures-integer = 2,
table-figures-decimal = 2
}
\begin{tabular}{r
S[table-auto-round]
S[table-auto-round]}
\toprule
& {Param1} & {Param2}\\
\midrule
Initial & \Sexpr{round(log1.p0[1],2)} & \Sexpr{round(log1.p0[2],2)}\\
Terminal & \Sexpr{round(log1.p1[1],2)} & \Sexpr{round(log1.p1[2],2)}\\
\bottomrule
\end{tabular}
\caption{Parameters of the logistic interpolation model}
\label{tab:log1}
\end{table}
\subsection{Piecewise linear model with regressor: duration}
This model is very close to the official predictive model. The only difference is, that it respects the two clusters in the wating time duration plot. Thus we get two linear models, one for each cluster. Implemented this looks as follows:
<<plm1_print, eval=FALSE, echo=TRUE>>=
<<plm1_dur_function>>
@
As the model is adapted from the official model. We use the parameters of it as initial parameters for our fit. The result of our regression can be found in Table~\ref{tab:plm1} and the residues are in Figure~\ref{fig:residue1} (right). Again we see almost no structure, except for the same outliers as in the model before.
\begin{table}[htbp]
\centering
\sisetup{
table-number-alignment = right,
table-figures-integer = 2,
table-figures-decimal = 2
}
\begin{tabular}{r
S[table-auto-round]
S[table-auto-round]
S[table-auto-round]
S[table-auto-round]}
\toprule
& {Param1} & {Param2} & {Param3} & {Param4}\\
\midrule
Initial & \Sexpr{round(plm1.p0[1],2)} & \Sexpr{round(plm1.p0[2],2)}
& \Sexpr{round(plm1.p0[3],2)} & \Sexpr{round(plm1.p0[4],2)}\\
Terminal & \Sexpr{round(plm1.p1[1],2)} & \Sexpr{round(plm1.p1[2],2)}
& \Sexpr{round(plm1.p1[3],2)} & \Sexpr{round(plm1.p1[4],2)}\\
\bottomrule
\end{tabular}
\caption{Parameters of the piecwise linear (dur) model}
\label{tab:plm1}
\end{table}
\begin{figure}[htbp]
\setkeys{Gin}{width=0.45\textwidth}
\centering
<<log1_plot, fig = TRUE>>=
plot(g$waiting[-1] - lg1.wait(g$duration[-l], log1.p1),
ylab = "Residue")
@
\hspace{1cm}
<<plm1_plot, fig=TRUE>>=
plot(g$waiting[-1] - plm1.dur(g$duration[-l], plm1.p1), ylab = "Residue")
@
\caption{Residuals of Model 1 (left) and Model 2 (right)}
\label{fig:residue1}
\end{figure}
\subsection{Linear model with predictors: duration \& waiting time}
This model is a slight adaption of the official predictive model, too. Instead of taking the clustering into account we add the last waiting time as a regressor. This leads to the following implementation:
<<lm1_print, eval=FALSE, echo=TRUE>>=
<<lm1_durwait_function>>
@
Because of the close relation to the park model, we use its parameters as inital parameters. For the waiting time coefficient we use 0 as we suspect it will only act as a small correction. The results can be found in Table~\ref{tab:lm1} and the residues are depicted in Figure~\ref{fig:residue2} on page~\pageref{fig:residue2}. As before, there is almost no structure in the residues, except for the known outliers.
\begin{table}[htbp]
\centering
\sisetup{
table-number-alignment = right,
table-format = -2.2
}
\begin{tabular}{r
S[table-auto-round]
S[table-auto-round]
S[table-auto-round]}
\toprule
& {Param1} & {Param2} & {Param3} \\
\midrule
Initial & \Sexpr{round(lm1.p0[1],2)} & \Sexpr{round(lm1.p0[2],2)}
& \Sexpr{round(lm1.p0[3],2)}\\
Terminal & \Sexpr{round(lm1.p1[1],2)} & \Sexpr{round(lm1.p1[2],2)}
& \Sexpr{round(lm1.p1[3],2)}\\
\bottomrule
\end{tabular}
\caption{Parameters of the linear (dur+wait) model}
\label{tab:lm1}
\end{table}
\subsection{Piecewise linear model with predictors: duration \& waiting time}
This model is a refinement of the previous one. Again we use duration and waiting time as predictors, yet this time we apply a linear model to each of the three identified clusters separately. This increases the number of parameters to 9 and the code takes the following form:
<<plm2_print, eval=FALSE, echo=TRUE>>=
<<plm2_durwait_function>>
@
Because of the close relation to the park model. We use its parameters as inital parameters. For the different waiting times we use 0, as we suspect only a small corrective effect. The result is summarized in Table~\ref{tab:plm2} and the residues can be found in Figure~\ref{fig:residue2} (right). In this case there is a little more structure than in the other cases and we have the same outliers. Additionally we see that we miss more eruptions than in the other models. Together with the high number of necessary parameters, these are indicators against the use of this model.
\begin{table}[htbp]
\centering
\sisetup{
table-number-alignment = right,
table-format = -2.2
}
\begin{tabular}{r
S[table-auto-round]
S[table-auto-round]
S[table-auto-round]
S[table-auto-round]}
\toprule
& {Param1} & {Param2} & {Param3} & {Param4}\\
\midrule
Initial & \Sexpr{round(plm2.p0[1],2)} & \Sexpr{round(plm2.p0[2],2)}
& \Sexpr{round(plm2.p0[3],2)} & \Sexpr{round(plm2.p0[4],2)}\\
Terminal & \Sexpr{round(plm2.p1[1],2)} & \Sexpr{round(plm2.p1[2],2)}
& \Sexpr{round(plm2.p1[3],2)} & \Sexpr{round(plm2.p1[4],2)}\\
\bottomrule
\end{tabular}
\vphantom{h}
\vspace{0.3cm}
\vphantom{h}
\begin{tabular}{S[table-auto-round]
S[table-auto-round]
S[table-auto-round]
S[table-auto-round]
S[table-auto-round]}
\toprule
{Param5} & {Param6} & {Param7} & {Param8} & {Param9}\\
\midrule
\Sexpr{round(plm2.p0[5],2)} & \Sexpr{round(plm2.p0[6],2)}
& \Sexpr{round(plm2.p0[7],2)} & \Sexpr{round(plm2.p0[8],2)}
& \Sexpr{round(plm2.p0[9],2)}\\
\Sexpr{round(plm2.p1[5],2)} & \Sexpr{round(plm2.p1[6],2)}
& \Sexpr{round(plm2.p1[7],2)} & \Sexpr{round(plm2.p1[8],2)}
& \Sexpr{round(plm2.p1[9],2)}\\
\bottomrule
\end{tabular}
\caption{Parameters of the piecwise linear (dur+wait) model}
\label{tab:plm2}
\end{table}
\begin{figure}[htbp]
\setkeys{Gin}{width=0.45\textwidth}
\centering
<<lm1_plot, fig = TRUE>>=
plot(g$waiting[-1] - lm1.durwait(data.frame(d = g$duration[-l],
w = g$waiting[-l]),
lm1.p1), ylab="Residue")
@
\hspace{1cm}
<<plm2_plot, fig=TRUE>>=
res <- c()
y <- g$waiting[-1]
for(i in 1:length(y)) {
tmp <- y[i] - plm2.durwait(list(d = g$duration[i],
w = g$waiting[i]),
plm2.p1)
res <- append(x = res, values = tmp)
}
plot(res, ylab = "Residue")
@
\caption{Residuals of Model 3 (left) and Model 4 (right)}
\label{fig:residue2}
\end{figure}