-
Notifications
You must be signed in to change notification settings - Fork 1
/
experiment.tex
638 lines (503 loc) · 23.6 KB
/
experiment.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
\chapter{Anomaly Detection Using Recurrent Neural Networks}
\label{ch:results}
\section{Introduction}
Chapters \ref{ch:ad} and \ref{ch:rnn}, separately, introduced anomaly detection in time series and recurrent neural networks as time series modelers.
%
This chapter outlines and tests a procedure for finding anomalies using RNNs with the goal of mitigating many of the problems associated with anomaly detection that were discussed in Chapters \ref{ch:ad} and \ref{ch:adtechnique}.
%
As much as possible, the same procedure is applied to each test time series to test the procedure's generality.
The procedure outline describes:
%
\begin{description}
%
\item[sampling:] the time series used for training and its associated sampling process
%
\item[recurrent autoencoders:] the specific form of the RNN
%
\item[training:] the training algorithm used on the RNN
%
\item[Bayesian optimization:] the search for optimized parameters for the procedure
%
\end{description}
%
After describing the procedure, the anomaly scores of the test time series are evaluated for their ability to detect anomalies.
\section{Sampling: Sliding Windows}
\label{sec:sampling}
Some univariate time series \emph{with} anomalies were chosen or generated to test the anomaly detection process named as follows:
\begin{description}
\item[spike:] two generated sequences that are simply evenly-spaced `spikes' of the same height.
\item[sine:] a generated sinusoidal signal.
\item[power demand:] a building's electrical power usage data%
\footnote{\url{http://www.cs.ucr.edu/~eamonn/discords/power_data.txt}} referenced in the influencial HOT-SAX paper \cite{Keogh2005} (for anomaly detection in time series). The data was scaled down by dividing it by its median (to help in the training process).
\item[electrocardiogram (ECG):] a signal from the human heart taken from the PhysioNet \cite{PhysioNet} database. Finding anomalies in ECG data has been used as a benchmark in several studies at least \cite{Malhotra2015,Keogh2005,Cheboli2010,li2012dimensionality,Wei2005,Chuah2007,Jones2014}.
\item[polysomnography ECG (PSG-ECG):] human ECG data \cite{PhysioNet} taken during a sleep study%
\footnote{University College Dublin Sleep Apnea Database, \url{http://www.physionet.org/physiobank/database/ucddb/}, doi:10.13026/C26C7D, study id. ucddb002}%
.
\end{description}
\noindent
%
The normal behaviour, as well as anomalous behaviour, of these time series can be seen in the plots presented in Section \ref{sec:results}.%
%todo: along with an explanation of why it was chosen
While there is only one `source' time series to train on (in each test), the RNN needs to see many `samples' from the source time series.
%
The sliding window procedure, described in Section \ref{sec:adsample}, is used to get samples of some window width.
%
Additionally, since RNNs do not need a fixed window, sliding windows are obtained for more than one window width.
%
Furthermore, the samples are collected into `mini-batches' (of the same window width) to be more compatible with the training procedure.
%
The window width is incremented (width skip) from some minimum until it is not possible to obtain a mini-batch (so the largest window will be close to the length of the time series).
Table \ref{tbl:winspec} specifies the relevant sizes and increments for the sampling process.
%
The values were adjusted manually until a `reasonable' number of samples were found.
%
However, the minimum window length was chosen such that meaningful dynamics were captured (regardless of the scale of the anomaly).
%todo: why extra line ???
\begin{table}[H]
\centering
\caption[Time series sample specifications]{Time series sample specifications} %todo: what case?
\csvreader[
% need p b/c the below \ifthenelse 'filters' make the tbl cell ugly
tabular=|p{.75in}||c|c|c|c|c||c|
,table head=
\hline
\bfseries series
& length
& min. win.
& slide skip
& width skip
& batch size
& $\Rightarrow$ \bfseries samples
\\ \hline \hline
,late after line=\\\hline
]%
{tbls/sampling.csv}%
{
batch_size=\bs
,length=\ln
,min_winsize=\mw
,name=\nm
,nsamples=\ns
,slide_jump=\ss
,winsize_jump=\wss
}%
{
%these act like filters. so you have to explicitly get them
\ifthenelse{ \equal{\nm}{spikelv}}{\noindent spike-1}{}
\ifthenelse{ \equal{\nm}{spikereg}}{\noindent spike-2}{}
\ifthenelse{ \equal{\nm}{sin}}{\noindent sine}{}
\ifthenelse{ \equal{\nm}{power}}{\noindent power}{}
\ifthenelse{ \equal{\nm}{ecg}}{\noindent ECG}{}
\ifthenelse{ \equal{\nm}{sleep}}{\noindent PSG-ECG}{}
& \ln & \mw & \ss & \wss & \bs & \ns
}
\label{tbl:winspec}
\end{table}
\section{RNN Setup: Autoencoder}
The RNN needs to learn what normal time series behavior is.
%
So an autoencoder is used which can learn expected behavior by setting the target, $\vc{y}$, to be the input, $\vc{x}$.
%
The loss function is the MSE (Equation \ref{eqn:mse}).
%
Furthermore, to prevent the RNN from learning trivial identity functions, Gaussian noise is added to the input where the standard deviation is equal to 0.75 times the standard deviation of the whole time series.
%
\begin{equation*}
\tilde{\x} = \x
+ \mathcal{N}(0,(0.75\sigma_{\mathrm{std}}(\x))^2)
\end{equation*}
%
Note the comparison in the loss function is between the uncorrupted signal, $\x$, and the output from the network, $\vc{o}$, given $\x$: $L(\vc{o}(\tilde{\x}),\x)$.
With this setup, a denoising autoencoder, the data generating process, $p(\x)$, is implicitly learned \cite{Bengio2013}.
\section{Training: RMSprop}
\label{sec:training}
SGD with RMSprop \cite{Tieleman2012} for parameter updates has been demonstrated to provide results that are similar to more sophisticated second-order methods but with significantly less computational cost \cite{Dauphin}.
%
Another benefit of RMSprop is that it is designed to work with mini-batch learning.
%
Since the training data is highly redundant (from sliding windows), it is expected that computing the gradient (to update parameters) from a subset of the data (mini-batch) is more efficient than computing the gradient for all the data.
For each derivative in the gradient, RMSprop keeps an exponentially-weighted moving average of derivative magnitudes which normalizes the derivative by its root-mean-squared.
%
More specifically, the (overall) RNN training procedure is outlined in the following box.
\begin{algorithm}{Training Procedure}
\begin{algorithmic}
\STATE \STATE
\STATE \COMMENT{obtain mini-batches (as Section \ref{sec:sampling})}
\STATE $\mathcal{X} \gets \mathrm{sampling}(\x)$
\STATE \STATE
\STATE \COMMENT{randomly split mini-batches into training and validation sets}
\STATE $\mathcal{X}_t \gets \mathrm{choose(75\%,\mathcal{X})}$
\STATE $\mathcal{X}_v \gets \mathcal{X} - \mathcal{X}_t$
\STATE \STATE
\STATE \COMMENT{initialize}
\STATE $\vc{\theta} \gets \vc{0}$
\COMMENT{initial RNN parameters are 0}
\STATE \COMMENT{set training parameters}
\STATE $\alpha = 10^{-4}$
\COMMENT{learning rate}
\STATE $h = 14$
\COMMENT {RMS `halflife'}
\STATE $\gamma \gets e^{\frac{-\ln{2}}{h}}$
\STATE $r = 10^{-8}$
\COMMENT{RMS regularizer}
\STATE patience = 5
\COMMENT{stopping criterion parameter}
\STATE min\_improvement = $.005$
\COMMENT{stopping criterion parameter}
\STATE \STATE
%\STATE $i \gets 1$
\FOR{epoch $i$}
\STATE{ $\mathcal{X}_t \gets \mathrm{shuffle}(\mathcal{X}_t)$}
\FOR{mini-batch $M$ in $\mathcal{X}_t$ }
\FOR{parameter $p$ in $\vc{\theta}$}
\STATE{} \COMMENT{per-parameter, update $p$ according to RMSprop \cite{Graves2013b}
(indices to $p$ omitted)}
\STATE{
\begin{flalign*}
f_{i+1}
&\gets
\gamma f_i
+
(1 - \gamma)
\frac{\partial{L}}{\partial{p}} &&
\\
g_{i+1}
&\gets
\gamma g_i
+
(1 - \gamma)
\left(
\frac{\partial{L}}{\partial{p}}
\right)^2 &&
\\
p_{i+1}
&\gets
p_i
-
\frac{\alpha}{
\sqrt{g_{t+1}
- f_{t+1}^2
+ r}
}
\frac{\partial{L}}{\partial{p}} &&
\end{flalign*}
}
\ENDFOR
\ENDFOR
\STATE \COMMENT{compute average loss on validation set}
\STATE{
$
v_i
\gets
\frac{1}{|\mathcal{X}_v|}
\sum_{\x_v \in \mathcal{X}_v}
L(\x_v,\vc{o})
$
}
\STATE \COMMENT{stop when no improvement more than patience times}
\IF{
$v_{min}-v_{i}> v_{min} \cdot \mathrm{min\_improvement}$
}
\STATE{$v_{min} \gets v_i$}
\STATE{$i_{min} \gets i$}
\ENDIF
\IF{$i - i_{min} > \mathrm{patience}$}
\STATE{STOP} \COMMENT{return $\vc{\theta}$}
\ENDIF
\STATE{$i \gets i+1$}
\ENDFOR
%\STATE \STATE
\end{algorithmic}
\end{algorithm}
%todo: use s=h in lstm
The \textsf{Theanets} \cite{Johnson2015} implementation%
\footnote{With every calculation of $\vc{o}$, the RNN states are initialized to 0.}
of RNNs and RMSprop was used.
%
\textsf{Theanets} is based on the mathematical expression compiler, \textsf{Theano} \cite{Bergstra2010}.
%
Gradients were computed using \textsf{Theano}'s automatic differentiation feature instead of explicitly-defined backpropagation \cite{Rumelhart1986}.
%\section{Hyper-parameter Optimization}
%In the previous section,
While the training procedure optimizes $\vc{\theta}$, there are other parameters that could be adjusted to minimize the loss.
%
The number of layers, $l$, and the `size' of each layer, $n$, corresponding to the dimension of vector $\vc{s}$ (which equals the cell state memory dimension as well), were chosen as `hyper-parameters' for optimization in a Bayesian optimization process.
%
Bayesian optimization is suited for optimizing RNN training because
1) it tries to minimize the number of (expensive) objective function calls, which, in this case, is the training procedure, and
2) it considers the stochasticity of the function, which, in this case, the selection of training and validation data are random in addition to the training data shuffling on each epoch.
The \textsf{Spearmint} \cite{snoek2012practical} package was used to drive the hyper-parameter optimization process.
%
\textsf{Spearmint} takes a parameter search space and a maximum number of optimization iterations.
%
The process was programmed to save RNN parameters after every optimization iteration.
%
However, the process was stopped until the RNN with the minimum validation loss was subjectively able to detect anomalies%
%
\footnote{
Further optimization may have been possible with increasing RNN size but loss reduction diminished relative to the associated increase in computational expense.
}%
%
.
For every sample, Figure \ref{fig:bo} shows the results of the Bayesian optimization process and the training progress of the best network.
%
In the training progress figures, the validation and training losses are close throughout the training process.
%
This is expected since the training and validation sets are highly redundant and perhaps the RNN, by nature, interprets them to be almost identical (invariance to translation and sequence length).
%todo: results are fine eventhough it might be underfit
%todo: spike-2 legend
\begin{figure}[!h]
\centering %3figs a pg. these are the gen figs
\begin{subfigure}[t]{\textwidth} %hmm it's the same width as the pdf
% todo: check the previous pdfs includegraphics
\includegraphics[width=.5\textwidth]{figs/bo_spikelv.pdf}
\includegraphics[width=.5\textwidth]{figs/trn_spikelv.pdf}
\caption{spike-1}
\end{subfigure}%
\end{figure}
\begin{figure}[!hp]
\ContinuedFloat
\begin{subfigure}[t]{\textwidth}
\includegraphics[width=.5\textwidth]{figs/bo_spikereg.pdf}
\includegraphics[width=.5\textwidth]{figs/trn_spikereg.pdf}
\caption{spike-2}
\end{subfigure}%
\end{figure}
\begin{figure}[!hp]
\ContinuedFloat
\begin{subfigure}[t]{\textwidth}
\includegraphics[width=.5\textwidth]{figs/bo_sin.pdf}
\includegraphics[width=.5\textwidth]{figs/trn_sin.pdf}
\caption{sine}
\end{subfigure}%
\end{figure}
\begin{figure}[!hp]
\ContinuedFloat
\begin{subfigure}[t]{\textwidth}
\includegraphics[width=.5\textwidth]{figs/bo_power.pdf}
\includegraphics[width=.5\textwidth]{figs/trn_power.pdf}
\caption{power}
\end{subfigure}%
\end{figure}
\begin{figure}[!h]
\ContinuedFloat
\begin{subfigure}[t]{\textwidth}
\includegraphics[width=.5\textwidth]{figs/bo_ecg.pdf}
\includegraphics[width=.5\textwidth]{figs/trn_ecg.pdf}
\caption{ECG}
\end{subfigure}%
\end{figure}
\begin{figure}[!h]
\ContinuedFloat
\begin{subfigure}[t]{\textwidth}
\includegraphics[width=.5\textwidth]{figs/bo_sleep.pdf}
\includegraphics[width=.5\textwidth]{figs/trn_sleep.pdf}
\caption{PSG-ECG}
\end{subfigure}%
\caption[Training results]{Bayesian optimized validation loss (left) and training progress of best RNN (right).
%
Each point in the Bayesian optimization plot is the average validation loss since more than on training session could have been launched with the same parameters.
%
If so, the points have a 95\% confidence interval bar through them.
}
\label{fig:bo} %OBVIOUSLY label should be AFTER caption!!!
\end{figure}
%todo: make figures centered by adding whitespace to the right
\section{Results and Discussion}
\label{sec:results}
The optimal RNN has some expectation of the typical series.
%
So, a `reconstruction error' is the difference between the test input and the output.
%
Therefore, the square of the reconstruction error can be used as an anomaly score \cite{Erfani2014}.
%
This section will show how the various calculations of error can be used to detect point anomalies \emph{as well as} discord anomalies introduced in Section \ref{sec:anomalytypes}.
%
However, the particular (objective) anomaly detection technique will not be discussed;
only the anomaly score, that would be used in such a technique, is presented.
%
Nonetheless, the argument is made for the ability of autoencoding RNNs to detect anomalies.
Although the order of the anomaly scores should not be important to the anomaly detection technique, sequences of error calculations can be made in two ways for a test series: 1) the MSE of a sliding window and 2) the squared error of every point in the test series where the whole sequence is input to the optimal RNN.
%
While the MSE from a sliding window can detect both point anomalies and discords, it is preferable not to have to specify a window size.
%
Ideally, high anomaly scores for points close together signal a discord while a lone high anomaly score signals a point anomaly.
%
But both types of calculations are evaluated in this discussion.
Error calculations are presented graphically in Figure \ref{fig:err} for each series (see Figure caption after the last subfigure).
%
The errors are summarized by their maximum, kernel density estimate, and the 5 percentile mark.
%
Together, with the error plot, the efficacy of the RNN in distinguishing anomalies can be evaluated;
%
normal data should correspond to the bulk of the error calculations while anomalies should correspond to extreme high values.
It should be noted that as the window size is increased, fractionally less points are included in the, normal, 95 percentile (and vice versa).
%
The kernel density estimate might change significantly accordingly.
What follows is a discussion of the ability of the (optimal) RNN to detect anomalies referencing subfigures of Figure \ref{fig:err}.
\noindent {\bfseries{spikes-1}} \nopagebreak
\begin{figure}[H]
\centering
\begin{subfigure}[t]{\textwidth}
\centering
\includegraphics[]{figs/er_spikelv.pdf}
\caption{spikes-1}
\end{subfigure}%
\end{figure}
This series can be considered discrete with a point anomaly.
%
The point anomaly takes on an improbable value but is not extreme.
%
Clearly, the anomalies are distinguished in all error calculations.
Note the error increases as soon as the sliding window (from the left) encounters the anomaly.
%
This observation can be used to analyze the other series.
\noindent {\bfseries{spikes-2}} \nopagebreak
\begin{figure}[H]
\ContinuedFloat
\begin{subfigure}[t]{\textwidth}
\centering
\includegraphics[]{figs/er_spikereg.pdf}
\caption{spikes-2}
\end{subfigure}%
\end{figure}
This spike series tests if the error becomes significant where the spike is out of place; a discord.
%
The anomaly is definitely detected in the windowed error plot but not the point error plot.
\noindent {\bfseries{sine}} \nopagebreak
\begin{figure}[H]
\ContinuedFloat
\begin{subfigure}[t]{\textwidth}
\centering
\includegraphics[]{figs/er_sin.pdf}
\caption{sine}
\end{subfigure}%
\end{figure}
This series tests discord because no single point is anomalous.
%
For the windowed errors, the anomalous and normal regions are distinguished by a range of values for each.
%
But, there was a transition region with errors higher or lower than the anomalous region.
%
In the point error plot, the extreme values correspond to extreme values in the test plot.
%
So, given these results, the windowed error can be more attributed to the extreme values of the test series than to its anomalous behavior.
%
In any case the RNN found a tight expectation for typical values.
%kde bad
%\pagebreak
\noindent {\bfseries{power}} \nopagebreak
\begin{figure}[H]
\ContinuedFloat
\begin{subfigure}[t]{\textwidth}
\centering
\includegraphics[]{figs/er_power.pdf}
\caption{power}
\end{subfigure}%
\end{figure}
The power series normally shows a demand profile over a five-day work week.
%
The windowed error distribution has a long tail because high errors were given to normal areas of test series (not shown within the plot, possibly due to some drift in the series).
%
In fact, the first windowed error plot does not contain the highest error unlike the second windowed error plot.
%
Nonetheless, both windowed errors distinguished the anomalous region.
%
But, obviously, the point errors cannot be used to detect the anomalous region.
\noindent {\bfseries{ECG}} \nopagebreak
\begin{figure}[H]
\ContinuedFloat
\begin{subfigure}[t]{\textwidth}
\centering
\includegraphics[]{figs/er_ecg.pdf}
\caption{ECG}
\end{subfigure}%
\end{figure}
The ECG is a challenging series because, while there is a repeating element, the element does not precisely have the same period.
%
There is also noise in the signal that has to be distinguished from its significant features.
In the point error plot, the most prominent values correspond to points on the test series that are simply lower than a baseline.
%
The same cannot be said about the error plot with the smallest window;
%
although the areas with extreme values are the most prominent, areas that have different behavior, but not extreme in value, are distinguished as well.
%
For example, a particularly subtle anomaly was detected within the highest 5 percentile at around $t=1380$ despite the fact that the more anomalous \emph{region} contributed extreme values.
%
The same could be said about anomaly at around $t=1470$.
%
This window size was chosen to be about the size of the repeating unit since the effect of these anomalies on the error is diluted for larger windows.
%
%Still, an argument can be made that the larger windows captured the anomalous region's behavior, as opposed to just its extreme values, because a significant portion of the error corresponding to the anomalous region lies within the 95 percentile (and hence within the normal range of error fluctuations for normal data). %weak :S
The larger windows detected the most anomalous region but it is not apparent whether the extreme error values were due to deviant values or deviant behavior.
\noindent {\bfseries{PSG-ECG}} \nopagebreak
\begin{figure}[H]
\ContinuedFloat
\begin{subfigure}[t]{\textwidth}
\centering
\includegraphics[]{figs/er_sleep.pdf}
\caption{PSG-ECG}
\end{subfigure}%
\caption[Anomaly scores of series]{Anomaly scores of series.
%
In the top pane, a portion of the input series is shown with an anomaly.
%
While the focus is on the anomaly, enough normal data is shown for subjective comparison.
%
This is accompanied by a kernel density estimate, plotted on the ordinate (vertically), to evaluate the normality of the data \emph{by value} as opposed to \emph{behavior}.
%
The lower panes show series of squared error, $\epsilon$, where each pane is associated with a sliding window size.
%
The window size is represented by a highlighted vertical span in the error plot as well as a corresponding box in the input series plot to give a sense of scale to the anomaly relative to the window.
%
Sliding a window over the test series that calculates its MSE generates the error series (error calculation type 1).
%
The locations of the MSE points correspond with the center of the sliding window.
%
However, error plots with just a vertical line as a `window' are just individual squared errors (error calculation type 2).
%
Furthermore, a kernel density estimate for all the errors (including ones outside the plot) is plotted (vertically) on the ordinate.
%
The highest 5 percentile and the maximum of (all) the errors are marked on the ordinate also.
}
\label{fig:err}
\end{figure}
In this ECG, the point errors do not reveal much about the anomalous region because every spike in the error plot plot corresponds to a normal spike in the ECG, normal or otherwise.
%
However, the anomalous region is clearly detected in both windowed error plots.
%
Again the error in the anomalous region is well beyond the normal range of error fluctuations albeit the test series values are slightly above average (as can be seen from the density estimate).
%todo:replace: indices to ^X^X FOR p omitted
%todo: 'was programmed to save' ...just write: process saves
\section{Conclusions}
Given the results, which are influenced by the described training process and RNN architecture, some general statements can be made about how well the RNNs can find anomalies.
\begin{itemize}
\item
Using the point errors, extreme, but not necessarily rare, values may be found.
\item
The windowed errors are a versatile way of finding both point and discord anomalies at multiple scales.
%
However, this means that the window size must be about the same scale as the anomaly.
%
Else, the effect of the anomaly would not be pronounced.
%
Furthermore, the minimum window size must be long enough to contain meaningful dynamics.
\item
Pursuant to the previous two points, windowed errors are more reliable in finding anomalies.
\item
The training data can have some anomalous data.
\item
The RNNs were insensitive to variations in sequence length and translation.
\item
With minor variation, the same process can be used to find anomalies in a variety of series.
\end{itemize}
In general, subject to effective training, the RNNs `learned' the typical \emph{behavior} of the series which includes information about their typical values as well as their ordering.
%
Concludingly, the autoencoding recurrent neural network can be used to detect anomalies in sequences.
%todo . theano implememtation particularly slow suspected
%todo. include tbls and probabilitydfs in repo in final
%%% Local Variables:
%%% mode: latex
%%% TeX-command-extra-options: "-shell-escape"
%%% TeX-master: "thesis"
%%% End: