This is the material for reproducing experiments for the paper "Building
Fixed-Size Spatiotemporal Models for Evolving Data Streams". All experiments
were performed on a Ubuntu 18.04.1 Linux system using Python 3.6.9. Python
package requirements can be installed with pip
using
pip3 install -r requirements.txt
Furthermore, for some experiments, git
has to be installed.
Code for our proposed method is contained in the tpSDOs
directory as a Python
module and is installed by issuing the above command.
-
Change to the
poc
directory.cd poc
-
Remove the file
results.csv
, which contains our obtained results.rm results.csv
-
Run the proof of concept implementation for several fractions of temporal outliers. Alternatively, you can specify the desired fraction of temporal outliers as script parameter.
python3 run.py
-
Results are appended to the
results.csv
file and can be plotted usingpython3 plot.py
.
-
Change to the
outlier
directory.cd outlier
-
Download
kddcup.data.gz
from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html to theoutlier
directory and extract it.wget http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data.gz && gzip -d kddcup.data.gz
-
Perform features extraction for the KDD Cup'99 dataset, creating the
kddcup.npz
file.python3 kddcup.py
-
Run all outlier detection algorithms for KDD Cup'99.
python3 run.py kddcup rshash swknn swrrct loda swlof tpsdose
-
Results will be appended to
results.csv
. Theresults.csv
file contained in this archive shows our obtained results. Results for our proposed method are namedtpsdose
.
-
Change to the directory
outlier
.cd outlier
-
Download the files
partition1_instances.tar.gz
,partition2_instances.tar.gz
,partition3_instances.tar.gz
,partition4_instances.tar.gz
,partition5_instances.tar.gz
from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/EBCFKM and place them in theoutlier
directory. -
Extract the downloaded files to obtain directories
partition1
,partition2
,partition3
,partition4
,partition5
within theoutlier
directory.tar xf partition1_instances.tar.gz tar xf partition2_instances.tar.gz tar xf partition3_instances.tar.gz tar xf partition4_instances.tar.gz tar xf partition5_instances.tar.gz
-
Clone the https://bitbucket.org/gsudmlab/swan_features/src/master/ repository to
gsudmlab-swan_features
. The repository is used for feature extraction.git clone https://bitbucket.org/gsudmlab/swan_features/src/master/ gsudmlab-swan_features
-
Ensure you are using the version with commit hash 56eb7cb, which we used for our experiments.
git -C gsudmlab-swan_features checkout 56eb7cb
-
Perform feature extraction for the SWAN-SF dataset, creating the
swan.npz
file.python3 swan.py
-
Run all outlier detection algorithms for SWAN-SF.
python3 run.py swan rshash swknn swrrct loda swlof tpsdose
-
Results will be appended to
results.csv
. Theresults.csv
file contained in this archive shows our obtained results. Results for our proposed method are namedtpsdose
.
Due to issues related to confidentiality, security and privacy, we unfortunately
cannot make this dataset publicly available. The following steps therefore
apply to an arbitrary network capture file capture.pcap
.
For this experiment, an installation of golang is additionally required. For
step 7, additionally an installation of tshark
is required.
-
Install go-flows from
https://github.com/CN-TU/go-flows
.go get github.com/CN-TU/go-flows/...
-
Change to the
m2m
directory.cd m2m
-
Perform flow extraction based on feature specifications in
CAIA.json
, creating the filecapture.csv
fromcapture.pcap
.go-flows run features CAIA.json export csv capture.csv source libpcap capture.pcap
-
Process flow information using the proposed algorithm, obtaining the file
results.pickle
.python3 process.py
-
Plot obtained outlier scores and the amount of sampled data points per day.
python3 plot_scores.py python3 plot_sampling.py
-
For each observer, plot the magnitude spectrum of observers, 1h temporal plots and 24h temporal plots into the directories
fts
,temporal_1h
andtemporal_24h
, respectively.python3 analyze.py
-
For each observer, extract a PCAP file containing network traffic corresponding to the respective observer into the
pcaps
directory.python3 extract.py
Please note that the used dataset has a size of ~2TB and processing of the data
takes several weeks. To only plot the results we obtained, you can use the
existing results.pickle
file and skip to step 6.
-
Change to the
darkspace
directory. -
Obtain the 'Patch Tuesday' dataset from https://www.caida.org/catalog/datasets/telescope-patch-tuesday_dataset/, and place the
ucsd_network_telescope.anon.*.flowtuple.cors.gz
files in thedarkspace
directory. -
Obtain the legacy Corsaro software from https://github.com/CAIDA/corsaro , build the
cors2ascii
tool and place it in your system's PATH. -
Extract flow information in AGM format.
( for file in ucsd_network_telescope.anon.*.flowtuple.cors.gz ; do cors2ascii $file ; done ) | python3 cors2agm.py >agm.csv
-
Process flow information, obtaining the file
results.pickle
.python3 process.py
-
Perform frequency plots and temporal plots from
results.pickle
python3 plot.py
-
Plots can be found in the
fts
andtemporal_1w
directories.