Building Fixed-Size Spatiotemporal Models for Evolving Data Streams

This is the material for reproducing experiments for the paper "Building Fixed-Size Spatiotemporal Models for Evolving Data Streams". All experiments were performed on a Ubuntu 18.04.1 Linux system using Python 3.6.9. Python package requirements can be installed with pip using

pip3 install -r requirements.txt

Furthermore, for some experiments, git has to be installed.

Code for our proposed method is contained in the tpSDOs directory as a Python module and is installed by issuing the above command.

Proof of Concept

Change to the poc directory.
```
 cd poc
```
Remove the file results.csv, which contains our obtained results.
```
 rm results.csv
```
Run the proof of concept implementation for several fractions of temporal outliers. Alternatively, you can specify the desired fraction of temporal outliers as script parameter.
```
 python3 run.py
```
Results are appended to the results.csv file and can be plotted using python3 plot.py.

Outlier Detection Evaluation for KDD Cup'99

Change to the outlier directory.
```
 cd outlier
```
Download kddcup.data.gz from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html to the outlier directory and extract it.
```
 wget http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data.gz && gzip -d kddcup.data.gz
```
Perform features extraction for the KDD Cup'99 dataset, creating the kddcup.npz file.
```
 python3 kddcup.py
```

Run all outlier detection algorithms for KDD Cup'99.

 python3 run.py kddcup rshash swknn swrrct loda swlof tpsdose

Results will be appended to results.csv. The results.csv file contained in this archive shows our obtained results. Results for our proposed method are named tpsdose.

Outlier Detection for SWAN-SF

Change to the directory outlier.
```
 cd outlier
```
Download the files partition1_instances.tar.gz, partition2_instances.tar.gz, partition3_instances.tar.gz, partition4_instances.tar.gz, partition5_instances.tar.gz from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/EBCFKM and place them in the outlier directory.

Extract the downloaded files to obtain directories partition1, partition2, partition3, partition4, partition5 within the outlier directory.

 tar xf partition1_instances.tar.gz
 tar xf partition2_instances.tar.gz
 tar xf partition3_instances.tar.gz
 tar xf partition4_instances.tar.gz
 tar xf partition5_instances.tar.gz

Clone the https://bitbucket.org/gsudmlab/swan_features/src/master/ repository to gsudmlab-swan_features. The repository is used for feature extraction.
```
 git clone https://bitbucket.org/gsudmlab/swan_features/src/master/ gsudmlab-swan_features
```
Ensure you are using the version with commit hash 56eb7cb, which we used for our experiments.
```
 git -C gsudmlab-swan_features checkout 56eb7cb
```
Perform feature extraction for the SWAN-SF dataset, creating the swan.npz file.
```
 python3 swan.py
```

Run all outlier detection algorithms for SWAN-SF.

 python3 run.py swan rshash swknn swrrct loda swlof tpsdose

Results will be appended to results.csv. The results.csv file contained in this archive shows our obtained results. Results for our proposed method are named tpsdose.

Knowledge Discovery on Network Traffic

Due to issues related to confidentiality, security and privacy, we unfortunately cannot make this dataset publicly available. The following steps therefore apply to an arbitrary network capture file capture.pcap.

For this experiment, an installation of golang is additionally required. For step 7, additionally an installation of tshark is required.

Install go-flows from https://github.com/CN-TU/go-flows.
```
 go get github.com/CN-TU/go-flows/...
```
Change to the m2m directory.
```
 cd m2m
```
Perform flow extraction based on feature specifications in CAIA.json, creating the file capture.csv from capture.pcap.
```
 go-flows run features CAIA.json export csv capture.csv source libpcap capture.pcap
```
Process flow information using the proposed algorithm, obtaining the file results.pickle.
```
 python3 process.py
```
Plot obtained outlier scores and the amount of sampled data points per day.
```
 python3 plot_scores.py
 python3 plot_sampling.py
```
For each observer, plot the magnitude spectrum of observers, 1h temporal plots and 24h temporal plots into the directories fts, temporal_1h and temporal_24h, respectively.
```
 python3 analyze.py
```
For each observer, extract a PCAP file containing network traffic corresponding to the respective observer into the pcaps directory.
```
 python3 extract.py
```

Knowledge Discovery on Darkspace Data

Please note that the used dataset has a size of ~2TB and processing of the data takes several weeks. To only plot the results we obtained, you can use the existing results.pickle file and skip to step 6.

Change to the darkspace directory.
Obtain the 'Patch Tuesday' dataset from https://www.caida.org/catalog/datasets/telescope-patch-tuesday_dataset/, and place the ucsd_network_telescope.anon.*.flowtuple.cors.gz files in the darkspace directory.
Obtain the legacy Corsaro software from https://github.com/CAIDA/corsaro , build the cors2ascii tool and place it in your system's PATH.

Extract flow information in AGM format.

 ( for file in ucsd_network_telescope.anon.*.flowtuple.cors.gz ; do cors2ascii $file ; done ) | python3 cors2agm.py >agm.csv

Process flow information, obtaining the file results.pickle.
```
 python3 process.py
```
Perform frequency plots and temporal plots from results.pickle
```
 python3 plot.py
```
Plots can be found in the fts and temporal_1w directories.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Building Fixed-Size Spatiotemporal Models for Evolving Data Streams

Proof of Concept

Outlier Detection Evaluation for KDD Cup'99

Outlier Detection for SWAN-SF

Knowledge Discovery on Network Traffic

Knowledge Discovery on Darkspace Data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Building Fixed-Size Spatiotemporal Models for Evolving Data Streams

Proof of Concept

Outlier Detection Evaluation for KDD Cup'99

Outlier Detection for SWAN-SF

Knowledge Discovery on Network Traffic

Knowledge Discovery on Darkspace Data