Skip to content

Fontanjo/TS_generation_benchmark_public

Repository files navigation

Time Series Generation Benchmark

This code provides a tool to generate synthetic time series using some of the most common techniques. The sources of the codes used are listed here

This repository only contains the sharable (public) part of the code. Since the concatenation tool to reconstruct the results of GANs is absent, the code might not work properly with GAN-based generations

Prerequisites | Build | Execution | Results | Examples | Plot metrics | BasicGAN 3072 Note | Sources | Datasets form


Prerequisites

  • Ubuntu 20.04 (Focal Fossa).

  • Clone this resository

  • Python3.6* and pip for python 3

*How to install python3.6 for Ubuntu 20.04


Build

  • Install the needed libraries and create the virtual environments (takes several minutes)
$ sudo sh install_all.sh

Execution

  • Change the parameters in parameters.json if needed, then run with
$ python3.6 main.py

Arguments

--algorithm --dataset --nb_epochs --batch_size --TimeGAN_seq_len --Kalman_filter --Compute_metrics --Show_plot
DBA 'datasets/Original_Data/BeetleFly_TEST.csv' int > 0 int > 0 int > 0 1 (-> apply) 1 (-> compute) 1 (-> show)
InfoGAN 'datasets/Original_Data/Coffee_TEST.csv' 0 (-> do not apply) 0 (-> do not compute) 0 (-> do not show)
TimeGAN 'datasets/Original_Data/Ham_TEST.csv'
AnomaliesInjection 'datasets/Original_Data/Lighting7_TEST.csv'
AR 'datasets/Original_Data/Alabama_weather_6k_8k.csv'
'datasets/Original_Data/Currency2.csv'

Results

For each new dataset, a separate folder is created in the 'results' folder. Inside it, the files 'precision.csv' and 'runtime.csv' groups the statistics for all the generation techniques used with the given dataset. 'data' contains a csv file with the output of each technique, and 'plots' contains a png image with the plot of each technique.

Example after running each of the 5 algorithms on "BeetleFly_TEST.csv" dataset:

./results/
├── BeetleFly_TEST
│   ├── data
│   │   ├── BeetleFly_TEST_AnomaliesInjection.csv
│   │   ├── BeetleFly_TEST_AR.csv
│   │   ├── BeetleFly_TEST_DBA.csv
│   │   ├── BeetleFly_TEST_InfoGAN.csv
│   │   └── BeetleFly_TEST_TimeGAN.csv
│   ├── plots
│   │   ├── BeetleFly_TEST_AnomaliesInjection.png
│   │   ├── BeetleFly_TEST_AR.png
│   │   ├── BeetleFly_TEST_DBA.png
│   │   ├── BeetleFly_TEST_InfoGAN.png
│   │   └── BeetleFly_TEST_TimeGAN.png
│   ├── precision.csv
│   └── runtime.csv
└── placeholder.txt

Execution examples

  1. Generate 5 time series of length 100 using AnomaliesInjection algorithm, with dataset "Coffee_TEST.csv" as input:
$ python3.6 main.py --dataset 'datasets/Original_Data/Coffee_TEST.csv' --algorithm AnomaliesInjection --length 5 --nb_series 100
  1. Run the InfoGAN algorithm on the "Currency2" dataset, for 300 epochs and with a batch of size 200. Then apply the Kalman filter and compute the metrics:
$ python3.6 main.py --dataset 'datasets/Original_Data/Currency2.csv' --algorithm InfoGAN --nb_epochs 300 --batch_size 200 --Kalman_filter 1 --Compute_metrics 1 --Show_plot 0
  1. Run both DBA and AnomaliesInjection on the "BeetleFly_TEST" dataset. For the other parameters, use the values specifies in parameters.json:
$ python3.6 main.py --dataset 'datasets/Original_Data/BeetleFly_TEST.csv' --algorithm DBA AnomaliesInjection

Other Parameters

The parameters for the synthetic data generation are stored in the file parameters.json, in the main folder (Time_Series_Generation_Benchmark). This can therefore be modified in order to adapt the generation. Some parameter, for example those for AnomaliesInjection, can only be set through this file, and not directly when the code is runned

(It is to notice that the tsgen implementation also use a parameters.json file, but this is partially overwritten when the code is runned, therefore it might not be usefull to modify it)

Parameters specific for AutoregressiveModel:

Parameter name Usage Possible values
AR_lag_window The lag window to use with the AR model Integer number > 0, 0 for default (1/4 of the ts length)

Parameters specific for Kalman filter:

Parameter name Usage Possible values
Kalman_remove_initial Number of points to remove from the start of the time series after applying filter Integer number >= 0

Parameters specific for AnomaliesInjection:

Parameter name Usage Possible values
AnomaliesInjection_nb_modifications The number of anomalies to insert in the dataset Integer number > 0, or -1 to use default value (average of 1 anomaly per each ts in the dataset)
AnomaliesInjection_multiple_modification_per_ts Determines if more than 1 anomaly can be inserted in the same ts 1 for True, 0 for False. If 0, AnomaliesInjection_nb_modifications should be smaller than the number of ts in the dataset
AnomaliesInjection_seed Seed for the random generation Integer number > 0
AnomaliesInjection_max_nb_extreme Maximal number of extreme points (spikes) for each extreme anomaly Integer number > 0
AnomaliesInjection_min/max_shift/trend/variance The minimal/maximal length of each shift/trend/variance anomaly Integer number > 0. For each anomaly type, the max value should be strictly bigger than the min value
AnomaliesInjection_extreme/shift/trend/variance_factor The "intensity" of each extreme/shift/trend/variance anomaly Integer number > 0
AnomaliesInjection_probability_extreme/shift/trend/variance The probability of each anomaly to be of type extreme/shift/trend/variance Float number >= 0. If the sum of the 4 probabilities is different than 1, they will be reascaled to ensure this property. If they are all 0, default probabilities (0.25 each) are used

Anomalies Injection Examples

In an "extreme" anomaly, a point is modified to have a much bigger/smaller value that the original one, thus resulting in a spike when the time series is plotted.

In a "shift" anomaly, all the the records in a given interval are shifted by a given value, which is equal in every point. The result is that a part of the time series is shifted up or down.

In a "trend" anomaly, a trend is inserted at a given point in the time series. In other words, an increasing (or decreasing) sequence of values is added to a portion of the time series. For example, a time series [1,1,1,1,1,1,1,1,1,1] might become [1,1,1,2,3,4,4,4,4,4]. Notice that after the trend part ([2,3,4]), all the values are modified in order to continue "directly" from the last point (in this example, they are all increased by 3).

In a variance anomaly, the variance of a random interval is augmented. Visually, this results in something similar to a "vibration".


Plot metrics

Once the data has been generated, plot metrics with:

$ python3.6 plot_metrics.py

It is possible to select only specific datasets by indicating their result folder:

$ python3.6 plot_metrics.py --dataset results/BeetleFly_TEST results/Coffee_TEST results/Currency2

BasicGAN 3072 Note

The code provides an implementation for BasicGAN as well. However, this do not work with any of the included datasets.
It requires a dataset with 3072 time series of length 3072.

It has been included for consistency with the paper, but its runtime is not direclty measured and the metrics should be extracted manually from the output.


Sources

The codes used in this repo are adapted versions of:


Datasets Form

col_name1, col_name2, col_name3, ..., col_namej
val11, val21, val31, ..., val1j
val21, val22, val32, ..., val2j
..., ..., ..., ..., ...
vali1, vali2, vali3, ..., valij

In particular, the values should be separated by a ',', and each of the j time series should have the same length

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published