Time series transformation selection with Estimation of Distribution Algorithms

When working with large datasets in time series Machine Learning projects, it is common to use different time series transformations in order to improve the forecasting model. In this approach an EDA is used to find the optimum transformations of the time series.

This approach is implemented and is part of EDAspy package.

Example of usage

# loading essential libraries first
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.api import VAR
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error

# EDAspy libraries
from EDAspy.timeseries import EDA_ts_fts as EDA
from EDAspy.timeseries import TS_transformations

# import some data
mdata = sm.datasets.macrodata.load_pandas().data
df = mdata.iloc[:, 2:12]
df.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	realgdp	realcons	realinv	realgovt	realdpi	cpi	m1	tbilrate	unemp	pop
0	2710.349	1707.4	286.898	470.045	1886.9	28.98	139.7	2.82	5.8	177.146
1	2778.801	1733.7	310.859	481.301	1919.7	29.15	141.7	3.08	5.1	177.830
2	2775.488	1751.8	289.226	491.260	1916.4	29.35	140.5	3.82	5.3	178.657
3	2785.204	1753.7	299.356	484.052	1931.3	29.37	140.0	4.33	5.6	179.386
4	2847.699	1770.5	331.722	462.199	1955.5	29.54	139.6	3.50	5.2	180.007

variables = list(df.columns)
variable_y = 'pop'  # pop is the variable we want to forecast
variables = list(set(variables) - {variable_y})  # array of variables to select among transformations
variables

['realinv',
 'tbilrate',
 'realgdp',
 'unemp',
 'm1',
 'cpi',
 'realdpi',
 'realcons',
 'realgovt']

Build the dataset with all the transformations desired. If some more transformations are desired, then it must be added to the dataset following the next steps:

Add the transformation postfix
Add to the dataset the respective variable with name (name + postfix)

TSTransf = TSTransformations(df)
transformations = ['detrend', 'smooth', 'log']  # postfix to variables, to denote the transformation

# build the transformations
for var in variables:
    transformation = TSTransf.de_trending(var)
    df[var + 'detrend'] = transformation

for var in variables:
    transformation = TSTransf.smoothing(var, window=10)
    df[var + 'smooth'] = transformation

for var in variables:
    transformation = TSTransf.log(var)
    df[var + 'log'] = transformation

Define the cost function to calculate the Mean Absolute Error

def cost_function(variables_list, nobs=20, maxlags=15, forecastings=10):
    """
    variables_list: list of variables without the variable_y
    nobs: how many observations for validation
    maxlags: previous lags used to predict
    forecasting: number of observations to predict
    
    return: MAE of the prediction with the real validation data
    """
    
    data = df[variables_list + [variable_y]]

    df_train, df_test = data[0:-nobs], data[-nobs:]

    model = VAR(df_train)
    results = model.fit(maxlags=maxlags, ic='aic')

    lag_order = results.k_ar
    array = results.forecast(df_train.values[-lag_order:], forecastings)

    variables_ = list(data.columns)
    position = variables_.index(variable_y)

    validation = [array[i][position] for i in range(len(array))]
    mae = mean_absolute_error(validation, df_test['pop'][-forecastings:])
    
    return mae

We take the normal variables without any time series transformation and try to forecast the y variable using the same cost function defined. This value is stored to be compared with the optimum solution found

mae_pre_eda = cost_function(variables)
print('MAE without using EDA:', mae_pre_eda)

MAE without using EDA: 5.091478009948458

Initialization of the initial vector of statitstics. Each variable has a 50% probability to be or not chosen

vector = pd.DataFrame(columns=list(variables))
vector.loc[0] = 0.5

Run the algorithm. The code will print some further information during execution

eda = EDA(max_it=50, dead_it=5, size_gen=15, alpha=0.7, vector=vector,
          array_transformations=transformations, cost_function=cost_function)
best_ind, best_MAE = eda.run(output=True)

[iteration: 0 ] 5.155357972950526
[iteration: 1 ] 5.246309408804405
[iteration: 2 ] 4.103723942647372
[iteration: 3 ] 4.710694858969362
[iteration: 4 ] 4.473166010597248
[iteration: 5 ] 4.913511145463962
[iteration: 6 ] 4.244706428986632
[iteration: 7 ] 3.847353187863837
[iteration: 8 ] 4.450659158760351
[iteration: 9 ] 4.723562306804138
[iteration: 10 ] 4.723562306804138
[iteration: 11 ] 4.723562306804138

Some plots

# some plots
hist = eda.historic_best

relative_plot = []
mx = 999999999
for i in range(len(hist)):
    if hist[i] < mx:
        mx = hist[i]
        relative_plot.append(mx)
    else:
        relative_plot.append(mx)

print('Solution:', best_ind, '\nMAE post EDA: %.2f' % best_MAE, '\nMAE pre EDA: %.2f' % mae_pre_eda)

plt.figure(figsize = (14,6))

ax = plt.subplot(121)
ax.plot(list(range(len(hist))), hist)
ax.title.set_text('Local cost found')
ax.set_xlabel('iteration')
ax.set_ylabel('MAE')

ax = plt.subplot(122)
ax.plot(list(range(len(relative_plot))), relative_plot)
ax.title.set_text('Best global cost found')
ax.set_xlabel('iteration')
ax.set_ylabel('MAE')

plt.show()

Solution: ['realinvsmooth', 'tbilratelog', 'realgdpdetrend', 'm1detrend', 'cpibasic', 'realconssmooth', 'realgovtbasic'] 
MAE post EDA: 3.85 
MAE pre EDA: 5.09

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Notebook.ipynb		Notebook.ipynb
README.ipynb		README.ipynb
README.md		README.md
TS_transformations.py		TS_transformations.py
TransformationsFeatureSelection.py		TransformationsFeatureSelection.py
main.py		main.py
output_16_1.png		output_16_1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Time series transformation selection with Estimation of Distribution Algorithms

Example of usage

Some plots

About

Releases

Packages

Languages

VicentePerezSoloviev/TS_Transformation_FS

Folders and files

Latest commit

History

Repository files navigation

Time series transformation selection with Estimation of Distribution Algorithms

Example of usage

Some plots

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages