Mutli Step Forecast with Horizon Column Indicator #238

LoicRaillon · 2023-10-12T08:56:48Z

Description

Currently mlforecast support the recursive and direct multi steps forecasting strategies. A good trade off between the two strategies is add a column which indicates the forecasting horizon and process the data accordingly as explained here. This method should have similar performances to the direct strategy but only use one model.

Use case

No response

jmoralez · 2024-01-23T17:19:48Z

Hey @LoicRaillon, thanks for the suggestion. In this case there would be duplicate rows with the features with an additional column "gap" that indicates how far ahead the target is, right? I ran a quick test and it doesn't perform very well on the M4, feel free to correct anything I may have done wrong.

import copy

import lightgbm as lgb
import pandas as pd
from datasetsforecast.m4 import M4, M4Info
from mlforecast import MLForecast
from utilsforecast.losses import smape

# load data
group = 'Hourly'
await M4.async_download('data', group=group)
df, *_ = M4.load(directory='data', group=group)
df['ds'] = df['ds'].astype('int')

# split
info = M4Info[group]
horizon = info.horizon
n_series = df['unique_id'].nunique()
valid = df.groupby('unique_id').tail(horizon)
train = df.drop(valid.index)

# build training set with gap column
fcst = MLForecast(
    models=[],
    freq=1,
    lags=[24],
)
prep = fcst.preprocess(train, max_horizon=horizon)
cols = prep.columns
is_target = cols.str.startswith('y')
long = prep.melt(
    id_vars=cols[~is_target],
    value_vars=cols[is_target],
    var_name='gap',
    value_name='y',
)
long = long[long['y'].notnull()].copy()
long['gap'] = long['gap'].str.replace('y', '').astype('int32')

# train model with gap
features = long.columns.drop(['unique_id', 'ds', 'y'])
X = long[features].values
y = long['y'].values
model = lgb.LGBMRegressor().fit(X, y)

# ugly setup to produce the next feature values
fcst.ts._ga = copy.copy(fcst.ts.ga)
fcst.ts._idxs = None
fcst.ts._static_features = fcst.ts.static_features_
fcst.ts._predict_setup()
next_feats = fcst.ts._get_features_for_next_step()
X_test = pd.concat([next_feats.assign(gap=i) for i in range(horizon)])

# evaluate with gap
eval_df = valid.copy()
eval_df['with_gap'] = model.predict(X_test).reshape(-1, n_series).ravel(order='F')

# train recursive model
fcst2 = MLForecast(
    models={'recursive': lgb.LGBMRegressor()},
    freq=1,
    lags=[24],
)
fcst2.fit(train)

# evaluate recursive
eval_df['recursive'] = fcst2.predict(horizon)['recursive'].values

# results
smape(eval_df, models=models)[models].mean().map('{:.1%}'.format)
# with_gap     24.2%
# recursive     8.6%
# dtype: object

davide-burba · 2024-01-29T08:16:43Z

Agree it would be a nice-to-have 3rd strategy to predict multi-horizon.
From my experience this approach is quite helpful. Here's a blog article describing this approach: https://medium.com/towards-data-science/forecast-multiple-horizons-an-example-with-weather-data-8d5fa4321e07?sk=5972cde6b06f20c3f9e679ba9e26b991

(Disclaimer: I wrote it)

LoicRaillon · 2024-01-29T08:52:40Z

@jmoralez Thanks for taking time on this issue. I'll try to double check your findings this week.

davide-burba · 2024-06-19T17:07:02Z

@jmoralez thank you for the code snippet.

I think the reason why you get poor performances with the code you posted is linked to the choice of the metric (smape). I tried to compute the mse or the mae with your code, and the forecast with gap actually leads to better scores.
In particular the mse (which is also the default objective of ligthgbm) of the recursive method is almost double the mse of the gap method.

To reproduce:

# results
from utilsforecast.losses import mse, mae

models = ["with_gap","recursive"]
print("--- mse ---")
print(mse(eval_df, models=models)[models].mean().map('{:.1e}'.format))
print("--- mae ---")
print(mae(eval_df, models=models)[models].mean().map('{:.1e}'.format))
# --- mse ---
# with_gap     5.7e+07
# recursive    1.0e+08
# dtype: object
# --- mae ---
# with_gap     1.2e+03
# recursive    1.3e+03
# dtype: object

I'm not sure why the smape is much worse for the gap method, but I heard multiple times that smape is not considered as a good metric. One reason for this is that smape is not really symmetric, since it penalizes more under-predictions than over-predictions, as described here

jmoralez · 2024-06-19T17:24:49Z

I don't remember very well the scales of the series in the M4, but you should probably weigh the series in some way for MAE and MSE, since if you just take the mean like that you're basically measuring the error of the series with large values and ignoring the rest. I know MAPE/SMAPE get a lot of hate but they're useful in cases like this to equally weigh the errors across all series.

jmoralez · 2024-06-19T20:49:24Z

Here's an updated example that scales the errors by the errors of the seasonal naive:

import lightgbm as lgb
import pandas as pd
from datasetsforecast.m4 import M4, M4Info
from mlforecast import MLForecast
from sklearn.base import BaseEstimator
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae, mse

class SeasonalNaive(BaseEstimator):
    def __init__(self, season_length):
        self.season_length = season_length

    def fit(self, X, y):
        return self

    def predict(self, X):
        return X[f'lag{self.season_length}']

group = 'Hourly'
await M4.async_download('data', group=group)
df, *_ = M4.load(directory='data', group=group)
df['ds'] = df['ds'].astype('int')

info = M4Info[group]
horizon = info.horizon
n_series = df['unique_id'].nunique()
valid = df.groupby('unique_id').tail(horizon).copy()
train = df.drop(valid.index)

lags = [1, 24, 24 * 7]
fcst = MLForecast(
    models=[],
    freq=1,
    lags=lags,
)
prep = fcst.preprocess(train, max_horizon=horizon)
cols = prep.columns
is_target = cols.str.startswith('y')
long = prep.melt(
    id_vars=cols[~is_target],
    value_vars=cols[is_target],
    var_name='gap',
    value_name='y',
)
long = long[long['y'].notnull()].copy()
long['gap'] = long['gap'].str.replace('y', '').astype('int32')

features = long.columns.drop(['unique_id', 'ds', 'y'])
X = long[features].values
y = long['y'].values
model = lgb.LGBMRegressor(verbosity=-1).fit(X, y)

with fcst.ts._maybe_subset(None), fcst.ts._backup():
    fcst.ts._predict_setup()
    next_feats = fcst.ts._get_features_for_next_step()
X_test = pd.concat([next_feats.assign(gap=i) for i in range(horizon)])

eval_df = valid.copy()
eval_df['with_gap'] = model.predict(X_test).reshape(-1, n_series).ravel(order='F')

fcst2 = MLForecast(
    models={
        'recursive': lgb.LGBMRegressor(verbosity=-1),
        'seasonal_naive': SeasonalNaive(season_length=24),
    },
    freq=1,
    lags=lags,
)
fcst2.fit(train)
eval_df = eval_df.merge(fcst2.predict(horizon), on=['unique_id', 'ds'])
evals = evaluate(eval_df, metrics=[mae, mse])
models = ['with_gap', 'recursive', 'seasonal_naive']
for model in models:
    evals[model] = evals[model] / evals['seasonal_naive']
print(evals.groupby('metric')[models].mean().to_markdown(floatfmt=',.1f'))

metric	with_gap	recursive	seasonal_naive
mae	42.2	6.3	1.0
mse	4,008.4	112.4	1.0

Also a plot for people who like visual evaluations:

from utilsforecast.plotting import plot_series

plot_series(train, eval_df, max_insample_length=24*3, palette='plasma')

I'm not against this method though, I'd just like to see it beat the recursive or direct approaches to be sure that it's worth the trouble of implementing it here.

LoicRaillon added enhancement feature labels Oct 12, 2023

jmoralez added the awaiting response label Jan 23, 2024

github-actions bot removed the awaiting response label Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutli Step Forecast with Horizon Column Indicator #238

Mutli Step Forecast with Horizon Column Indicator #238

LoicRaillon commented Oct 12, 2023

jmoralez commented Jan 23, 2024

davide-burba commented Jan 29, 2024

LoicRaillon commented Jan 29, 2024

davide-burba commented Jun 19, 2024 •

edited

Loading

jmoralez commented Jun 19, 2024

jmoralez commented Jun 19, 2024 •

edited

Loading

Mutli Step Forecast with Horizon Column Indicator #238

Mutli Step Forecast with Horizon Column Indicator #238

Comments

LoicRaillon commented Oct 12, 2023

Description

Use case

jmoralez commented Jan 23, 2024

davide-burba commented Jan 29, 2024

LoicRaillon commented Jan 29, 2024

davide-burba commented Jun 19, 2024 • edited Loading

jmoralez commented Jun 19, 2024

jmoralez commented Jun 19, 2024 • edited Loading

davide-burba commented Jun 19, 2024 •

edited

Loading

jmoralez commented Jun 19, 2024 •

edited

Loading