The required input format is a dataframe with at least the following columns: * unique_id with a unique identifier for each time serie * ds with the datestamp and a column * y with the values of the serie.
+
Every other column is considered a static feature unless stated otherwise in TimeSeries.fit
+
+
series = generate_daily_series(20, n_static_features=2)
+series
+
+
+
+
+
+
+
+
+
unique_id
+
ds
+
y
+
static_0
+
static_1
+
+
+
+
+
0
+
id_00
+
2000-01-01
+
7.404529
+
27
+
53
+
+
+
1
+
id_00
+
2000-01-02
+
35.952624
+
27
+
53
+
+
+
2
+
id_00
+
2000-01-03
+
68.958353
+
27
+
53
+
+
+
3
+
id_00
+
2000-01-04
+
84.994505
+
27
+
53
+
+
+
4
+
id_00
+
2000-01-05
+
113.219810
+
27
+
53
+
+
+
...
+
...
+
...
+
...
+
...
+
...
+
+
+
4869
+
id_19
+
2000-03-25
+
400.606807
+
97
+
45
+
+
+
4870
+
id_19
+
2000-03-26
+
538.794824
+
97
+
45
+
+
+
4871
+
id_19
+
2000-03-27
+
620.202104
+
97
+
45
+
+
+
4872
+
id_19
+
2000-03-28
+
20.625426
+
97
+
45
+
+
+
4873
+
id_19
+
2000-03-29
+
141.513169
+
97
+
45
+
+
+
+
+
4874 rows × 5 columns
+
+
+
+
For simplicity we’ll just take one time serie here.
Utility class for storing and transforming time series data.
+
The TimeSeries class takes care of defining the transformations to be performed (lags, lag_transforms and date_features). The transformations can be computed using multithreading if num_threads > 1.
The transformations are stored as a dictionary where the key is the name of the transformation (name of the column in the dataframe with the computed features), which is built using build_transform_name and the value is a tuple where the first element is the lag it is applied to, then the function and then the function arguments.
Note that for lags we define the transformation as the identity function applied to its corresponding lag. This is because _transform_series takes the lag as an argument and shifts the array before computing the transformation.
Add the features to data and save the required information for the predictions step.
+
If not all features are static, specify which ones are in static_features. If you don’t want to drop rows with null values after the transformations set dropna=False If keep_last_n is not None then that number of observations is kept across all series for updates.
The series values are stored as a GroupedArray in an attribute ga. If the data type of the series values is an int then it is converted to np.float32, this is because lags generate np.nans so we need a float data type for them.
You can also specify keep_last_n in TimeSeries.fit_transform, which means that after computing the features for training we want to keep only the last n samples of each time serie for computing the updates. This saves both memory and time, since the updates are performed by running the transformation functions on all time series again and keeping only the last value (the update).
+
If you have very long time series and your updates only require a small sample it’s recommended that you set keep_last_n to the minimum number of samples required to compute the updates, which in this case is 15 since we have a rolling mean of size 14 over the lag 2 and in the first update the lag 2 becomes the lag 1. This is because in the first update the lag 1 is the last value of the series (or the lag 0), the lag 2 is the lag 1 and so on.
+
+
keep_last_n =15
+
+ts = TimeSeries(**flow_config)
+df = ts.fit_transform(series, id_col='unique_id', time_col='ds', target_col='y', keep_last_n=keep_last_n)
+ts._uids = ts.uids.tolist()
+ts._idxs = np.arange(len(ts.ga))
+ts._predict_setup()
+
+expected_lags = ['lag7', 'lag14']
+expected_transforms = ['rolling_mean_lag2_window_size7',
+'rolling_mean_lag2_window_size14']
+expected_date_features = ['dayofweek', 'month', 'year']
+
+test_eq(ts.features, expected_lags + expected_transforms + expected_date_features)
+test_eq(ts.static_features_.columns.tolist() + ts.features, df.columns.drop(['ds', 'y']).tolist())
+# we dropped 2 rows because of the lag 2 and 13 more to have the window of size 14
+test_eq(df.shape[0], series.shape[0] - (2+13) * ts.ga.ngroups)
+test_eq(ts.ga.data.size, ts.ga.ngroups * keep_last_n)
+
+
TimeSeries.fit_transform requires that the y column doesn’t have any null values. This is because the transformations could propagate them forward, so if you have null values in the y column you’ll get an error.
The DistributedMLForecast class is a high level abstraction that encapsulates all the steps in the pipeline (preprocessing, fitting the model and computing predictions) and applies them in a distributed way.
+
The different things that you need to use DistributedMLForecast (as opposed to MLForecast) are:
+
+
You need to set up a cluster. We currently support dask and spark (ray is on the roadmap).
+
Your data needs to be a distributed collection. We currently support dask and spark dataframes.
+
You need to use a model that implements distributed training in your framework of choice, e.g. SynapseML for LightGBM in spark.
Here we define a client that connects to a dask.distributed.LocalCluster, however it could be any other kind of cluster.
+
+
+
Data setup
+
For dask, the data must be a dask.dataframe.DataFrame. You need to make sure that each time serie is only in one partition and it is recommended that you have as many partitions as you have workers. If you have more partitions than workers make sure to set num_threads=1 to avoid having nested parallelism.
+
The required input format is the same as for MLForecast, except that it’s a dask.dataframe.DataFrame instead of a pandas.Dataframe.
+
+
series = generate_daily_series(100, n_static_features=2, equal_ends=True, static_as_categorical=False)
+npartitions =10
+partitioned_series = dd.from_pandas(series.set_index('unique_id'), npartitions=npartitions) # make sure we split by the id_col
+partitioned_series = partitioned_series.map_partitions(lambda df: df.reset_index())
+partitioned_series['unique_id'] = partitioned_series['unique_id'].astype(str) # can't handle categoricals atm
+partitioned_series
+
+
Dask DataFrame Structure:
+
+
+
+
+
+
+
+
unique_id
+
ds
+
y
+
static_0
+
static_1
+
+
+
npartitions=10
+
+
+
+
+
+
+
+
+
+
id_00
+
object
+
datetime64[ns]
+
float64
+
int64
+
int64
+
+
+
id_10
+
...
+
...
+
...
+
...
+
...
+
+
+
...
+
...
+
...
+
...
+
...
+
...
+
+
+
id_89
+
...
+
...
+
...
+
...
+
...
+
+
+
id_99
+
...
+
...
+
...
+
...
+
...
+
+
+
+
+
+
Dask Name: assign, 5 graph layers
+
+
+
+
+
Models
+
In order to perform distributed forecasting, we need to use a model that is able to train in a distributed way using dask. The current implementations are in DaskLGBMForecast and DaskXGBForecast which are just wrappers around the native implementations.
Future values of the dynamic features, e.g. prices.
+
+
+
before_predict_callback
+
typing.Optional[typing.Callable]
+
None
+
Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index.
+
+
+
after_predict_callback
+
typing.Optional[typing.Callable]
+
None
+
Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index.
+
+
+
new_df
+
typing.Optional[~AnyDataFrame]
+
None
+
Series data of new observations for which forecasts are to be generated. This dataframe should have the same structure as the one used to fit the model, including any features and time series data. If new_df is not None, the method will generate forecasts for the new observations.
+
+
+
horizon
+
typing.Optional[int]
+
None
+
+
+
+
new_data
+
typing.Optional[~AnyDataFrame]
+
None
+
+
+
+
Returns
+
AnyDataFrame
+
+
Predictions for each serie and timestep, with one column per model.
+
+
+
+
Once we have our fitted models we can compute the predictions for the next 7 timesteps.
+
+
preds = fcst.predict(7)
+preds
+
+
Dask DataFrame Structure:
+
+
+
+
+
+
+
+
unique_id
+
ds
+
DaskXGBForecast
+
DaskLGBMForecast
+
+
+
npartitions=10
+
+
+
+
+
+
+
+
+
id_00
+
object
+
datetime64[ns]
+
float64
+
float64
+
+
+
id_10
+
...
+
...
+
...
+
...
+
+
+
...
+
...
+
...
+
...
+
...
+
+
+
id_89
+
...
+
...
+
...
+
...
+
+
+
id_99
+
...
+
...
+
...
+
...
+
+
+
+
+
+
Dask Name: map, 17 graph layers
+
+
+
+
2. Preprocess and train
+
If we only want to perform the preprocessing step we call preprocess with our data.
This is useful if we want to inspect the data the model will be trained. If we do this we must manually train our models and add a local version of them to the models_ attribute.
By default the predict method repeats the static features and updates the transformations and the date features. If you have dynamic features like prices or a calendar with holidays you can pass them as a list to the dynamic_dfs argument of DistributedMLForecast.predict, which will call pd.DataFrame.merge on each of them in order.
+
Here’s an example:
+
Suppose that we have a product_id column and we have a catalog for prices based on that product_id and the date.
This dataframe will be passed to DistributedMLForecast.fit (or DistributedMLForecast.preprocess), however since the price is dynamic we have to tell that method that only static_0 and product_id are static and we’ll have to update price in every timestep, which basically involves merging the updated features with the prices catalog.
So in order to update the price in each timestep we just call DistributedMLForecast.predict with our forecast horizon and pass the prices catalog as a dynamic dataframe.
If you want to do something like scaling the predictions you can define a function and pass it to DistributedMLForecast.predict as described in Custom predictions.
Perform time series cross validation. Creates n_windows splits where each window has h test periods, trains the models, computes the predictions and merges the actuals.
+
+
+
+
+
+
+
+
+
+
+
Type
+
Default
+
Details
+
+
+
+
+
df
+
AnyDataFrame
+
+
Series data in long format.
+
+
+
n_windows
+
int
+
+
Number of windows to evaluate.
+
+
+
h
+
int
+
+
Number of test periods in each window.
+
+
+
id_col
+
str
+
unique_id
+
Column that identifies each serie.
+
+
+
time_col
+
str
+
ds
+
Column that identifies each timestep, its values can be timestamps or integers.
+
+
+
target_col
+
str
+
y
+
Column that contains the target.
+
+
+
step_size
+
typing.Optional[int]
+
None
+
Step size between each cross validation window. If None it will be equal to h.
+
+
+
static_features
+
typing.Optional[typing.List[str]]
+
None
+
Names of the features that are static and will be repeated when forecasting.
+
+
+
dropna
+
bool
+
True
+
Drop rows with missing values produced by the transformations.
+
+
+
keep_last_n
+
typing.Optional[int]
+
None
+
Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it.
+
+
+
refit
+
bool
+
True
+
Retrain model for each cross validation window. If False, the models are trained at the beginning and then used to predict each window.
+
+
+
before_predict_callback
+
typing.Optional[typing.Callable]
+
None
+
Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index.
+
+
+
after_predict_callback
+
typing.Optional[typing.Callable]
+
None
+
Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index.
+
+
+
input_size
+
typing.Optional[int]
+
None
+
Maximum training samples per serie in each window. If None, will use an expanding window.
For spark, the data must be a pyspark DataFrame. You need to make sure that each time serie is only in one partition (which you can do using repartitionByRange, for example) and it is recommended that you have as many partitions as you have workers. If you have more partitions than workers make sure to set num_threads=1 to avoid having nested parallelism.
+
The required input format is the same as for MLForecast, i.e. it should have at least an id column, a time column and a target column.
In order to perform distributed forecasting, we need to use a model that is able to train in a distributed way using spark. The current implementations are in SparkLGBMForecast and SparkXGBForecast which are just wrappers around the native implementations.
For ray, the data must be a ray DataFrame. It is recommended that you have as many partitions as you have workers. If you have more partitions than workers make sure to set num_threads=1 to avoid having nested parallelism.
+
The required input format is the same as for MLForecast, i.e. it should have at least an id column, a time column and a target column.
+
+
series = generate_daily_series(100, n_static_features=2, equal_ends=True, static_as_categorical=False)
+# we need noncategory unique_id
+series['unique_id'] = series['unique_id'].astype(str)
+ray_series = ray.data.from_pandas(series)
+
+
+
+
Models
+
The ray integration allows to include lightgbm (RayLGBMRegressor), and xgboost (RayXGBRegressor).
Wrapper of lightgbm.dask.DaskLGBMRegressor that adds a model_ property that contains the fitted booster and is sent to the workers to in the forecasting step.
Wrapper of xgboost.dask.DaskXGBRegressor that adds a model_ property that contains the fitted model and is sent to the workers in the forecasting step.
Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below).
+
+
+
booster
+
typing.Optional[str]
+
None
+
+
+
+
tree_method
+
typing.Optional[str]
+
None
+
+
+
+
n_jobs
+
typing.Optional[int]
+
None
+
Number of parallel threads used to run xgboost. When used with other Scikit-Learn algorithms like grid search, you may choose which algorithm to parallelize and balance the threads. Creating thread contention will significantly slow down both algorithms.
+
+
+
gamma
+
typing.Optional[float]
+
None
+
(min_split_loss) Minimum loss reduction required to make a further partition on a leaf node of the tree.
+
+
+
min_child_weight
+
typing.Optional[float]
+
None
+
Minimum sum of instance weight(hessian) needed in a child.
+
+
+
max_delta_step
+
typing.Optional[float]
+
None
+
Maximum delta step we allow each tree’s weight estimation to be.
+
+
+
subsample
+
typing.Optional[float]
+
None
+
Subsample ratio of the training instance.
+
+
+
sampling_method
+
typing.Optional[str]
+
None
+
Sampling method. Used only by the GPU version of hist tree method. - uniform: select random training instances uniformly. - gradient_based select random training instances with higher probability when the gradient and hessian are larger. (cf. CatBoost)
+
+
+
colsample_bytree
+
typing.Optional[float]
+
None
+
Subsample ratio of columns when constructing each tree.
+
+
+
colsample_bylevel
+
typing.Optional[float]
+
None
+
Subsample ratio of columns for each level.
+
+
+
colsample_bynode
+
typing.Optional[float]
+
None
+
Subsample ratio of columns for each split.
+
+
+
reg_alpha
+
typing.Optional[float]
+
None
+
L1 regularization term on weights (xgb’s alpha).
+
+
+
reg_lambda
+
typing.Optional[float]
+
None
+
L2 regularization term on weights (xgb’s lambda).
+
+
+
scale_pos_weight
+
typing.Optional[float]
+
None
+
Balancing of positive and negative weights.
+
+
+
base_score
+
typing.Optional[float]
+
None
+
The initial prediction score of all instances, global bias.
Constraints for interaction representing permitted interactions. The constraints must be specified in the form of a nested list, e.g. [[0, 1], [2,<br>3, 4]], where each inner list is a group of indices of features that are allowed to interact with each other. See :doc:tutorial<br></tutorials/feature_interaction_constraint> for more information
+
+
+
importance_type
+
typing.Optional[str]
+
None
+
+
+
+
device
+
typing.Optional[str]
+
None
+
.. versionadded:: 2.0.0
Device ordinal, available options are cpu, cuda, and gpu.
+
+
+
validate_parameters
+
typing.Optional[bool]
+
None
+
Give warnings for unknown parameter.
+
+
+
enable_categorical
+
bool
+
False
+
.. versionadded:: 1.5.0
.. note:: This parameter is experimental
Experimental support for categorical data. When enabled, cudf/pandas.DataFrame should be used to specify categorical data type. Also, JSON/UBJSON serialization format is required.
+
+
+
feature_types
+
typing.Optional[typing.Sequence[str]]
+
None
+
.. versionadded:: 1.7.0
Used for specifying feature types without constructing a dataframe. See :py:class:DMatrix for details.
+
+
+
max_cat_to_onehot
+
typing.Optional[int]
+
None
+
.. versionadded:: 1.6.0
.. note:: This parameter is experimental
A threshold for deciding whether XGBoost should use one-hot encoding based split for categorical data. When number of categories is lesser than the threshold then one-hot encoding is chosen, otherwise the categories will be partitioned into children nodes. Also, enable_categorical needs to be set to have categorical feature support. See :doc:Categorical Data<br></tutorials/categorical> and :ref:cat-param for details.
+
+
+
max_cat_threshold
+
typing.Optional[int]
+
None
+
.. versionadded:: 1.7.0
.. note:: This parameter is experimental
Maximum number of categories considered for each split. Used only by partition-based splits for preventing over-fitting. Also, enable_categorical needs to be set to have categorical feature support. See :doc:Categorical Data<br></tutorials/categorical> and :ref:cat-param for details.
+
+
+
multi_strategy
+
typing.Optional[str]
+
None
+
.. versionadded:: 2.0.0
.. note:: This parameter is working-in-progress.
The strategy used for training multi-target models, including multi-target regression and multi-class classification. See :doc:/tutorials/multioutput for more information.
- one_output_per_tree: One model for each target. - multi_output_tree: Use multi-target trees.
Metric used for monitoring the training result and early stopping. It can be a string or list of strings as names of predefined metric in XGBoost (See doc/parameter.rst), one of the metrics in :py:mod:sklearn.metrics, or any other user defined metric that looks like sklearn.metrics.
If custom objective is also provided, then custom metric should implement the corresponding reverse link function.
Unlike the scoring parameter commonly used in scikit-learn, when a callable object is provided, it’s assumed to be a cost function and by default XGBoost will minimize the result during early stopping.
For advanced usage on Early stopping like directly choosing to maximize instead of minimize, see :py:obj:xgboost.callback.EarlyStopping.
See :doc:Custom Objective and Evaluation Metric </tutorials/custom_metric_obj> for more.
.. note::
This parameter replaces eval_metric in :py:meth:fit method. The old one receives un-transformed prediction regardless of whether custom objective is being used.
.. code-block:: python
from sklearn.datasets import load_diabetes from sklearn.metrics import mean_absolute_error X, y = load_diabetes(return_X_y=True) reg = xgb.XGBRegressor( tree_method=“hist”, eval_metric=mean_absolute_error, ) reg.fit(X, y, eval_set=[(X, y)])
+
+
+
early_stopping_rounds
+
typing.Optional[int]
+
None
+
.. versionadded:: 1.6.0
- Activates early stopping. Validation metric needs to improve at least once in every early_stopping_rounds round(s) to continue training. Requires at least one item in eval_set in :py:meth:fit.
- If early stopping occurs, the model will have two additional attributes: :py:attr:best_score and :py:attr:best_iteration. These are used by the :py:meth:predict and :py:meth:apply methods to determine the optimal number of trees during inference. If users want to access the full model (including trees built after early stopping), they can specify the iteration_range in these inference methods. In addition, other utilities like model plotting can also use the entire model.
- If you prefer to discard the trees after best_iteration, consider using the callback function :py:class:xgboost.callback.EarlyStopping.
- If there’s more than one item in eval_set, the last entry will be used for early stopping. If there’s more than one metric in eval_metric, the last metric will be used for early stopping.
.. note::
This parameter replaces early_stopping_rounds in :py:meth:fit method.
List of callback functions that are applied at end of each iteration. It is possible to use predefined callbacks by using :ref:Callback API <callback_api>.
.. note::
States in callback are not preserved during training, which means callback objects can not be reused for multiple training sessions without reinitialization or deepcopy.
.. code-block:: python
for params in parameters_grid: # be sure to (re)initialize the callbacks before each run callbacks = [xgb.callback.LearningRateScheduler(custom_rates)] reg = xgboost.XGBRegressor(**params, callbacks=callbacks) reg.fit(X, y)
+
+
+
kwargs
+
typing.Any
+
+
Keyword arguments for XGBoost Booster object. Full documentation of parameters can be found :doc:here </parameter>. Attempting to set a parameter via the constructor args and **kwargs dict simultaneously will result in a TypeError.
.. note:: **kwargs unsupported by scikit-learn
**kwargs is unsupported by scikit-learn. We do not guarantee that parameters passed via this argument will interact properly with scikit-learn.
Wrapper of lightgbm.ray.RayLGBMRegressor that adds a model_ property that contains the fitted booster and is sent to the workers to in the forecasting step.
Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below).
+
+
+
kwargs
+
typing.Any
+
+
Keyword arguments for XGBoost Booster object. Full documentation of parameters can be found :doc:here </parameter>. Attempting to set a parameter via the constructor args and **kwargs dict simultaneously will result in a TypeError.
.. note:: **kwargs unsupported by scikit-learn
**kwargs is unsupported by scikit-learn. We do not guarantee that parameters passed via this argument will interact properly with scikit-learn.
.. note:: Custom objective function
A custom objective function can be provided for the objective parameter. In this case, it should have the signature objective(y_true, y_pred) -> grad, hess:
y_true: array_like of shape [n_samples] The target values y_pred: array_like of shape [n_samples] The predicted values
grad: array_like of shape [n_samples] The value of the gradient for each sample point. hess: array_like of shape [n_samples] The value of the second derivative for each sample point
Wrapper of synapse.ml.lightgbm.LightGBMRegressor that adds an extract_local_model method to get a local version of the trained model and broadcast it to the workers.
+
+
+
SparkLGBMForecast
+
+
SparkLGBMForecast ()
+
+
Initialize self. See help(type(self)) for accurate signature.
Wrapper of xgboost.spark.SparkXGBRegressor that adds an extract_local_model method to get a local version of the trained model and broadcast it to the workers.
SparkXGBRegressor is a PySpark ML estimator. It implements the XGBoost regression algorithm based on XGBoost python library, and it can be used in PySpark Pipeline and PySpark ML meta algorithms like - :py:class:~pyspark.ml.tuning.CrossValidator/ - :py:class:~pyspark.ml.tuning.TrainValidationSplit/ - :py:class:~pyspark.ml.classification.OneVsRest
+
SparkXGBRegressor automatically supports most of the parameters in :py:class:xgboost.XGBRegressor constructor and most of the parameters used in :py:meth:xgboost.XGBRegressor.fit and :py:meth:xgboost.XGBRegressor.predict method.
+
To enable GPU support, set device to cuda or gpu.
+
SparkXGBRegressor doesn’t support setting base_margin explicitly as well, but support another param called base_margin_col. see doc below for more details.
+
SparkXGBRegressor doesn’t support validate_features and output_margin param.
+
SparkXGBRegressor doesn’t support setting nthread xgboost param, instead, the nthread param for each xgboost worker will be set equal to spark.task.cpus config value.
We’ll take a look at our series to get ideas for transformations and features.
+
+
fig = plot_series(df, max_insample_length=24*14)
+
+
+
We can use the MLForecast.preprocess method to explore different transformations. It looks like these series have a strong seasonality on the hour of the day, so we can subtract the value from the same hour in the previous day to remove it. This can be done with the mlforecast.target_transforms.Differences transformer, which we pass through target_transforms.
+
+
from mlforecast import MLForecast
+from mlforecast.target_transforms import Differences
+
+
+
fcst = MLForecast(
+ models=[], # we're not interested in modeling yet
+ freq=1, # our series have integer timestamps, so we'll just add 1 in every timestep
+ target_transforms=[Differences([24])],
+)
+prep = fcst.preprocess(df)
+prep
+
+
+
+
+
+
+
+
+
unique_id
+
ds
+
y
+
+
+
+
+
24
+
H196
+
25
+
0.3
+
+
+
25
+
H196
+
26
+
0.3
+
+
+
26
+
H196
+
27
+
0.1
+
+
+
27
+
H196
+
28
+
0.2
+
+
+
28
+
H196
+
29
+
0.2
+
+
+
...
+
...
+
...
+
...
+
+
+
4027
+
H413
+
1004
+
39.0
+
+
+
4028
+
H413
+
1005
+
55.0
+
+
+
4029
+
H413
+
1006
+
14.0
+
+
+
4030
+
H413
+
1007
+
3.0
+
+
+
4031
+
H413
+
1008
+
4.0
+
+
+
+
+
3936 rows × 3 columns
+
+
+
+
This has subtacted the lag 24 from each value, we can see what our series look like now.
+
+
fig = plot_series(prep)
+
+
+
+
+
Adding features
+
+
Lags
+
Looks like the seasonality is gone, we can now try adding some lag features.
y 1.000000
+lag1 0.622531
+lag24 -0.234268
+Name: y, dtype: float64
+
+
+
+
+
Lag transforms
+
Lag transforms are defined as a dictionary where the keys are the lags and the values are lists of functions that transform an array. These must be numba jitted functions (so that computing the features doesn’t become a bottleneck). There are some implemented in the window-ops package but you can also implement your own.
+
If the function takes two or more arguments you can either:
You can see that both approaches get to the same result, you can use whichever one you feel most comfortable with.
+
+
+
Date features
+
If your time column is made of timestamps then it might make sense to extract features like week, dayofweek, quarter, etc. You can do that by passing a list of strings with pandas time/date components. You can also pass functions that will take the time column as input, as we’ll show here.
If you want to do some transformation to your target before computing the features and then re-apply it after predicting you can use the target_transforms argument, which takes a list of transformations. You can find the implemented ones in mlforecast.target_transforms or you can implement your own as described in the target transformations guide.
+
+
from mlforecast.target_transforms import LocalStandardScaler
Once you’ve decided the features, transformations and models that you want to use you can use the MLForecast.fit method instead, which will do the preprocessing and then train the models. The models can be specified as a list (which will name them by using their class name and an index if there are repeated classes) or as a dictionary where the keys are the names you want to give to the models, i.e. the name of the column that will hold their predictions, and the values are the models themselves.
After you’ve trained a forecast object you can save it and load it to use later using pickle or cloudpickle. If by the time you want to use it you already know the following values of the target you can use the MLForecast.ts.update method to incorporate these, which will allow you to use these new values when computing predictions.
+
+
If no new values are provided for a serie that’s currently stored, only the previous ones are kept.
+
If new series are included they are added to the existing ones.
In order to get an estimate of how well our model will be when predicting future data we can perform cross validation, which consist on training a few models independently on different subsets of the data, using them to predict a validation set and measuring their performance.
+
Since our data depends on time, we make our splits by removing the last portions of the series and using them as validation sets. This process is implemented in MLForecast.cross_validation.
+
+
fcst = MLForecast(
+ models=lgb.LGBMRegressor(**lgb_params),
+ freq=1,
+ target_transforms=[Differences([24])],
+ lags=[1, 24],
+ lag_transforms={
+1: [expanding_mean],
+24: [(rolling_mean, 48)],
+ },
+ date_features=[hour_index],
+)
+cv_result = fcst.cross_validation(
+ df,
+ n_windows=4, # number of models to train/splits to perform
+ window_size=48, # length of the validation set in each window
+)
+cv_result
You can quickly try different features and evaluate them this way. We can try removing the differencing and using an exponentially weighted average of the lag 1 instead of the expanding mean.
In the same spirit of estimating our model’s performance, LightGBMCV allows us to train a few LightGBM models on different partitions of the data. The main differences with MLForecast.cross_validation are:
+
+
It can only train LightGBM models.
+
It trains all models simultaneously and gives us per-iteration averages of the errors across the complete forecasting window, which allows us to find the best iteration.
As you can see this gives us the error by iteration (controlled by the eval_every argument) and performs early stopping (which can be configured with early_stopping_evals and early_stopping_pct). If you set compute_cv_preds=True the out-of-fold predictions are computed using the best iteration found and are saved in the cv_preds_ attribute.
You can use this class to quickly try different configurations of features and hyperparameters. Once you’ve found a combination that works you can train a model with those features and hyperparameters on all the data by creating an MLForecast object from the LightGBMCV one as follows:
+ Instructions to install the package from different sources.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Released versions
+
+
PyPI
+
+
Latest release
+
To install the latest release of mlforecast from PyPI you just have to run the following in a terminal:
+
pip install mlforecast
+
+
+
Specific version
+
If you want a specific version you can include a filter, for example:
+
+
pip install "mlforecast==0.3.0" to install the 0.3.0 version
+
pip install "mlforecast<0.4.0" to install any version prior to 0.4.0
+
+
+
+
+
Conda
+
+
Latest release
+
The mlforecast package is also published to conda-forge, which you can install by running the following in a terminal:
+
conda install -c conda-forge mlforecast
+
Note that this happens about a day later after it is published to PyPI, so you may have to wait to get the latest release.
+
+
+
Specific version
+
If you want a specific version you can include a filter, for example:
+
+
conda install -c conda-forge "mlforecast==0.3.0" to install the 0.3.0 version
+
conda install -c conda-forge "mlforecast<0.4.0" to install any version prior to 0.4.0
+
+
+
+
+
Distributed training
+
If you want to perform distributed training you can use either dask, ray or spark. Once you know which framework you want to use you can include its extra:
+
+
dask: pip install "mlforecast[dask]"
+
ray: pip install "mlforecast[ray]"
+
spark: pip install "mlforecast[spark]"
+
+
+
+
+
Development version
+
If you want to try out a new feature that hasn’t made it into a release yet you have the following options:
+
+
Install from github: pip install git+https://github.com/Nixtla/mlforecast
+
Clone and install:
+
+
git clone https://github.com/Nixtla/mlforecast
+
pip install mlforecast
+
+
+
which will install the version from the current main branch.
+ Minimal example of distributed training with MLForecast
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Main concepts
+
The main component for distributed training with mlforecast is the DistributedMLForecast class, which abstracts away:
+
+
Feature engineering and model training through DistributedMLForecast.fit
+
Feature updates and multi step ahead predictions through DistributedMLForecast.predict
+
+
+
+
Setup
+
In order to perform distributed training you need a dask cluster. In this example we’ll use a local cluster but you can replace it with any other type of remote cluster and the processing will take place there.
+
+
from dask.distributed import Client, LocalCluster
+
+
+
cluster = LocalCluster(n_workers=2, threads_per_worker=1) # change this to use a remote cluster
+client = Client(cluster)
+
+
+
+
Data format
+
The data is expected to be a dask dataframe in long format, that is, each row represents an observation of a single serie at a given time, with at least three columns:
+
+
id_col: column that identifies each serie.
+
target_col: column that has the series values at each timestamp.
+
time_col: column that contains the time the series value was observed. These are usually timestamps, but can also be consecutive integers.
+
+
You need to make sure that each serie is only in a single partition. You can do so by setting the id_col as the index in dask or with repartitionByRange in spark.
+
Here we present an example with synthetic data.
+
+
import dask.dataframe as dd
+from mlforecast.utils import generate_daily_series
+
+
+
series = generate_daily_series(100, with_trend=True)
+series
+
+
+
+
+
+
+
+
+
unique_id
+
ds
+
y
+
+
+
+
+
0
+
id_00
+
2000-01-01
+
0.497650
+
+
+
1
+
id_00
+
2000-01-02
+
1.554489
+
+
+
2
+
id_00
+
2000-01-03
+
2.734311
+
+
+
3
+
id_00
+
2000-01-04
+
4.028039
+
+
+
4
+
id_00
+
2000-01-05
+
5.366009
+
+
+
...
+
...
+
...
+
...
+
+
+
26998
+
id_99
+
2000-06-25
+
34.165302
+
+
+
26999
+
id_99
+
2000-06-26
+
28.277320
+
+
+
27000
+
id_99
+
2000-06-27
+
29.450129
+
+
+
27001
+
id_99
+
2000-06-28
+
30.241885
+
+
+
27002
+
id_99
+
2000-06-29
+
31.576907
+
+
+
+
+
27003 rows × 3 columns
+
+
+
+
Here we can see that the index goes from id_00 to id_99, which means we have 100 different series stacked together.
+
We also have the ds column that contains the timestamps, in this case with a daily frequency, and the y column that contains the series values in each timestamp.
+
In order to perform distributed processing and training we need to have these in a dask dataframe, this is typically done loading them directly in a distributed way, for example with dd.read_parquet.
+
+
series_ddf = dd.from_pandas(series.set_index('unique_id'), npartitions=2) # make sure we split by id
+series_ddf = series_ddf.map_partitions(lambda part: part.reset_index()) # we can't have an index
+series_ddf['unique_id'] = series_ddf['unique_id'].astype('str') # categoricals aren't supported at the moment
+series_ddf
+
+
Dask DataFrame Structure:
+
+
+
+
+
+
+
+
unique_id
+
ds
+
y
+
+
+
npartitions=2
+
+
+
+
+
+
+
+
id_00
+
object
+
datetime64[ns]
+
float64
+
+
+
id_49
+
...
+
...
+
...
+
+
+
id_99
+
...
+
...
+
...
+
+
+
+
+
+
Dask Name: assign, 5 graph layers
+
+
+
We now have a dask dataframe with two partitions which will be processed independently in each machine and their outputs will be combined to perform distributed training.
We can see that the series have a clear trend, so we can take the first difference, i.e. take each value and subtract the value at the previous month. This can be achieved by passing an mlforecast.target_transforms.Differences([1]) instance to target_transforms.
+
We can then train a LightGBM model using the value from the same day of the week at the previous week (lag 7) as a feature, this is done by passing lags=[7].
The main component of mlforecast is the MLForecast class, which abstracts away:
+
+
Feature engineering and model training through MLForecast.fit
+
Feature updates and multi step ahead predictions through MLForecast.predict
+
+
+
+
Data format
+
The data is expected to be a pandas dataframe in long format, that is, each row represents an observation of a single serie at a given time, with at least three columns:
+
+
id_col: column that identifies each serie.
+
target_col: column that has the series values at each timestamp.
+
time_col: column that contains the time the series value was observed. These are usually timestamps, but can also be consecutive integers.
+
+
Here we present an example using the classic Box & Jenkins airline data, which measures monthly totals of international airline passengers from 1949 to 1960. Source: Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976) Time Series Analysis, Forecasting and Control. Third Edition. Holden-Day. Series G.
+
+
import pandas as pd
+from utilsforecast.plotting import plot_series
Here the unique_id column has the same value for all rows because this is a single time series, you can have multiple time series by stacking them together and having a column that differentiates them.
+
We also have the ds column that contains the timestamps, in this case with a monthly frequency, and the y column that contains the series values in each timestamp.
+
+
+
Modeling
+
+
fig = plot_series(df)
+
+
+
We can see that the serie has a clear trend, so we can take the first difference, i.e. take each value and subtract the value at the previous month. This can be achieved by passing an mlforecast.target_transforms.Differences([1]) instance to target_transforms.
+
We can then train a linear regression using the value from the same month at the previous year (lag 12) as a feature, this is done by passing lags=[12].
fcst = MLForecast(
+ models=LinearRegression(),
+ freq='MS', # our serie has a monthly frequency
+ lags=[12],
+ target_transforms=[Differences([1])],
+)
+fcst.fit(df)
+ In this example, we’ll implement time series cross-validation to evaluate model’s performance.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Prerequesites
+
+
+
+
+
+
This tutorial assumes basic familiarity with MLForecast. For a minimal example visit the Quick Start
+
+
+
+
+
Introduction
+
Time series cross-validation is a method for evaluating how a model would have performed in the past. It works by defining a sliding window across the historical data and predicting the period following it.
+
+
MLForecast has an implementation of time series cross-validation that is fast and easy to use. This implementation makes cross-validation a efficient operation, which makes it less time-consuming. In this notebook, we’ll use it on a subset of the M4 Competition hourly dataset.
+
Outline:
+
+
Install libraries
+
Load and explore data
+
Train model
+
Perform time series cross-validation
+
Evaluate results
+
+
+
+
+
+
+
+Tip
+
+
+
+
You can use Colab to run this Notebook interactively
+
+
+
+
+
Install libraries
+
We assume that you have MLForecast already installed. If not, check this guide for instructions on how to install MLForecast.
+
Install the necessary packages with pip install mlforecast.
+
+
# pip install mlforecast lightgbm
+
+
+
import pandas as pd
+
+from utilsforecast.plotting import plot_series
+
+from mlforecast import MLForecast # required to instantiate MLForecast object and use cross-validation method
+
+
+
+
Load and explore the data
+
As stated in the introduction, we’ll use the M4 Competition hourly dataset. We’ll first import the data from an URL using pandas.
+
+
Y_df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/m4-hourly.csv') # load the data
+Y_df.head()
+
+
+
+
+
+
+
+
+
unique_id
+
ds
+
y
+
+
+
+
+
0
+
H1
+
1
+
605.0
+
+
+
1
+
H1
+
2
+
586.0
+
+
+
2
+
H1
+
3
+
586.0
+
+
+
3
+
H1
+
4
+
559.0
+
+
+
4
+
H1
+
5
+
511.0
+
+
+
+
+
+
+
+
The input to MLForecast is a data frame in long format with three columns: unique_id, ds and y:
+
+
The unique_id (string, int, or category) represents an identifier for the series.
+
The ds (datestamp or int) column should be either an integer indexing time or a datestamp in format YYYY-MM-DD or YYYY-MM-DD HH:MM:SS.
+
The y (numeric) represents the measurement we wish to forecast.
+
+
The data in this example already has this format, so no changes are needed.
+
We can plot the time series we’ll work with using the following function.
Notice that in each cutoff period, we generated a forecast for the next 24 hours using only the data y before said period.
+
+
+
Evaluate results
+
We can now compute the accuracy of the forecast using an appropiate accuracy metric. Here we’ll use the Root Mean Squared Error (RMSE). To do this, we can use utilsforecast, a Python library developed by Nixtla that includes a function to compute the RMSE.
+
+
from utilsforecast.losses import rmse
+
+
We’ll compute the rmse per time series and cutoff. To do this we’ll concatenate the id and the cutoff columns, then we will take the mean of the results.
+ Define your own functions to be used as date features
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
from mlforecast import MLForecast
+from mlforecast.utils import generate_daily_series
+
+
The date_features argument of MLForecast can take pandas date attributes as well as functions that take a pandas DatetimeIndex and return a numeric value. The name of the function is used as the name of the feature, so please use unique and descriptive names.
+
+
series = generate_daily_series(1, min_length=6, max_length=6)
+
+
+
def even_day(dates):
+"""Day of month is even"""
+return dates.day %2==0
+
+def month_start_or_end(dates):
+"""Date is month start or month end"""
+return dates.is_month_start | dates.is_month_end
+
+def is_monday(dates):
+"""Date is monday"""
+return dates.dayofweek ==0
series = generate_daily_series(
+100, equal_ends=True, n_static_features=2
+).rename(columns={'static_1': 'product_id'})
+series.head()
+
+
+
+
+
+
+
+
+
unique_id
+
ds
+
y
+
static_0
+
product_id
+
+
+
+
+
0
+
id_00
+
2000-10-05
+
39.811983
+
79
+
45
+
+
+
1
+
id_00
+
2000-10-06
+
103.274013
+
79
+
45
+
+
+
2
+
id_00
+
2000-10-07
+
176.574744
+
79
+
45
+
+
+
3
+
id_00
+
2000-10-08
+
258.987900
+
79
+
45
+
+
+
4
+
id_00
+
2000-10-09
+
344.940404
+
79
+
45
+
+
+
+
+
+
+
+
In mlforecast the required columns are the series identifier, time and target. Any extra columns you have, like static_0 and product_id here are considered to be static and are replicated when constructing the features for the next timestamp. You can disable this by passing static_features to MLForecast.preprocess or MLForecast.fit, which will only keep the columns you define there as static. Keep in mind that all features in your input dataframe will be used for training, so you’ll have to provide the future values of exogenous features to MLForecast.predict through the X_df argument.
+
Consider the following example. Suppose that we have a prices catalog for each id and date.
This dataframe will be passed to MLForecast.fit (or MLForecast.preprocess). However, since the price is dynamic we have to tell that method that only static_0 and product_id are static.
+ Train one model to predict each step of the forecasting horizon
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
By default mlforecast uses the recursive strategy, i.e. a model is trained to predict the next value and if we’re predicting several values we do it one at a time and then use the model’s predictions as the new target, recompute the features and predict the next step.
+
There’s another approach where if we want to predict 10 steps ahead we train 10 different models, where each model is trained to predict the value at each specific step, i.e. one model predicts the next value, another one predicts the value two steps ahead and so on. This can be very time consuming but can also provide better results. If you want to use this approach you can specify max_horizon in MLForecast.fit, which will train that many models and each model will predict its corresponding horizon when you call MLForecast.predict.
def avg_smape(df):
+"""Computes the SMAPE by serie and then averages it across all series."""
+ full = df.merge(valid)
+return (
+ evaluate(full, metrics=[smape])
+ .drop(columns='metric')
+ .set_index('unique_id')
+ .squeeze()
+ )
horizon =24
+# the following will train 24 models, one for each horizon
+individual_fcst = fcst.fit(train, max_horizon=horizon)
+individual_preds = individual_fcst.predict(horizon)
+avg_smape_individual = avg_smape(individual_preds).rename('individual')
+# the following will train a single model and use the recursive strategy
+recursive_fcst = fcst.fit(train)
+recursive_preds = recursive_fcst.predict(horizon)
+avg_smape_recursive = avg_smape(recursive_preds).rename('recursive')
+# results
+print('Average SMAPE per method and serie')
+avg_smape_individual.to_frame().join(avg_smape_recursive).applymap('{:.1%}'.format)
+ Get access to the input features and predictions in each forecasting horizon
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
If you want to do something to the input before predicting or something to the output before it gets used to update the target (and thus the next features that rely on lags), you can pass a function to run at any of these times.
Saving the features that are sent as input to the model in each timestamp can be helpful, for example to estimate SHAP values. This can be easily achieved with the SaveFeatures callback.
Once we’ve called predict we can just retrieve the features.
+
+
save_features_cbk.get_features()
+
+
+
+
+
+
+
+
+
unique_id
+
lag1
+
+
+
+
+
0
+
id_0
+
4.155930
+
+
+
0
+
id_0
+
5.281643
+
+
+
+
+
+
+
+
+
+
+
After predicting
+
When predicting with the recursive strategy (the default) the predictions for each timestamp are used to update the target and recompute the features. If you want to do something to these predictions before that happens you can use the after_predict_callback argument of MLForecast.predict.
+
+
Increasing predictions values
+
Suppose we know that our model always underestimates and we want to prevent that from happening by making our predictions 10% higher. We can achieve that with the following:
+
+
def increase_predictions(predictions):
+"""Increases all predictions by 10%"""
+return1.1* predictions
By default all series seen during training will be forecasted with the predict method. If you’re only interested in predicting a couple of them you can use the ids argument.
+
+
fcst.predict(1, ids=['id_0', 'id_4'])
+
+
+
+
+
+
+
+
+
unique_id
+
ds
+
lgb
+
+
+
+
+
0
+
id_0
+
2000-08-10
+
3.728396
+
+
+
1
+
id_4
+
2001-01-08
+
3.331394
+
+
+
+
+
+
+
+
Note that the ids must’ve been seen during training, if you try to predict an id that wasn’t there you’ll get an error.
+ In this example, we’ll implement prediction intervals
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Prerequesites
+
+
+
+
+
+
This tutorial assumes basic familiarity with MLForecast. For a minimal example visit the Quick Start
+
+
+
+
+
Introduction
+
When we generate a forecast, we usually produce a single value known as the point forecast. This value, however, doesn’t tell us anything about the uncertainty associated with the forecast. To have a measure of this uncertainty, we need prediction intervals.
+
A prediction interval is a range of values that the forecast can take with a given probability. Hence, a 95% prediction interval should contain a range of values that include the actual future value with probability 95%. Probabilistic forecasting aims to generate the full forecast distribution. Point forecasting, on the other hand, usually returns the mean or the median or said distribution. However, in real-world scenarios, it is better to forecast not only the most probable future outcome, but many alternative outcomes as well.
+
With MLForecast you can train sklearn models to generate point forecasts. It also takes the advantages of ConformalPrediction to generate the same point forecasts and adds them prediction intervals. By the end of this tutorial, you’ll have a good understanding of how to add probabilistic intervals to sklearn models for time series forecasting. Furthermore, you’ll also learn how to generate plots with the historical data, the point forecasts, and the prediction intervals.
+
+
+
+
+
+
+Important
+
+
+
+
Although the terms are often confused, prediction intervals are not the same as confidence intervals.
+
+
+
+
+
+
+
+
+Warning
+
+
+
+
In practice, most prediction intervals are too narrow since models do not account for all sources of uncertainty. A discussion about this can be found here.
+
+
+
Outline:
+
+
Install libraries
+
Load and explore the data
+
Train models
+
Plot prediction intervals
+
+
+
+
+
+
+
+Tip
+
+
+
+
You can use Colab to run this Notebook interactively
+
+
+
+
+
Install libraries
+
Install the necessary packages using pip install mlforecast utilsforecast
+
+
+
Load and explore the data
+
For this example, we’ll use the hourly dataset from the M4 Competition. We first need to download the data from a URL and then load it as a pandas dataframe. Notice that we’ll load the train and the test data separately. We’ll also rename the y column of the test data as y_test.
+
+
import pandas as pd
+from utilsforecast.plotting import plot_series
Since the goal of this notebook is to generate prediction intervals, we’ll only use the first 8 series of the dataset to reduce the total computational time.
+
+
n_series =8
+uids = train['unique_id'].unique()[:n_series] # select first n_series of the dataset
+train = train.query('unique_id in @uids')
+test = test.query('unique_id in @uids')
+
+
We can plot these series using the plot_series function from the utilsforecast library. This function has multiple parameters, and the required ones to generate the plots in this notebook are explained below.
+
+
df: A pandas dataframe with columns [unique_id, ds, y].
+
forecasts_df: A pandas dataframe with columns [unique_id, ds] and models.
+
plot_random: bool = True. Plots the time series randomly.
+
models: List[str]. A list with the models we want to plot.
+
level: List[float]. A list with the prediction intervals we want to plot.
+
engine: str = matplotlib. It can also be plotly. plotly generates interactive plots, while matplotlib generates static plots.
# Create a list of models and instantiation parameters
+models = [
+ KNeighborsRegressor(),
+ Lasso(),
+ LinearRegression(),
+ MLPRegressor(),
+ Ridge(),
+]
+
+
To instantiate a new MLForecast object, we need the following parameters:
+
+
models: The list of models defined in the previous step.
+
+
target_transforms: Transformations to apply to the target before computing the features. These are restored at the forecasting step.
+
lags: Lags of the target to use as features.
+
+
+
mlf = MLForecast(
+ models=[Ridge(), Lasso(), LinearRegression(), KNeighborsRegressor(), MLPRegressor(random_state=0)],
+ target_transforms=[Differences([1])],
+ lags=[24* (i+1) for i inrange(7)],
+)
+
+
Now we’re ready to generate the point forecasts and the prediction intervals. To do this, we’ll use the fit method, which takes the following arguments:
+
+
data: Series data in long format.
+
id_col: Column that identifies each series. In our case, unique_id.
+
time_col: Column that identifies each timestep, its values can be timestamps or integers. In our case, ds.
+
target_col: Column that contains the target. In our case, y.
+
prediction_intervals: A PredicitonIntervals class. The class takes two parameters: n_windows and h. n_windows represents the number of cross-validation windows used to calibrate the intervals and h is the forecast horizon. The strategy will adjust the intervals for each horizon step, resulting in different widths for each step.
After fitting the models, we will call the predict method to generate forecasts with prediction intervals. The method takes the following arguments:
+
+
horizon: An integer that represent the forecasting horizon. In this case, we’ll forecast the next 48 hours.
+
level: A list of floats with the confidence levels of the prediction intervals. For example, level=[95] means that the range of values should include the actual future value with probability 95%.
test = test.merge(forecasts, how='left', on=['unique_id', 'ds'])
+
+
+
+
Plot prediction intervals
+
To plot the point and the prediction intervals, we’ll use the plot_series function again. Notice that now we also need to specify the model and the levels that we want to plot.
From these plots, we can conclude that the uncertainty around each forecast varies according to the model that is being used. For the same time series, one model can predict a wider range of possible future values than others.
Since mlforecast uses a single global model it can be helpful to apply some transformations to the target to ensure that all series have similar distributions. They can also help remove trend for models that can’t deal with it out of the box.
+
+
Data setup
+
For this example we’ll use a single serie from the M4 dataset.
+
+
import matplotlib.pyplot as plt
+import numpy as np
+import pandas as pd
+from datasetsforecast.m4 import M4
+from sklearn.base import BaseEstimator
+
+from mlforecast import MLForecast
+from mlforecast.target_transforms import Differences, LocalStandardScaler
We see that our serie is random noise now. Suppose we also want to standardize it, i.e. make it have a mean of 0 and variance of 1. We can add the LocalStandardScaler transformation after these differences.
Now that we’ve captured the components of the serie (trend + seasonality), we could try forecasting it with a model that always predicts 0, which will basically project the trend and seasonality.
There are some transformations that don’t require to learn any parameters, such as applying logarithm for example. These can be easily defined using the GlobalSklearnTransformer, which takes a scikit-learn compatible transformer and applies it to all series. Here’s an example on how to define a transformation that applies logarithm to each value of the series + 1, which can help avoid computing the log of 0.
In order to implement your own target transformation you have to define a class that inherits from mlforecast.target_transforms.BaseTargetTransform (this takes care of setting the column names as the id_col, time_col and target_col attributes) and implement the fit_transform and inverse_transform methods. Here’s an example on how to define a min-max scaler.
+
+
from mlforecast.target_transforms import BaseTargetTransform
+
+
+
class LocalMinMaxScaler(BaseTargetTransform):
+"""Scales each serie to be in the [0, 1] interval."""
+def fit_transform(self, df: pd.DataFrame) -> pd.DataFrame:
+self.stats_ = df.groupby(self.id_col)[self.target_col].agg(['min', 'max'])
+ df = df.merge(self.stats_, on=self.id_col)
+ df[self.target_col] = (df[self.target_col] - df['min']) / (df['max'] - df['min'])
+ df = df.drop(columns=['min', 'max'])
+return df
+
+def inverse_transform(self, df: pd.DataFrame) -> pd.DataFrame:
+ df = df.merge(self.stats_, on=self.id_col)
+for col in df.columns.drop([self.id_col, self.time_col, 'min', 'max']):
+ df[col] = df[col] * (df['max'] - df['min']) + df['min']
+ df = df.drop(columns=['min', 'max'])
+return df
+
+
And now you can pass an instance of this class to the target_transforms argument.
Transfer learning refers to the process of pre-training a flexible model on a large dataset and using it later on other data with little to no training. It is one of the most outstanding 🚀 achievements in Machine Learning and has many practical applications.
+
For time series forecasting, the technique allows you to get lightning-fast predictions ⚡ bypassing the tradeoff between accuracy and speed (more than 30 times faster than our already fast AutoARIMA for a similar accuracy).
+
This notebook shows how to generate a pre-trained model to forecast new time series never seen by the model.
+
Table of Contents
+
+
Installing MLForecast
+
Load M3 Monthly Data
+
Instantiate NeuralForecast core, Fit, and save
+
Use the pre-trained model to predict on AirPassengers
The M3 class will automatically download the complete M3 dataset and process it.
+
It return three Dataframes: Y_df contains the values for the target variables, X_df contains exogenous calendar features and S_df contains static features for each time-series. For this example we will only use Y_df.
+
If you want to use your own data just replace Y_df. Be sure to use a long format and have a simmilar structure than our data set.
In this tutorial we are only using 1_000 series to speed up computations. Remove the filter to use the whole dataset.
+
+
fig = plot_series(Y_df_M3)
+
+
+
+
+
Model Training
+
Using the MLForecast.fit method you can train a set of models to your dataset. You can modify the hyperparameters of the model to get a better accuracy, in this case we will use the default hyperparameters of lgb.LGBMRegressor.
+
+
models = [lgb.LGBMRegressor(verbosity=-1)]
+
+
The MLForecast object has the following parameters:
+
+
models: a list of sklearn-like (fit and predict) models.
Now we can transfer the trained model to forecast AirPassengers with the MLForecast.predict method, we just have to pass the new dataframe to the new_data argument.
+
+
Y_df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/air-passengers.csv', parse_dates=['ds'])
+
+# We define the train df.
+Y_train_df = Y_df[Y_df.ds<='1959-12-31'] # 132 train
+Y_test_df = Y_df[Y_df.ds>'1959-12-31'] # 12 test
+ In this example we will show how to perform electricity load forecasting using MLForecast alongside many models. We also compare them against the prophet library.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Introduction
+
Some time series are generated from very low frequency data. These data generally exhibit multiple seasonalities. For example, hourly data may exhibit repeated patterns every hour (every 24 observations) or every day (every 24 * 7, hours per day, observations). This is the case for electricity load. Electricity load may vary hourly, e.g., during the evenings electricity consumption may be expected to increase. But also, the electricity load varies by week. Perhaps on weekends there is an increase in electrical activity.
+
In this example we will show how to model the two seasonalities of the time series to generate accurate forecasts in a short time. We will use hourly PJM electricity load data. The original data can be found here.
+
+
+
Libraries
+
In this example we will use the following libraries:
+
+
mlforecast. Accurate and ⚡️ fast forecasting withc lassical machine learning models.
PJM Interconnection LLC (PJM) is a regional transmission organization (RTO) in the United States. It is part of the Eastern Interconnection grid operating an electric transmission system serving all or parts of Delaware, Illinois, Indiana, Kentucky, Maryland, Michigan, New Jersey, North Carolina, Ohio, Pennsylvania, Tennessee, Virginia, West Virginia, and the District of Columbia. The hourly power consumption data comes from PJM’s website and are in megawatts (MW).
+
+
Let’s take a look to the data.
+
+
import matplotlib.pyplot as plt
+import numpy as np
+import pandas as pd
+from utilsforecast.plotting import plot_series
data_url ='https://raw.githubusercontent.com/panambY/Hourly_Energy_Consumption/master/data/PJM_Load_hourly.csv'
+df = pd.read_csv(data_url, parse_dates=['Datetime'])
+df.columns = ['ds', 'y']
+df.insert(0, 'unique_id', 'PJM_Load_hourly')
+df['ds'] = pd.to_datetime(df['ds'])
+df = df.sort_values(['unique_id', 'ds']).reset_index(drop=True)
+print(f'Shape of the data {df.shape}')
+df.tail()
+
+
Shape of the data (32896, 3)
+
+
+
+
+
+
+
+
+
+
unique_id
+
ds
+
y
+
+
+
+
+
32891
+
PJM_Load_hourly
+
2001-12-31 20:00:00
+
36392.0
+
+
+
32892
+
PJM_Load_hourly
+
2001-12-31 21:00:00
+
35082.0
+
+
+
32893
+
PJM_Load_hourly
+
2001-12-31 22:00:00
+
33890.0
+
+
+
32894
+
PJM_Load_hourly
+
2001-12-31 23:00:00
+
32590.0
+
+
+
32895
+
PJM_Load_hourly
+
2002-01-01 00:00:00
+
31569.0
+
+
+
+
+
+
+
+
+
fig = plot_series(df)
+
+
+
We clearly observe that the time series exhibits seasonal patterns. Moreover, the time series contains 32,896 observations, so it is necessary to use very computationally efficient methods to display them in production.
+
We are going to split our series in order to create a train and test set. The model will be tested using the last 24 hours of the timeseries.
First we must visualize the seasonalities of the model. As mentioned before, the electricity load presents seasonalities every 24 hours (Hourly) and every 24 * 7 (Daily) hours. Therefore, we will use [24, 24 * 7] as the seasonalities for the model. In order to analize how they affect our series we are going to use the Difference method.
+
+
from mlforecast import MLForecast
+from mlforecast.target_transforms import Differences
+
+
We can use the MLForecast.preprocess method to explore different transformations. It looks like these series have a strong seasonality on the hour of the day, so we can subtract the value from the same hour in the previous day to remove it. This can be done with the mlforecast.target_transforms.Differences transformer, which we pass through target_transforms.
+
In order to analize the trends individually and combined we are going to plot them individually and combined. Therefore, we can compare them against the original series. We can use the next function for that.
Since the seasonalities are present at 24 hours (daily) and 24*7 (weekly) we are going to substract them from the serie using Differences([24, 24*7]) and plot them.
As we can see when we extract the 24 difference (daily) in PJM_Load_hourly_24 the series seem to stabilize sisnce the peaks seem more uniform in comparison with the original series PJM_Load_hourly.
+
When we extrac the 24*7 (weekly) PJM_Load_hourly_168 difference we can see there is more periodicity in the peaks in comparison with the original series.
+
Finally we can see the result from the combined result from substracting all the differences PJM_Load_hourly_all_diff.
+
For modeling we are going to use both difference for the forecasting, therefore we are setting the argument target_transforms from the MLForecast object equal to [Differences([24, 24*7])], if we wanted to include a yearly difference we would need to add the term 24*365.
+
+
fcst = MLForecast(
+ models=[], # we're not interested in modeling yet
+ freq='H', # our series have hourly frequency
+ target_transforms=[Differences([24, 24*7])],
+)
+prep = fcst.preprocess(df_train)
+prep
+
+
+
+
+
+
+
+
+
unique_id
+
ds
+
y
+
+
+
+
+
192
+
PJM_Load_hourly
+
1998-04-09 02:00:00
+
831.0
+
+
+
193
+
PJM_Load_hourly
+
1998-04-09 03:00:00
+
918.0
+
+
+
194
+
PJM_Load_hourly
+
1998-04-09 04:00:00
+
760.0
+
+
+
195
+
PJM_Load_hourly
+
1998-04-09 05:00:00
+
849.0
+
+
+
196
+
PJM_Load_hourly
+
1998-04-09 06:00:00
+
710.0
+
+
+
...
+
...
+
...
+
...
+
+
+
32867
+
PJM_Load_hourly
+
2001-12-30 20:00:00
+
3417.0
+
+
+
32868
+
PJM_Load_hourly
+
2001-12-30 21:00:00
+
3596.0
+
+
+
32869
+
PJM_Load_hourly
+
2001-12-30 22:00:00
+
3501.0
+
+
+
32870
+
PJM_Load_hourly
+
2001-12-30 23:00:00
+
3939.0
+
+
+
32871
+
PJM_Load_hourly
+
2001-12-31 00:00:00
+
4235.0
+
+
+
+
+
32680 rows × 3 columns
+
+
+
+
+
fig = plot_series(prep)
+
+
+
+
+
Model Selection with Cross-Validation
+
We can test many models simoultaneously using MLForecast cross_validation. We can import lightgbm and scikit-learn models and try different combinations of them, alongside different target transformations (as the ones we created previously) and historical variables.
+You can see an in-depth tutorial on how to use MLForecastCross Validation methods here
We can create a benchmark Naive model that uses the electricity load of the last hour as prediction lag1 as showed in the next cell. You can create your own models and try them with MLForecast using the same structure.
Now let’s try differen models from the scikit-learn library: Lasso, LinearRegression, Ridge, KNN, MLP and Random Forest alongside the LightGBM. You can add any model to the dictionary to train and compare them by adding them to the dictionary (models) as shown.
The we can instanciate the MLForecast class with the models we want to try along side target_transforms, lags, lag_transforms, and date_features. All this features are applied to the models we selected.
+
In this case we use the 1st, 12th and 24th lag, which are passed as a list. Potentially you could pass a range.
+
lags=[1,12,24]
+
Lag transforms are defined as a dictionary where the keys are the lags and the values are lists of functions that transform an array. These must be numba jitted functions (so that computing the features doesn’t become a bottleneck). There are some implemented in the window-ops package but you can also implement your own.
+For this example we applied an expanding mean to the first lag, and a rolling mean to the 24th lag.
For using the date features you need to be sure that your time column is made of timestamps. Then it might make sense to extract features like week, dayofweek, quarter, etc. You can do that by passing a list of strings with pandas time/date components. You can also pass functions that will take the time column as input, as we’ll show here.
+Here we add month, hour and dayofweek features:
+
date_features=['month', 'hour', 'dayofweek']
+
+
+
mlf = MLForecast(
+ models = models,
+ freq='H', # our series have hourly frequency
+ target_transforms=[Differences([24, 24*7])],
+ lags=[1,12,24], # Lags to be used as features
+ lag_transforms={
+1: [expanding_mean],
+24: [(rolling_mean, 48)],
+ },
+ date_features=['month', 'hour', 'dayofweek']
+)
+
+
Now we use the cross_validation method to train and evalaute the models. + df: Receives the training data + h: Forecast horizon + n_windows: The number of folds we want to predict
+
You can specify the names of the time series id, time and target columns. + id_col:Column that identifies each serie ( Default unique_id ) + time_col: Column that identifies each timestep, its values can be timestamps or integer( Default ds ) + target_col:Column that contains the target ( Default y )
Visually examining the forecasts can give us some idea of how the model is behaving, yet in order to asses the performace we need to evaluate them trough metrics. For that we use the utilsforecast library that contains many useful metrics and an evaluate function.
+
+
from utilsforecast.losses import*
+from utilsforecast.evaluation import evaluate
+
+
+
# Metrics to be used for evaluation
+metrics = [
+ mae,
+ rmse,
+ mape,
+ smape
+ ]
We can se that the model lgbm has top performance in most metrics folowed by the lasso regression. Both models perform way better than the naive.
+
+
+
Test Evaluation
+
Now we are going to evaluate their perfonce in the test set. We can use both of them for forecasting the test alongside some prediction intervals. For that we can use the PredictionIntervals function in mlforecast.utils.
+You can see an in-depth tutotorial of Probabilistic Forecasting here
Now we’re ready to generate the point forecasts and the prediction intervals. To do this, we’ll use the fit method, which takes the following arguments:
+
+
df: Series data in long format.
+
id_col: Column that identifies each series. In our case, unique_id.
+
time_col: Column that identifies each timestep, its values can be timestamps or integers. In our case, ds.
+
target_col: Column that contains the target. In our case, y.
+
+
The PredictionIntervals function is used to compute prediction intervals for the models using Conformal Prediction. The function takes the following arguments: + n_windows: represents the number of cross-validation windows used to calibrate the intervals + h: the forecast horizon
Now that the model has been trained we are going to forecast the next 24 hours using the predict method so we can compare them to our test data. Additionally, we are going to create prediction intervals at levels[90,95].
The predict method returns a DataFrame witht the predictions for each model (lasso and lgbm) along side the prediction tresholds. The high-threshold is indicated by the keyword hi, the low-threshold by the keyword lo, and the level by the number in the column names.
+
+
test = df_last_24_hours.merge(forecasts, how='left', on=['unique_id', 'ds'])
+test.head()
+
+
+
+
+
+
+
+
+
unique_id
+
ds
+
y
+
lgbm
+
lasso
+
lgbm-lo-95
+
lgbm-lo-90
+
lgbm-hi-90
+
lgbm-hi-95
+
lasso-lo-95
+
lasso-lo-90
+
lasso-hi-90
+
lasso-hi-95
+
+
+
+
+
0
+
PJM_Load_hourly
+
2001-12-31 01:00:00
+
29001.0
+
28847.573176
+
29124.085976
+
28544.593464
+
28567.603130
+
29127.543222
+
29150.552888
+
28762.752269
+
28772.604275
+
29475.567677
+
29485.419682
+
+
+
1
+
PJM_Load_hourly
+
2001-12-31 02:00:00
+
28138.0
+
27862.589195
+
28365.330749
+
27042.311414
+
27128.839888
+
28596.338503
+
28682.866977
+
27528.548959
+
27619.065224
+
29111.596275
+
29202.112539
+
+
+
2
+
PJM_Load_hourly
+
2001-12-31 03:00:00
+
27830.0
+
27044.418960
+
27712.161676
+
25596.659896
+
25688.230426
+
28400.607493
+
28492.178023
+
26236.955369
+
26338.087102
+
29086.236251
+
29187.367984
+
+
+
3
+
PJM_Load_hourly
+
2001-12-31 04:00:00
+
27874.0
+
26976.104125
+
27661.572733
+
25249.961527
+
25286.024722
+
28666.183529
+
28702.246724
+
25911.133521
+
25959.815715
+
29363.329750
+
29412.011944
+
+
+
4
+
PJM_Load_hourly
+
2001-12-31 05:00:00
+
28427.0
+
26694.246238
+
27393.922370
+
25044.220845
+
25051.548832
+
28336.943644
+
28344.271631
+
25751.547897
+
25762.524815
+
29025.319924
+
29036.296843
+
+
+
+
+
+
+
+
Now we can evaluate the metrics and performance in the test set.
We can see that the lasso regression performed slighty better than the LightGBM for the test set. Additonally, we can also plot the forecasts alongside their prediction intervals. For that we can use the plot_series method available in utilsforecast.plotting.
+
We can plot one or many models at once alongside their coinfidence intervals.
One of the most widely used models for time series forecasting is Prophet. This model is known for its ability to model different seasonalities (weekly, daily yearly). We will use this model as a benchmark to see if the lgbm alongside MLForecast adds value for this time series.
+
+
from prophet import Prophet
+from time import time
print(f'lgbm with MLForecast has a speedup of {time_prophet/time_lgbm:.2f} compared with prophet')
+
+
lgbm with MLForecast has a speedup of 22.21 compared with prophet
+
+
+
We can see that lgbm with MLForecast was able to provide metrics at least twice as good as Prophet as seen in the column improvement above, and way faster.
+ In this example we will show how to perform electricity load forecasting on the ERCOT (Texas) market for detecting daily peaks.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Introduction
+
Predicting peaks in different markets is useful. In the electricity market, consuming electricity at peak demand is penalized with higher tarifs. When an individual or company consumes electricity when its most demanded, regulators calls that a coincident peak (CP).
+
In the Texas electricity market (ERCOT), the peak is the monthly 15-minute interval when the ERCOT Grid is at a point of highest capacity. The peak is caused by all consumers’ combined demand on the electrical grid. The coincident peak demand is an important factor used by ERCOT to determine final electricity consumption bills. ERCOT registers the CP demand of each client for 4 months, between June and September, and uses this to adjust electricity prices. Clients can therefore save on electricity bills by reducing the coincident peak demand.
+
In this example we will train a LightGBM model on historic load data to forecast day-ahead peaks on September 2022. Multiple seasonality is traditionally present in low sampled electricity data. Demand exhibits daily and weekly seasonality, with clear patterns for specific hours of the day such as 6:00pm vs 3:00am or for specific days such as Sunday vs Friday.
+
First, we will load ERCOT historic demand, then we will use the MLForecast.cross_validation method to fit the LightGBM model and forecast daily load during September. Finally, we show how to use the forecasts to detect the coincident peak.
+
Outline
+
+
Install libraries
+
Load and explore the data
+
Fit LightGBM model and forecast
+
Peak detection
+
+
+
+
+
+
+
+Tip
+
+
+
+
You can use Colab to run this Notebook interactively
+
+
+
+
+
Libraries
+
We assume you have MLForecast already installed. Check this guide for instructions on how to install MLForecast.
+
Install the necessary packages using pip install mlforecast.
+
Also we have to install LightGBM using pip install lightgbm.
+
+
+
Load Data
+
The input to MLForecast is always a data frame in long format with three columns: unique_id, ds and y:
+
+
The unique_id (string, int or category) represents an identifier for the series.
+
The ds (datestamp or int) column should be either an integer indexing time or a datestamp ideally like YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp.
+
The y (numeric) represents the measurement we wish to forecast. We will rename the
+
+
First, read the 2022 historic total demand of the ERCOT market. We processed the original data (available here), by adding the missing hour due to daylight saving time, parsing the date to datetime format, and filtering columns of interest.
+
+
import numpy as np
+import pandas as pd
+from utilsforecast.plotting import plot_series
We observe that the time series exhibits seasonal patterns. Moreover, the time series contains 6,552 observations, so it is necessary to use computationally efficient methods to deploy them in production.
+
+
+
Fit and Forecast LightGBM model
+
Import the MLForecast class and the models you need.
target_transforms: Transformations to apply to the target before computing the features. These are restored at the forecasting step.
+
lags: Lags of the target to use as features.
+
+
+
# Instantiate MLForecast class as mlf
+mlf = MLForecast(
+ models=models,
+ freq='H',
+ target_transforms=[Differences([24])],
+ lags=range(1, 25)
+)
+
+
+
+
+
+
+
+Tip
+
+
+
+
In this example, we are only using differences and lags to produce features. See the full documentation to see all available features.
+
+
+
The cross_validation method allows the user to simulate multiple historic forecasts, greatly simplifying pipelines by replacing for loops with fit and predict methods. This method re-trains the model and forecast each window. See this tutorial for an animation of how the windows are defined.
+
Use the cross_validation method to produce all the daily forecasts for September. To produce daily forecasts set the forecasting horizon window_size as 24. In this example we are simulating deploying the pipeline during September, so set the number of windows as 30 (one for each day). Finally, the step size between windows is 24 (equal to the window_size). This ensure to only produce one forecast per day.
+
Additionally,
+
+
id_col: identifies each time series.
+
time_col: indetifies the temporal column of the time series.
When using cross_validation make sure the forecasts are produced at the desired timestamps. Check the cutoff column which specifices the last timestamp before the forecasting window.
+
+
+
+
+
Peak Detection
+
Finally, we use the forecasts in crossvaldation_df to detect the daily hourly demand peaks. For each day, we set the detected peaks as the highest forecasts. In this case, we want to predict one peak (npeaks); depending on your setting and goals, this parameter might change. For example, the number of peaks can correspond to how many hours a battery can be discharged to reduce demand.
+
+
npeaks =1# Number of peaks
+
+
For the ERCOT 4CP detection task we are interested in correctly predicting the highest monthly load. Next, we filter the day in September with the highest hourly demand and predict the peak.
+
+
crossvalidation_df = crossvalidation_df.reset_index()[['ds','y','LGBMRegressor']]
+max_day = crossvalidation_df.iloc[crossvalidation_df['y'].argmax()].ds.day # Day with maximum load
+cv_df_day = crossvalidation_df.query('ds.dt.day == @max_day')
+max_hour = cv_df_day['y'].argmax()
+peaks = cv_df_day['LGBMRegressor'].argsort().iloc[-npeaks:].values # Predicted peaks
+
+
In the following plot we see how the LightGBM model is able to correctly detect the coincident peak for September 2022.
In this example we only include September. However, MLForecast and LightGBM can correctly predict the peaks for the 4 months of 2022. You can try this by increasing the n_windows parameter of cross_validation or filtering the Y_df dataset.
+
+
+
+
+
Next steps
+
MLForecast and LightGBM in particular are good benchmarking models for peak detection. However, it might be useful to explore further and newer forecasting algorithms or perform hyperparameter optimization.
The objective of the following article is to obtain a step-by-step guide on building Prediction intervals in forecasting models using mlforecast.
+
During this walkthrough, we will become familiar with the main MlForecast class and some relevant methods such as MLForecast.fit, MLForecast.predict and MLForecast.cross_validation in other.
The target of our prediction is something unknown (otherwise we wouldn’t be making a prediction), so we can think of it as a random variable. For example, the total sales for the next month could have different possible values, and we won’t know what the exact value will be until we get the actual sales at the end of the month. Until next month’s sales are known, this is a random amount.
+
By the time the next month draws near, we usually have a pretty good idea of possible sales values. However, if we are forecasting sales for the same month next year, the possible values can vary much more. In most forecasting cases, the variability associated with what we are forecasting reduces as we get closer to the event. In other words, the further back in time we make the prediction, the more uncertainty there is.
+
We can imagine many possible future scenarios, each yielding a different value for what we are trying to forecast.
+
When we obtain a forecast, we are estimating the middle of the range of possible values the random variable could take. Often, a forecast is accompanied by a prediction interval giving a range of values the random variable could take with relatively high probability. For example, a 95% prediction interval contains a range of values which should include the actual future value with probability 95%.
+
Rather than plotting individual possible futures , we usually show these prediction intervals instead.
+
When we generate a forecast, we usually produce a single value known as the point forecast. This value, however, doesn’t tell us anything about the uncertainty associated with the forecast. To have a measure of this uncertainty, we need prediction intervals.
+
A prediction interval is a range of values that the forecast can take with a given probability. Hence, a 95% prediction interval should contain a range of values that include the actual future value with probability 95%. Probabilistic forecasting aims to generate the full forecast distribution. Point forecasting, on the other hand, usually returns the mean or the median or said distribution. However, in real-world scenarios, it is better to forecast not only the most probable future outcome, but many alternative outcomes as well.
+
The problem is that some timeseries models provide forecast distributions, but some other ones only provide point forecasts. How can we then estimate the uncertainty of predictions?
+
+
+
Forecasts and prediction intervals
+
There are at least four sources of uncertainty in forecasting using time series models:
+
+
The random error term;
+
The parameter estimates;
+
The choice of model for the historical data;
+
The continuation of the historical data generating process into the future.
+
+
When we produce prediction intervals for time series models, we generally only take into account the first of these sources of uncertainty. It would be possible to account for 2 and 3 using simulations, but that is almost never done because it would take too much time to compute. As computing speeds increase, it might become a viable approach in the future.
+
Even if we ignore the model uncertainty and the DGP uncertainty (sources 3 and 4), and just try to allow for parameter uncertainty as well as the random error term (sources 1 and 2), there are no closed form solutions apart from some simple special cases. see full article Rob J Hyndman
+
+
Forecast distributions
+
We use forecast distributions to express the uncertainty in our predictions. These probability distributions describe the probability of observing different future values using the fitted model. The point forecast corresponds to the mean of this distribution. Most time series models generate forecasts that follow a normal distribution, which implies that we assume that possible future values follow a normal distribution. However, later in this section we will look at some alternatives to normal distributions.
+
+
Importance of Confidence Interval Prediction in Time Series:
+
+
Uncertainty Estimation: The confidence interval provides a measure of the uncertainty associated with time series predictions. It enables variability and the range of possible future values to be quantified, which is essential for making informed decisions.
+
Precision evaluation: By having a confidence interval, the precision of the predictions can be evaluated. If the interval is narrow, it indicates that the forecast is more accurate and reliable. On the other hand, if the interval is wide, it indicates greater uncertainty and less precision in the predictions.
+
Risk management: The confidence interval helps in risk management by providing information about possible future scenarios. It allows identifying the ranges in which the real values could be located and making decisions based on those possible scenarios.
+
Effective communication: The confidence interval is a useful tool for communicating predictions clearly and accurately. It allows the variability and uncertainty associated with the predictions to be conveyed to the stakeholders, avoiding a wrong or overly optimistic interpretation of the results.
+
+
Therefore, confidence interval prediction in time series is essential to understand and manage uncertainty, assess the accuracy of predictions, and make informed decisions based on possible future scenarios.
+
+
+
+
Prediction intervals
+
A prediction interval gives us a range in which we expect \(y_t\) to lie with a specified probability. For example, if we assume that the distribution of future observations follows a normal distribution, a 95% prediction interval for the forecast of step h would be represented by the range
+
\[\hat{y}_{T+h|T} \pm 1.96 \hat\sigma_h,\]
+
Where \(\hat\sigma_h\) is an estimate of the standard deviation of the h -step forecast distribution.
+
More generally, a prediction interval can be written as
+
\[\hat{y}_{T+h|T} \pm c \hat\sigma_h\]
+
In this context, the term “multiplier c” is associated with the probability of coverage. In this article, intervals of 80% and 95% are typically calculated, but any other percentage can be used. The table below shows the values of c corresponding to different coverage probabilities, assuming a normal forecast distribution.
+
+
+
+
Percentage
+
Multiplier
+
+
+
+
+
50
+
0.67
+
+
+
55
+
0.76
+
+
+
60
+
0.84
+
+
+
65
+
0.93
+
+
+
70
+
1.04
+
+
+
75
+
1.15
+
+
+
80
+
1.28
+
+
+
85
+
1.44
+
+
+
90
+
1.64
+
+
+
95
+
1.96
+
+
+
96
+
2.05
+
+
+
97
+
2.17
+
+
+
98
+
2.33
+
+
+
99
+
2.58
+
+
+
+
Prediction intervals are valuable because they reflect the uncertainty in the predictions. If we only generate point forecasts, we cannot assess how accurate those forecasts are. However, by providing prediction intervals, the amount of uncertainty associated with each forecast becomes apparent. For this reason, point forecasts may lack significant value without the inclusion of corresponding forecast intervals.
+
+
+
One-step prediction intervals
+
When making a prediction for a future step, it is possible to estimate the standard deviation of the forecast distribution using the standard deviation of the residuals, which is calculated by
where \(K\) is the number of parameters estimated in the forecasting method, and \(M\) is the number of missing values in the residuals. (For example, \(M=1\) for a naive forecast, because we can’t forecast the first observation.)
+
+
+
Multi-step prediction intervals
+
A typical feature of forecast intervals is that they tend to increase in length as the forecast horizon lengthens. As we move further out in time, there is greater uncertainty associated with the prediction, resulting in wider prediction intervals. In general, σh tends to increase as h increases (although there are some nonlinear forecasting methods that do not follow this property).
+
To generate a prediction interval, it is necessary to have an estimate of σh. As mentioned above, for one-step forecasts (h=1), equation (1) provides a good estimate of the standard deviation of the forecast, σ1. However, for multi-step forecasts, a more complex calculation method is required. These calculations assume that the residuals are uncorrelated with each other.
+
+
+
Benchmark methods
+
For the four benchmark methods, it is possible to mathematically derive the forecast standard deviation under the assumption of uncorrelated residuals. If \(\hat{\sigma}_h\) denotes the standard deviation of the \(h\) -step forecast distribution, and \(\hat{\sigma}\) is the residual standard deviation given by (1), then we can use the expressions shown in next Table. Note that when \(h=1\) and \(T\) is large, these all give the same approximate value \(\hat{\sigma}\).
+
+
+
+
Method
+
h-step forecast standard deviation
+
+
+
+
+
Mean forecasts
+
\(\hat\sigma_h = \hat\sigma\sqrt{1 + 1/T}\)
+
+
+
Naïve forecasts
+
\(\hat\sigma_h = \hat\sigma\sqrt{h}\)
+
+
+
Seasonal naïve forecasts
+
\(\hat\sigma_h = \hat\sigma\sqrt{k+1}\)
+
+
+
Drift forecasts
+
\(\hat\sigma_h = \hat\sigma\sqrt{h(1+h/T)}\)
+
+
+
+
Note that when \(h=1\) and \(T\) is large, these all give the same approximate value \(\hat{\sigma}\).
+
+
+
Prediction intervals from bootstrapped residuals
+
When a normal distribution for the residuals is an unreasonable assumption, one alternative is to use bootstrapping, which only assumes that the residuals are uncorrelated with constant variance. We will illustrate the procedure using a naïve forecasting method.
+
A one-step forecast error is defined as \(e_t = y_t - \hat{y}_{t|t-1}\). For a naïve forecasting method, {t|t-1} = y{t-1}, so we can rewrite this as \[y_t = y_{t-1} + e_t.\]
+
Assuming future errors will be similar to past errors, when \(t>T\) we can replace \(e_{t}\) by sampling from the collection of errors we have seen in the past (i.e., the residuals). So we can simulate the next observation of a time series using
+
\[y^*_{T+1} = y_{T} + e^*_{T+1}\]
+
where \(e^*_{T+1}\) is a randomly sampled error from the past, and \(y^*_{T+1}\) is the possible future value that would arise if that particular error value occurred. We use We use a * to indicate that this is not the observed \(y_{T+1}\) value, but one possible future that could occur. Adding the new simulated observation to our data set, we can repeat the process to obtain
+
\[y^*_{T+2} = y_{T+1}^* + e^*_{T+2},\]
+
where \(e^*_{T+2}\) is another draw from the collection of residuals. Continuing in this way, we can simulate an entire set of future values for our time series.
+
+
+
Conformal Prediction
+
Multi-quantile losses and statistical models can provide provide prediction intervals, but the problem is that these are uncalibrated, meaning that the actual frequency of observations falling within the interval does not align with the confidence level associated with it. For example, a calibrated 95% prediction interval should contain the true value 95% of the time in repeated sampling. An uncalibrated 95% prediction interval, on the other hand, might contain the true value only 80% of the time, or perhaps 99% of the time. In the first case, the interval is too narrow and underestimates the uncertainty, while in the second case, it is too wide and overestimates the uncertainty.
+
Statistical methods also assume normality. Here, we talk about another method called conformal prediction that doesn’t require any distributional assumptions.
+
Conformal prediction intervals use cross-validation on a point forecaster model to generate the intervals. This means that no prior probabilities are needed, and the output is well-calibrated. No additional training is needed, and the model is treated as a black box. The approach is compatible with any model
+
mlforecast now supports Conformal Prediction on all available models.
+
+
+
+
Installing mlforecast
+
+
using pip:
+
+
pip install mlforecast
+
+
using with conda:
+
+
conda install -c conda-forge mlforecast
+
+
+
+
+
Loading libraries and data
+
+
# Handling and processing of Data
+# ==============================================================================
+import numpy as np
+import pandas as pd
+
+import scipy.stats as stats
+
+# Handling and processing of Data for Date (time)
+# ==============================================================================
+import datetime
+import time
+from datetime import datetime, timedelta
+
+#
+# ==============================================================================
+from statsmodels.tsa.stattools import adfuller
+import statsmodels.api as sm
+import statsmodels.tsa.api as smt
+from statsmodels.tsa.seasonal import seasonal_decompose
+#
+# ==============================================================================
+from utilsforecast.plotting import plot_series
Plot some series using the plot method from the StatsForecast class. This method prints 8 random series from the dataset and is useful for basic EDA.
+
+
fig = plot_series(df)
+
+
+
+
The Augmented Dickey-Fuller Test
+
An Augmented Dickey-Fuller (ADF) test is a type of statistical test that determines whether a unit root is present in time series data. Unit roots can cause unpredictable results in time series analysis. A null hypothesis is formed in the unit root test to determine how strongly time series data is affected by a trend. By accepting the null hypothesis, we accept the evidence that the time series data is not stationary. By rejecting the null hypothesis or accepting the alternative hypothesis, we accept the evidence that the time series data is generated by a stationary process. This process is also known as stationary trend. The values of the ADF test statistic are negative. Lower ADF values indicate a stronger rejection of the null hypothesis.
+
Augmented Dickey-Fuller Test is a common statistical test used to test whether a given time series is stationary or not. We can achieve this by defining the null and alternate hypothesis.
+
+
Null Hypothesis: Time Series is non-stationary. It gives a time-dependent trend.
+
Alternate Hypothesis: Time Series is stationary. In another term, the series doesn’t depend on time.
+
ADF or t Statistic < critical values: Reject the null hypothesis, time series is stationary.
+
ADF or t Statistic > critical values: Failed to reject the null hypothesis, time series is non-stationary.
+
+
+
def augmented_dickey_fuller_test(series , column_name):
+print (f'Dickey-Fuller test results for columns: {column_name}')
+ dftest = adfuller(series, autolag='AIC')
+ dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','No Lags Used','Number of observations used'])
+for key,value in dftest[4].items():
+ dfoutput['Critical Value (%s)'%key] = value
+print (dfoutput)
+if dftest[1] <=0.05:
+print("Conclusion:====>")
+print("Reject the null hypothesis")
+print("The data is stationary")
+else:
+print("Conclusion:====>")
+print("The null hypothesis cannot be rejected")
+print("The data is not stationary")
+
+
+
augmented_dickey_fuller_test(df["y"],'Ads')
+
+
Dickey-Fuller test results for columns: Ads
+Test Statistic -1.076452e+01
+p-value 2.472132e-19
+No Lags Used 3.900000e+01
+Number of observations used 1.028000e+04
+Critical Value (1%) -3.430986e+00
+Critical Value (5%) -2.861821e+00
+Critical Value (10%) -2.566920e+00
+dtype: float64
+Conclusion:====>
+Reject the null hypothesis
+The data is stationary
+
+
+
+
+
Autocorrelation plots
+
+
Autocorrelation Function
+
Definition 1. Let \(\{x_t;1 ≤ t ≤ n\}\) be a time series sample of size n from \(\{X_t\}\). 1. \(\bar x = \sum_{t=1}^n \frac{x_t}{n}\) is called the sample mean of \(\{X_t\}\). 2. \(c_k =\sum_{t=1}^{n−k} (x_{t+k}- \bar x)(x_t−\bar x)/n\) is known as the sample autocovariance function of \(\{X_t\}\). 3. \(r_k = c_k /c_0\) is said to be the sample autocorrelation function of \(\{X_t\}\).
+
Note the following remarks about this definition:
+
+
Like most literature, this guide uses ACF to denote the sample autocorrelation function as well as the autocorrelation function. What is denoted by ACF can easily be identified in context.
+
Clearly c0 is the sample variance of \(\{X_t\}\). Besides, \(r_0 = c_0/c_0 = 1\) and for any integer \(k, |r_k| ≤ 1\).
+
When we compute the ACF of any sample series with a fixed length \(n\), we cannot put too much confidence in the values of \(r_k\) for large k’s, since fewer pairs of \((x_{t +k }, x_t )\) are available for calculating \(r_k\) as \(k\) is large. One rule of thumb is not to estimate \(r_k\) for \(k > n/3\), and another is \(n ≥ 50, k ≤ n/4\). In any case, it is always a good idea to be careful.
+
We also compute the ACF of a nonstationary time series sample by Definition 1. In this case, however, the ACF or \(r_k\) very slowly or hardly tapers off as \(k\) increases.
+
Plotting the ACF \((r_k)\) against lag \(k\) is easy but very helpful in analyzing time series sample. Such an ACF plot is known as a correlogram.
+
If \(\{X_t\}\) is stationary with \(E(X_t)=0\) and \(\rho_k =0\) for all \(k \neq 0\),thatis,itisa white noise series, then the sampling distribution of \(r_k\) is asymptotically normal with the mean 0 and the variance of \(1/n\). Hence, there is about 95% chance that \(r_k\) falls in the interval \([−1.96/√n, 1.96/√n]\).
+
+
Now we can give a summary that (1) if the time series plot of a time series clearly shows a trend or/and seasonality, it is surely nonstationary; (2) if the ACF \(r_k\) very slowly or hardly tapers off as lag \(k\) increases, the time series should also be nonstationary.
In time series analysis to forecast new values, it is very important to know past data. More formally, we can say that it is very important to know the patterns that values follow over time. There can be many reasons that cause our forecast values to fall in the wrong direction. Basically, a time series consists of four components. The variation of those components causes the change in the pattern of the time series. These components are:
+
+
Level: This is the primary value that averages over time.
+
Trend: The trend is the value that causes increasing or decreasing patterns in a time series.
+
Seasonality: This is a cyclical event that occurs in a time series for a short time and causes short-term increasing or decreasing patterns in a time series.
+
Residual/Noise: These are the random variations in the time series.
+
+
Combining these components over time leads to the formation of a time series. Most time series consist of level and noise/residual and trend or seasonality are optional values.
+
If seasonality and trend are part of the time series, then there will be effects on the forecast value. As the pattern of the forecasted time series may be different from the previous time series.
+
The combination of the components in time series can be of two types: * Additive * multiplicative
+
Additive time series
+
If the components of the time series are added to make the time series. Then the time series is called the additive time series. By visualization, we can say that the time series is additive if the increasing or decreasing pattern of the time series is similar throughout the series. The mathematical function of any additive time series can be represented by: \[y(t) = level + Trend + seasonality + noise\]
+
+
+
Multiplicative time series
+
If the components of the time series are multiplicative together, then the time series is called a multiplicative time series. For visualization, if the time series is having exponential growth or decline with time, then the time series can be considered as the multiplicative time series. The mathematical function of the multiplicative time series can be represented as.
+
\[y(t) = Level * Trend * seasonality * Noise\]
+
+
Additive
+
+
a = seasonal_decompose(df["y"], model ="additive", period=24).plot()
+a.savefig('../../figs/prediction_intervals_in_forecasting_models__seasonal_decompose_aditive.png', bbox_inches='tight')
+plt.close()
+
+
+
+
+
Multiplicative
+
+
b = seasonal_decompose(df["y"], model ="Multiplicative", period=24).plot()
+b.savefig('../../figs/prediction_intervals_in_forecasting_models__seasonal_decompose_multiplicative.png', bbox_inches='tight')
+plt.close();
+
+
+
+
+
+
+
Split the data into training and testing
+
Let’s divide our data into sets 1. Data to train our model. 2. Data to test our model
+
For the test data we will use the last 500 hours to test and evaluate the performance of our model.
Now let’s plot the training data and the test data.
+
+
fig = plot_series(train,test)
+
+
+
+
+
Modeling with mlforecast
+
+
Building Model
+
We define the model that we want to use, for our example we are going to use the XGBoost model.
+
+
model1 = [xgb.XGBRegressor()]
+
+
We can use the MLForecast.preprocess method to explore different transformations.
+
If it is true that the series we are working with is a stationary series see (Dickey fuller test), however for the sake of practice and instruction in this guide, we will apply the difference to our series, we will do this using the target_transforms parameter and calling the diff function like: mlforecast.target_transforms.Differences
It is important to take into account when we use the parameter target_transforms=[Differences([1])] in case the series is stationary we can use a difference, or in the case that the series is not stationary, we can use more than one difference so that the series is constant over time, that is, that it is constant in mean and in variance.
+
+
prep = mlf.preprocess(df)
+prep
+
+
+
+
+
+
+
+
+
ds
+
y
+
unique_id
+
+
+
+
+
1
+
2014-07-01 00:30:00
+
-2717.0
+
1
+
+
+
2
+
2014-07-01 01:00:00
+
-1917.0
+
1
+
+
+
3
+
2014-07-01 01:30:00
+
-1554.0
+
1
+
+
+
4
+
2014-07-01 02:00:00
+
-836.0
+
1
+
+
+
5
+
2014-07-01 02:30:00
+
-947.0
+
1
+
+
+
...
+
...
+
...
+
...
+
+
+
10315
+
2015-01-31 21:30:00
+
951.0
+
1
+
+
+
10316
+
2015-01-31 22:00:00
+
1051.0
+
1
+
+
+
10317
+
2015-01-31 22:30:00
+
1588.0
+
1
+
+
+
10318
+
2015-01-31 23:00:00
+
-718.0
+
1
+
+
+
10319
+
2015-01-31 23:30:00
+
-303.0
+
1
+
+
+
+
+
10319 rows × 3 columns
+
+
+
+
This has subtacted the lag 1 from each value, we can see what our series look like now.
+
+
fig = plot_series(prep)
+
+
+
+
+
Adding features
+
+
Lags
+
Looks like the seasonality is gone, we can now try adding some lag features.
y 1.000000
+lag1 0.663082
+lag24 0.155366
+Name: y, dtype: float64
+
+
+
+
+
+
Lag transforms
+
Lag transforms are defined as a dictionary where the keys are the lags and the values are lists of functions that transform an array. These must be numba jitted functions (so that computing the features doesn’t become a bottleneck). There are some implemented in the window-ops package but you can also implement your own.
+
If the function takes two or more arguments you can either:
You can see that both approaches get to the same result, you can use whichever one you feel most comfortable with.
+
+
+
Date features
+
If your time column is made of timestamps then it might make sense to extract features like week, dayofweek, quarter, etc. You can do that by passing a list of strings with pandas time/date components. You can also pass functions that will take the time column as input, as we’ll show here.
It’s important to note that we can only use this method if we assume that the residuals of our validation predictions are normally distributed. To see if this is the case, we will use a PP-plot and test its normality with the Anderson-Darling, Kolmogorov-Smirnov, and D’Agostino K^2 tests.
+
The PP-plot(Probability-to-Probability) plots the data sample against the normal distribution plot in such a way that if normally distributed, the data points will form a straight line.
+
The three normality tests determine how likely a data sample is from a normally distributed population using p-values. The null hypothesis for each test is that “the sample came from a normally distributed population”. This means that if the resulting p-values are below a chosen alpha value, then the null hypothesis is rejected. Thus there is evidence to suggest that the data comes from a non-normal distribution. For this article, we will use an Alpha value of 0.01.
Now let’s visualize the result of our forecast and the historical data of our time series, also let’s draw the confidence interval that we have obtained when making the prediction with 95% confidence.
The confidence interval is a range of values that has a high probability of containing the true value of a variable. In machine learning time series models, the confidence interval is used to estimate the uncertainty in the predictions.
+
One of the main benefits of using the confidence interval is that it allows users to understand the accuracy of the predictions. For example, if the confidence interval is very wide, it means that the prediction is less accurate. Conversely, if the confidence interval is very narrow, it means that the prediction is more accurate.
+
Another benefit of the confidence interval is that it helps users make informed decisions. For example, if a prediction is within the confidence interval, it means that it is likely to come true. Conversely, if a prediction is outside the confidence interval, it means that it is less likely to come true.
+
In general, the confidence interval is an important tool for machine learning time series models. It helps users understand the accuracy of the forecasts and make informed decisions.
+
+
+
+
References
+
+
Changquan Huang • Alla Petukhina. Springer series (2022). Applied Time Series Analysis and Forecasting with Python.