-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forecasting the Unknown #146
Comments
Hi @Jackwannsee, I am not a maintainer of this library, but I did write a couple of notebooks for making forward predictions using it. Please let me know if you find them useful. |
@Jackwannsee Indeed -- many of the notebooks are focussed on prediction of timestamps in the test dataset. This is done so that we can view and evaluate the performance of the model (i.e. make some comparison to ground truth data). The two notebooks @fayvor pointed out show how to use the forecasting pipeline to produce forecasts for the future timestamps. |
Thank you for your responses. @fayvor I have run your notebooks locally using my synthetic dataset and was able to get the results I desired as outlined in my initial comment. Following further exploration I have a few more questions:
(.venv) PS C:\Users\z0050j3w\Documents\Code\TTM\V0.2.10> python .\fine-tuning.py
Data lengths: train = 1, validate = 667, test = 667
Number of params before freezing backbone 805280
Number of params after freezing the backbone 289696
Using learning rate = 0.001
0%| | 0/50 ata lengths: train = 1, validate = 667, test = 667
Number of params before freezing backbone 805280
Number of params after freezing the backbone 289696
Using learning rate = 0.001
0%| | 0/50 raceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\runpy.py", line 289, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\z0050j3w\Documents\Code\TTM\V0.2.10\fine-tuning.py", line 176, in <module>
finetune_forecast_trainer.train()
File "C:\Users\z0050j3w\Documents\Code\TTM\V0.2.10\.venv\lib\site-packages\transformers\trainer.py", line 2052, in train
return inner_training_loop(
File "C:\Users\z0050j3w\Documents\Code\TTM\V0.2.10\.venv\lib\site-packages\transformers\trainer.py", line 2345, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "C:\Users\z0050j3w\Documents\Code\TTM\V0.2.10\.venv\lib\site-packages\accelerate\data_loader.py", line 547, in __iter__
dataloader_iter = self.base_dataloader.__iter__()
File "C:\Users\z0050j3w\Documents\Code\TTM\V0.2.10\.venv\lib\site-packages\torch\utils\data\dataloader.py", line 440, in __iter__
return self._get_iterator()
File "C:\Users\z0050j3w\Documents\Code\TTM\V0.2.10\.venv\lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\z0050j3w\Documents\Code\TTM\V0.2.10\.venv\lib\site-packages\torch\utils\data\dataloader.py", line 1038, in __init__
w.start()
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\z0050j3w\.pyenv\pyenv-win\versions\3.10.11\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
0%| | 0/50
[00:00<?, ?it/s] Please note that my script that resulted in this error message is not 1:1 with the original notebook, if necessary (which I don't believe is the case) I can provide this code. All the best, |
Hi @Jackwannsee, Great to hear you were able to get the forecast results! I'll do my best to answer your questions, but @wgifford will probably have additional insight.
I hope this helps! |
@fayvor Thank you for your answer! Following your response I will focus on using Im excited to see what the future holds with regard to fine tuning as I am planning to write my thesis on the topic of Time Series Forecasting. All the best, |
My recommendation for a Windows user would be to use Microsoft's really excellent WSL subsystem for for Linux. It integrates extremely well with vscode too. It gives you a near native Linux experience on Windows (with nvidia driver support too if you want it). See https://learn.microsoft.com/en-us/windows/wsl/install |
@Jackwannsee Just checking in -- were you able to resolve the error? You may need to add a section in your script like:
|
@wgifford Thanks for checking in! I have a working script running on a windows OS using a python virtual environment using NOTE: The code is messy but I hope it could potentially help someone to get started... from tsfm_public import (
TinyTimeMixerForPrediction,
TimeSeriesForecastingPipeline,
TimeSeriesPreprocessor,
)
import os
import pandas as pd
from datetime import timedelta
DATA_PATH = None # Replace None with path to the dataset you are working with
OUTPUT_PATH = None # Replace None with the path where results are to be stored
timestamp_column = "Timestamp" # String of time feature from pandas df
target_columns = ["Name of Target"] # List containing string containing target feature from pandas df
target_column = 'Name of Target' # string of target feature from pandas df
frequency = '15min'
context_lengths = [512,1024,1536] # Check hugging face documentation
prediction_length = 96
models = [
("ibm-granite/granite-timeseries-ttm-r1", "main", "TTM R1 cl=512"),
("ibm-granite/granite-timeseries-ttm-r1", "1024_96_v1", "TTM R1 cl=1024"),
("ibm-granite/granite-timeseries-ttm-r2", "main", "TTM R2 cl=512"),
("ibm-granite/granite-timeseries-ttm-r2", "1024-96-r2", "TTM R2 cl=1024"),
("ibm-granite/granite-timeseries-ttm-r2", "1536-96-r2", "TTM R2 cl=1536"),
("ibm-granite/granite-timeseries-ttm-r2", "1024-192-r2", "TTM R2 cl=1536"),
]
column_specifiers = {
"timestamp_column": timestamp_column,
"id_columns": [],
"target_columns": target_columns,
"control_columns": [],
}
input_df = pd.read_csv(
DATA_PATH,
parse_dates=[timestamp_column]
)
# Drop the last 96 timestamps, so that you can test how well the model performs using metrics such as RMSE or MSE
df_actual_values = input_df[[target_column]].tail(prediction_length)
input_df.drop(input_df.tail(prediction_length).index, inplace=True)
# Looping the model with different parameters
for model in models:
# Logic to select context length
context_length = None
if model == models[0] or model == models[2]:
context_length = 512
elif model == models[1] or model == models[3]:
context_length = 1024
elif model == models[4]:
context_length = 1536
tsp = TimeSeriesPreprocessor(
**column_specifiers,
context_length=context_length,
prediction_length=prediction_length,
scaling=False,
scaler_type="standard",
)
zeroshot_model = TinyTimeMixerForPrediction.from_pretrained(
model[0],
revision= model[1],
num_input_channels=len(target_columns) # Number of input columns.
)
pipeline = TimeSeriesForecastingPipeline(
zeroshot_model,
device="cpu", # Specify your local GPU or CPU.
feature_extractor=tsp,
inverse_scale_outputs=False
)
historical = input_df.iloc[-context_length:].copy()
zeroshot_forecast = pipeline(historical) # Make a forecast on the target column given the input data.
# The following code isn't pretty but it works \_O_/
# Because output format is a dictionary the following code is used to transform into a pandas df
initial_date = pd.to_datetime(zeroshot_forecast[timestamp_column][0]) # Extract the initial date
y_prediction_list = str(zeroshot_forecast[f'{target_column}_prediction'][0]).replace('[', '').replace(']', '').split() # Extract y_prediction and clean up the formatting
y_prediction_list = [x.rstrip(',') for x in y_prediction_list]
y_prediction = list(map(float, y_prediction_list))
new_timestamps = [initial_date + pd.to_timedelta(frequency) * (i + 1) for i in range(len(y_prediction))]
forecasting_results = pd.DataFrame({
'date': new_timestamps,
'forecast': y_prediction
})
df_actual_values.reset_index(drop=True, inplace=True)
forecasting_results["actual"] = df_actual_values[target_column]
forecasting_results = forecasting_results.rename(columns={'date': timestamp_column})
# change this path to suit your working environment. ALSO import os
forecasting_results.to_csv(os.path.join(OUTPUT_PATH, " f"forecasting_results_{model[0][-2:]}_{context_length}.csv"))
historical[timestamp_column] = pd.to_datetime(historical[timestamp_column])
print("\nData Set:", DATA_PATH)
print("Context Length", context_length)
print("Forecast Horizon", prediction_length)
print("Mean Squared Error: ", mse(forecasting_results["actual"], forecasting_results["forecast"]))
# For MSE calculations you can import "from sklearn.metrics import mean_squared_error" OR write your own function |
Hello, let me preface this by saying thanks to you and your team for the great work, congratulations.
I am a student experimenting with time series forecasting using deep learning methods. Following experimentation, I have noticed that forecasts are not future/ unknown timestamps rather of timestamps that are already in the data. As can be seen in the image, the
actual
values are set to 0 and the prediction is able to quite accurately predict those timestamps.The timeseries_data.csv used for this is synthetic, following a predictable path. Observations have a 1 hour interval with the first observation occurring at
01/01/2012 00:00
and last observation in the data occurring04/14/2012 03:00
, with a total of 2500 observations.My code (seen bellow) is based off of an IBM tutorial. In the IBM tutorial it is also visible that the forecast is not predicting future timestamps rather the actual values are being set to 0.
Hence, my question is whether it is possible to forecast timestamps that aren't in the data set and if so how. I have gone through the notebooks in this repo alongside other resources but haven't been able to figure it out.
Please advise.
Thanks, Jack.
The text was updated successfully, but these errors were encountered: