Skip to content

Commit

Permalink
add cutoff parameter for train
Browse files Browse the repository at this point in the history
  • Loading branch information
MoritzM00 committed Nov 14, 2024
1 parent 08add1e commit d2526d4
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 43 deletions.
82 changes: 42 additions & 40 deletions dvc.lock
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,8 @@ stages:
nfiles: 7
- path: src/probafcst/pipeline/train.py
hash: md5
md5: 8c38b7144ddc190301187bcbccab381c
size: 1003
md5: 7d5682f17d005a1324ebe0272cd2bb08
size: 1039
params:
params.yaml:
data.energy:
Expand All @@ -117,7 +117,8 @@ stages:
- 0.75
- 0.975
train.energy:
selected: xgboost
selected: quantreg
cutoff: '2021-11-14'
benchmark:
n_weeks: 75
quantreg:
Expand All @@ -131,8 +132,8 @@ stages:
outs:
- path: models/energy_model.pkl
hash: md5
md5: 45a947eb418dca8f2508f4448d4a843d
size: 13806992
md5: 096054215456ac5e6c7e7858efbda652
size: 2764139
train@bikes:
cmd: python src/probafcst/pipeline/train.py --target bikes
deps:
Expand All @@ -147,8 +148,8 @@ stages:
nfiles: 7
- path: src/probafcst/pipeline/train.py
hash: md5
md5: 8c38b7144ddc190301187bcbccab381c
size: 1003
md5: 5b991e4c34d3f7c3c0e4e3153ed4d50c
size: 1091
params:
params.yaml:
data.bikes:
Expand All @@ -162,7 +163,8 @@ stages:
- 0.75
- 0.975
train.bikes:
selected: xgboost
selected: quantreg
cutoff: '2021-11-14'
benchmark:
n_weeks: 125
quantreg:
Expand All @@ -176,8 +178,8 @@ stages:
outs:
- path: models/bikes_model.pkl
hash: md5
md5: 5a18f0021519963b0e0a5666e915e982
size: 2007180
md5: f072e90794c21a2066cd284dea8e6ff8
size: 113630
train@no2:
cmd: uv run python src/probafcst/pipeline/train.py --target no2
deps:
Expand Down Expand Up @@ -209,12 +211,12 @@ stages:
deps:
- path: models/bikes_model.pkl
hash: md5
md5: 5a18f0021519963b0e0a5666e915e982
size: 2007180
md5: f072e90794c21a2066cd284dea8e6ff8
size: 113630
- path: models/energy_model.pkl
hash: md5
md5: 45a947eb418dca8f2508f4448d4a843d
size: 13806992
md5: 096054215456ac5e6c7e7858efbda652
size: 2764139
- path: src/probafcst//plotting.py
hash: md5
md5: 482a42cf8b0b9196d98b0d8e772d83d2
Expand Down Expand Up @@ -242,16 +244,16 @@ stages:
outs:
- path: output/bikes_forecast.png
hash: md5
md5: e3c07dd456fa87dc28835445d5b42140
size: 63372
md5: f9734857202bee86f68c717c96eb60ad
size: 55660
- path: output/energy_forecast.png
hash: md5
md5: 0030d0ed824c242ea65fe72b70a452d0
size: 70485
md5: a1cf54f1601b2d029c3bc43f48906615
size: 69911
- path: output/submission.csv
hash: md5
md5: 171f4c2aa910e3ef29a5348aa0fc0998
size: 1609
md5: a45024b08f350c39c893886b337f8686
size: 1564
eval@energy:
cmd: python src/probafcst/pipeline/evaluate.py --target energy
deps:
Expand All @@ -261,8 +263,8 @@ stages:
size: 1325417
- path: models/energy_model.pkl
hash: md5
md5: 45a947eb418dca8f2508f4448d4a843d
size: 13806992
md5: 096054215456ac5e6c7e7858efbda652
size: 2764139
- path: src/probafcst//backtest.py
hash: md5
md5: 30b4d59467717a9db02aef2dc16a3ff5
Expand Down Expand Up @@ -291,20 +293,20 @@ stages:
outs:
- path: output/energy_eval_results.csv
hash: md5
md5: 51ccd4ba00e7c3c7f9fcde93e2095037
size: 4826
md5: 28a62f592b9a3f1f24dcae579de45e8f
size: 4768
- path: output/energy_metrics.json
hash: md5
md5: 7ef50c23f2abc167c17d15ba6cee5002
size: 180
md5: ae2bf7575652ca03bcc7d51420580213
size: 181
- path: output/energy_pinball_losses.png
hash: md5
md5: 9352193e2e8a015b2bda9545b951346d
size: 14698
md5: 228cc9e24fde90a4f586a8ed3d4eea9e
size: 13423
- path: output/eval_plots/energy/
hash: md5
md5: f96a28c200f6bd1cdd999bbac202f432.dir
size: 430647
md5: 6f406e5b7ef1f7794a0812d817c3d2d4.dir
size: 369507
nfiles: 3
eval@bikes:
cmd: python src/probafcst/pipeline/evaluate.py --target bikes
Expand All @@ -315,8 +317,8 @@ stages:
size: 63848
- path: models/bikes_model.pkl
hash: md5
md5: 5a18f0021519963b0e0a5666e915e982
size: 2007180
md5: f072e90794c21a2066cd284dea8e6ff8
size: 113630
- path: src/probafcst//backtest.py
hash: md5
md5: 30b4d59467717a9db02aef2dc16a3ff5
Expand Down Expand Up @@ -345,18 +347,18 @@ stages:
outs:
- path: output/bikes_eval_results.csv
hash: md5
md5: 09e1b262562c73716c096dfcd0c1fce7
size: 19368
md5: fee8a6e27b155999c721a50ad07497cb
size: 19482
- path: output/bikes_metrics.json
hash: md5
md5: 5bd84f0c97eeaff71318d72810a4c736
size: 182
md5: 66958376e69a5f37010da45b502e4fc2
size: 183
- path: output/bikes_pinball_losses.png
hash: md5
md5: 19ced770898caac13014027764ef69e9
size: 21920
md5: 34067e344f4e2b98105e6ae80ea0d746
size: 19683
- path: output/eval_plots/bikes/
hash: md5
md5: f25902814a56909f3be4f081b57e1264.dir
size: 227088
md5: 750cdffd89426e605df50ebced92dd24.dir
size: 205991
nfiles: 3
6 changes: 4 additions & 2 deletions params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ data:

train:
bikes:
selected: xgboost
selected: quantreg
cutoff: '2021-11-14' # use only data onwards this date for training the final model
benchmark:
n_weeks: 125
quantreg:
Expand All @@ -43,7 +44,8 @@ train:


energy:
selected: xgboost
selected: quantreg
cutoff: '2021-11-14'
benchmark:
n_weeks: 75
quantreg:
Expand Down
4 changes: 3 additions & 1 deletion src/probafcst/pipeline/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,9 @@ def train(target):
y = pd.read_parquet(data_path).asfreq(params.data[target].freq)

forecaster = get_model(params=params.train[target], quantiles=params.quantiles)
forecaster.fit(y)

y_subset = y.loc[: params.train[target].cutoff]
forecaster.fit(y_subset)

# Save the model
model_dir = Path(params.model_dir)
Expand Down

0 comments on commit d2526d4

Please sign in to comment.