Added MASE metric and `y_train` parameter to objectives #4221

remyogasawara · 2023-06-30T23:52:29Z

Resolves #4217

codecov · 2023-06-30T23:59:07Z

Codecov Report

Merging #4221 (c6f52e4) into main (4d20d58) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #4221     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        349     349             
  Lines      38281   38309     +28     
=======================================
+ Hits       38162   38190     +28     
  Misses       119     119

Impacted Files	Coverage Δ
evalml/objectives/__init__.py	`100.0% <ø> (ø)`
evalml/tests/data_checks_tests/test_data_checks.py	`100.0% <ø> (ø)`
evalml/objectives/cost_benefit_matrix.py	`100.0% <100.0%> (ø)`
evalml/objectives/fraud_cost.py	`100.0% <100.0%> (ø)`
evalml/objectives/lead_scoring.py	`100.0% <100.0%> (ø)`
evalml/objectives/objective_base.py	`100.0% <100.0%> (ø)`
evalml/objectives/standard_metrics.py	`100.0% <100.0%> (ø)`
evalml/pipelines/pipeline_base.py	`98.5% <100.0%> (ø)`
...ata_checks_tests/test_invalid_target_data_check.py	`100.0% <100.0%> (ø)`
...alml/tests/objective_tests/test_fraud_detection.py	`100.0% <100.0%> (ø)`
... and 2 more

evalml/tests/dependency_update_check/latest_dependency_versions.txt

remyogasawara · 2023-07-05T23:25:26Z

evalml/objectives/standard_metrics.py

+
+    def objective_function(self, y_true, y_predicted, X=None, sample_weight=None):
+        """Objective function for mean absolute percentage error for time series regression."""
+        y_train = y_true


MASE takes in training data as a parameter, since we don't use training data we just used the actual data in place of training data for now, but wanted to see what others think

This brings up a very interesting question. A quick test shows that the results do change based on what you pass in to the y_train argument, meaning that the results we pass in with this are going to be less correct.

Looking through where we score pipelines, I don't think it would be too much extra lift to add the ability to pass y_train through. We have access to it in the upper level time series pipeline scoring functions, as it's pretty common for time series related things in general to require the historical data.

I'm curious what others think though - should we update our objective functions to have access to y_train, or should we keep it as is like this?

I think we should update it to support MASE properly but lets work on it in a followup issue. @remyogasawara can you file the follow up in EvalML? Just basically stating the requirements that @eccabay wrote above?

eccabay

This metric is a tricky one - I think there's a few details we need to discuss before it can be merged in! Curious for other team members' input on this as well.

evalml/objectives/standard_metrics.py

evalml/tests/dependency_update_check/latest_dependency_versions.txt

eccabay · 2023-07-06T14:24:26Z

evalml/objectives/standard_metrics.py

+
+    def objective_function(self, y_true, y_predicted, X=None, sample_weight=None):
+        """Objective function for mean absolute percentage error for time series regression."""
+        y_train = y_true


This brings up a very interesting question. A quick test shows that the results do change based on what you pass in to the y_train argument, meaning that the results we pass in with this are going to be less correct.

Looking through where we score pipelines, I don't think it would be too much extra lift to add the ability to pass y_train through. We have access to it in the upper level time series pipeline scoring functions, as it's pretty common for time series related things in general to require the historical data.

I'm curious what others think though - should we update our objective functions to have access to y_train, or should we keep it as is like this?

evalml/objectives/standard_metrics.py

eccabay · 2023-07-06T14:34:52Z

evalml/objectives/standard_metrics.py

+        if isinstance(y_train, pd.Series):
+            y_train = y_train.to_numpy()
+        mase = MeanAbsoluteScaledError()
+        return mase(y_true, y_predicted, y_train=y_train) * 100


According to the documentation, another parameter that we may want to consider leveraging is sp. With our Decomposer.determine_periodicity function (or, if the pipeline has a Decomposer, the decomposer's saved sp parameter), we should have the information necessary to pass that through. The question remains how, since we need information from X and y train, but it could be done. From brief testing, it looks like adding the sp argument gives us lower values (aka "better" scores), and they would once again be considered to be more accurate, since we use sp during training.

I'm curious what other people think about the need to include sp as well.

I agree we should support this! @remyogasawara can you file another issue for this so we can prioritize after decomp is added for multiseries?

evalml/objectives/standard_metrics.py

jeremyliweishih · 2023-07-06T17:29:21Z

evalml/objectives/standard_metrics.py

+
+    def objective_function(self, y_true, y_predicted, X=None, sample_weight=None):
+        """Objective function for mean absolute percentage error for time series regression."""
+        y_train = y_true


I think we should update it to support MASE properly but lets work on it in a followup issue. @remyogasawara can you file the follow up in EvalML? Just basically stating the requirements that @eccabay wrote above?

evalml/objectives/standard_metrics.py

eccabay · 2023-07-06T17:45:12Z

One other small thing I forgot to request, can you add this (as well as SMAPE, because I forgot before 😅) to the API reference?

jeremyliweishih

LGTM good work!

eccabay

Looks good, great job being thorough with your changes! Just a few small nitpicks, but otherwise this looks ready to go

evalml/objectives/cost_benefit_matrix.py

evalml/tests/objective_tests/test_standard_metrics.py

evalml/objectives/standard_metrics.py

eccabay · 2023-07-18T14:28:37Z

evalml/objectives/standard_metrics.py

+        if (isinstance(y_train, pd.DataFrame) and (y_train.values == 0).all()) or (
+            isinstance(y_train, pd.Series) and (y_train == 0).all()
+        ):


I don't think you need to split it up this way - y_train.values should work regardless of whether it's a dataframe or a series!

remyogasawara force-pushed the 4217_add_mase branch from d43834e to 57f19c9 Compare July 5, 2023 18:13

remyogasawara commented Jul 5, 2023

View reviewed changes

evalml/tests/dependency_update_check/latest_dependency_versions.txt Outdated Show resolved Hide resolved

remyogasawara commented Jul 5, 2023

View reviewed changes

remyogasawara marked this pull request as ready for review July 5, 2023 23:37

auto-assign bot assigned remyogasawara Jul 5, 2023

remyogasawara requested review from christopherbunn, jeremyliweishih, fjlanasa, eccabay and chukarsten July 5, 2023 23:38

eccabay requested changes Jul 6, 2023

View reviewed changes

jeremyliweishih requested changes Jul 6, 2023

View reviewed changes

remyogasawara force-pushed the 4217_add_mase branch from 2491719 to 5569083 Compare July 6, 2023 20:43

remyogasawara marked this pull request as draft July 8, 2023 00:26

remyogasawara changed the title ~~Add MASE metric~~ Add MASE metric and y_train parameter to objectives Jul 14, 2023

remyogasawara changed the title ~~Add MASE metric and y_train parameter to objectives~~ Added MASE metric and y_train parameter to objectives Jul 14, 2023

remyogasawara force-pushed the 4217_add_mase branch from 657d675 to bd072d7 Compare July 14, 2023 23:14

remyogasawara marked this pull request as ready for review July 15, 2023 02:15

remyogasawara requested review from jeremyliweishih and eccabay July 15, 2023 02:18

jeremyliweishih approved these changes Jul 17, 2023

View reviewed changes

remyogasawara force-pushed the 4217_add_mase branch from 634f20c to a2ba1db Compare July 17, 2023 23:07

eccabay approved these changes Jul 18, 2023

View reviewed changes

remyogasawara force-pushed the 4217_add_mase branch from 7865012 to 90fb578 Compare July 18, 2023 16:48

remyogasawara added 4 commits July 18, 2023 12:09

add MASE metric

3297b7e

fix comments

3b3e233

add mase tests

4e9f8f7

update release notes

58fc8a0

remyogasawara added 27 commits July 18, 2023 12:09

add y_train param

e254975

update PR name

31c61f6

add y_train parameter

01414bd

add y_train parameter

8967b22

add y_train parameter

a021f3f

fix parameter order

27045d2

fix parameter names

a01c395

param names

6d01d32

remove *100 and fix comments

8ea017b

add y_train

ea14502

rebase

381879b

remove *100 and fix comments

005d9d7

remove positive_only function and fix comments

471a71d

add y_train

fd74870

add y_train parameter

53d134d

remove *100 and fix comments

f06cfc0

remove positive_only function and fix comments

12c977a

add y_train

c8588b0

remove *100 and fix comments

d484c78

remove positive_only function and fix comments

d84abd5

add y_train

56cdc00

update mase tests

19eea7b

check 0 values

7488bc3

check for df and series

2813e78

spelling

688fe8b

clean up comments and if statement

b25d890

swap np array to pd series

c6f52e4

remyogasawara force-pushed the 4217_add_mase branch from 844a517 to c6f52e4 Compare July 18, 2023 19:09

remyogasawara merged commit 09bd86e into main Jul 18, 2023

remyogasawara deleted the 4217_add_mase branch July 18, 2023 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added MASE metric and `y_train` parameter to objectives #4221

Added MASE metric and `y_train` parameter to objectives #4221

remyogasawara commented Jun 30, 2023

codecov bot commented Jun 30, 2023 •

edited

Loading

remyogasawara Jul 5, 2023 •

edited

Loading

eccabay Jul 6, 2023

jeremyliweishih Jul 6, 2023

eccabay left a comment

eccabay Jul 6, 2023

eccabay Jul 6, 2023

jeremyliweishih Jul 6, 2023

remyogasawara Jul 15, 2023

jeremyliweishih Jul 6, 2023

eccabay commented Jul 6, 2023

jeremyliweishih left a comment

eccabay left a comment

eccabay Jul 18, 2023

Added MASE metric and y_train parameter to objectives #4221

Added MASE metric and y_train parameter to objectives #4221

Conversation

remyogasawara commented Jun 30, 2023

codecov bot commented Jun 30, 2023 • edited Loading

Codecov Report

remyogasawara Jul 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eccabay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eccabay commented Jul 6, 2023

jeremyliweishih left a comment

Choose a reason for hiding this comment

eccabay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Added MASE metric and `y_train` parameter to objectives #4221

Added MASE metric and `y_train` parameter to objectives #4221

codecov bot commented Jun 30, 2023 •

edited

Loading

remyogasawara Jul 5, 2023 •

edited

Loading