Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std_score could give nan instead of ValueError if standard_deviation = 0? #272

Open
jagerber48 opened this issue Nov 17, 2024 · 4 comments

Comments

@jagerber48
Copy link
Contributor

Right now the std_score function raises a ValueError if it is called on an AffineScalarFunc with zero standard deviation. Should it return nan, possibly with a warning instead?

except ZeroDivisionError:

@newville
Copy link
Member

@jagerber48 I don't have a strong opinion.

I probably lean towards: NaNs are more painful as their effect is not noticed where it originates but in later code where it is used.

It might be reasonable for this function to drop try/except and just be

    return (value - self._nominal_value) / self.std_dev

I think aZeroDivisionError raised from this would be clearer than a ValueError, which would seem to imply that the value passed in for value was in error, say, value was a string or None, or a dict or something. FWIW, that would raise a ValueError, not explicitly caught here.

@jagerber48
Copy link
Contributor Author

@newville thanks for the feedback. I'm thinking about a case where there is an array of UFloat or something. There could be a chance that 1/1000 has a zero std_dev. In those cases it might be preferable to return nan instead of raising an exception (thus halting processing the 999/1000 "good" values. I know you said you don't have strong opinions, but does thinking about this case change your opinion at all?

I also don't know how prevalently this function is used. I've never used it. I just came across this detail as I was working on the linear combination refactor PR and posted here to track it.

@newville
Copy link
Member

@jagerber48 If value or self._nominal_value is an numpy array, and self.std_dev is a scaler 0, the result will be an ndarray of np.inf. If self.std_dev is an array and one element is 0, that element in the result with be np.inf. That seems OK, or at least "normal numpy behavior".

I would say that IMHO:
1. Inf is slightly better than NaN for divide-by-zero.
2. raising ZeroDivisionError is slightly better than returning Inf. But Numpy disagrees, and following them makes sense.
3. raising ValueError is kind of confusing, given the existence of ZeroDivisionError.

But also: not a strong preference. And I also never use this function ;).

@jagerber48
Copy link
Contributor Author

Taking your comments into review:

  • I agree inf is more appropriate than nan.
  • Yes, personally I'd tend to go the numpy direction and set inf rather than raise ZeroDivisionError
  • Yes, I also agree that it makes sense to raise the ZeroDivisionError instead of converting it to a ValueError.

I'll probably leave this discussion here for a while since there are higher priorities. Curious to hear others' thoughts in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants