-
-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add estimate_sigma function to threshold features #395
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #395 +/- ##
=========================================
- Coverage 84.52% 84.43% -0.1%
=========================================
Files 22 22
Lines 3600 3616 +16
Branches 627 630 +3
=========================================
+ Hits 3043 3053 +10
- Misses 489 493 +4
- Partials 68 70 +2
Continue to review full report at Codecov.
|
|
||
if distribution.lower() == 'gaussian': | ||
# 75th quantile of the underlying, symmetric noise distribution | ||
# denom = scipy.stats.norm.ppf(0.75) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to a string name, you could accept any object with a ppf
method here perhaps (assuming that works, didn't read the paper)? Just check with if hasattr(distribution, 'ppf')
.
The bigger issue here seems to be the hardcoding of the 75th quantile. Why is that not a keyword?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like both choices originate from specific suggestions by the author, I agree that the best solution would be to have any distribution or quantile as valid inputs, as that could prove useful. The issue I saw is the use of scipy.stats in the original code, because PyWavelets doesn't rely on scipy (right?) I'm not sure what the best thing to do there is, including scipy for just this is overkill, perhaps a couple hand coded options for now? Or for just Gaussian with a variable percent input, shouldn't be too hard to get going without scipy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, nice suggestion for finding if a distribution supports a method, seeing the example it seems obvious now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we don't want to rely on scipy
. That's why I suggested checking for a ppf
method; users can still pass in scipy.stats
distributions but we don't need to import from scipy anywhere so we don't rely on scipy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A concise explanation of the rationale for use of the 75th quantile in MAD is given on the following wikipedia page:
https://en.wikipedia.org/wiki/Median_absolute_deviation
Perhaps it is worth adding that to the References? I don't remember if it was spelled out explicitly in Donoho's paper
I also agree about not relying on scipy
here. scipy
was already a dependency of scikit-image
and a reviewer of my PR there preferred to call the ppf
method explicitly instead of hard-coding the value, but that doesn't mean we have to do the same here.
The idea to check for a ppf
method is nice.
Another approach to deal with other noise distributions such as Poisson noise is to first apply a variance stabilizing transform prior to calling this Gaussian-based MAD method. An example implementation of this type of approach for use with Rician noise is available here, but it is not under compatible license terms.
if distribution.lower() == 'gaussian': | ||
if hasattr(distribution, 'ppf'): | ||
if not kwargs: | ||
kwargs = {'q': 0.75} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally thought it would be good to auto-fill the arguments with 75th quantile in the case there are no args passed in to keep up the defaults, but this may be a bad idea in case someone wants to use a ppf method which takes no arguments (as this would result in passing in an unwanted keyword argument).
Hi @micha2718l. Over in #394, @rgommers suggested renaming to |
I agree that
|
@rgommers @grlee77 Hi All! Sorry for the multi-year delay on this, I had some time and was reminded of this work. Would someone with access please enable the CI workflow here to help with tests (I don't think it existed in this state when I first started on this)/maybe it doesn't need to be enabled, if not then let me know what the next steps need to be. |
Hi @micha2718l, thanks for getting back to this. I have just hit the button to have CI run again. Note that this is needed on each new push. To avoid the problem, please feel free to submit a separate PR with for example a typo fix or trivial rewording - I'd be happy to merge such a PR straight away, and once you have a commit in the default branch, GitHub will trigger CI on future pushes to this PR without needing approval. |
Hi, any ideas when it will be released in a stable version ? |
This PR adds a feature to estimate the noise variance of a signal. This is useful in many thresholding algorithms such as 'VisuShrink' and 'RiskShrink'. Below is an example using this function to find a good threshold value for a contrived data set with noise added. This shows how choosing a threshold too high or too low will not perform as well.
Inspired by Issue #394
There are remaining details to work out yet, I don't know what the most correct thing to do on line 319 of _thresholding.py, as we do not have scipy as a dependency.
I am sure there are other little things as well as a better test set and docs, and a final decision on the name of the function and where it will live.
Middle right figure shows the correct threshold applied, top is too high and bottom is too low.