Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resample distributions function for probability matching #390

Merged
merged 13 commits into from
Jul 22, 2024

Conversation

RubenImhoff
Copy link
Contributor

@RubenImhoff RubenImhoff commented Jul 18, 2024

First attempt to add a resample distributions function for probability matching, see also issue #353.

  • Add resampling function to postprocessing/probmatching.py.
  • Use the resampled distribution instead of the R_pm_blended variable in blending/steps.py.
  • If probmatching_method is cdf or mean in blending/steps.py, add resampling prior to using the probability matching as an optional functionality.
  • Add tests.

@RubenImhoff RubenImhoff added the enhancement New feature or request label Jul 18, 2024
@RubenImhoff RubenImhoff requested review from dnerini and aitaten July 18, 2024 14:37
Copy link

codecov bot commented Jul 18, 2024

Codecov Report

Attention: Patch coverage is 98.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 83.81%. Comparing base (d77fe73) to head (a46e62a).
Report is 1 commits behind head on master.

Files Patch % Lines
pysteps/postprocessing/probmatching.py 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #390      +/-   ##
==========================================
+ Coverage   83.67%   83.81%   +0.13%     
==========================================
  Files         159      160       +1     
  Lines       12649    12783     +134     
==========================================
+ Hits        10584    10714     +130     
- Misses       2065     2069       +4     
Flag Coverage Δ
unit_tests 83.81% <98.33%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@dnerini dnerini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good! do we have any example result? is this working?

pysteps/blending/steps.py Outdated Show resolved Hide resolved
pysteps/blending/steps.py Outdated Show resolved Hide resolved
@RubenImhoff
Copy link
Contributor Author

Apologies for the ugly plots, but this was the quick and dirty way to test three versions:

This is a blended forecast up to 12 hours ahead (1 member) for 16 July 2024 in the Netherlands. The left column shows a forecast without the resampling and also no smoothing as introduced by @sidekock yesterday. The middle column does both the smoothing and resampling prior to the probability matching and the right column is a check with only smoothing and no resampling. Overall, I would say that the smoothing already makes a differences on the edges of the radar domain (@sidekock gave a better example in his PR), but the resampling clearly results in less suppression of the extremes in the NWP (you kind of see a dissapation for the longer lead times in the left and right columns, but not in the middle one). I think this is what we are looking for, although one example is of course a bit limited.

resampling_test

@RubenImhoff RubenImhoff marked this pull request as ready for review July 19, 2024 15:39
@RubenImhoff
Copy link
Contributor Author

I think the PR is ready to go, but definitely feel free to give it a last thorough review. :)

Copy link
Member

@dnerini dnerini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some changes directly to the code and left few comments that I think it'd be better to clarify before we merge

pysteps/postprocessing/probmatching.py Outdated Show resolved Hide resolved
pysteps/postprocessing/probmatching.py Outdated Show resolved Hide resolved
@RubenImhoff
Copy link
Contributor Author

Looks good like this to me, @dnerini, thanks for the help!

@dnerini dnerini merged commit d577a98 into master Jul 22, 2024
10 checks passed
@dnerini dnerini deleted the resample_distributions_for_prob_matching branch July 22, 2024 11:51
@dnerini
Copy link
Member

dnerini commented Jul 22, 2024

@ladc @RubenImhoff @sidekock
the effect of this new approach seems to be fairly strong, see for example the example in the gallery:

v1.10 latest
image image

@aitaten
Copy link
Contributor

aitaten commented Jul 22, 2024

Isn't that what we were looking for? I think the problem in left example (60, 90 minutes) is that the mismatching precipitation bring lower precipitation values than the actual distributions either from NWC or NWP. And these resampling guarantees a similar distribution. In the right example, the values look similar in all the lead-times whereas in the left it is clear that the smoothing (lost of high values) is happening in the middle lead-times where I am assuming the weight it is closer to 0,5. Just let me know what you think 😄

@dnerini
Copy link
Member

dnerini commented Jul 22, 2024

totally agree with you, just wanted to make it clear that this has a significant impact on the output.

@RubenImhoff
Copy link
Contributor Author

Thanks for sharing, @dnerini! Based on the input radar data (see below), I'd agree that the changes seem to improve the distribution. Nevertheless, would be nice to see some more examples at some point (I'm sure some will come up from both KNMI and RMI soon).
image

@ladc
Copy link
Contributor

ladc commented Jul 22, 2024

Yes, I agree that this is actually better since you don't lose the extremes as much. Thanks for the examples. It might also fix a problem I had with the control member but I still need to run some tests, sorry for the delay.

@ladc
Copy link
Contributor

ladc commented Jul 24, 2024

Applying it on the sample dataset for the hackathon I get a strange result, but maybe I'm using a bad combination of parameter settings. Maybe because the NWP intensities are so high, the nowcast intensity increases immediately at timestep 1 (ens 0 and mean are the images on the top right and bottom left, respectively). Any idea what's going on?
control_ens_mean_obs

Top left is the "control run" by the way.

@RubenImhoff
Copy link
Contributor Author

RubenImhoff commented Jul 24, 2024

Hi @ladc,
Thanks for sharing this example! Did you also turn on the smoothing by @sidekock? I think the top right and bottom left do, in principle, look quite good, but I agree that it is weird to have such an intensity jump. This almost gives the feeling that the weight for NWP compared to the nowcast is too high, but that should go well, I think..

Something else that could play a role is the following in the blending/steps.py: code

  if probmatching_method is not None and resample_distribution:
      # deal with missing values
      arr1 = R_pm_ep[t_index]
      arr2 = precip_models_pm_temp[j]
      arr2 = np.where(np.isnan(arr2), np.nanmin(arr2), arr2)
      arr1 = np.where(np.isnan(arr1), arr2, arr1)
      # resample weights based on cascade level 2
      R_pm_resampled = probmatching.resample_distributions(
          first_array=arr1,
          second_array=arr2,
          probability_first_array=weights_pm_normalized[0],
      )

We fill up the 'outside radar domain' extrapolation values with the NWP values to have no nans (which I initially thought was a good idea). But maybe that skews the distribution in the wrong direction? So maybe we should change that back to actually allowing for nans in the extrapolation component?

@RubenImhoff
Copy link
Contributor Author

@ladc, would the above be worth the test for your case?

@ladc
Copy link
Contributor

ladc commented Jul 24, 2024

Thanks @RubenImhoff. I didn't turn on the smoothing, this was on purpose.
The main issues for me is indeed the intensity jump. Could this be because the NWP has very high intensities, but as these are not present in the blended forecast, the latter is unrealistically scaled up overall?
Do you suggest not filling anything outside the radar mask (so we have nans in the final forecast)? Or just for the resampling?

@RubenImhoff
Copy link
Contributor Author

I think just the resampling for now. This would at least keep the radar distribution lower in a case like this (because you push the distribution up by filling the nans with the values of the NWP forecast outside the radar domain).

@aitaten
Copy link
Contributor

aitaten commented Jul 24, 2024

Hi! Sorry to jump late for this discussion. I have a small question; how is it looking before doing the resampling/cdf? Because I am not sure if the problem is coming from the resample or from the merging of the cascades with different weights in different areas? @ladc , I did run your example data; maybe you could update or send me the snippet for recreating the figure so I can have a more detail look. :)

@ladc
Copy link
Contributor

ladc commented Jul 24, 2024

Hi @aitaten you can find the plot_nwc.py and plot_radar.py script here - apologies for the messy code:
https://github.com/ladc/rmi-scripts

@ladc
Copy link
Contributor

ladc commented Jul 27, 2024

I tried not filling the nan values (or rather not including them in the pmatching) - but I was a bit rushed so it's possible that the code contains errors:

control_ens_mean_obs

Warning: I will be fully offline in the coming week and still on vacation until 14 August.

The code is in my branch mentioned in #370

@aitaten
Copy link
Contributor

aitaten commented Jul 29, 2024

For me it is not working either ... I had to change a couple of things in the merged code (#370 ) because noise = None was throwing an error and not creating any output. After, making sure the noise was 0, I obtained (as expected) the smoothing of the unpredictable scales. Not sure how @ladc you are making it without losing the small scales 😞 . Here my example:
forecast_202107041600

@RubenImhoff
Copy link
Contributor Author

@ladc, your second attempt looks already better to me! Should I change the code (back) to accepting nans outside the radar domain? Then we're good to go, I hope.

@RubenImhoff
Copy link
Contributor Author

And seeing the vacation of @ladc, what do the others think?

@dnerini
Copy link
Member

dnerini commented Jul 29, 2024

And seeing the vacation of @ladc, what do the others think?

one remaining issue might be that using the weights from level 2 gives too much credit to NWP already from the first lead time, while you would expect a more progressive transition from the radar distribution...

@RubenImhoff
Copy link
Contributor Author

Based on the figure we made for the QJRMS paper, that should not be too much of an issue at that scale level:

image image

(the lead time is in hours going from 0 to 12 hr)

@RubenImhoff
Copy link
Contributor Author

@dnerini, any idea on how to nicely fix this? Right now the nans outside the radar domain are filled with the NWP values, which changes the distribution of the extrapolation cascade data. We could go back to our earlier version where nans were allowed, or maybe we can think of a more elegant solution to handle the nans outside the radar domain without changing the distribution?

@dnerini
Copy link
Member

dnerini commented Aug 2, 2024

I wonder if we could simply "blend in" the NWP values outside the radar domain, also improving the seamlessness at t0. In practice we could try replacing the radar mask with zeros and see if that is enough. Alternatively, one could treat the part outside the radar domain separately and use a simple blending function to introduce the NWP values. Both approaches should avoid the jump at t0 while the resampling approach can stay the same. What do you think?

@RubenImhoff
Copy link
Contributor Author

That would be worth the try. Would we introduce any statistical problems in the resampling if we introduce zeros (or np.nanmin(precip_radar)) in the extrapolation cascade outside the radar domain? If not, that might be an easy solution while keeping the code nearly the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants