Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masked data is returned as 0.0 after gridding, how can these pixels be identified if zeros exist in input data? #51

Closed
JSAnandEOS opened this issue Jul 4, 2019 · 7 comments

Comments

@JSAnandEOS
Copy link

So I'm using the "conservative_normed" algorithm provided in the "masking" branch of xESMF to grid some MODIS GPP data to a lower spatial resolution (fine to coarse). Being a land-only product, the ocean pixels are invalid and so need to be masked. After masking and running xESMF on the data these regions now appear as zeroes, as expected.

My problem is that valid zero values also exist in the input data over regions with no vegetation (e.g. deserts). Therefore, in the resulting array I can't readily tell which pixels are invalid, and which contain real data. How do I get around this? Is there any way to output a mask of which pixels contain real (i.e. no data was binned at all)? Thanks.

@JiaweiZhuang
Copy link
Owner

the ocean pixels are invalid and so need to be masked.

in the resulting array I can't readily tell which pixels are invalid, and which contain real data. How do I get around this?

Do you mean that the input data (on source grid) are all NaNs cover the ocean region? In that case, the output data will also be NaNs over the ocean, by default. You don't need to apply additional masking. In many cases, "masking" just means "setting NaN to zeros" (#22 (comment)), which might not be what you actually want.

If you input data do not even cover the ocean region (i.e. a regional grid only over land), but the output grid is global, then the undefined ocean region will have zeros instead of NaNs, by default. To flip this behavior see #15 (comment).

@JSAnandEOS
Copy link
Author

Do you mean that the input data (on source grid) are all NaNs cover the ocean region?

In addition to the ocean, there are also certain areas where for whatever reason (say, cloud cover) the data is invalid, so these regions have to be removed from the gridding as well. I have currently set these to NaNs as well. These are different to areas where the data is zero (e.g. deserts), because these values are still valid.

You don't need to apply additional masking. In many cases, "masking" just means "setting NaN to zeros" (#22 (comment)), which might not be what you actually want.

I had originally wanted to use conservative gridding with NaNs and zero values, but I encountered the same problem as #22, where large sections of coastal regions were missing in the final gridded dataset, despite having non-zero input data near those regions. The discussion about "conservative_normed" suggested that I needed to do both masking and setting unwanted areas to NaNs in order to deal with both coastal regions and areas with invalid data.

@JiaweiZhuang
Copy link
Owner

If I understand correctly, then you need to

  1. Use "conservative_normed" with additional masks for NaN values, when building the regridder, just like what you did right now.
  2. Then, after building the regridder, apply the trick at Value of cells in the new grid that are outside the old grid's domain #15 (comment) so that "real zeros" and "mask-generated zeros" can be distinguished.

Does this produce what you expected?

@JSAnandEOS
Copy link
Author

JSAnandEOS commented Jul 8, 2019

If I understand you correctly, the regridding should be done like so:

import scipy
import xesmf as xe
import numpy as np

def add_matrix_NaNs(regridder):
    X = regridder.A
    M = scipy.sparse.csr_matrix(X)
    num_nonzeros = np.diff(M.indptr)
    M[num_nonzeros == 0, 0] = np.NaN
    regridder.A = scipy.sparse.coo_matrix(M)
    return regridder


def regrid(ds_in, ds_out, dr_in, method = 'conservative_normed'):
    regridder = xe.Regridder(ds_in, ds_out, method, periodic=True, reuse_weights=False)
    regridder = add_matrix_NaNs(regridder)
    dr_out = regridder(dr_in)
    regridder.clean_weight_file()
    return dr_out

Is this correct?

@JiaweiZhuang
Copy link
Owner

Yes this should mark undefined regions as NaNs while keeping real zeros untouched. However it is a very niche edge case, so I am not entirely sure if it is correct. Let me know if it works.

@JSAnandEOS
Copy link
Author

I apologise for the late reply, but I am pleased to report that this solution works. Thanks!

@JiaweiZhuang
Copy link
Owner

Great! Just notice that 0.2.0 deprecates regridder.A in favor of regridder.weights (792e228)

I'd like to have a simpler option in the main branch to set different mask-handling behavior, to avoid this ad-hoc fix from users. But given the subtlety of masking, it probably requires more study. Not having a clear timeline right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants