-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LI and FCI readers do not work with dask distributed scheduler #2815
Comments
Both readers use satpy/satpy/readers/fci_l1c_nc.py Lines 470 to 472 in 367016e
and satpy/satpy/readers/li_base_nc.py Lines 339 to 344 in 367016e
but the FCI code does not use the index map (the code in question is never reached). The LI use case for |
If I change satpy/satpy/readers/netcdf_utils.py Lines 331 to 333 in 12854de
then the code completes successfully (but no longer reads lazily). So maybe it's not the filehandler, but it's the dask graphs themselves that are unpicklable due to containing open NetCDF variables. |
But unpicklable dask objects are still computable: import netCDF4
import dask.array as da
from dask.distributed import Client, LocalCluster
def main():
cluster = LocalCluster()
client = Client(cluster)
nc = netCDF4.Dataset("/media/nas/x21308/MTG_test_data/LI/x/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+LI-2-AF--FD--CHK-BODY---NC4E_C_EUMT_20240613045114_L2PF_OPE_20240613045030_20240613045100_N__T_0030_0002.nc")
nc["x"]
v = da.from_array(nc["x"])
print(v.compute())
if __name__ == "__main__":
main() so the presence of this variable by itself shouldn't prevent the dask graph to be computable. |
dask distributed is supposed to contain a custom pickler class to handle exactly such cases, and/or use cloudpickle which can handle variables not otherwise picklable. This pull request explicitly refers to the data variable case (from h5netcdf, but probably the same from NetCDF4). |
NB: The satpy/satpy/readers/netcdf_utils.py Lines 72 to 78 in 367016e
|
Maybe we should use |
Describe the bug
When using the dask distributed
LocalCluster
, computing LI or FCI data gives corrupted data. Attempting to save the datasets to disk fails with several exceptions, with the root cause that a_netCDF4.Variable
is not picklable.This might affect other readers as well.
To Reproduce
For FCI:
And for LI accumulated products:
Expected behavior
I expect that the print statements result in the correct value, namely the same value as when I comment out the cluster usage. Furthermore, I expect both scripts to produce a simple image.
Actual results
With the distributed scheduler for FCI:
If I comment out the
save_datasets
call, the code completes with:The value is wrong (it shouldn't even be a float). For reference, when I use the regular scheduler, I get the (probably correct) value of 1025 (integer)
If I load calibrated data, all data are NaN.
I get similar problems with LI, but not with IASI L2 CDR. All three use file handlers deriving from
NetCDF4FsspecFileHandler
, but only FCI and LI usecache_handle=True
, which seems to trigger the problem. I did not try MWS (which also usescache_handle=True
).Environment Info:
Additional context
I ran into this problem when working on #2686.
It would seem that references to the file handler object end up in the dask graph, which should probably be avoided. They fail to be pickled, because they contain references to NetCDF4 objects.
In #1546, a similar problem was solved for VIIRS Compact.
I don't know yet if the hypothesis is correct and if so, how those references end up in the dask graph.
The text was updated successfully, but these errors were encountered: