-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VolcanicAsh product to VIIRS EDR reader #3050
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3050 +/- ##
=======================================
Coverage 96.11% 96.11%
=======================================
Files 383 383
Lines 55673 55685 +12
=======================================
+ Hits 53511 53523 +12
Misses 2162 2162
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Pull Request Test Coverage Report for Build 13180056023Details
💛 - Coveralls |
Could you provide a link to one of these files or provide the |
I don't have the files at hand, so if I remember tomorrow, I'll put the file in Slack for example. And the Now, what I remember of the files 🤔 Both the |
I think it is treated like a mapping/dict so order doesn't matter, but I do know CF-standard-wise that coordinate variables aren't allowed to have fill values. But this xarray error (I assume that's what's causing the error?) seems to be happening before that. I assume the variable of that name is of (I mean "uses") the dimension with that same name? |
The error rises at |
The dump:
|
Opening with XArray v2025.1.1: In [13]: xr.open_dataset(fname)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[13], line 1
----> 1 xr.open_dataset(fname)
File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/api.py:679, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
667 decoders = _resolve_decoders_kwargs(
668 decode_cf,
669 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...)
675 decode_coords=decode_coords,
676 )
678 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 679 backend_ds = backend.open_dataset(
680 filename_or_obj,
681 drop_variables=drop_variables,
682 **decoders,
683 **kwargs,
684 )
685 ds = _dataset_from_backend_dataset(
686 backend_ds,
687 filename_or_obj,
(...)
697 **kwargs,
698 )
699 return ds
File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:681, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, auto_complex, lock, autoclose)
679 store_entrypoint = StoreBackendEntrypoint()
680 with close_on_error(store):
--> 681 ds = store_entrypoint.open_dataset(
682 store,
683 mask_and_scale=mask_and_scale,
684 decode_times=decode_times,
685 concat_characters=concat_characters,
686 decode_coords=decode_coords,
687 drop_variables=drop_variables,
688 use_cftime=use_cftime,
689 decode_timedelta=decode_timedelta,
690 )
691 return ds
File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/store.py:59, in StoreBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
45 encoding = filename_or_obj.get_encoding()
47 vars, attrs, coord_names = conventions.decode_cf_variables(
48 vars,
49 attrs,
(...)
56 decode_timedelta=decode_timedelta,
57 )
---> 59 ds = Dataset(vars, attrs=attrs)
60 ds = ds.set_coords(coord_names.intersection(vars))
61 ds.set_close(filename_or_obj.close)
File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/core/dataset.py:747, in Dataset.__init__(self, data_vars, coords, attrs)
744 if isinstance(coords, Dataset):
745 coords = coords._variables
--> 747 variables, coord_names, dims, indexes, _ = merge_data_and_coords(
748 data_vars, coords
749 )
751 self._attrs = dict(attrs) if attrs else None
752 self._close = None
File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/core/dataset.py:460, in merge_data_and_coords(data_vars, coords)
456 coords = create_coords_with_default_indexes(coords, data_vars)
458 # exclude coords from alignment (all variables in a Coordinates object should
459 # already be aligned together) and use coordinates' indexes to align data_vars
--> 460 return merge_core(
461 [data_vars, coords],
462 compat="broadcast_equals",
463 join="outer",
464 explicit_coords=tuple(coords),
465 indexes=coords.xindexes,
466 priority_arg=1,
467 skip_align_args=[1],
468 )
File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/core/merge.py:705, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value, skip_align_args)
700 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
701 variables, out_indexes = merge_collected(
702 collected, prioritized, compat=compat, combine_attrs=combine_attrs
703 )
--> 705 dims = calculate_dimensions(variables)
707 coord_names, noncoord_names = determine_coords(coerced)
708 if compat == "minimal":
709 # coordinates may be dropped in merged results
File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/core/variable.py:3073, in calculate_dimensions(variables)
3071 for dim, size in zip(var.dims, var.shape, strict=True):
3072 if dim in scalar_vars:
-> 3073 raise ValueError(
3074 f"dimension {dim!r} already exists as a scalar variable"
3075 )
3076 if dim not in dims:
3077 dims[dim] = size
ValueError: dimension 'Det_QF_Size' already exists as a scalar variable Opening with In [9]: fname = "JRR-VolcanicAsh_v3r0_j01_s202501290934247_e202501290935492_c202501291707471.nc"
In [10]: with netCDF4.Dataset(fname, "r") as fid:
...: print(fid["Det_QF_Size"])
...:
<class 'netCDF4.Variable'>
int32 Det_QF_Size()
long_name: Detection QF Size
_FillValue: -999
units: 1
unlimited dimensions:
current shape = ()
filling on |
Oh yeah, this file is poorly designed. We should maybe contact the creators. Especially since the global attributes suggest CF compliance. Maybe this doesn't go against CF because the variable with the same name as the dimension does not actually use that dimension, but come on. This is just confusing. |
I tried running the file through a cf checker (a python package, the web ones wouldn't let me upload a file so large) and it didn't complain. I've sent a message to someone at the SSEC to see if they know who makes these files to ask for a change. The drop solution seems to be the way to go for now. |
This PR adds the VolcanicAsh product to
viirs_edr
reader.The files I have (v3r0) can't be opened unless
Det_QF_Size
variable is dropped. I think this is becaus the same name is used as a dimension. For this, I added a possibility to drop variables defined in the reader YAML. The testing is a bit of a kludge, as the only way I could think of was to makeDet_QF_Size
a 2D variable instead of the scalar it is, and check that it is not in the available dataset listing.The error coming from XArray without
drop_variables
isValueError: dimension 'Det_QF_Size' already exists as a scalar variable
.