Summary of NEXRAD reader issues #258

ghiggi · 2025-01-30T13:26:04Z

With this issue I would like to list new and already reported bugs that affects the NEXRAD reader, together with a MCVE which may help to solve them.

Regarding NEXRAD files on the AWS S3 bucket:

files ending with .Z looks like corrupted data and can't be opened.
files ending with .001 can be opened and looks like a copy of another file without the .001 extension, although being uploaded at later stage.

Problems

[Discussed in NEXRAD issue - IndexError index out of bounds #207] Bug related to Dask task tokenization (NEXRAD issue - IndexError index out of bounds #207 )
[Fixed] Bug related to conflicting sizes for dimension NexradLevel2 Bug related to conflicting sizes for dimension #259 FIX: correct retrieval of intermediate records in nexrad level2 reader #261
[Fixed] Bug related to BZ2 data decompression [NexradLevel2] Bug related to BZ2 data decompression #264 FIX: test for BZhX1AY&SY (where X is any number between 0..9) #266
[Fixed] Bug related to reading MSG5 data NEXRADLevel2 Reader Error for Older Files #256, ENH: add message 1 decoding to nexrad level2 reader #267
[Discussed in can Nexradlevel2 reader read streamed data? #265] Inability to read files which are gzip compressed (< 2009) can Nexradlevel2 reader read streamed data? #265
[Discussed in Different time/azimuth coordinate order compared to pyart #268] Different time and azimuth coordinates order compared to pyart (design choice?)
[Fixed] Performance: The pyart reader takes 3 seconds to read a file, xradar reader takes between 30-50 seconds fixing sweep names and making open_nexradlevel2_datatree 5x faster #240

Bug related to Dask task tokenization

Already reported by NEXRAD issue - IndexError index out of bounds #207 .
Error : IndexError: index 140 is out of bounds for axis 0 with size 7

import os
import fsspec
import xradar as xd
import xarray as xr

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2024/09/01/FOP1/FOP120240901_000347_V06"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)

ds = xr.open_dataset(local_filepath, group="sweep_0", engine="nexradlevel2")
dsc = ds.chunk()
# Load one variable - IMPORTANT - skipping this step makes the next line work
dsc["DBZH"].load()
# Load the entire dataset
dsc.load()

[FIXED #261] Bug related to conflicting sizes for dimension

Previously unreported bug.
This error does not occur in the pyart reader !
In the MCVE below, we first decompress the file.
Error: ValueError: conflicting sizes for dimension 'azimuth': length 360 on 'azimuth' and length 358 on {'azimuth': 'DBZH', 'range': 'DBZH'}

import os
import fsspec
import xradar as xd
import subprocess

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2010/01/01/KABR/KABR20100101_000618_V03.gz"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)
subprocess.run(["gzip", "-d", local_filepath], check=True)

dt = xd.io.open_nexradlevel2_datatree(local_filepath.replace(".gz", ""))

[FIXED #267] Bug related to reading MSG5 data

Previously unreported bug.
Should solve also NEXRADLevel2 Reader Error for Older Files #256 .
This error does not occur in the pyart reader !
In the MCVE below, we first decompress the file.
Error: ValueError: unpack requires a buffer of 46 bytes

import os
import fsspec
import xradar as xd
import subprocess

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/1996/07/01/KFSX/KFSX19960701_044028.gz"
s3_filepath = "noaa-nexrad-level2/2002/01/01/KABX/KABX20020101_001308.gz"
s3_filepath = "noaa-nexrad-level2/2005/08/28/KLIX/KLIX20050828_180149.gz"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)
subprocess.run(["gzip", "-d", local_filepath], check=True)

dt = xd.io.open_nexradlevel2_datatree(local_filepath.replace(".gz", ""))

[FIXED #266] Bug related to BZ2 data decompression

Previously unreported bug.
This error does not occur in the pyart reader !
It might be related to a wrong definition of the record number
Tracing: self.get_data_header() --> self.init_next_record() --> self.init_record(self.record_number + 1)
Error: OSError: Invalid data stream

import os
import fsspec
import xradar as xd

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2021/12/20/KLIX/KLIX20211220_161656_V06"
s3_filepath = "noaa-nexrad-level2/2021/12/20/KLIX/KLIX20211220_160243_V06"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)

dt = xd.io.open_nexradlevel2_datatree(local_filepath)

[FIXED #240] Reader performance

The xradar reader is at least 10 times slower than the pyart reader.
The xradar reader currently reopen the file for each sweep !
As workaround to xradar reader bugs, for my research work I developed a simple function converting pyart.Radar objects
to a xradar xr.DataTree structure, and the overhead of creating the xr.DataTree object is just of 1-2 seconds.
Would make sense to add a to_datatree method to pyart.Radar objects?

import os
import time
import fsspec
import pyart
import xradar as xd

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2024/09/01/FOP1/FOP120240901_000347_V06"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)

t_i = time.time()
dt = xd.io.open_nexradlevel2_datatree(local_filepath, chunks=None)
t_f = time.time()
print(t_f - t_i)  # > 30 s
 
t_i = time.time()
radar_obj = pyart.io.read_nexrad_archive(local_filepath)
t_f = time.time()
print(t_f - t_i) # 3-4 s

Different time/azimuth coordinate order compared to pyart

The 1 day shift in the time coordinate has been resolved in Bugs in NEXRAD L2 reader #180.
pyart enforce 'time' to be sorted, which means the first azimuth in the data array is the first being scanned.
xradar enforce the 'azimuth' coordinate value to be sorted, which means that the first azimuth of the data array
is not necessarily the first azimuth that has been scanned by the radar !
This MCVE just check the agreement between coordinates after taking into account the coordinate shift.

import numpy as np
import os
import time
import fsspec
import pyart
import xradar as xd

# Retrieve example file
local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2024/09/01/FOP1/FOP120240901_000347_V06"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)

# Read with pyart and xradar
dt = xd.io.open_nexradlevel2_datatree(local_filepath, chunks=None)
radar_obj = pyart.io.read_nexrad_archive(local_filepath)

# Retrieve time (with second accuracy)
da_sweep_time_pyart = xr.DataArray(radar_obj.time["data"][radar_obj.get_slice(0)], dims="azimuth")
da_sweep_time_pyart.attrs["units"] = radar_obj.time["units"] 
ds_sweep_time_pyart = da_sweep_time_pyart.to_dataset(name="time")
ds_sweep_time_pyart = xr.decode_cf(ds_sweep_time_pyart, decode_times=True)
pyart_time = ds_sweep_time_pyart["time"].data.astype("M8[s]")
xradar_time = dt["sweep_0"]["time"].data.astype("M8[s]")

# Check min and max values are correct  
pyart_time_bounds = pyart_time.min(), pyart_time.max()
xradar_time_bounds = xradar_time.min(), xradar_time.max()
assert pyart_time_bounds == xradar_time_bounds  # WRO

# Check time is monotonically increasing 
assert np.all(np.diff(pyart_time) > 0)  # OK
assert np.all(np.diff(xradar_time) > 0) # ERROR

# Identify index where time "restart" into xradar
idx_bad = np.argmin(np.diff(xradar_time))
xradar_time[0]
xradar_time[idx_bad: idx_bad+5] 

# Assert that sorted time are the same (with seconds accuracy)
sorted_pyart_time = np.sort(pyart_time)
sorted_xradar_time = np.sort(xradar_time)
assert np.all((sorted_pyart_time - sorted_xradar_time).astype(int) == 0)
 
# Assert that sorted azimuth are the same
xradar_azimuth = dt["sweep_0"]["azimuth"].data
pyart_azimuth = radar_obj.get_azimuth(0)
sorted_pyart_azimuth = np.sort(xradar_azimuth)
sorted_xradar_azimuth = np.sort(pyart_azimuth)
np.testing.assert_allclose(sorted_pyart_azimuth, sorted_xradar_azimuth)

# Idenfity first azimuth scan 
idx_first_scan = np.argmin(xradar_time)
xradar_azimuth[idx_first_scan]
pyart_azimuth[0]

The text was updated successfully, but these errors were encountered:

kmuehlbauer · 2025-01-30T13:51:32Z

Thanks @ghiggi for creating this tracking issue. Let's open dedicated issues for the different errors/bugs.

aladinor · 2025-02-04T22:28:10Z

Thanks @ghiggi for reporting this.

Reader performance
The xradar reader is at least 10 times slower than the pyart reader.
The xradar reader currently reopen the file for each sweep !
As workaround to xradar reader bugs, for my research work I developed a simple function converting pyart.Radar objects
to a xradar xr.DataTree structure, and the overhead of creating the xr.DataTree object is just of 1-2 seconds.
Would make sense to add a to_datatree method to pyart.Radar objects?

I guess you are using xradar=0.8.0. In later PRs we handle it to avoid reopening the same object (#240). I think you might need to install that specific branch or wait until a new release.

aladinor · 2025-02-04T22:40:51Z

In this error,

Bug related to BZ2 data decompression
Previously unreported bug.
This error does not occur in the pyart reader !
It might be related to a wrong definition of the record number
Tracing: self.get_data_header() --> self.init_next_record() --> self.init_record(self.record_number + 1)
Error: OSError: Invalid data stream
import os
import fsspec
import xradar as xd

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2021/12/20/KLIX/KLIX20211220_161656_V06"
s3_filepath = "noaa-nexrad-level2/2021/12/20/KLIX/KLIX20211220_160243_V06"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)

dt = xd.io.open_nexradlevel2_datatree(local_filepath)

I think this does happen because pyart uses a function called prepare_for_read that allows reading different formats including .gz files and also to perform data streaming. I will create a new issue for handle this.

kmuehlbauer · 2025-02-06T14:08:55Z

Separate issues have been opened to track one issue at a time. I'm closing this now. Please open single issues, it's additional maintenance burden to interlink and keep track of everything. Thanks!

ghiggi added the bug Something isn't working label Jan 30, 2025

kmuehlbauer mentioned this issue Jan 30, 2025

NexradLevel2 Bug related to conflicting sizes for dimension #259

Closed

kmuehlbauer mentioned this issue Feb 4, 2025

[NexradLevel2] Bug related to BZ2 data decompression #264

Closed

aladinor mentioned this issue Feb 4, 2025

can Nexradlevel2 reader read streamed data? #265

Open

This was referenced Feb 5, 2025

NEXRADLevel2 Reader Error for Older Files #256

Closed

Different time/azimuth coordinate order compared to pyart #268

Closed

kmuehlbauer closed this as completed Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summary of NEXRAD reader issues #258

Summary of NEXRAD reader issues #258

ghiggi commented Jan 30, 2025 •

edited by kmuehlbauer

Loading

kmuehlbauer commented Jan 30, 2025

aladinor commented Feb 4, 2025 •

edited

Loading

aladinor commented Feb 4, 2025 •

edited

Loading

kmuehlbauer commented Feb 6, 2025

Summary of NEXRAD reader issues #258

Summary of NEXRAD reader issues #258

Comments

ghiggi commented Jan 30, 2025 • edited by kmuehlbauer Loading

Bug related to Dask task tokenization

[FIXED #261] Bug related to conflicting sizes for dimension

[FIXED #267] Bug related to reading MSG5 data

[FIXED #266] Bug related to BZ2 data decompression

[FIXED #240] Reader performance

Different time/azimuth coordinate order compared to pyart

kmuehlbauer commented Jan 30, 2025

aladinor commented Feb 4, 2025 • edited Loading

aladinor commented Feb 4, 2025 • edited Loading

kmuehlbauer commented Feb 6, 2025

ghiggi commented Jan 30, 2025 •

edited by kmuehlbauer

Loading

aladinor commented Feb 4, 2025 •

edited

Loading

aladinor commented Feb 4, 2025 •

edited

Loading