Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary of NEXRAD reader issues #258

Closed
7 tasks done
ghiggi opened this issue Jan 30, 2025 · 4 comments
Closed
7 tasks done

Summary of NEXRAD reader issues #258

ghiggi opened this issue Jan 30, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@ghiggi
Copy link

ghiggi commented Jan 30, 2025

With this issue I would like to list new and already reported bugs that affects the NEXRAD reader, together with a MCVE which may help to solve them.

Regarding NEXRAD files on the AWS S3 bucket:

  • files ending with .Z looks like corrupted data and can't be opened.
  • files ending with .001 can be opened and looks like a copy of another file without the .001 extension, although being uploaded at later stage.

Problems


Bug related to Dask task tokenization

import os
import fsspec
import xradar as xd
import xarray as xr

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2024/09/01/FOP1/FOP120240901_000347_V06"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)

ds = xr.open_dataset(local_filepath, group="sweep_0", engine="nexradlevel2")
dsc = ds.chunk()
# Load one variable - IMPORTANT - skipping this step makes the next line work
dsc["DBZH"].load()
# Load the entire dataset
dsc.load() 

[FIXED #261] Bug related to conflicting sizes for dimension

  • Previously unreported bug.
  • This error does not occur in the pyart reader !
  • In the MCVE below, we first decompress the file.
  • Error: ValueError: conflicting sizes for dimension 'azimuth': length 360 on 'azimuth' and length 358 on {'azimuth': 'DBZH', 'range': 'DBZH'}
import os
import fsspec
import xradar as xd
import subprocess

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2010/01/01/KABR/KABR20100101_000618_V03.gz"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)
subprocess.run(["gzip", "-d", local_filepath], check=True)

dt = xd.io.open_nexradlevel2_datatree(local_filepath.replace(".gz", "")) 

[FIXED #267] Bug related to reading MSG5 data

  • Previously unreported bug.
  • Should solve also NEXRADLevel2 Reader Error for Older Files #256 .
  • This error does not occur in the pyart reader !
  • In the MCVE below, we first decompress the file.
  • Error: ValueError: unpack requires a buffer of 46 bytes
import os
import fsspec
import xradar as xd
import subprocess

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/1996/07/01/KFSX/KFSX19960701_044028.gz"
s3_filepath = "noaa-nexrad-level2/2002/01/01/KABX/KABX20020101_001308.gz"
s3_filepath = "noaa-nexrad-level2/2005/08/28/KLIX/KLIX20050828_180149.gz"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)
subprocess.run(["gzip", "-d", local_filepath], check=True)

dt = xd.io.open_nexradlevel2_datatree(local_filepath.replace(".gz", "")) 


[FIXED #266] Bug related to BZ2 data decompression

  • Previously unreported bug.
  • This error does not occur in the pyart reader !
  • It might be related to a wrong definition of the record number
  • Tracing: self.get_data_header() --> self.init_next_record() --> self.init_record(self.record_number + 1)
  • Error: OSError: Invalid data stream
import os
import fsspec
import xradar as xd

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2021/12/20/KLIX/KLIX20211220_161656_V06"
s3_filepath = "noaa-nexrad-level2/2021/12/20/KLIX/KLIX20211220_160243_V06"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)

dt = xd.io.open_nexradlevel2_datatree(local_filepath)

[FIXED #240] Reader performance

  • The xradar reader is at least 10 times slower than the pyart reader.
  • The xradar reader currently reopen the file for each sweep !
  • As workaround to xradar reader bugs, for my research work I developed a simple function converting pyart.Radar objects
    to a xradar xr.DataTree structure, and the overhead of creating the xr.DataTree object is just of 1-2 seconds.
    Would make sense to add a to_datatree method to pyart.Radar objects?
import os
import time
import fsspec
import pyart
import xradar as xd

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2024/09/01/FOP1/FOP120240901_000347_V06"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)

t_i = time.time()
dt = xd.io.open_nexradlevel2_datatree(local_filepath, chunks=None)
t_f = time.time()
print(t_f - t_i)  # > 30 s
 
t_i = time.time()
radar_obj = pyart.io.read_nexrad_archive(local_filepath)
t_f = time.time()
print(t_f - t_i) # 3-4 s

Different time/azimuth coordinate order compared to pyart

  • The 1 day shift in the time coordinate has been resolved in Bugs in NEXRAD L2 reader #180.
  • pyart enforce 'time' to be sorted, which means the first azimuth in the data array is the first being scanned.
  • xradar enforce the 'azimuth' coordinate value to be sorted, which means that the first azimuth of the data array
    is not necessarily the first azimuth that has been scanned by the radar !
  • This MCVE just check the agreement between coordinates after taking into account the coordinate shift.
import numpy as np
import os
import time
import fsspec
import pyart
import xradar as xd

# Retrieve example file
local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2024/09/01/FOP1/FOP120240901_000347_V06"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)

# Read with pyart and xradar
dt = xd.io.open_nexradlevel2_datatree(local_filepath, chunks=None)
radar_obj = pyart.io.read_nexrad_archive(local_filepath)

# Retrieve time (with second accuracy)
da_sweep_time_pyart = xr.DataArray(radar_obj.time["data"][radar_obj.get_slice(0)], dims="azimuth")
da_sweep_time_pyart.attrs["units"] = radar_obj.time["units"] 
ds_sweep_time_pyart = da_sweep_time_pyart.to_dataset(name="time")
ds_sweep_time_pyart = xr.decode_cf(ds_sweep_time_pyart, decode_times=True)
pyart_time = ds_sweep_time_pyart["time"].data.astype("M8[s]")
xradar_time = dt["sweep_0"]["time"].data.astype("M8[s]")

# Check min and max values are correct  
pyart_time_bounds = pyart_time.min(), pyart_time.max()
xradar_time_bounds = xradar_time.min(), xradar_time.max()
assert pyart_time_bounds == xradar_time_bounds  # WRO

# Check time is monotonically increasing 
assert np.all(np.diff(pyart_time) > 0)  # OK
assert np.all(np.diff(xradar_time) > 0) # ERROR

# Identify index where time "restart" into xradar
idx_bad = np.argmin(np.diff(xradar_time))
xradar_time[0]
xradar_time[idx_bad: idx_bad+5] 

# Assert that sorted time are the same (with seconds accuracy)
sorted_pyart_time = np.sort(pyart_time)
sorted_xradar_time = np.sort(xradar_time)
assert np.all((sorted_pyart_time - sorted_xradar_time).astype(int) == 0)
 
# Assert that sorted azimuth are the same
xradar_azimuth = dt["sweep_0"]["azimuth"].data
pyart_azimuth = radar_obj.get_azimuth(0)
sorted_pyart_azimuth = np.sort(xradar_azimuth)
sorted_xradar_azimuth = np.sort(pyart_azimuth)
np.testing.assert_allclose(sorted_pyart_azimuth, sorted_xradar_azimuth)

# Idenfity first azimuth scan 
idx_first_scan = np.argmin(xradar_time)
xradar_azimuth[idx_first_scan]
pyart_azimuth[0]
@kmuehlbauer
Copy link
Collaborator

Thanks @ghiggi for creating this tracking issue. Let's open dedicated issues for the different errors/bugs.

@aladinor
Copy link
Member

aladinor commented Feb 4, 2025

Thanks @ghiggi for reporting this.

Reader performance
The xradar reader is at least 10 times slower than the pyart reader.
The xradar reader currently reopen the file for each sweep !
As workaround to xradar reader bugs, for my research work I developed a simple function converting pyart.Radar objects
to a xradar xr.DataTree structure, and the overhead of creating the xr.DataTree object is just of 1-2 seconds.
Would make sense to add a to_datatree method to pyart.Radar objects?

I guess you are using xradar=0.8.0. In later PRs we handle it to avoid reopening the same object (#240). I think you might need to install that specific branch or wait until a new release.

@aladinor
Copy link
Member

aladinor commented Feb 4, 2025

In this error,

Bug related to BZ2 data decompression
Previously unreported bug.
This error does not occur in the pyart reader !
It might be related to a wrong definition of the record number
Tracing: self.get_data_header() --> self.init_next_record() --> self.init_record(self.record_number + 1)
Error: OSError: Invalid data stream
import os
import fsspec
import xradar as xd

local_tmp_dir = "/tmp"
fs_args = {}
_ = fs_args.setdefault("anon", True)
fs = fsspec.filesystem("s3", **fs_args)

s3_filepath = "noaa-nexrad-level2/2021/12/20/KLIX/KLIX20211220_161656_V06"
s3_filepath = "noaa-nexrad-level2/2021/12/20/KLIX/KLIX20211220_160243_V06"
local_filepath = os.path.join(local_tmp_dir, os.path.basename(s3_filepath))
fs.download(s3_filepath, local_tmp_dir)

dt = xd.io.open_nexradlevel2_datatree(local_filepath)

I think this does happen because pyart uses a function called prepare_for_read that allows reading different formats including .gz files and also to perform data streaming. I will create a new issue for handle this.

@kmuehlbauer
Copy link
Collaborator

Separate issues have been opened to track one issue at a time. I'm closing this now. Please open single issues, it's additional maintenance burden to interlink and keep track of everything. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

3 participants