Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple forecast time steps in ECMWF IFS files not supported #41

Open
LJ-Young opened this issue Nov 22, 2024 · 4 comments
Open

Multiple forecast time steps in ECMWF IFS files not supported #41

LJ-Young opened this issue Nov 22, 2024 · 4 comments

Comments

@LJ-Young
Copy link

I tried to read a .grib file from ECMWF forecasting products downloaded from https://apps.ecmwf.int/datasets/data/tigge/levtype=sfc/type=fc/. I opened the file with python xarray:

>>> import xarray as xr
>>> import cfgrib
>>> data=xr.open_dataset('test.grib',engine='cfgrib')
>>> data
<xarray.Dataset>
Dimensions:     (time: 62, step: 4, latitude: 91, longitude: 180)
Coordinates:
  * time        (time) datetime64[ns] 2024-01-01 ... 2024-01-31T12:00:00
  * step        (step) timedelta64[ns] 00:00:00 06:00:00 12:00:00 18:00:00
    meanSea     float64 ...
  * latitude    (latitude) float64 90.0 88.0 86.0 84.0 ... -86.0 -88.0 -90.0
  * longitude   (longitude) float64 0.0 2.0 4.0 6.0 ... 352.0 354.0 356.0 358.0
    valid_time  (time, step) datetime64[ns] ...
Data variables:
    msl         (time, step, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2024-11-22T18:00 GRIB to CDM+CF via cfgrib-0.9.1...

It gave right results and showed the dimension of the mean sea level pressure (msl) is (time, step, latitude, longitude)

However, when I used Julia to read it with GRIBDatasets:

julia> using GRIBDatasets
julia> ds=GRIBDataset("test.grib")
Dataset: test.grib
Group: /

Dimensions
   lon = 180
   lat = 91
   valid_time = 126

Variables
  lon   (180)
    Datatype:    Float64 (Float64)
    Dimensions:  lon
    Attributes:
     units                = degrees_east
     long_name            = longitude
     standard_name        = longitude

  lat   (91)
    Datatype:    Float64 (Float64)
    Dimensions:  lat
    Attributes:
     units                = degrees_north
     long_name            = latitude
     standard_name        = latitude

  valid_time   (126)
    Datatype:    Dates.DateTime (Int64)
    Dimensions:  valid_time
    Attributes:
     units                = seconds since 1970-01-01T00:00:00
     calendar             = proleptic_gregorian
     long_name            = time
     standard_name        = time

  msl   (180 × 91 × 126)
    Datatype:    Union{Missing, Float64} (Float64)
    Dimensions:  lon × lat × valid_time
    Attributes:
     units                = Pa
     long_name            = Mean sea level pressure
     standard_name        = air_pressure_at_mean_sea_level

Global attributes
  edition              = 2
  source               = test.grib
  centreDescription    = European Centre for Medium-Range Weather Forecasts
  centre               = ecmf
  subCentre            = 0
  Conventions          = CF-1.7

The dimension of msl changed into (lon × lat × valid_time), however the length of "valid_time" did not match with "time" and "step" obtained from xarray.

The ECMWF forcasting file was uploaded as attachment test.zip. I would like to ask how can I deal with this issue with GRIBDatasets?

@lupemba
Copy link
Contributor

lupemba commented Nov 22, 2024

@LJ-Young

Have you tried to inspect the data to actually check if something is missing? I am not super familiar with the notion of "steps" but it looks like it is just referring too 4 different times of day (00:00:00 06:00:00 12:00:00 18:00:00).

If you open the file with GRIBDatasets and list the times, then you can see that there is 4 timestamps for each day in January + 2 timestamps in February.

ds = GRIBDataset(raw"C:\Users\Kok\Downloads\test\test.grib")
ds["valid_time"][:]

Output

126-element Vector{Dates.DateTime}:
 2024-01-01T00:00:00
 2024-01-01T06:00:00
 2024-01-01T12:00:00
 2024-01-01T18:00:00
 2024-01-02T00:00:00
 2024-01-02T06:00:00
 2024-01-02T12:00:00
 2024-01-02T18:00:00
 2024-01-03T00:00:00
 2024-01-03T06:00:00
 ⋮
 2024-01-30T00:00:00
 2024-01-30T06:00:00
 2024-01-30T12:00:00
 2024-01-30T18:00:00
 2024-01-31T00:00:00
 2024-01-31T06:00:00
 2024-01-31T12:00:00
 2024-01-31T18:00:00
 2024-02-01T00:00:00
 2024-02-01T06:00:00

Getting the msl for one time stamp also looks fine considering that the unit is Pa

ds["msl"][:,:,3]

output

180×91 Matrix{Union{Missing, Float64}}:
 1.02034e5       1.01557e5  1.01074e5  1.00929e5  …  98586.8  99002.4  99328.3  99328.8  98903.7    
 1.02034e5  101548.0        1.01047e5  1.00838e5     98582.4  99013.8  99340.3  99313.5  98903.7    
 1.02034e5       1.01539e5  1.0102e5   1.00752e5     98583.9  99026.3  99354.9  99302.8  98903.7    
 1.02034e5       1.01531e5  1.00993e5  1.00665e5     98587.3  99035.0  99365.2  99291.9  98903.7    
 ⋮                                                ⋱                                          ⋮      
 1.02034e5       1.01596e5  1.01183e5  1.01268e5     98627.4  98973.2  99231.0  99283.2  98903.7    
 1.02034e5       1.01586e5  1.01155e5  1.01192e5     98616.4  98991.8  99261.5  99299.7  98903.7    
 1.02034e5       1.01577e5  1.01128e5  1.01109e5     98609.4  99000.3  99289.5  99312.0  98903.7
 1.02034e5       1.01567e5  1.01102e5  1.01021e5     98602.2  98996.4  99310.9  99320.8  98903.7 

@benelsen
Copy link

It appears to me that the file has 2 messages per valid_time and only one ends up in the GRIBDataset.

i.e. the file has steps 0h, 6h, 12h, 18h for every data time (00:00, 12:00). So we have 2 data points starting with 2024-01-01 12:00. One from step 12h of the 00:00 data, and one from step 0h of the 12:00 data.

$ grib_ls test.grib -m        
test.grib
origin      date        time        step        levtype     param       expver      class       model       type        stream      
ecmf        20240101    0000        0           sfc         151         prod        ti          glob        fc          oper       
ecmf        20240101    0000        6           sfc         151         prod        ti          glob        fc          oper       
ecmf        20240101    0000        12          sfc         151         prod        ti          glob        fc          oper       
ecmf        20240101    0000        18          sfc         151         prod        ti          glob        fc          oper       
ecmf        20240101    1200        0           sfc         151         prod        ti          glob        fc          oper       
ecmf        20240101    1200        6           sfc         151         prod        ti          glob        fc          oper       
ecmf        20240101    1200        12          sfc         151         prod        ti          glob        fc          oper       
ecmf        20240101    1200        18          sfc         151         prod        ti          glob        fc          oper       
(…)

I'm unclear which one ends up in the GRIBDataset, so only downloading steps 0h and 6h for each data (00:00, 12:00) would make sure only the most recent data point ends up in the GRIB.

@LJ-Young
Copy link
Author

@lupemba "step" means forecasting step. For example, for "time" 2024-01-01T00:00:00, this test.grib file provides 4 forecasting msl fields ("step" 0h, 6h, 12h and 18h) into the future. Theoretically, the msl obtained from GRIBDatasets should be four-dimensional (lon x lat x time x step), or the "valid_time" length should be 64×4=256, but the valid_time length here is 126, which means that some data is overwritten during the reading process of GRIBDatasets (as @benelsen mentioned).

@tcarion
Copy link
Member

tcarion commented Dec 4, 2024

Hi @LJ-Young, thank you for reporting the issue!

This typically falls into a missing feature of GRIBDatasets. Currently, it only considers the valid_time as time dimension, as mentionned here and here.

That's something that should be extended. I don't have much time lately, but I should be able to work on it beginning of next year. Of course, any help is always welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants