GRIB collection design #732

JohnLCaron · 2021-06-20T20:33:33Z

JohnLCaron
Jun 20, 2021
Collaborator

These are the types of GRIB Collections, with explanations and issues.

SRC

SRC (Single Runtime Collection) is the ideal for model data. It requires that all the data for a model run is in one file (PartitionType = file) or in one directory (PartitionType = directory).

In the GribCollection ncx, the time coordinate is a time2D (1 X ntimes) orthogonal:

time2D: time4 runtime=reftime nruns=1 ntimes=18 isOrthogonal=true isRegular=false
  All time values= 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, (n=18)
  time:   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,  15,  16,  17,  18, (18)

In the Netcdf representation, the variables have a 1D time coordinate with a scalar reference time:

float Precipitation(time4=18, y=1105, x=1649);
  :coordinates = "reftime time4 y x ";

double reftime;
double time4(time4=18);

MRUTC

MRUTC (Multiple Runtime Unique Time Collection) has multiple reference times, but a single unique time coordinate for each. Its likely that the only case we've seen is that time offest == 0, and reference time == valid time. We could rename this case "Observation Collection", or OBS.

In the GribCollection ncx, the time coordinate is a time2D (nruns X 1) orthogonal:

time1 runtime=reftime nruns=30 ntimes=1 isOrthogonal=true isRegular=false
Time offsets: (1 minutes) ref=2020-10-27T00:00:38Z 
     0

In the Netcdf representation, the variables have a 1D time coordinate with an auxillary 1D reference time with values identical to the runtime:

float MESH_altitude_above_msl(time1=30, altitude_above_msl=1, lat=3500, lon=7000);
  :coordinates = "reftime1 time1 altitude_above_msl lat lon ";

double time1(time1=30);   
double reftime1(time1=30);

MRUTP

MRUTP is identical to an MRUTC. It occurs when one collects MRUTCs. The times are unique. The number of times can get quite long, and may be irregular. In the MRMS, they are approx every 2 mins, and each variable has their own.

An MRUTP may also get created with collecting SRCs with a single forecast time, such as "analysis" datasets like Global_0p5deg_ana. These are represented identially to OBS datasets, but have many fewer times (123 vs 22K), which are shared amonst variables.

***** Why does GFS-Global_0p5deg_ana.ncx4 runtime have units of seconds, not hours??**

TwoD

TwoD (2D time coordinates) are generated from collections of SRCs. Variables have a runtime and usually a 1D offsetTime coordinate, and an auxiliary 2D time coordinate in the Netcdf representation:

float Convective_Available_Potential_Energy_surface(reftime=62, validtime2Offset=11, y=65, x=93);
  :coordinates = "reftime validtime2 validtime2Offset y x "; 
  
double reftime(reftime=62);
  :standard_name = "forecast_reference_time";
  
double validtime2Offset(validtime2Offset=11);
  :standard_name = "forecast_period";   

double validtime2(reftime=62, validtime2Offset=11);
  :standard_name = "time";

And the twoD time coordinate has NaNs where needed:

TwoD/validtime1 = 
  {
    {0.0, 1.0, 3.0, 6.0, 7.0, 8.0, 10.0, 17.0, 18.0, 21.0, 23.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 36.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 46.0, 47.0, 48.0},
    {5.0, 6.0, 7.0, 8.0, 11.0, 13.0, 14.0, 15.0, 17.0, 19.0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN},
    {2.0, 3.0, 5.0, 13.0, 17.0, 19.0, 20.0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN},
    {3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 14.0, 15.0, 17.0, 19.0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN},
    {4.0, 7.0, 8.0, 13.0, 14.0, 15.0, 16.0, 17.0, 21.0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN},
    ....

There are variations in the GribCollection ncx4 representation:

TwoD orthogonal

(Multiple runtimes with identical time offsets for each) is the ideal for collections of model runs. These are stored in the ncx4 as two 1D coordinates, runtime and offsetTime.

TwoD regular

(Multiple runtimes with identical time offsets for each "runtime minute of day") is next best for collections of model runs. The offsetTime is stored in the ncx4 as M 1D coordinates, where M are the number of runs in the day. These are identical for all days.

TwoD time2D

(Multiple runtimes with irregular time offsets across days) is the general case for collections of model runs. The offsetTime is stored in the ncx4 as N 1D coordinates, where N is the number of runs overall. When N becomes large, the size of these coordinates can start to limit what can be stored in memory. These are a good candidate for a meta-collecion.

The offset time coordinate is not put into the Netcdf, just the 1D reftime and the 2D validtime:

float DZDT_P0_L100_GLC0_isobaric(reftime=82, time1=27, isobaric=1, y=1059, x=1799);
  :coordinates = "reftime validtime1 isobaric y x ";

double reftime(reftime=82);
double validtime1(reftime=82, time1=27);

JohnLCaron · 2021-06-22T13:34:34Z

JohnLCaron
Jun 22, 2021
Collaborator Author

TwoD time2D vs TwoD regular.

In both cases, the netcdf time coordinate variable has to be 2D, and the variables using it have two time coordinates, for example:

float Weather_string_surface(reftime1=1480, time=65, y=1377, x=2145);

where 65 is the maximum forecast times for any reference time. The coordinate is stored with NaNs to indicate where those are missing:

 {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 39.0, 42.0, 45.0, 48.0, 51.0, 54.0, 57.0, 60.0, 63.0, 66.0, 69.0, 72.0, 78.0, 84.0, 90.0, 96.0, 102.0, 108.0, 114.0, 120.0, 126.0, 132.0, 138.0, 144.0, 150.0, 156.0, 162.0, 168.0, NaN},
 {60.0, 120.0, 180.0, 240.0, 300.0, 360.0, 420.0, 480.0, 540.0, 600.0, 660.0, 720.0, 780.0, 840.0, 900.0, 960.0, 1020.0, 1080.0, 1140.0, 1200.0, 1260.0, 1320.0, 1380.0, 1440.0, 1500.0, 1560.0, 1620.0, 1680.0, 1740.0, 1800.0, 1860.0, 1920.0, 1980.0, 2040.0, 2100.0, 2160.0, 2340.0, 2520.0, 2700.0, 2880.0, 3060.0, 3240.0, 3420.0, 3600.0, 3780.0, 3960.0, 4140.0, 4320.0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN},
...

The advantage of regular is that we can store just the regular pattern in the ncx, which applies to all reference times. When its not regular, we have to store the offsets for every reftime. This gets very large as the number of reference times grow.

So for regular, we have to store N + M*H values, and for time2D we have to store N x M values in the ncx4, where

N = number of reference times
M = number of forecast times per reference time
H = number of reference times per day

For regular, we can generate the time2D coordinate on the fly for Netcdf. For Grids, we never have to instantiate the coordinates unless the user asks for them explicitly.

In principle, the time period of regularity doesnt have to be one day, but in practice we havent seen any other case.

So its worth doing when possible.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRIB collection design #732

{{title}}

Replies: 1 comment

{{title}}

Select a reply

GRIB collection design #732

JohnLCaron Jun 20, 2021 Collaborator

SRC

MRUTC

MRUTP

TwoD

TwoD orthogonal

TwoD regular

TwoD time2D

Replies: 1 comment

JohnLCaron Jun 22, 2021 Collaborator Author

TwoD time2D vs TwoD regular.

JohnLCaron
Jun 20, 2021
Collaborator

JohnLCaron
Jun 22, 2021
Collaborator Author