Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenDrift] Request for smaller chunks #46

Open
zachsa opened this issue Sep 6, 2023 · 5 comments
Open

[OpenDrift] Request for smaller chunks #46

zachsa opened this issue Sep 6, 2023 · 5 comments
Assignees

Comments

@zachsa
Copy link
Collaborator

zachsa commented Sep 6, 2023

Thanks very much for including the JSON kerchunk output. If it's not too much trouble, please can you specify smaller chunks in the NetCDF file.

Looking at the output at https://mnemosyne.somisana.ac.za/somisana/opendrift/20230904/test_east_coast_blowout, I see in the JSON file that the smallest retrievable chunk is 3.7MB. If I remember specifying chunk sizes for tier3 output there is a tradeoff between write-speed and chunk size (more chunks = longer to create the NetCDF file).

It's easy to do: https://github.com/SAEON/somisana/blob/stable/toolkit/cli/applications/croco/regrid_tier3/__init__.py#L258-L289

# Explicitly set chunk sizes of some dimensions
chunksizes = {
    "time": 24,
    "depth": 1,
}

# For data_vars, set chunk sizes for each dimension
# This is either the override specified in "chunksizes"
# or the length of the dimension
default_chunksizes = {dim: len(data_out[dim]) for dim in data_out.dims}

encoding = {
    var: {
        "dtype": "float32", 
        "chunksizes": [chunksizes.get(dim, default_chunksizes[dim]) for dim in data_out[var].dims]
    }
    for var in data_out.data_vars
}

# Adjust for non-chunked variables - I can't remember why this doesn't override the 'chunksizes' array above
encoding["time"] = {"dtype": "i4"}
encoding['latitude'] = {"dtype": "float32"}
encoding['longitude'] = {"dtype": "float32"}
encoding['depth'] = {"dtype": "float32"}

log("Generating NetCDF data")
write_op = data_out.to_netcdf(
    output,
    encoding=encoding,
    mode="w",
    compute=False,
)

Here is an example of the output: https://mnemosyne.somisana.ac.za/somisana/algoa-bay/5-day-forecast/202309/20230906_hourly_avg_t3.kerchunk.json

I can make a get request to get all salt values at a particular depth/time for all lat/longs.

image

(The salt is defined as [time, depth, lat, long] - "salt/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"time\",\"depth\",\"latitude\",\"longitude\"],\"long_name\":\"averaged salinity\",\"standard_name\":\"sea_water_salinity\",\"units\":\"PSU\"}" )

@GilesFearon
Copy link
Collaborator

OK, so you'd like smaller chunks in the trajectories.nc file? That's the raw output file written by the opendrift model, so I'd have to edit the opendrift source code to do that.... which is not ideal. Alternatively, we could generate another postprocessed file where the chunk size is specified?

@zachsa
Copy link
Collaborator Author

zachsa commented Sep 6, 2023

I didn't realize that was the output from opendrift. I haven't looked inside the .nc file yet, so I don't know that much about what I'm trying to render (I guess I should do this first).

Is this conceptually similar to the croco output where the raw netcdf file is difficult to use?

@GilesFearon
Copy link
Collaborator

the opendrift output is a lot more intuitive than croco...

@zachsa
Copy link
Collaborator Author

zachsa commented Sep 14, 2023

Right, I see it's not as straight forward as 'where will the particle be at t = x'.

Questions!
(1) Are these variables all produced by opendrift, or are some of them passed through from the model output?
(2) Does opendrift work with raw croco output? if so, how does it work with other models?

dimensions:
	trajectory = 5000 ;
	time = 187 ;
variables:
	int trajectory(trajectory) ;
	double time(time) ;
	int status(trajectory, time) ;
	int moving(trajectory, time) ;
	float age_seconds(trajectory, time) ;
	int origin_marker(trajectory, time) ;
	float lon(trajectory, time) ;
	float lat(trajectory, time) ;
	float z(trajectory, time) ;
	float wind_drift_factor(trajectory, time) ;
	float current_drift_factor(trajectory, time) ;
	float terminal_velocity(trajectory, time) ;
	float mass_oil(trajectory, time) ;
	float viscosity(trajectory, time) ;
	float density(trajectory, time) ;
	float bulltime(trajectory, time) ;
	float interfacial_area(trajectory, time) ;
	float mass_dispersed(trajectory, time) ;
	float mass_evaporated(trajectory, time) ;
	float mass_biodegraded(trajectory, time) ;
	float fraction_evaporated(trajectory, time) ;
	float water_fraction(trajectory, time) ;
	float oil_film_thickness(trajectory, time) ;
	float diameter(trajectory, time) ;
	float x_sea_water_velocity(trajectory, time) ;
	float y_sea_water_velocity(trajectory, time) ;
	float x_wind(trajectory, time) ;
	float y_wind(trajectory, time) ;
	float upward_sea_water_velocity(trajectory, time) ;
	float sea_surface_wave_significant_height(trajectory, time) ;
	float sea_surface_wave_stokes_drift_x_velocity(trajectory, time) ;
	float sea_surface_wave_stokes_drift_y_velocity(trajectory, time) ;
	float sea_surface_wave_period_at_variance_spectral_density_maximum(trajectory, time) ;
	float sea_surface_wave_mean_period_from_variance_spectral_density_second_frequency_moment(trajectory, time) ;
	float sea_ice_area_fraction(trajectory, time) ;
	float sea_ice_x_velocity(trajectory, time) ;
	float sea_ice_y_velocity(trajectory, time) ;
	float sea_water_temperature(trajectory, time) ;
	float sea_water_salinity(trajectory, time) ;
	float sea_floor_depth_below_sea_level(trajectory, time) ;
	float ocean_vertical_diffusivity(trajectory, time) ;
	float land_binary_mask(trajectory, time) ;
	float ocean_mixed_layer_thickness(trajectory, time) ;

@GilesFearon
Copy link
Collaborator

GilesFearon commented Sep 14, 2023

To answer the question 'where will the particle be at t = x', you would get that from the lon, lat variables (and z if you wanted the depth). All variables have dimensions (trajectory, time), where trajectory corresponds to the particles, so you would be able to extract the location of particle p at time t.

To answer the other questions

  1. some of the variables are indeed read directly from the ocean model data, but interpolated to the particle locations e.g. x_sea_water_velocity. Others are run time inputs e.g. wind_drift_factor and others are computed during the opendrift run e.g. mass_evaporated
  2. yes, it works with the raw croco output files, but only because they have been kind enough to provide a dedicated 'reader' for roms files (which works on croco files too thankfully). If you want to run it with ocean data on a regular grid you have to use a different 'reader'... they provide a few out the box readers which work really well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants