Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Storing GCPs in xarray #376

Closed
snowman2 opened this issue Jul 7, 2021 · 9 comments · Fixed by #460
Closed

ENH: Storing GCPs in xarray #376

snowman2 opened this issue Jul 7, 2021 · 9 comments · Fixed by #460
Labels
enhancement New feature or request

Comments

@snowman2
Copy link
Member

snowman2 commented Jul 7, 2021

Forward from: #339

Changes that will likely be needed:

  • rio.write_gcps() - write an array of GCPS to the grid_mapping coordinate attrs. I am thinking json would be the best way to go.
  • rio.get_gcps() - read GCPS from json in the grid_mapping coordinate attrs and convert to rasterio GCP objects.
  • rio.reproject() - pass in GCPS automatically where applicable.
  • open_rasterio() - write GCPS and don't generate coordinates or transform.
  • rio.to_raster() - write GCPS to raster.

Original GCPs:

[GroundControlPoint(row=0.0, col=0.0, x=-1.3712442422120068, y=48.818122749035496, z=1.9444833537563682, id='1', info=''), GroundControlPoint(...]

Coordinate Structure

Dimensions: ( gcp: ...)
Coordinates:
  gcp_x (gcp): ...
  gcp_y (gcp): ...
  gcp_z (gcp): ...
  gcp_row (gcp): ...
  gcp_col (gcp): ...
  gcp_id (gcp): ...
  gcp_info (gcp): ...

JSON Structure

Grid mapping coordinate:

Attributes:
    crs_wkt: ....
    gcps: '[{"row": 0.0, "col": 0.0, "x": -1.3712442422120068, "y": 48.818122749035496, "z": 1.9444833537563682, "id": "1","info": ""}, {...]'

cc @mraspaud

@snowman2 snowman2 added the enhancement New feature or request label Jul 8, 2021
@mraspaud
Copy link
Contributor

Sorry for the late reaction, early vacation here in Sweden.
Anyway, I've been doing some thinking about how to store GCPs in xarray.

The TL;DR summary: the JSON structure you propose is the easiest and most pragmatic solution.

The idea I had was to use the existing coordinates mechanism that xarray provides, like this:

import dask.array as da
import xarray as xr
import numpy as np
data = xr.DataArray(da.ones((40, 40)), coords=dict(y=range(40), x=range(40)), dims=['y', 'x'])
lons = da.random.random((10, 10))
lats = da.random.random((10, 10))
z = da.random.random((10, 10))
x_sub = np.arange(0, 40, 4)
y_sub = np.arange(0, 40, 4)
xrlons = xr.DataArray(lons, dims=['y', 'x'], coords=dict(y=y_sub, x=x_sub))
data.coords['lons'] = xrlons
data

that prints out:

<xarray.DataArray 'ones-9a83ffe72ccafae3ea3b86da58d3e667' (y: 40, x: 40)>
dask.array<ones, shape=(40, 40), dtype=float64, chunksize=(40, 40), chunktype=numpy.ndarray>
Coordinates:
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9 10 ... 30 31 32 33 34 35 36 37 38 39
  * x        (x) int64 0 1 2 3 4 5 6 7 8 9 10 ... 30 31 32 33 34 35 36 37 38 39
    lons     (y, x) float64 dask.array<chunksize=(40, 40), meta=np.ndarray>

as xarray fills up the non-existing x and y points in lons and fills it with nans.

From there, we can fill the nans using linear interpolation with the builtin interpolate_na:

data.coords['lons'].interpolate_na('y', fill_value="extrapolate").interpolate_na('x', fill_value="extrapolate").values

However there are a some issues:

  • the gcps arrays are inflated with nans to the size of the data array, which might create inefficiencies
  • the difficulty to store id and info in this manner
  • when gcps and data points are not aligned, gcp data not ending up on data coordinate points is dropped at the moment
  • no builtin function to do 2d interpolation here

So if we don't want to go to the risky business of guessing what the user want or provide a bloated interface to handle all the particularities of different data formats, I think your JSON proposal is the best for now.

@snowman2
Copy link
Member Author

I think your JSON proposal is the best for now.

Sounds good to me. @mraspaud is this something you would be interested in submitting a PR for?

@mraspaud
Copy link
Contributor

I'm under quite some time pressure right now, so I can't make any promises... I'll report here if I start working on it.

@mraspaud
Copy link
Contributor

mraspaud commented Feb 2, 2022

Just for reference, I was just wondering how it would look like as geojson:

{ "type": "FeatureCollection",
    "features": [
      { "type": "Feature",
        "geometry": {"type": "Point", "coordinates": [-1.3712442422120068, 48.818122749035496, 1.9444833537563682]},
        "properties": {"id": "1",
                       "col": 0,
                       "row": 0,
                       "info": "the first gcp"}
      },
      { "type": "Feature",
        "geometry": {"type": "Point", "coordinates": [103.0, 1.5, 12.0]},
        "properties": {"id": "2",
                       "col": 5,
                       "row": 0,
                       "info": "the second gcp"}                      
      }
...
   ]
}

So more verbose, but I don't know if that would be useful in any way.

@snowman2
Copy link
Member Author

snowman2 commented Feb 2, 2022

I like the idea of storing it using a standard format 👍

@mraspaud
Copy link
Contributor

mraspaud commented Feb 4, 2022

Is it worth adding the geojson package as a dependency?
https://github.com/jazzband/geojson

@snowman2
Copy link
Member Author

snowman2 commented Feb 4, 2022

Is it worth adding the geojson package as a dependency?

My initial impression is no because the geometry types are simple and consistent. However, it isn't a large package, so if it has a sizeable reduction/simplification of the code then it is probably fine. One thing to be careful about is the precision enforced by geojson.

@snowman2
Copy link
Member Author

snowman2 commented Feb 4, 2022

Actually, the precision enforcement from geojson seems to be unavoidable. You can set the precision, but that seems kinda clunky. I am currently thinking we shouldn't use it for GCPs. Thoughts?

@mraspaud
Copy link
Contributor

mraspaud commented Feb 4, 2022

You are right about geojson fiddling with the precision. I'm supposing we will not dump/load the geojson string, as we use this as an internal representation of the gcps only, so I would rather keep the full precision. Hence we might need to implement our own geojson conversion, but it should be small.

Relevant part of the geojson spec:
https://datatracker.ietf.org/doc/html/rfc7946#section-11.2

I'll be opening a draft PR shortly so we can discuss implementation details there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants