-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stackstac.stack provides shifted pixel coordinnates compare to qgis/rioxarray #207
Comments
There's not a single convention for whether coordinates should refer to the center of a pixel versus to its top-left corner (or even somewhere else): https://gis.stackexchange.com/questions/122670/is-there-a-standard-for-the-coordinates-of-pixels-in-georeferenced-rasters
In my personal experience with geospatial tools, top-left-corner is far more common/conventional, but I'm sure it's different for others depending on what tools you're used to. To me, rioxarray is the notable exception in the Python geospatial world, in that it intentionally shifts the coordinates by a half-pixel offset to align with pixel centers. My understanding is that this is meant to match xarray's model, where coordinates often refer to the center of pixels (I can't find a link to this in the docs though). However, I also understood that this convention came from xarray's origins working with weather and climate data (where I think "pixel is point" is more common, which expects the half-pixel offset), not that it's a requirement for representing coordinates in xarray. Anyway, I chose to make top-left corner the default for stackstac because it seemed more consistent to me with most geospatial conventions broadly, rather matching one particular library (rioxarray) but introducing confusion for everyone else. But the As for Ultimately both of these are just the defaults that I picked, thinking they'd be the least confusing (for reasons explained above). I'd be very happy to hear others weigh in on whether these defaults also seem good to them. As for actually changing them, that would be a subtle and significant breaking change for most stackstac users, so I'd definitely be conservative about making any changes. But if there are good reasons to change either of them, it still would be worth doing. |
First, thanks a lot for your support, that's amazing, plus this package in general is making my life so much easier. To be honest I thought using stackstac with rioxarray is what most of us do but I might be taking false assumptions. If it is the case I would say xy_coords="center" would be more suited as default, all scripts I've seen have missed this specificity and it could create issue down the line with cliping for example. Regarding snap bounds, I love the functionnality, but in the context of sentinel 2 above it shifts the whole scene by a pixel. Which also can be confusing for most users and again I haven't seen much script online editing such value so people might also be getting surprised. In my case, I'm happy I'm aware, and I hope having this issue created might help people facing similar issue. You definitively have a better view on if it should be changed or not. |
@gjoseph92 I would like to disagree with your assessment regarding lack of convention for coordinates in xarray. Here is a simple notebook example you can try out: import xarray as xr
import numpy as np
mode = "center"
N = 3
X = Y = np.linspace(0, N-1, N)
if mode == "center":
X = X + 0.5
Y = Y + 0.5
pix = np.zeros((len(Y), len(X)))
pix[1, 1] = 1
xr.DataArray(pix, coords={'x': X, 'y': Y}).plot.imshow(size=2); But without adding The coordinates returned by rioxarray and odc-stac are not like that because of GDAL, they are like that because that's what xarray library expects to see. |
Thanks for jumping in here @Kirill888. That's a good point. Unfortunately, it looks like indexing also assumes pixel-center coordinates: import xarray as xr
import numpy as np
N = 3
X = Y = np.linspace(0, N-1, N)
pix = np.zeros((len(Y), len(X)))
pix[1, 1] = 1
arr_tl = xr.DataArray(pix, coords={'y': Y, 'x': X})
arr_c = xr.DataArray(pix, coords={'y': Y + 0.5, 'x': X + 0.5}) arr_c.sel(x=0.9, y=1, method='nearest')
# <xarray.DataArray ()>
# array(0.)
# Coordinates:
# y float64 1.5
# x float64 0.5
arr_tl.sel(x=0.9, y=1, method='nearest')
# <xarray.DataArray ()>
# array(1.)
# Coordinates:
# y float64 1.0
# x float64 1.0 You'd expect (0.9, 1) to fall within a 0 pixel. That's a strong argument for changing the default. One place I can think of so far as being confusing is when you're getting coordinates back out of xarray. For example, maybe you're interested in the points with NDVI greater than some threshold to pass into some other analysis. flat = arr_c.stack(px=["x", "y"], create_index=False)
flat[flat > 0].to_dataframe(name='value')
# y x value
# px
# 0 1.5 1.5 1.0 Using pixel-center coordinates gives you, of course, the coordinates of the pixel centers. If you're not aware of this xarray convention and are used to conventions in other geospatial tools, or want to pass these coordinates into another tool, you might end up with a half-pixel offset. Top-left labeled coordinates avoids this. flat = arr_tl.stack(px=["x", "y"], create_index=False)
print(flat[flat > 0].to_dataframe(name='value'))
# y x value
# px
# 0 1.0 1.0 1.0 I think the indexing issue might outweigh getting coordinates out, though. |
looks like xarray is consistent with pixel coordinate notation 🥳
It's a common problem when dealing with image analysis in general. Image can be viewed as a multi-dimensional array (usually 2 or 3 dims), It's not an easy problem to resolve when you have a lot of code that assumes certain convention, possibly different in different places, in a library with a long history, my request to document pixel coordinate convention remains open... I personally feel that "top left corner of the top left pixel is |
Default stackstac.stack parameters are (in my experience) for all stac catalogs I tried providing incorrect pixel coordinates.
First here is an example showing up the shift created by stackstac.stack default parameters, should we default 'snap_bounds' to False & 'xy_coords' to "center" instead?
Let's gather some data
Then do some plots
Please confirm it's not only me. It seems like the convention default is confusing right?
The text was updated successfully, but these errors were encountered: