-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle reading multi-band datasetes #146
base: main
Are you sure you want to change the base?
Conversation
- Think it's piped through in `to_dask` and readers - Needs tests - Should `bands` list even go in the asset table? It's kinda redundant. But maybe no worse than inlining it in every task. - Custom chunks still feels awkward. `prepare` obvs doesn't actually figure out band counts from STAC yet. `to_coords` will also need to be updated to handle this.
@@ -321,7 +331,25 @@ def prepare_items( | |||
) | |||
|
|||
# Phew, we figured out all the spatial stuff! Now actually store the information we care about. | |||
asset_table[item_i, asset_i] = (asset["href"], asset_bbox_proj) | |||
|
|||
bands: Optional[Sequence[int]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this is where we'd actually figure out band counts from STAC metadata
self.dtype = dtype | ||
self.ndim = len(bands) if bands is not None else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, ndim
is not the right name for this, since this is just the length of one dimension... it was a late night.
return norm, asset_table_band_chunks | ||
|
||
|
||
def process_multiband_chunks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a probably an easier/more legible way to implement this function, but this turned out to be the crux of implementing multi-band. We really don't want two different chunks to source data from the same multi-band asset (different bands to different chunks), because this would no longer be a blockwise operation (from asset table -> full array), therefore would require rewriting the asset table, opening the same dataset twice, and probably be bad performance anyway when bands are interleaved. It seemed easier to just validate that this situation doesn't occur than to support it.
If you have an asset with lots of bands which aren't stored interleaved, I could definitely see wanting different chunks per band. I'm just not sure how common that is. It would be helpful to know if this seems like a common use-case @TomAugspurger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These docs describe the three methods for organizing multiple bands, two of which use interleaving: https://desktop.arcgis.com/en/arcmap/10.3/manage-data/raster-and-images/bil-bip-and-bsq-raster-files.htm
And a shorter synopsis of each: https://www.l3harrisgeospatial.com/docs/enviimagefiles.html
BIP is common for hyperspectral datasets like ASTER
BIL I think is the most common format. Landsat comes in BIL
BSQ seems to be less common, legacy sats like SPOT were distributed in BSQ: https://www.loc.gov/preservation/digital/formats/fdd/fdd000306.shtml but I'm not certain on how common BSQ is.
Logic for reading multi-band assets on the dask side. Actually figuring out the number of bands from STAC metadata will be a separate PR. So even after this is done and merged, multiband assets won't be supported yet, but we'll be a lot closer. I'm hoping that someone else (@TomAugspurger?) could handle the STAC side of things.
This is incomplete and untested; it won't actually work right now. Just wanted to share progress towards #62.