Handle reading multi-band datasetes #146

gjoseph92 · 2022-04-23T00:23:59Z

Logic for reading multi-band assets on the dask side. Actually figuring out the number of bands from STAC metadata will be a separate PR. So even after this is done and merged, multiband assets won't be supported yet, but we'll be a lot closer. I'm hoping that someone else (@TomAugspurger?) could handle the STAC side of things.

This is incomplete and untested; it won't actually work right now. Just wanted to share progress towards #62.

- Think it's piped through in `to_dask` and readers - Needs tests - Should `bands` list even go in the asset table? It's kinda redundant. But maybe no worse than inlining it in every task. - Custom chunks still feels awkward. `prepare` obvs doesn't actually figure out band counts from STAC yet. `to_coords` will also need to be updated to handle this.

gjoseph92 · 2022-04-23T00:25:13Z

stackstac/prepare.py

@@ -321,7 +331,25 @@ def prepare_items(
                    )

            # Phew, we figured out all the spatial stuff! Now actually store the information we care about.
-            asset_table[item_i, asset_i] = (asset["href"], asset_bbox_proj)
+
+            bands: Optional[Sequence[int]] = None


FYI, this is where we'd actually figure out band counts from STAC metadata

gjoseph92 · 2022-04-23T00:25:44Z

stackstac/reader_protocol.py

        self.dtype = dtype
+        self.ndim = len(bands) if bands is not None else 1


Oof, ndim is not the right name for this, since this is just the length of one dimension... it was a late night.

gjoseph92 · 2022-04-23T00:31:31Z

stackstac/to_dask.py

+    return norm, asset_table_band_chunks
+
+
+def process_multiband_chunks(


There's a probably an easier/more legible way to implement this function, but this turned out to be the crux of implementing multi-band. We really don't want two different chunks to source data from the same multi-band asset (different bands to different chunks), because this would no longer be a blockwise operation (from asset table -> full array), therefore would require rewriting the asset table, opening the same dataset twice, and probably be bad performance anyway when bands are interleaved. It seemed easier to just validate that this situation doesn't occur than to support it.

If you have an asset with lots of bands which aren't stored interleaved, I could definitely see wanting different chunks per band. I'm just not sure how common that is. It would be helpful to know if this seems like a common use-case @TomAugspurger.

These docs describe the three methods for organizing multiple bands, two of which use interleaving: https://desktop.arcgis.com/en/arcmap/10.3/manage-data/raster-and-images/bil-bip-and-bsq-raster-files.htm

And a shorter synopsis of each: https://www.l3harrisgeospatial.com/docs/enviimagefiles.html
BIP is common for hyperspectral datasets like ASTER
BIL I think is the most common format. Landsat comes in BIL
BSQ seems to be less common, legacy sats like SPOT were distributed in BSQ: https://www.loc.gov/preservation/digital/formats/fdd/fdd000306.shtml but I'm not certain on how common BSQ is.

gjoseph92 commented Apr 23, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle reading multi-band datasetes #146

Handle reading multi-band datasetes #146

gjoseph92 commented Apr 23, 2022

gjoseph92 Apr 23, 2022

gjoseph92 Apr 23, 2022

gjoseph92 Apr 23, 2022

rbavery Sep 8, 2022

		self.dtype = dtype
		self.ndim = len(bands) if bands is not None else 1

		return norm, asset_table_band_chunks


		def process_multiband_chunks(

Handle reading multi-band datasetes #146

Are you sure you want to change the base?

Handle reading multi-band datasetes #146

Conversation

gjoseph92 commented Apr 23, 2022

gjoseph92 Apr 23, 2022

Choose a reason for hiding this comment

gjoseph92 Apr 23, 2022

Choose a reason for hiding this comment

gjoseph92 Apr 23, 2022

Choose a reason for hiding this comment

rbavery Sep 8, 2022

Choose a reason for hiding this comment