Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle reading multi-band datasetes #146

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Handle reading multi-band datasetes #146

wants to merge 1 commit into from

Conversation

gjoseph92
Copy link
Owner

Logic for reading multi-band assets on the dask side. Actually figuring out the number of bands from STAC metadata will be a separate PR. So even after this is done and merged, multiband assets won't be supported yet, but we'll be a lot closer. I'm hoping that someone else (@TomAugspurger?) could handle the STAC side of things.

This is incomplete and untested; it won't actually work right now. Just wanted to share progress towards #62.

- Think it's piped through in `to_dask` and readers
- Needs tests
- Should `bands` list even go in the asset table? It's kinda redundant. But maybe no worse than inlining it in every task.
- Custom chunks still feels awkward.

`prepare` obvs doesn't actually figure out band counts from STAC yet. `to_coords` will also need to be updated to handle this.
@@ -321,7 +331,25 @@ def prepare_items(
)

# Phew, we figured out all the spatial stuff! Now actually store the information we care about.
asset_table[item_i, asset_i] = (asset["href"], asset_bbox_proj)

bands: Optional[Sequence[int]] = None
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is where we'd actually figure out band counts from STAC metadata

self.dtype = dtype
self.ndim = len(bands) if bands is not None else 1
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, ndim is not the right name for this, since this is just the length of one dimension... it was a late night.

return norm, asset_table_band_chunks


def process_multiband_chunks(
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a probably an easier/more legible way to implement this function, but this turned out to be the crux of implementing multi-band. We really don't want two different chunks to source data from the same multi-band asset (different bands to different chunks), because this would no longer be a blockwise operation (from asset table -> full array), therefore would require rewriting the asset table, opening the same dataset twice, and probably be bad performance anyway when bands are interleaved. It seemed easier to just validate that this situation doesn't occur than to support it.

If you have an asset with lots of bands which aren't stored interleaved, I could definitely see wanting different chunks per band. I'm just not sure how common that is. It would be helpful to know if this seems like a common use-case @TomAugspurger.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docs describe the three methods for organizing multiple bands, two of which use interleaving: https://desktop.arcgis.com/en/arcmap/10.3/manage-data/raster-and-images/bil-bip-and-bsq-raster-files.htm

And a shorter synopsis of each: https://www.l3harrisgeospatial.com/docs/enviimagefiles.html
BIP is common for hyperspectral datasets like ASTER
BIL I think is the most common format. Landsat comes in BIL
BSQ seems to be less common, legacy sats like SPOT were distributed in BSQ: https://www.loc.gov/preservation/digital/formats/fdd/fdd000306.shtml but I'm not certain on how common BSQ is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants