Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate MODIS HDF4 data for use with titiler-cmr #25

Open
abarciauskas-bgse opened this issue Jul 18, 2024 · 7 comments
Open

Evaluate MODIS HDF4 data for use with titiler-cmr #25

abarciauskas-bgse opened this issue Jul 18, 2024 · 7 comments
Assignees

Comments

@abarciauskas-bgse
Copy link
Contributor

Earth.gov, which is an instance of VEDA, has made a request to include MODIS data. It was correctly identified that this dataset, as it is in earthdata cloud, could be a candidate for titiler-cmr. However, I am coming to the conclusion that it won't work since the files are in HDF4 and it is my understanding that any sort of virtual file system (s3fs, vsicurl, vsis3) won't work because the HDF4 library does not implement any abstraction for IO and so it must be read from a local file system using the underlying C library.

The only thing that would work would be to download entire files to read and tile from local storage, which definitely seems like a bad idea. I did test that this works at least for one file, reading using xarray (with rasterio driver), gdal or rasterio. I just want to check that the above conclusion is correct so we can advise the VEDA/earth.gov leads that we may want to take this opportunity to create a cloud-optimized version of this dataset, but that will of course take more time.

A few other notes:

  • there is a driver for working with HDF4 files: https://github.com/NCAR/pynio but it is in "maintenance mode". I don't think it will work as I think it also assumes local file system storage.
  • There is a vsipreload library that uses MODIS as an example to enable virtual file IO, however it appears there are operating system requirements (Linux glibc ONLY) so I would need to launch a virtual instance to test it out.

cc @vincentsarago @wildintellect @sharkinsspatial

@wildintellect
Copy link

@abarciauskas-bgse which MODIS products, there are many .... and many are already on WorldView why wouldn't we just tap into the existing web services.

I'm also not so sure about the won't work with S3FS, do you mean not work at all, or simply perform poorly?

This is a MODIS product, @chuckwondo and I tested this yesterday, note we used the h5netcdf driver to read the data not the netcdf4 library.

import xarray as xr
import fsspec

sample_file = 's3://lp-prod-protected/VJ114IMG.002/VJ114IMG.A2024198.0942.002.2024198150736/VJ114IMG.A2024198.0942.002.2024198150736.nc'
s3_fsspec = fsspec.filesystem("s3", profile="maap-data-reader")
test = xr.open_dataset(s3_fsspec.open(sample_file), engine="h5netcdf", phony_dims='sort')

https://github.com/orgs/MAAP-Project/discussions/1031#discussioncomment-10067291

@chuckwondo
Copy link

FYI, when phony_dims is required, I recommend using "access", not "sort". Using "access" applies the phony dims only upon access of particular arrays, whereas "sort" will apply phone dims across the entire hierarchy (if I understand correctly), which means reading all metadata through the hierarchy, whether you need to or not.

@abarciauskas-bgse
Copy link
Contributor Author

@wildintellect thanks for looking at this issue.

The MODIS product I am evaluating is MCD12Q1 v061: MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500 m SIN Grid. I just checked and using the h5netdf engine with phony dims also does not work for this dataset: https://gist.github.com/abarciauskas-bgse/8d967af117793bead9395020d8c22c48

You can see the error is

ValueError: b'\x0e\x03\x13\x01\x00\x10\x00\x00' is not the signature of a valid netCDF4 file

which is raised from this line: https://github.com/pydata/xarray/blob/main/xarray/backends/h5netcdf_.py#L161 which indicates to me that this product is not a valid NetCDF file. This I think is expected since the file driver is HDF4/Hierarchical Data Format Release 4 whereas for the VIIRS product it is netCDF/Network Common Data Format.

There is a very similar MODIS product on worldview: https://go.nasa.gov/3zPxN9O. Unfortunately, that product only goes through 2019 and it appears it was decommissioned: MCD12Q1 v006 MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500 m SIN Grid pointing users to the newer product which is the same one we are investigating here.

Aside: From a bit more reading of the user guide it appears this is an annual product where each granule represents a different spatial extent.

@wildintellect
Copy link

@abarciauskas-bgse interesting, if it's an annual product, then there probably aren't that many granules which means the best option might be to convert the data format.

@abarciauskas-bgse
Copy link
Contributor Author

well it's 315 granules a year (I think) and about 20 years: 6,300 (actual number is 6,930). But I was implying in that comment that I do think creating annual COGs or a zarr dataset would be interesting.

@abarciauskas-bgse
Copy link
Contributor Author

To wrap up this ticket I will write up a list of options to discuss with VEDA and earth.gov leads

@abarciauskas-bgse abarciauskas-bgse self-assigned this Jul 24, 2024
@maxrjones
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants