Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Add new VEDA dataset: Annual land cover maps for 2001 and 2020 #226

Closed
7 of 10 tasks
dfelikson opened this issue Nov 9, 2022 · 42 comments
Closed
7 of 10 tasks

Add new VEDA dataset: Annual land cover maps for 2001 and 2020 #226

dfelikson opened this issue Nov 9, 2022 · 42 comments
Assignees
Labels

Comments

@dfelikson
Copy link
Collaborator

dfelikson commented Nov 9, 2022

  • Identify the point of contact and ensure someone is providing them updates: @dfelikson and @j08lue

  • Identify data location:

    • s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD.cog.tif
    • s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2020_BD.cog.tif
  • Number of items: 2

  • Notes: This will be displayed with a slider so the user can contrast the two maps.

  • Verify that files are valid COGs (e.g. with rio cogeo validate): @dfelikson verified with rio cogeo validate

  • Gather STAC collection metadata

    • id: modis-annual-ld-2001-2020
    • title: Annual land cover maps for 2001 and 2020
    • description: The annual land cover maps of 2001 and 2020 were captured using combined Moderate Resolution Imaging Spectroradiometer (MODIS) Annual Land Cover Type dataset (MCD12Q1 V6, dataset link: https://lpdaac.usgs.gov/products/mcd12q1v006/). The actual data product provides global land cover types at yearly intervals (2001-2020) at 500 meters with six different types of land cover classification. Among six different schemes, The International Geosphere–Biosphere Programme (IGBP) land cover classification selected and further simplified to dominant land cover classes (water, urban, cropland, native vegetation) for two different years to illustrate the changes in land use and land cover of the country.
    • license: Creative Commons Zero (CC0-1.0)
    • temporal interval:
      • MODIS_LC_2001_BD.cog.tif: {"start_datetime": "2003-01-01T00:00:00+00:00", "end_datetime": "2003-12-31T23:59:00+00:00"}
      • MODIS_LC_2020_BD.cog.tif: {"start_datetime": "2020-01-01T00:00:00+00:00", "end_datetime": "2020-12-31T23:59:00+00:00"}
    • whether it is periodic on the dashboard (periodic = regular time series of layers without gaps): false
    • the dashboard time density: none
  • Review and follow https://github.com/NASA-IMPACT/cloud-optimized-data-pipelines/blob/main/OPERATING.md

  • Open PR for publishing those datasets to the Staging API:

  • Notify QA / move ticket to QA state

  • Once approved, merge and close.

Resources on metadata

If not already familiar with these conventions for generating STAC collection and item metadata:
- Collections: NASA-IMPACT/veda-backend#29 and STAC version 1.0 specification for collections
- Items: NASA-IMPACT/veda-backend#28 and STAC version 1.0 specification for items
- NOTE: The delta-backend instructions are specific to datasets for the climate dashboard, however not all datasets are going to be a part of the visual layers for the dashboard so you can ignore the instructions that are specific to "dashboard" extension, "item_assets" in the collection and "cog_default" asset type in the item.

@anayeaye
Copy link
Contributor

anayeaye commented Nov 9, 2022

@dfelikson can these files be reprocessed/re-uploaded with the .tif extension instead of .COG? Gdalinfo is able to read the data but these files are not working seamlessly with our raster services. I think the problem is just the extension 🤞 .

https://staging-raster.delta-backend.com/cog/viewer?url=s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD.COG

https://staging-raster.delta-backend.com/cog/validate?url=s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD.COG

@dfelikson
Copy link
Collaborator Author

@anayeaye - done! Files were renamed to *.cog.tif.

@j08lue
Copy link
Contributor

j08lue commented Nov 9, 2022

The usual way is to have just .tif, not .cog.tif, but as long as the ending is right, it works.

vlulla added a commit that referenced this issue Nov 9, 2022
@vlulla vlulla self-assigned this Nov 10, 2022
@vlulla
Copy link
Contributor

vlulla commented Nov 14, 2022

@dfelikson is there any specific cover image you want to use for this dataset? And, can you also please tell me if there are specific colors that you wish to use for the various classes in this dataset? The netlify preview link shows the colors i selected for the preview.

@j08lue j08lue added this to the EIS Coastal Risk discovery milestone Nov 14, 2022
@dfelikson
Copy link
Collaborator Author

@vlulla - the data isn't showing up for me on the netlify preview for some reason. I'll get back to you about the cover image.

@vlulla
Copy link
Contributor

vlulla commented Nov 15, 2022

@dfelikson since the data extent is so small you will have to click at the marker on Bangladesh to zoom to a level where the data shows up. Do you see the maker in/near Bangladesh?

@dfelikson
Copy link
Collaborator Author

Thanks @vlulla! That worked and I see the data. Here's the info I got for this dataset:

  • Classes are numbered as follows: 0 = missing cells (some of the cells have missing data), 100 = Native vegetation, 400= Cropland, 200 = Open water, 300 = Urban areas
  • Suggested colors: cropland = light green; native vegetation = dark green; open water = some blue tone; missing data = white or super light grey; urban areas = red

Could you help us adjust the colors? And is there a way to label the colors using the descriptions above?

@vlulla
Copy link
Contributor

vlulla commented Nov 15, 2022

@dfelikson I ran into lots of issues when I tried setting the colors you mentioned. Attached is a gif demonstrating some issues that I found when I was trying to view the cog (scrolling in my local session...indicating some issues with overviews) with the color scheme you recommended.

cog-issues-20221115

So, asking my more experienced colleagues, @moradology and @ividito, to help me with this they concluded, after a couple of hours of deep dive, that there appears to be some issues with the cog itself. This is our experience of trying to verify that there weren't some underlying issues with the cog. @moradology and @ividito can provide more context where I've still missed some important details from our explorations. Here's what we tried:

  1. Running the command rio cogeo validate MODIS_LC_2001_BD.cog.tif yields this:

    ~/Downloads % rio cogeo validate MODIS_LC_2001_BD.cog.tif
    /Users/vlulla/mambaforge/lib/python3.10/site-packages/rasterio/__init__.py:304: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
      dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
    The following warnings were found:
    - The file is greater than 512xH or 512xW, it is recommended to include internal overviews
    
    The following errors were found:
    - The file is greater than 512xH or 512xW, but is not tiled
    - The offset of the main IFD should be < 300. It is 2721096 instead
    - The offset of the first bloc
  2. So, we tried gdalinfo MODIS_LC_2001_BD.cog.tif which resulted into this:

    Driver: GTiff/GeoTIFF
    Files: MODIS_LC_2001_BD.cog.tif
           MODIS_LC_2001_BD.cog.tif.aux.xml
    Size is 1037, 1312
    Image Structure Metadata:
      INTERLEAVE=BAND
    Subdatasets:
      SUBDATASET_1_NAME=GTIFF_DIR:1:MODIS_LC_2001_BD.cog.tif
      SUBDATASET_1_DESC=Page 1 (1037P x 1312L x 1B)
      SUBDATASET_2_NAME=GTIFF_DIR:2:MODIS_LC_2001_BD.cog.tif
      SUBDATASET_2_DESC=Page 2 (518P x 656L x 1B)
      SUBDATASET_3_NAME=GTIFF_DIR:3:MODIS_LC_2001_BD.cog.tif
      SUBDATASET_3_DESC=Page 3 (259P x 328L x 1B)
    Corner Coordinates:
    Upper Left  (    0.0,    0.0)
    Lower Left  (    0.0, 1312.0)
    Upper Right ( 1037.0,    0.0)
    Lower Right ( 1037.0, 1312.0)
    Center      (  518.5,  656.0)
    Band 1 Block=1037x496 Type=UInt16, ColorInterp=Gray

From this it appears that the tif is not a cloud optimized geotiff. Can you please verify that the command rio cogeo validate MODIS_LC_2001_BD.cog.tif states that it's a valid cog?

@j08lue
Copy link
Contributor

j08lue commented Nov 16, 2022

@vlulla
Copy link
Contributor

vlulla commented Nov 16, 2022

I see. I am still unsure why the complete map (colors appear only in patches) does not show up with the defined colors for 2020 cog when I click, and zoom to, the marker. I have to zoom in quite a bit before the fully colored map shows up. It's the same thing as in the gif above...but i can show it at our meeting today too. Maybe this is another issue?

@dfelikson
Copy link
Collaborator Author

dfelikson commented Nov 16, 2022

@j08lue and @vlulla - I can't reproduce this issue. I ran rio cogeo validate MODIS_LC_2001_BD.cog.tif and I see: /home/jovyan/MODIS_LC_2001_BD.cog.tif is a valid cloud optimized GeoTIFF with no other warnings or errors. I am also able to visualize this file in QGIS, zoom in and out, without any of the data disappearing. I'm using rio cogeo version 3.3.0.

Is it possible that this is an issue with the way that the map visualization interpolates the data when zooming in/out?

@vlulla
Copy link
Contributor

vlulla commented Nov 16, 2022

Interesting. For me rio cogeo --version states that my version is 3.5.0! And, I still get the error I listed above. @dfelikson i'm curious, can you please describe how you set up your python environment?

@j08lue
Copy link
Contributor

j08lue commented Nov 16, 2022

Maybe you, @dfelikson, are running the test on the original ("master") copy of the file, while you, @vlulla, downloaded it from S3?

It could be that the file got broken during upload. That would also explain why the online validation fails: https://staging-raster.delta-backend.com/cog/validate?url=s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD.cog.tif

@dfelikson
Copy link
Collaborator Author

@j08lue - good thinking. I uploaded a new file here: s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD_v2.cog.tif. @vlulla or @j08lue - could you take a look at this new version and see if it passes validation for you?

vlulla added a commit to NASA-IMPACT/veda-config that referenced this issue Nov 16, 2022
Colors based on recommendation listed in discussion
at NASA-IMPACT/veda-data-pipelines#226
@j08lue
Copy link
Contributor

j08lue commented Nov 16, 2022

Wait, my mistake!

Actually, even the first version of the dataset seems to pass COG validation online. Caching or so tricked me - sorry about that - https://staging-raster.delta-backend.com/cog/validate?url=s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD.cog.tif actually returns this:

{
"Path": "s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD.cog.tif",
"Driver": "GTiff",
"COG": true,
"Compression": null,
"ColorSpace": null,
"COG_errors": null,
"COG_warnings": null,
"Profile": {
"Bands": 1,
"Width": 1037,
"Height": 1312,
"Tiled": true,
"Dtype": "uint16",
"Interleave": "BAND",
"AlphaBand": false,
"InternalMask": false,
"Nodata": 65535,
"ColorInterp": [
"gray"
],
"ColorMap": false,
"Scales": [
1
],
"Offsets": [
0
]
},
"GEO": {
"CRS": "EPSG:4326",
"BoundingBox": [
88.02591469087191,
20.742099910319755,
92.68367943903164,
26.63504817414382
],
"Origin": [
88.02591469087191,
26.63504817414382
],
"Resolution": [
0.004491576420597609,
-0.004491576420597609
],
"MinZoom": 6,
"MaxZoom": 8
},
"Tags": {
"Image Metadata": {
"AREA_OR_POINT": "Area"
},
"Image Structure": {
"INTERLEAVE": "BAND",
"LAYOUT": "COG"
}
},
"Band Metadata": {
"Band 1": {
"Description": null,
"ColorInterp": "gray",
"Offset": 0,
"Scale": 1,
"Metadata": {}
}
},
"IFD": [
{
"Level": 0,
"Width": 1037,
"Height": 1312,
"Blocksize": [
512,
512
],
"Decimation": 0
},
{
"Level": 1,
"Width": 518,
"Height": 656,
"Blocksize": [
512,
512
],
"Decimation": 2
},
{
"Level": 2,
"Width": 259,
"Height": 328,
"Blocksize": [
512,
512
],
"Decimation": 4
}
]
}

and the same for v2: https://staging-raster.delta-backend.com/cog/validate?url=s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD_v2.cog.tif

So the file was not broken online and it is tiled, too (Tiled: true). The block size 512x512 is quite big for a raster only 1037x1312 pixels in size, but that should not be an issue and is the default for rio cogeo.

@vlulla
Copy link
Contributor

vlulla commented Nov 16, 2022

@dfelikson we have ingested the v2 tif that you shared. It appears to be a valid cog. I have also made the color changes that you recommended. However, there appears to be some rendering issue with the map when I use these custom colors. I have briefly described the issue at https://github.com/NASA-IMPACT/delta-config/issues/141 .

Additionally, while exploring the validate endpoint we (me and @ividito) observed that validation fails when we use the staging endpoint (https://staging-raster.delta-backend.com/cog/validate?url=s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD_v2.cog.tif) but passes when we use the dev endpoint (https://dev-raster.delta-backend.com/cog/validate?url=s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD_v2.cog.tif).

We are unsure what is causing this issue. And, with the demo being so close we were very reluctant to modify anything at this late stage. We will definitely look into this strange behavior after the demo.

cc @j08lue

@j08lue
Copy link
Contributor

j08lue commented Nov 16, 2022

I should stop guessing. Can we look at the tiler logs and see what is happening? First of all, do the requests fail or return transparent images?

Just one note: For categorical maps, you need to make sure to use mode (or so) interpolation when generating the tiles from the original data - and actually also when generating overviews when you make the COG.

rio cogeo is using nearest by default for overviews, I can see (--overview-resampling) https://cogeotiff.github.io/rio-cogeo/CLI/. And also TiTiler https://developmentseed.org/titiler/endpoints/cog/ (&resampling=nearest). So unless you override that somewhere, that is not the issue...

@vlulla
Copy link
Contributor

vlulla commented Nov 17, 2022

@j08lue
Copy link
Contributor

j08lue commented Nov 21, 2022

@dfelikson to fix the issue with gaps in the rendered image, we need to re-compute the overviews in the two files, this time with a resampling method that preserves the classes, like mode or nearest.

E.g.

rio cogeo create --overview-resampling "mode" <file.tif> <file-fixed.tif>

Maybe we could simply replace the ingested COGs with new ones, then no changes in the ingested records are required. But new files are fine, too. Perhaps call them like the original files, but without the .cog part, like s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD.tif.

moradology pushed a commit that referenced this issue Nov 22, 2022
@j08lue
Copy link
Contributor

j08lue commented Dec 1, 2022

@dfelikson kindly shared the land cover files with us here. We can perhaps transform them to COGs and upload them ourselves to see whether things work then?

@vlulla
Copy link
Contributor

vlulla commented Dec 1, 2022

These are the original tif and cogs in qgis. I don't understand why there's 15 in the original dataset. I thought that this dataset was supposed to contain only hundreds values, which represent land cover categories. It appears that this 15 causes rendering issue, even in qgis, only (i.e., no gaps in original tif) for cog tif.

Original tif

Screenshot 2022-12-01 at 18 04 50

COG tif

Screenshot 2022-12-01 at 17 36 03

Modified COG tif

Screenshot 2022-12-01 at 17 45 39

@dfelikson
Copy link
Collaborator Author

Thanks for helping look into this, @vlulla! I don't know why there are values of 15 in the original dataset, either. The original creator (Augusto Getirana) is out of office until the 5th but I'll ask him once he's back.

@nbiswasuw
Copy link

Hi @dfelikson and @vlulla,
Thanks for pointing it out.
I am sorry that you needed to go through this for the unintended value in the raster. Value 15 in the classification raster denotes permanent snow/ice. There is no snow/ice in Bangladesh, so I ignored 15 from the classification scheme during reclassify. After some investigation, I found that there are only 2 pixels with value 15 in both rasters.

As a solution, either you can merge them together with value 100, or you can ignore them. Please let me know if you want me to do it from my side.
image

@vlulla
Copy link
Contributor

vlulla commented Dec 5, 2022

Hi @nbiswasuw and @dfelikson,

While I could fix it for this landcover dataset i am reluctant to make these changes myself. Here are the reasons for my reluctance:

  • Fixing this one-off issue appears to me fragile because the workflow that generated this classification might generate some other incompatible values (possibly for some other class[es] in the future?),
  • And, more importantly, I do not know how/if the validation steps we are currently considering for figuring out what is acceptable data to ingest can capture (or flag?) issues of this nature. I don't think this is a validation issue as much as it is a conceptual issue.

It is only fortuitous that I stumbled upon this data issue when I opened this lc image in qgis after a lot of exasperation! I'm unsure if I'll be this lucky again. Therefore, can you please fix this issue on your end?

@nbiswasuw
Copy link

OK, thanks for your reply, @vlulla.
I have made changes to the original geotiff, do you want me to share the latest geotiff or the COG? Which one will be the most convenient for you?

@vlulla
Copy link
Contributor

vlulla commented Dec 6, 2022

@nbiswasuw it'll be great if you can share the geotiff and cog. It'll be great to use your cog to verify that my cogification is not creating any unexpected issues. Thanks in advance.

@nbiswasuw
Copy link

Will you mind grabbing the datasets from this link?

@vlulla
Copy link
Contributor

vlulla commented Dec 6, 2022

I downloaded the geotiffs and cog and am surprised to see that there are still rendering issues with the cog. When I zoom in the classes get filled up but when i zoom to layer the image does not render correctly. It appears that my earlier suspicion was wrong...sorry! rio cogeo validate MODIS_LC_2020.v3.COG states that it's a valid cloud optimized geotiff and I am completely stumped as to what could be causing these issues. Do both of these images render correctly in ArcMap?

COG

Screenshot 2022-12-06 at 15 32 32

Geotiff

Screenshot 2022-12-06 at 15 32 39

@nbiswasuw
Copy link

image
This is what I am seeing in ArcGIS.

@vlulla
Copy link
Contributor

vlulla commented Dec 6, 2022

From the layer name it appears that this is the regular geotiff. The geotiff renders correctly for me too...it's the cog that has the rendering issue. Can you please verify that the cog renders correctly on arcgis? Sorry for the hassle.

@j08lue
Copy link
Contributor

j08lue commented Dec 6, 2022

Re-posting this, just in case: rio cogeo create --overview-resampling "mode" <file.tif> <file-fixed.tif>

Btw, you do not need any .cog.tif extension. .tif is fine (actually nicer for our asset names in STAC), also for COGs.

@nbiswasuw
Copy link

image
I can see the COG as well in ArcGIS without any issue,

@vlulla
Copy link
Contributor

vlulla commented Dec 7, 2022

Interesting. It appears that creating a cog with --overview-resampling "mode" did the trick! Thanks for the reminder @j08lue ! I will try to ingest these images after I get answers to a couple of questions (primarily naming and workflow related) from the other members of the team.

@vlulla
Copy link
Contributor

vlulla commented Dec 7, 2022

After speaking with @ividito and @smohiudd it appears that the easiest, and quickest, fix for this rendering issue might be to replace the old cogs in the s3 bucket with the newer ones, with the same name. This way the dashboard will pick up these newer cogs and hopefully the rendering issues will get resolved. Attached are the cogs which render correctly in qgis and hoping that they work on the dashboard too. And, even though you already know it, the paths1 for these cogs are: s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD_v2.cog.tif s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2020_BD.cog.tif

So @dfelikson, do you want to try out this simpler method and see if this solves our problem? If so, here's the zip with the cogs created using @j08lue 's recommendation: bd-cogs-20221207.zip

Footnotes

  1. The regex used for this ingestion https://github.com/NASA-IMPACT/veda-data-pipelines/blob/main/data/step_function_inputs/bangladesh-landcover-2001-2020.json

@dfelikson
Copy link
Collaborator Author

@vlulla - just to make sure I understand correctly ... you'd like me to take the COGs from bd-cogs-20221207.zip and upload them to s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/ using the same filenames as before, correct?

I can try to do that but last time I tried overwriting the files on the S3 bucket, I don't think it actually overwrote with my new files. That's why I used "_v2" in the filename for the LC COG.

If you confirm, I'll try to re-upload and we'll see if it actually overwrites this time ...

@vlulla
Copy link
Contributor

vlulla commented Dec 8, 2022

@dfelikson you are correct. If uploading these new cogs with the same name[s] does not resolve the rendering issue, then you can upload a _v3 suffixed COG and we'll make the necessary modifications in the jsons and reingest them.

@dfelikson
Copy link
Collaborator Author

Alright - MODIS_LC_2001_BD_v2.cog.tif and MODIS_LC_2020_BD.cog.tif in the S3 bucket s3://veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/ have been replaced with the versions from bd-cogs-20221207.zip. When I do aws s3 ls, the timestamps are from today so, hopefully, the files were actually overwritten in the bucket.

@vlulla - take a look when you have the chance.

@vlulla
Copy link
Contributor

vlulla commented Dec 8, 2022

Holy smokes @dfelikson that seems to have done it! Dashboard link (ctrl+click OR cmd+click to open in new tab). Thanks everyone!

Anyways, it is not at all clear to me what i/we did that made this work. The smallest example that encapsulates the strangeness of this whole undertaking that I can summarize is this:

$ gdal_translate -of COG -co OVERVIEW_RESAMPLING=mode -co COMPRESS=LZW MODIS_LC_2001.v3.tif tst-with-mode.tif
$ gdal_translate -of COG                              -co COMPRESS=LZW MODIS_LC_2001.v3.tif tst-without-mode.tif
$ ## Nothing's different...but somehow there is a rendering issue in qgis (but not in arcgis) and dashboard rendering
$ diff <(gdalinfo tst-with-mode.tif) <(gdalinfo tst-without-mode.tif)
$ gdal_translate --version
GDAL 3.5.3, released 2022/10/21

It is very likely that I am missing something obvious but I don't know what that is. Anyhow, now that this is done can we close this issue?

@nbiswasuw
Copy link

Great to see that it is working finally.

@nbiswasuw
Copy link

image
Just a suggestion, at this zoom the map is disappeared. Is it possible to keep until this zoom level where we can see the whole country?

Thanks.

@j08lue
Copy link
Contributor

j08lue commented Dec 12, 2022

I'll take the liberty to close this issue and track further improvements in a separate one.

@j08lue j08lue closed this as completed Dec 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants