diff --git a/05-access-data.md b/05-access-data.md index f0fee95d..f1f6b604 100644 --- a/05-access-data.md +++ b/05-access-data.md @@ -39,7 +39,7 @@ new images. Current sensor resolutions and satellite revisit periods are such that terabytes of data products are added daily to the corresponding collections. Such datasets cannot be made accessible to users via full-catalog download. Space agencies and other data providers often offer access to their data catalogs through interactive Graphical User Interfaces (GUIs), -see for instance the [Copernicus Open Access Hub portal](https://scihub.copernicus.eu/dhus/#/home) for the Sentinel +see for instance the [Copernicus Browser](https://browser.dataspace.copernicus.eu) for the Sentinel missions. Accessing data via a GUI is a nice way to explore a catalog and get familiar with its content, but it represents a heavy and error-prone task that should be avoided if carried out systematically to retrieve data. @@ -81,7 +81,10 @@ and the list of assets. What kind of data do the assets represent? ![Views of the Earth Search STAC endpoint](fig/E05/STAC-browser-exercise.jpg){alt="earth-search stac catalog views"} -1. 7 subcatalogs are available, including a catalog for Landsat Collection 2, Level-2 and Sentinel-2 Level 2A (see left screenshot in the figure above). +1. 7 sub-catalogs are available. In the STAC nomenclature, these are actually "collections", i.e. catalogs with +additional information about the elements they list: spatial and temporal extents, license, providers, etc. +Among the available collections, we have Landsat Collection 2, Level-2 and Sentinel-2 Level 2A (see left screenshot in +the figure above). 2. When you select the Sentinel-2 Level 2A collection, and randomly choose one of the items from the list, you should find yourself on a page similar to the right screenshot in the figure above. On the left side you will find a list of the available assets: overview images (thumbnail and true color images), metadata files and the "real" @@ -97,7 +100,7 @@ functionality of searching its items. For the Earth Search STAC catalog the API api_url = "https://earth-search.aws.element84.com/v1" ``` -You can query a STAC API endpoint from Python using the `pystac_client` library: +You can query a STAC API endpoint from Python using the `pystac_client` library. To do so we will import `Client` from `pystac_client`: ```python from pystac_client import Client @@ -105,8 +108,8 @@ from pystac_client import Client client = Client.open(api_url) ``` -In the following, we ask for scenes belonging to the `sentinel-2-l2a` collection. This dataset includes Sentinel-2 -data products pre-processed at level 2A (bottom-of-atmosphere reflectance) and saved in Cloud Optimized GeoTIFF (COG) +Next, we ask for scenes belonging to the `sentinel-2-l2a` collection. This dataset includes Sentinel-2 +data products pre-processed at level 2A (bottom-of-atmosphere reflectance). This data is stored as Cloud Optimized GeoTIFF (COG) format: ```python @@ -128,35 +131,73 @@ by a high-resolution raster can directly access the lower-resolution versions of on the downloading time. More information on the COG format can be found [here](https://www.cogeo.org). ::: -We also ask for scenes intersecting a geometry defined using the `shapely` library (in this case, a point): +In order to get data for a specific location you can add longitude latitude coordinates (World Geodetic System 1984 EPSG:4326) in your request. In order to do so we are using the `shapely` library to define a geometrical point. +Below we have included a center point for the island of Rhods, which is the location of interest for our case study (i.e. Longitude: 27.95 | Latitude 36.20) . You can change that to the region of your preference. ```python from shapely.geometry import Point -point = Point(4.89, 52.37) # AMS coordinates +point = Point(27.95, 36.20) # Coordinates of a point on Rhodes ``` Note: at this stage, we are only dealing with metadata, so no image is going to be downloaded yet. But even metadata can -be quite bulky if a large number of scenes match our search! For this reason, we limit the search result to 10 items: +be quite bulky if a large number of scenes match our search! For this reason, we limit the search by the intersect of the point (by setting the parameter `intersects`) and assign the collection (by setting the parameter `collections`). More information about the possible parameters to be set can be found at the [stac search method](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.search) documentation. + +We now set up our search of satellite images in the following way: ```python search = client.search( collections=[collection], intersects=point, - max_items=10, ) ``` -We submit the query and find out how many scenes match our search criteria (please note that this output can be different as more data is added to the catalog): +Now we submit the query in order te find out how many scenes match our search criteria with the parameters assigned above (please note that this output can be different as more data is added to the catalog to when this episode was created): + +```python +print(search.matched()) +``` + +```output +611 +``` + +You will notice that more than 500 scenes match our search criteria. We are however interested in the period right before and after the wildfire of Rhodes. In the following exercise you will therefore have to add a time filter to +our search criteria to narrow down our search for images of that period. + +:::challenge +## Exercise: Search satellite scenes using metadata filters + +Search for all the available Sentinel-2 scenes in the `sentinel-2-l2a` collection that satisfy the following +criteria: +- intersect a provided bounding box (use ±0.01 deg in lat/lon from the previously defined point); +- have been recorded between 1st of July 2023 and 31st of August 2023 (the timespan in which the wildfire took place).**Hint:** generic metadata filters can be implemented via the `query` input argument of `client.search`, which requires the following syntax (see [docs](https://pystac-client.readthedocs.io/en/stable/usage.html#query-extension)): `query=['']`). + +How many scenes are available? Save the search results in GeoJSON format. + +::::solution +```python +bbox = point.buffer(0.01).bounds +``` ```python +search = client.search( + collections=[collection], + intersects=point, + datetime='2023-07-01/2023-08-31' +) print(search.matched()) ``` ```output -840 +12 ``` -Finally, we retrieve the metadata of the search results: +This means that 12 scenes satisfy the search criteria. +:::: +::: + + +Now that we have added a time filter, we retrieve the metadata of the search results: ```python items = search.item_collection() @@ -169,11 +210,11 @@ print(len(items)) ``` ```output -10 +12 ``` -which is consistent with the maximum number of items that we have set in the search criteria. We can iterate over -the returned items and print these to show their IDs: +which is consistent with the number of scenes matching our search results as found with `search.matched()`. We can +iterate over the returned items and print these to show their IDs: ```python for item in items: @@ -181,22 +222,36 @@ for item in items: ``` ```output - - - - - - - - - - + + + + + + + + + + + + ``` Each of the items contains information about the scene geometry, its acquisition time, and other metadata that can be -accessed as a dictionary from the `properties` attribute. +accessed as a dictionary from the `properties` attribute. To see which information each item contains you can have a look at the [item documentation of pystac](https://pystac.readthedocs.io/en/latest/api/item.html). + + +Let us inspect the metadata associated with the first item of the search results. Let us first look at collection date of the first item:: +```python +item = items[0] +print(item.datetime) +``` + +```output +2023-08-27 09:00:21.327000+00:00 +``` + +Let us now add the geometry and other properties as well. -Let's inspect the metadata associated with the first item of the search results: ```python item = items[0] print(item.datetime) @@ -205,54 +260,67 @@ print(item.properties) ``` ```output -2023-07-01 10:46:30.262000+00:00 -{'type': 'Polygon', 'coordinates': [[[5.233744523520149, 53.228684673408296], [6.141754296879459, 53.20819279121764], [6.071664488869862, 52.22257539160585], [4.80943323800081, 52.2486879358387], [5.233744523520149, 53.228684673408296]]]} -{'created': '2023-07-02T01:49:17.191Z', 'platform': 'sentinel-2a', 'constellation': 'sentinel-2', 'instruments': ['msi'], 'eo:cloud_cover': 99.952936, 'proj:epsg': 32631, 'mgrs:utm_zone': 31, 'mgrs:latitude_band': 'U', 'mgrs:grid_square': 'FU', 'grid:code': 'MGRS-31UFU', 'view:sun_azimuth': 154.716674921261, 'view:sun_elevation': 58.4960054056685, 's2:degraded_msi_data_percentage': 0.0346, 's2:nodata_pixel_percentage': 33.00232, 's2:saturated_defective_pixel_percentage': 0, 's2:dark_features_percentage': 0, 's2:cloud_shadow_percentage': 0.030847, 's2:vegetation_percentage': 0, 's2:not_vegetated_percentage': 0.004947, 's2:water_percentage': 0.011271, 's2:unclassified_percentage': 0, 's2:medium_proba_clouds_percentage': 5.838514, 's2:high_proba_clouds_percentage': 94.035202, 's2:thin_cirrus_percentage': 0.07922, 's2:snow_ice_percentage': 0, 's2:product_type': 'S2MSI2A', 's2:processing_baseline': '05.09', 's2:product_uri': 'S2A_MSIL2A_20230701T103631_N0509_R008_T31UFU_20230701T200058.SAFE', 's2:generation_time': '2023-07-01T20:00:58.000000Z', 's2:datatake_id': 'GS2A_20230701T103631_041904_N05.09', 's2:datatake_type': 'INS-NOBS', 's2:datastrip_id': 'S2A_OPER_MSI_L2A_DS_2APS_20230701T200058_S20230701T104159_N05.09', 's2:granule_id': 'S2A_OPER_MSI_L2A_TL_2APS_20230701T200058_A041904_T31UFU_N05.09', 's2:reflectance_conversion_factor': 0.967641353116838, 'datetime': '2023-07-01T10:46:30.262000Z', 's2:sequence': '0', 'earthsearch:s3_path': 's3://sentinel-cogs/sentinel-s2-l2a-cogs/31/U/FU/2023/7/S2A_31UFU_20230701_0_L2A', 'earthsearch:payload_id': 'roda-sentinel2/workflow-sentinel2-to-stac/7b1a81ed3fb8d763a0cecf8d9edd4d4a', 'earthsearch:boa_offset_applied': True, 'processing:software': {'sentinel2-to-stac': '0.1.0'}, 'updated': '2023-07-02T01:49:17.191Z'} +2023-08-27 09:00:21.327000+00:00 +{'type': 'Polygon', 'coordinates': [[[27.290401625602243, 37.04621863329741], [27.23303872472207, 36.83882218126937], [27.011145718480538, 36.05673246264742], [28.21878905911668, 36.05053734221328], [28.234426643135546, 37.04015200857309], [27.290401625602243, 37.04621863329741]]]} +{'created': '2023-08-27T18:15:43.106Z', 'platform': 'sentinel-2a', 'constellation': 'sentinel-2', 'instruments': ['msi'], 'eo:cloud_cover': 0.955362, 'proj:epsg': 32635, 'mgrs:utm_zone': 35, 'mgrs:latitude_band': 'S', 'mgrs:grid_square': 'NA', 'grid:code': 'MGRS-35SNA', 'view:sun_azimuth': 144.36354987218, 'view:sun_elevation': 59.06665363921, 's2:degraded_msi_data_percentage': 0.0126, 's2:nodata_pixel_percentage': 12.146327, 's2:saturated_defective_pixel_percentage': 0, 's2:dark_features_percentage': 0.249403, 's2:cloud_shadow_percentage': 0.237454, 's2:vegetation_percentage': 6.073786, 's2:not_vegetated_percentage': 18.026696, 's2:water_percentage': 74.259061, 's2:unclassified_percentage': 0.198216, 's2:medium_proba_clouds_percentage': 0.613614, 's2:high_proba_clouds_percentage': 0.341423, 's2:thin_cirrus_percentage': 0.000325, 's2:snow_ice_percentage': 2.3e-05, 's2:product_type': 'S2MSI2A', 's2:processing_baseline': '05.09', 's2:product_uri': 'S2A_MSIL2A_20230827T084601_N0509_R107_T35SNA_20230827T115803.SAFE', 's2:generation_time': '2023-08-27T11:58:03.000000Z', 's2:datatake_id': 'GS2A_20230827T084601_042718_N05.09', 's2:datatake_type': 'INS-NOBS', 's2:datastrip_id': 'S2A_OPER_MSI_L2A_DS_2APS_20230827T115803_S20230827T085947_N05.09', 's2:granule_id': 'S2A_OPER_MSI_L2A_TL_2APS_20230827T115803_A042718_T35SNA_N05.09', 's2:reflectance_conversion_factor': 0.978189079756816, 'datetime': '2023-08-27T09:00:21.327000Z', 's2:sequence': '0', 'earthsearch:s3_path': 's3://sentinel-cogs/sentinel-s2-l2a-cogs/35/S/NA/2023/8/S2A_35SNA_20230827_0_L2A', 'earthsearch:payload_id': 'roda-sentinel2/workflow-sentinel2-to-stac/af0287974aaa3fbb037c6a7632f72742', 'earthsearch:boa_offset_applied': True, 'processing:software': {'sentinel2-to-stac': '0.1.1'}, 'updated': '2023-08-27T18:15:43.106Z'} +``` + +If we want to access one item in the dictionary, for instance the EPSG code of the projected coordinate system, you need to access the item in the dictionary as usual. For instance: + +```python +item = items[0] +print(item.properties['proj:epsg']) ``` + + :::challenge ## Exercise: Search satellite scenes using metadata filters -Search for all the available Sentinel-2 scenes in the `sentinel-2-l2a` collection that satisfy the following -criteria: -- intersect a provided bounding box (use ±0.01 deg in lat/lon from the previously defined point); -- have been recorded between 20 March 2020 and 30 March 2020; -- have a cloud coverage smaller than 10% (hint: use the `query` input argument of `client.search`). -How many scenes are available? Save the search results in GeoJSON format. +Let's add a filter on the cloud cover to select the only scenes with less than 1% cloud coverage. How many scenes do now +match our search? + +Hint: generic metadata filters can be implemented via the `query` input argument of `client.search`, which requires the +following syntax (see [docs](https://pystac-client.readthedocs.io/en/stable/usage.html#query-extension)): +`query=['']`. ::::solution -```python -bbox = point.buffer(0.01).bounds -``` ```python search = client.search( collections=[collection], - bbox=bbox, - datetime="2020-03-20/2020-03-30", - query=["eo:cloud_cover<15"] + intersects=point, + datetime='2023-07-01/2023-08-31', + query=['eo:cloud_cover<1'] ) print(search.matched()) ``` ```output -4 +11 ``` +:::: +::: + +Once we are happy with our search, we save the search results in a file: + ```python items = search.item_collection() -items.save_object("search.json") +items.save_object("rhodes_sentinel-2.json") ``` -:::: -::: + +This creates a file in GeoJSON format, which we will reuse here and in the next episodes. Note that this does not contain the actual data. Note that this file contains the metadata of the files that meet out criteria. It does not include the data itself only their metadata. ## Access the assets So far we have only discussed metadata - but how can one get to the actual images of a satellite scene (the "assets" in the STAC nomenclature)? These can be reached via links that are made available through the item's attribute `assets`. +Let's focus on the last item in the collection: this is the oldest in time, and it thus corresponds to an image taken +before the wildfires. ```python -assets = items[0].assets # first item's asset dictionary +assets = items[-1].assets # last item's asset dictionary print(assets.keys()) ``` @@ -307,22 +375,39 @@ wvp-jp2: Water vapour (WVP) Among the others, assets include multiple raster data files (one per optical band, as acquired by the multi-spectral instrument), a thumbnail, a true-color image ("visual"), instrument metadata and scene-classification information -("SCL"). Let's get the URL links to the actual asset: +("SCL"). Let's get the URL link to the thumbnail, which gives us a glimpse of the Sentinel-2 scene: ```python print(assets["thumbnail"].href) ``` ```output -https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/31/U/FU/2020/3/S2A_31UFU_20200328_0_L2A/thumbnail.jpg +https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/35/S/NA/2023/7/S2A_35SNA_20230708_0_L2A/thumbnail.jpg ``` This can be used to download the corresponding file: -![Overview of the true-color image ("thumbnail")](fig/E05/STAC-s2-preview.jpg){alt="thumbnail of the sentinel-2 scene"} +![Overview of the true-color image ("thumbnail") before the wildfires on Rhodes](fig/E05/STAC-s2-preview-before.jpg){alt="thumbnail of the sentinel-2 scene before the wildfires"} + +For comparison, we can check out the thumbnail of the most recent scene of the sequence considered (i.e. the first item +in the item collection), which has been taken after the wildfires: + +```python +print(items[0].assets["thumbnail"].href) +``` + +```output +https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/35/S/NA/2023/8/S2A_35SNA_20230827_0_L2A/thumbnail.jpg +``` + +![Overview of the true-color image ("thumbnail") after the wildfires on Rhodes](fig/E05/STAC-s2-preview-after.jpg){alt="thumbnail of the sentinel-2 scene after the wildfires"} + +From the thumbnails alone we can already observe some dark spots on the island of Rhodes at the bottom right of the image! + +In order to open the high-resolution satellite images and investigate the scenes in more detail, we will be using the `rioxarray` library. Note that this library can both work with local and remote raster data. At this moment we will only take a sneak peek at the [to_raster function](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster) of this library. We will learn more about it in the next episode. + +Now let us focus on the near Infrared (NIR) band by accessing the item `nir` from the assets dictionary and get the Hypertext Reference (also known as URL) attribute using `.href` after the item selection. -Remote raster data can be directly opened via the `rioxarray` library. We will -learn more about this library in the next episodes. ```python import rioxarray nir_href = assets["nir"].href @@ -331,13 +416,13 @@ print(nir) ``` ```output - + Size: 241MB [120560400 values with dtype=uint16] Coordinates: - * band (band) int64 1 - * x (x) float64 6e+05 6e+05 6e+05 ... 7.098e+05 7.098e+05 7.098e+05 - * y (y) float64 5.9e+06 5.9e+06 5.9e+06 ... 5.79e+06 5.79e+06 - spatial_ref int64 0 + * band (band) int32 4B 1 + * x (x) float64 88kB 5e+05 5e+05 5e+05 ... 6.098e+05 6.098e+05 + * y (y) float64 88kB 4.1e+06 4.1e+06 4.1e+06 ... 3.99e+06 3.99e+06 + spatial_ref int32 4B 0 Attributes: AREA_OR_POINT: Area OVR_RESAMPLING_ALG: AVERAGE @@ -346,23 +431,33 @@ Attributes: add_offset: 0.0 ``` -We can then save the data to disk: +Now we want to save the data to our local machine using the [to_raster](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster) function : ```python # save whole image to disk nir.rio.to_raster("nir.tif") ``` -Since that might take a while, given there are over 10000 x 10000 = a hundred million pixels in the 10 meter NIR band, you can take a smaller subset before downloading it. Becuase the raster is a COG, we can download just what we need! +That might take a while, given there are over 10000 x 10000 = a hundred million pixels in the 10-meter NIR band. +But we can take a smaller subset before downloading it. Because the raster is a COG, we can download just what we need! -Here, we specify that we want to download the first (and only) band in the tif file, and a slice of the width and height dimensions. +In order to do that, we are using rioxarray´s [clip_box](https://corteva.github.io/rioxarray/stable/examples/clip_box.html) with which you can set a bounding box defining the area we want. + +```python +nir_subset = nir.rio.clip_box( + minx=560900, + miny=3995000, + maxx=570900, + maxy=4015000 +) +``` +Next, we save the subset using `to_raster` again. ```python -# save portion of an image to disk -nir[0,1500:2200,1500:2200].rio.to_raster("nir_subset.tif") +nir_subset.rio.to_raster("nir_subset.tif") ``` -The difference is 155 Megabytes for the large image vs about 1 Megabyte for the subset. +The difference is 241 Megabytes for the full image vs less than 10 Megabytes for the subset. :::challenge ## Exercise: Downloading Landsat 8 Assets diff --git a/fig/E05/STAC-s2-preview-after.jpg b/fig/E05/STAC-s2-preview-after.jpg new file mode 100644 index 00000000..363cc073 Binary files /dev/null and b/fig/E05/STAC-s2-preview-after.jpg differ diff --git a/fig/E05/STAC-s2-preview-before.jpg b/fig/E05/STAC-s2-preview-before.jpg new file mode 100644 index 00000000..ae105588 Binary files /dev/null and b/fig/E05/STAC-s2-preview-before.jpg differ diff --git a/md5sum.txt b/md5sum.txt index 48aa8fda..9a21d2ce 100644 --- a/md5sum.txt +++ b/md5sum.txt @@ -8,7 +8,7 @@ "episodes/02-intro-vector-data.md" "48082eb809ec7caa0c3005fd6cdd5dae" "site/built/02-intro-vector-data.md" "2023-07-27" "episodes/03-crs.md" "85d370c723c2a6e8b39e853d25363aef" "site/built/03-crs.md" "2023-07-27" "episodes/04-geo-landscape.md" "04937f2c15667adbe259b68e76ac0d46" "site/built/04-geo-landscape.md" "2023-07-27" -"episodes/05-access-data.md" "3f4586875eaa4c175f5ed7cd6730b6be" "site/built/05-access-data.md" "2023-09-12" +"episodes/05-access-data.md" "26ffc068f2e92aee2683c5255f54e6ae" "site/built/05-access-data.md" "2024-04-09" "episodes/06-raster-intro.md" "686d165b48bea5469d3eb73f0f54a421" "site/built/06-raster-intro.md" "2023-08-23" "episodes/07-vector-data-in-python.md" "01a7e0adfcc44f6a1ad5c8ede060473d" "site/built/07-vector-data-in-python.md" "2023-08-08" "episodes/08-crop-raster-data.md" "87861d050911db9082000d4788140766" "site/built/08-crop-raster-data.md" "2023-07-27"