esciencecenter-digital-skills · fnattino · Jun 25, 2024 · Jun 19, 2024 · Jun 19, 2024 · Jun 19, 2024
diff --git a/episodes/05-access-data.md b/episodes/05-access-data.md
@@ -24,9 +24,9 @@
 typically provided in the form of geospatial raster data, with the measurements in each grid cell ("pixel") being
 associated to accurate geographic coordinate information.
 
-In this episode we will explore how to access open satellite data using Python. In particular,  we will
+In this episode we will explore how to access open satellite data using Python. In particular, we will
 consider [the Sentinel-2 data collection that is hosted on Amazon Web Services (AWS)](https://registry.opendata.aws/sentinel-2-l2a-cogs).
-This dataset consists of multi-band optical images acquired by the two satellite constellations of
+This dataset consists of multi-band optical images acquired by the constellation of two satellites from
 [the Sentinel-2 mission](https://sentinel.esa.int/web/sentinel/missions/sentinel-2) and it is continuously updated with
 new images.
 
@@ -66,7 +66,7 @@
 ::: challenge
 ## Exercise: Discover a STAC catalog
 Let's take a moment to explore the Earth Search STAC catalog, which is the catalog indexing the Sentinel-2 collection
 that is hosted on AWS. We can interactively browse this catalog using the STAC browser at [this link](https://radiantearth.github.io/stac-browser/#/external/earth-search.aws.element84.com/v1).

 1. Open the link in your web browser. Which (sub-)catalogs are available?
 2. Open the Sentinel-2 Level 2A collection, and select one item from the list. Each item corresponds to a satellite
@@ -96,20 +96,20 @@
 api_url = "https://earth-search.aws.element84.com/v1"
 ```
 
-You can query a STAC API endpoint from Python using the [`pystac_client` library](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client). 
-To do so we will first import `Client` from `pystac_client` and use the [method open from the Client object](https://pystac-client.readthedocs.io/en/stable/quickstart.html):
+You can query a STAC API endpoint from Python using the [`pystac_client` library](https://pystac-client.readthedocs.io).
+To do so we will first import `Client` from `pystac_client` and use the [method `open` from the Client object](https://pystac-client.readthedocs.io/en/stable/quickstart.html):
 
 ```python
 from pystac_client import Client
 
 client = Client.open(api_url)
 ```
 
-For this episode we will focus at scenes belonging to the `sentinel-2-l2a` collection. 
-This dataset is useful for our case and includes Sentinel-2 data products pre-processed at level 2A (bottom-of-atmosphere reflectance). 
+For this episode we will focus at scenes belonging to the `sentinel-2-l2a` collection.
+This dataset is useful for our case and includes Sentinel-2 data products pre-processed at level 2A (bottom-of-atmosphere reflectance).
 
-In order to see which collections are available in the provided `api_url` the 
-[`get_collections`](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.get_collections) method can be used on the Client object. 
+In order to see which collections are available in the provided `api_url` the
+[`get_collections`](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.get_collections) method can be used on the Client object.
 
 ```python
 collections = client.get_collections()
@@ -136,7 +136,7 @@
 As said, we want to focus to the `sentinel-2-l2a` collection. To do so, we set this collection into a variable:
 
 ```python
-collection_sentinel_2_l2a = "sentinel-2-l2a"  
+collection_sentinel_2_l2a = "sentinel-2-l2a"
 ```
 
 The data in this collection is stored in the Cloud Optimized GeoTIFF (COG) format and as JPEG2000 images. In this episode we will focus at COGs, as these offer useful functionalities for our purpose.
@@ -153,20 +153,20 @@
 COGs typically include multiple lower-resolution versions of the original image, called "overviews", which can also be
 accessed independently. By providing this "pyramidal" structure, users that are not interested in the details provided
 by a high-resolution raster can directly access the lower-resolution versions of the same image, significantly saving
 on the downloading time. More information on the COG format can be found [here](https://www.cogeo.org).
 :::
 
-In order to get data for a specific location you can add longitude latitude coordinates (World Geodetic System 1984 EPSG:4326) in your request. 
+In order to get data for a specific location you can add longitude latitude coordinates (World Geodetic System 1984 EPSG:4326) in your request.
 In order to do so we are using the `shapely` library to define a geometrical point.
-Below we have included a center point for the island of Rhodes, which is the location of interest for our case study (i.e. Longitude: 27.95 | Latitude 36.20).
+Below we have included a point on the island of Rhodes, which is the location of interest for our case study (i.e. Longitude: 27.95 | Latitude 36.20).
 
 ```python
 from shapely.geometry import Point
 point = Point(27.95, 36.20)  # Coordinates of a point on Rhodes
 ```
 
 Note: at this stage, we are only dealing with metadata, so no image is going to be downloaded yet. But even metadata can
-be quite bulky if a large number of scenes match our search! For this reason, we limit the search by the intersection of the point (by setting the parameter `intersects`) and assign the collection (by setting the parameter `collections`).  More information about the possible parameters to be set can be found in the `pystac_client` documentation for the [Client's search method](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.search).
+be quite bulky if a large number of scenes match our search! For this reason, we limit the search by the intersection of the point (by setting the parameter `intersects`) and assign the collection (by setting the parameter `collections`).  More information about the possible parameters to be set can be found in the `pystac_client` documentation for the [Client's `search` method](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.search).
 
 We now set up our search of satellite images in the following way:
 
@@ -287,7 +287,7 @@
 {'created': '2023-08-27T18:15:43.106Z', 'platform': 'sentinel-2a', 'constellation': 'sentinel-2', 'instruments': ['msi'], 'eo:cloud_cover': 0.955362, 'proj:epsg': 32635, 'mgrs:utm_zone': 35, 'mgrs:latitude_band': 'S', 'mgrs:grid_square': 'NA', 'grid:code': 'MGRS-35SNA', 'view:sun_azimuth': 144.36354987218, 'view:sun_elevation': 59.06665363921, 's2:degraded_msi_data_percentage': 0.0126, 's2:nodata_pixel_percentage': 12.146327, 's2:saturated_defective_pixel_percentage': 0, 's2:dark_features_percentage': 0.249403, 's2:cloud_shadow_percentage': 0.237454, 's2:vegetation_percentage': 6.073786, 's2:not_vegetated_percentage': 18.026696, 's2:water_percentage': 74.259061, 's2:unclassified_percentage': 0.198216, 's2:medium_proba_clouds_percentage': 0.613614, 's2:high_proba_clouds_percentage': 0.341423, 's2:thin_cirrus_percentage': 0.000325, 's2:snow_ice_percentage': 2.3e-05, 's2:product_type': 'S2MSI2A', 's2:processing_baseline': '05.09', 's2:product_uri': 'S2A_MSIL2A_20230827T084601_N0509_R107_T35SNA_20230827T115803.SAFE', 's2:generation_time': '2023-08-27T11:58:03.000000Z', 's2:datatake_id': 'GS2A_20230827T084601_042718_N05.09', 's2:datatake_type': 'INS-NOBS', 's2:datastrip_id': 'S2A_OPER_MSI_L2A_DS_2APS_20230827T115803_S20230827T085947_N05.09', 's2:granule_id': 'S2A_OPER_MSI_L2A_TL_2APS_20230827T115803_A042718_T35SNA_N05.09', 's2:reflectance_conversion_factor': 0.978189079756816, 'datetime': '2023-08-27T09:00:21.327000Z', 's2:sequence': '0', 'earthsearch:s3_path': 's3://sentinel-cogs/sentinel-s2-l2a-cogs/35/S/NA/2023/8/S2A_35SNA_20230827_0_L2A', 'earthsearch:payload_id': 'roda-sentinel2/workflow-sentinel2-to-stac/af0287974aaa3fbb037c6a7632f72742', 'earthsearch:boa_offset_applied': True, 'processing:software': {'sentinel2-to-stac': '0.1.1'}, 'updated': '2023-08-27T18:15:43.106Z'}
 ```
 
-Now if we want to access one item in the dictionary, for instance the EPSG code of the projected coordinate system, you need to access the item in the dictionary as usual. For instance:
+You can access items from the `properties` dictionary as usual in Python. For instance, for the EPSG code of the projected coordinate system:
 
 ```python
 print(item.properties['proj:epsg'])
@@ -329,7 +329,7 @@
 items.save_object("rhodes_sentinel-2.json")
 ```
 
-This creates a file in GeoJSON format, which we will reuse here and in the next episodes. Note that this file contains the metadata of the files that meet out criteria. It does not include the data itself, only their metadata.
+This creates a file in GeoJSON format, which we will reuse here and in the next episodes. Note that this file contains the metadata of the files that meet out criteria. It does not include the data itself, only their metadata and links to where the data files can be accessed.
 
 ## Access the assets
 
@@ -423,11 +423,9 @@
 
 From the thumbnails alone we can already observe some dark spots on the island of Rhodes at the bottom right of the image!
 
-In order to open the high-resolution satellite images and investigate the scenes in more detail, we will be using the `rioxarray` library. Note that this library can both work with local and remote raster data. At this moment we will only take a sneak peek at the [to_raster function](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster) of this library. We will learn more about it in the next episode.
+In order to open the high-resolution satellite images and investigate the scenes in more detail, we will be using the [`rioxarray` library](https://corteva.github.io/rioxarray). Note that this library can both work with local and remote raster data. At this moment, we will only take a sneak peek of the functionality of this library. We will learn more about it in the next episode.
 
-Now let us focus on the near ´red´ band by accessing the item `red` from the assets dictionary and get the Hypertext Reference (also known as URL) attribute using `.href` after the item selection.
-
-Remote raster data can be directly opened via the `rioxarray` library. We will learn more about this library in the next episodes.
+Now let us focus on the red band by accessing the item `red` from the assets dictionary and get the Hypertext Reference (also known as URL) attribute using `.href` after the item selection.
 
 For now we are using [rioxarray to open the raster file](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray-open-rasterio).
 
@@ -454,7 +452,7 @@
     add_offset:          0.0
 ```
 
-Now we want to save the data to our local machine using the [to_raster](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster) function:
+Now we want to save the data to our local machine using the [to_raster](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster) method:
 
 ```python
 # save whole image to disk
@@ -464,7 +462,7 @@
 That might take a while, given there are over 10000 x 10000 = a hundred million pixels in the 10-meter NIR band.
 But we can take a smaller subset before downloading it. Because the raster is a COG, we can download just what we need!
 
-In order to do that, we are using rioxarray´s [clip_box](https://corteva.github.io/rioxarray/stable/examples/clip_box.html) with which you can set a bounding box defining the area you want.
+In order to do that, we are using rioxarray´s [`clip_box`](https://corteva.github.io/rioxarray/stable/examples/clip_box.html) with which you can set a bounding box defining the area you want.
 
 ```python
 red_subset = red.rio.clip_box(
@@ -552,7 +550,7 @@
 The authentication procedure for dataset with restricted access might differ depending on the data provider. For the
 NASA CMR, follow these steps in order to access data using Python:

 * Create a NASA Earthdata login account [here](https://urs.earthdata.nasa.gov);
 * Set up a netrc file with your credentials, e.g. by using [this script](https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse/EarthdataLoginSetup.py);
 * Define the following environment variables:


diff --git a/episodes/06-raster-intro.md b/episodes/06-raster-intro.md
@@ -25,9 +25,9 @@
 
 The Python package we will use throughout this episode to handle raster data is [`rioxarray`](https://corteva.github.io/rioxarray/stable/). This package is based on the popular [`rasterio`](https://rasterio.readthedocs.io/en/latest/) (which is build upon the GDAL library) for working with raster data and [`xarray`](https://xarray.pydata.org/en/stable/) for working with multi-dimensional arrays.
 
-`Rioxarray` extends `xarray` by providing top-level functions like the [`open_rasterio`](https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray-open-rasterio) function to open raster datasets. Furthermore, it adds a set of methods to the main objects of the `xarray` package like the [`Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) and the [`DataArray`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html#xarray.DataArray). These methods are made available via the `rio` accessor and become available from `xarray` objects after importing `rioxarray`. Since a lot of the functions, methods and attributes do not orginate from rioxarray, but come from the other packages mentioned and are accessible through the accessor, the documentation is in some cases limited and requires a little puzzling. It is therefore recommended to foremost focus at the notebook´s functionality to use tab and go through the various functionalities. In addition, every function or method offers the opportunity to add a questionmark `?` to see the various options. 
+`rioxarray` extends `xarray` by providing top-level functions like the [`open_rasterio`](https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray-open-rasterio) function to open raster datasets. Furthermore, it adds a set of methods to the main objects of the `xarray` package like the [`Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) and the [`DataArray`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html#xarray.DataArray). These methods are made available via the `rio` accessor and become available from `xarray` objects after importing `rioxarray`. Since a lot of the functions, methods and attributes do not originate from `rioxarray`, but come from the other packages mentioned and are accessible through the accessor, the documentation is in some cases limited and requires a little puzzling. It is therefore recommended to foremost focus at the notebook´s functionality to use tab completion and go through the various functionalities. In addition, adding a question mark `?` after every function or method offers the opportunity to see the various options.
 
-For instance if you want to understand the options for rioxarray´s open_rasterio call:
+For instance if you want to understand the options for rioxarray´s `open_rasterio` function:
 
 ```python
 rioxarray.open_rasterio?
@@ -137,12 +137,12 @@
 ## Getting the data

 We will continue from the results of the satellite image search that we have carried out in an exercise from
 [the previous episode](05-access-data.md). We will load data using the created metadata file and use one scene from the search results as an example to demonstrate data loading and visualization. To do so you can use the `rhodes_sentinel-2.json` you created or download and use the created JSON file [here](../data/stac_json/rhodes_sentinel-2.json).

 In case you would like to work with raster data for this lesson without downloading data on-the-fly, you can download the raster data using this [link](https://figshare.com/ndownloader/files/36028100). Save the `geospatial-python-raster-dataset.tar.gz` file in your current working directory, and extract the archive file by double-clicking on it or by running the
 following command in your terminal `tar -zxvf geospatial-python-raster-dataset.tar.gz`.
 
-If you use choose to download the data you can skip the following part and continue with 
+If you use choose to download the data you can skip the following part and continue with
 
 **Load a Raster and View Attributes**.
 
@@ -159,7 +159,7 @@
 ```python
 print(len(file_items))
 ```
 To check the items themselves you can make a for loop on the items (as we did in episode 5). To look at specific parameters of the item you can have a look [here](https://pystac.readthedocs.io/en/latest/api/item.html). Since we are interested in the most recent image let us print the `datetime` and `id`:

 ```python
 for item in file_items:
@@ -266,7 +266,7 @@

 The output tells us that we are looking at an `xarray.DataArray`, with `1` band, `10980` rows, and `10980` columns. We can also see the number of pixel values in the `DataArray`, and the type of those pixel values, which is unsigned integer (or `uint16`). The `DataArray` also stores different values for the coordinates of the `DataArray`. When using `rioxarray`, the term coordinates refers to spatial coordinates like `x` and `y` but also the `band` coordinate. Each of these sequences of values has its own data type, like `float64` for the spatial coordinates and `int64` for the `band` coordinate.

 This `DataArray` object also has a couple of attributes that are accessed like `.rio.crs`, `.rio.nodata`, and `.rio.bounds()` (in jupyter you can browse through these attributes by using `tab` for auto completion or have a look at the documentation [here](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray-rio-accessors)), which contains the metadata for the file we opened. Note that many of the metadata are accessed as attributes without `()`, however since `bounds()` is a method (i.e. a function in an object) it requires these parentheses this is also the case for `.rio.resolution()`.

 ```python
 print(rhodes_red.rio.crs)
@@ -366,7 +366,7 @@

 ![Raster plot using vmin 100 and vmax 2000](fig/E06/rhodes_red_80_B04_vmin100_vmax2000.png){alt="raster plot with robust setting"}

 More options can be consulted [here](https://docs.xarray.dev/en/v2024.02.0/generated/xarray.plot.imshow.html). You will notice that these parameters are part of the `imshow` method from the plot function. Since plot originates from matplotlib and is so widely used, your python environment helps you to interpret the parameters without having to specify the method. It is a service to help you, but can be confusing when teaching it. We will explain more about this below.

 :::


diff --git a/episodes/08-crop-raster-data.md b/episodes/08-crop-raster-data.md
@@ -26,7 +26,7 @@
 [Episode 5: Access satellite imagery using Python](05-access-data.md).

 If you would like to work with the data for this lesson without downloading data on-the-fly, you can download the
 raster data using this [link](https://figshare.com/ndownloader/files/36028100). Save the `geospatial-python-raster-dataset.tar.gz`
 file in your current working directory, and extract the archive file by double-clicking on it or by running the
 following command in your terminal `tar -zxvf geospatial-python-raster-dataset.tar.gz`. Use the file `geospatial-python-raster-dataset/search.json`
 (instead of `search.json`) to get started with this lesson.
@@ -38,11 +38,9 @@
 
 ### Data loading
 
-### Data loading
-
-First, we will load the visual image of Sentinel-2 over Rhodes Island, which we downloaded and stored in `data/sentinel2/visual.tif`. 
+First, we will load the visual image of Sentinel-2 over Rhodes Island, which we downloaded and stored in `data/sentinel2/visual.tif`.
 
-We can open this asset with `rioxarray`, and specify the overview level, since this is a Cloud-Optimized GeoTIFF (COG) file. As explained in episode 6 raster images can be quite big, therefore we decided to resample the data using ´rioxarray's´ overview parameter and set it to `overview_level=1`.  
+We can open this asset with `rioxarray`, and specify the overview level, since this is a Cloud-Optimized GeoTIFF (COG) file. As explained in episode 6 raster images can be quite big, therefore we decided to resample the data using ´rioxarray's´ overview parameter and set it to `overview_level=1`.
 
 ```python
 import rioxarray
@@ -96,7 +94,7 @@
 array([27.7121001 , 35.87837949, 28.24591124, 36.45725024])
 ```
 
-The bounding box is composed of the `[minx, miny, maxx, maxy]` values of the raster. Comparing these values with the raster image, we can identify that the magnitude of the bounding box coordinates does not match the coordinates of the raster image. This is because the two datasets have different coordinate reference systems (CRS). This will cause problems when cropping the raster image, therefore we first need to align the CRS-s of the two datasets 
+The bounding box is composed of the `[minx, miny, maxx, maxy]` values of the raster. Comparing these values with the raster image, we can identify that the magnitude of the bounding box coordinates does not match the coordinates of the raster image. This is because the two datasets have different coordinate reference systems (CRS). This will cause problems when cropping the raster image, therefore we first need to align the CRS-s of the two datasets
 
 Considering the raster image has larger data volume than the vector data, we will reproject the vector data to the CRS of the raster data. We can use the `to_crs` method: