Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fixes #119

Merged
merged 10 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 18 additions & 20 deletions episodes/05-access-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@
typically provided in the form of geospatial raster data, with the measurements in each grid cell ("pixel") being
associated to accurate geographic coordinate information.

In this episode we will explore how to access open satellite data using Python. In particular, we will
In this episode we will explore how to access open satellite data using Python. In particular, we will
consider [the Sentinel-2 data collection that is hosted on Amazon Web Services (AWS)](https://registry.opendata.aws/sentinel-2-l2a-cogs).
This dataset consists of multi-band optical images acquired by the two satellite constellations of
This dataset consists of multi-band optical images acquired by the constellation of two satellites from
[the Sentinel-2 mission](https://sentinel.esa.int/web/sentinel/missions/sentinel-2) and it is continuously updated with
new images.

Expand Down Expand Up @@ -66,7 +66,7 @@
::: challenge
## Exercise: Discover a STAC catalog
Let's take a moment to explore the Earth Search STAC catalog, which is the catalog indexing the Sentinel-2 collection
that is hosted on AWS. We can interactively browse this catalog using the STAC browser at [this link](https://radiantearth.github.io/stac-browser/#/external/earth-search.aws.element84.com/v1).

Check warning on line 69 in episodes/05-access-data.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [this link](https://radiantearth.github.io/stac-browser/#/external/earth-search.aws.element84.com/v1)

1. Open the link in your web browser. Which (sub-)catalogs are available?
2. Open the Sentinel-2 Level 2A collection, and select one item from the list. Each item corresponds to a satellite
Expand Down Expand Up @@ -96,20 +96,20 @@
api_url = "https://earth-search.aws.element84.com/v1"
```

You can query a STAC API endpoint from Python using the [`pystac_client` library](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client).
To do so we will first import `Client` from `pystac_client` and use the [method open from the Client object](https://pystac-client.readthedocs.io/en/stable/quickstart.html):
You can query a STAC API endpoint from Python using the [`pystac_client` library](https://pystac-client.readthedocs.io).
To do so we will first import `Client` from `pystac_client` and use the [method `open` from the Client object](https://pystac-client.readthedocs.io/en/stable/quickstart.html):

```python
from pystac_client import Client

client = Client.open(api_url)
```

For this episode we will focus at scenes belonging to the `sentinel-2-l2a` collection.
This dataset is useful for our case and includes Sentinel-2 data products pre-processed at level 2A (bottom-of-atmosphere reflectance).
For this episode we will focus at scenes belonging to the `sentinel-2-l2a` collection.
This dataset is useful for our case and includes Sentinel-2 data products pre-processed at level 2A (bottom-of-atmosphere reflectance).

In order to see which collections are available in the provided `api_url` the
[`get_collections`](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.get_collections) method can be used on the Client object.
In order to see which collections are available in the provided `api_url` the
[`get_collections`](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.get_collections) method can be used on the Client object.

```python
collections = client.get_collections()
Expand All @@ -136,7 +136,7 @@
As said, we want to focus to the `sentinel-2-l2a` collection. To do so, we set this collection into a variable:

```python
collection_sentinel_2_l2a = "sentinel-2-l2a"
collection_sentinel_2_l2a = "sentinel-2-l2a"
```

The data in this collection is stored in the Cloud Optimized GeoTIFF (COG) format and as JPEG2000 images. In this episode we will focus at COGs, as these offer useful functionalities for our purpose.
Expand All @@ -153,20 +153,20 @@
COGs typically include multiple lower-resolution versions of the original image, called "overviews", which can also be
accessed independently. By providing this "pyramidal" structure, users that are not interested in the details provided
by a high-resolution raster can directly access the lower-resolution versions of the same image, significantly saving
on the downloading time. More information on the COG format can be found [here](https://www.cogeo.org).

Check warning on line 156 in episodes/05-access-data.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [here](https://www.cogeo.org)
:::

In order to get data for a specific location you can add longitude latitude coordinates (World Geodetic System 1984 EPSG:4326) in your request.
In order to get data for a specific location you can add longitude latitude coordinates (World Geodetic System 1984 EPSG:4326) in your request.
In order to do so we are using the `shapely` library to define a geometrical point.
Below we have included a center point for the island of Rhodes, which is the location of interest for our case study (i.e. Longitude: 27.95 | Latitude 36.20).
Below we have included a point on the island of Rhodes, which is the location of interest for our case study (i.e. Longitude: 27.95 | Latitude 36.20).

```python
from shapely.geometry import Point
point = Point(27.95, 36.20) # Coordinates of a point on Rhodes
```

Note: at this stage, we are only dealing with metadata, so no image is going to be downloaded yet. But even metadata can
be quite bulky if a large number of scenes match our search! For this reason, we limit the search by the intersection of the point (by setting the parameter `intersects`) and assign the collection (by setting the parameter `collections`). More information about the possible parameters to be set can be found in the `pystac_client` documentation for the [Client's search method](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.search).
be quite bulky if a large number of scenes match our search! For this reason, we limit the search by the intersection of the point (by setting the parameter `intersects`) and assign the collection (by setting the parameter `collections`). More information about the possible parameters to be set can be found in the `pystac_client` documentation for the [Client's `search` method](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.search).

We now set up our search of satellite images in the following way:

Expand Down Expand Up @@ -287,7 +287,7 @@
{'created': '2023-08-27T18:15:43.106Z', 'platform': 'sentinel-2a', 'constellation': 'sentinel-2', 'instruments': ['msi'], 'eo:cloud_cover': 0.955362, 'proj:epsg': 32635, 'mgrs:utm_zone': 35, 'mgrs:latitude_band': 'S', 'mgrs:grid_square': 'NA', 'grid:code': 'MGRS-35SNA', 'view:sun_azimuth': 144.36354987218, 'view:sun_elevation': 59.06665363921, 's2:degraded_msi_data_percentage': 0.0126, 's2:nodata_pixel_percentage': 12.146327, 's2:saturated_defective_pixel_percentage': 0, 's2:dark_features_percentage': 0.249403, 's2:cloud_shadow_percentage': 0.237454, 's2:vegetation_percentage': 6.073786, 's2:not_vegetated_percentage': 18.026696, 's2:water_percentage': 74.259061, 's2:unclassified_percentage': 0.198216, 's2:medium_proba_clouds_percentage': 0.613614, 's2:high_proba_clouds_percentage': 0.341423, 's2:thin_cirrus_percentage': 0.000325, 's2:snow_ice_percentage': 2.3e-05, 's2:product_type': 'S2MSI2A', 's2:processing_baseline': '05.09', 's2:product_uri': 'S2A_MSIL2A_20230827T084601_N0509_R107_T35SNA_20230827T115803.SAFE', 's2:generation_time': '2023-08-27T11:58:03.000000Z', 's2:datatake_id': 'GS2A_20230827T084601_042718_N05.09', 's2:datatake_type': 'INS-NOBS', 's2:datastrip_id': 'S2A_OPER_MSI_L2A_DS_2APS_20230827T115803_S20230827T085947_N05.09', 's2:granule_id': 'S2A_OPER_MSI_L2A_TL_2APS_20230827T115803_A042718_T35SNA_N05.09', 's2:reflectance_conversion_factor': 0.978189079756816, 'datetime': '2023-08-27T09:00:21.327000Z', 's2:sequence': '0', 'earthsearch:s3_path': 's3://sentinel-cogs/sentinel-s2-l2a-cogs/35/S/NA/2023/8/S2A_35SNA_20230827_0_L2A', 'earthsearch:payload_id': 'roda-sentinel2/workflow-sentinel2-to-stac/af0287974aaa3fbb037c6a7632f72742', 'earthsearch:boa_offset_applied': True, 'processing:software': {'sentinel2-to-stac': '0.1.1'}, 'updated': '2023-08-27T18:15:43.106Z'}
```

Now if we want to access one item in the dictionary, for instance the EPSG code of the projected coordinate system, you need to access the item in the dictionary as usual. For instance:
You can access items from the `properties` dictionary as usual in Python. For instance, for the EPSG code of the projected coordinate system:

```python
print(item.properties['proj:epsg'])
Expand Down Expand Up @@ -329,7 +329,7 @@
items.save_object("rhodes_sentinel-2.json")
```

This creates a file in GeoJSON format, which we will reuse here and in the next episodes. Note that this file contains the metadata of the files that meet out criteria. It does not include the data itself, only their metadata.
This creates a file in GeoJSON format, which we will reuse here and in the next episodes. Note that this file contains the metadata of the files that meet out criteria. It does not include the data itself, only their metadata and links to where the data files can be accessed.

## Access the assets

Expand Down Expand Up @@ -423,11 +423,9 @@

From the thumbnails alone we can already observe some dark spots on the island of Rhodes at the bottom right of the image!

In order to open the high-resolution satellite images and investigate the scenes in more detail, we will be using the `rioxarray` library. Note that this library can both work with local and remote raster data. At this moment we will only take a sneak peek at the [to_raster function](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster) of this library. We will learn more about it in the next episode.
In order to open the high-resolution satellite images and investigate the scenes in more detail, we will be using the [`rioxarray` library](https://corteva.github.io/rioxarray). Note that this library can both work with local and remote raster data. At this moment, we will only take a sneak peek of the functionality of this library. We will learn more about it in the next episode.
fnattino marked this conversation as resolved.
Show resolved Hide resolved

Now let us focus on the near ´red´ band by accessing the item `red` from the assets dictionary and get the Hypertext Reference (also known as URL) attribute using `.href` after the item selection.

Remote raster data can be directly opened via the `rioxarray` library. We will learn more about this library in the next episodes.
Now let us focus on the red band by accessing the item `red` from the assets dictionary and get the Hypertext Reference (also known as URL) attribute using `.href` after the item selection.

For now we are using [rioxarray to open the raster file](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray-open-rasterio).

Expand All @@ -454,7 +452,7 @@
add_offset: 0.0
```

Now we want to save the data to our local machine using the [to_raster](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster) function:
Now we want to save the data to our local machine using the [to_raster](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster) method:

```python
# save whole image to disk
Expand All @@ -464,7 +462,7 @@
That might take a while, given there are over 10000 x 10000 = a hundred million pixels in the 10-meter NIR band.
But we can take a smaller subset before downloading it. Because the raster is a COG, we can download just what we need!

In order to do that, we are using rioxarray´s [clip_box](https://corteva.github.io/rioxarray/stable/examples/clip_box.html) with which you can set a bounding box defining the area you want.
In order to do that, we are using rioxarray´s [`clip_box`](https://corteva.github.io/rioxarray/stable/examples/clip_box.html) with which you can set a bounding box defining the area you want.

```python
red_subset = red.rio.clip_box(
Expand Down Expand Up @@ -552,7 +550,7 @@
The authentication procedure for dataset with restricted access might differ depending on the data provider. For the
NASA CMR, follow these steps in order to access data using Python:

* Create a NASA Earthdata login account [here](https://urs.earthdata.nasa.gov);

Check warning on line 553 in episodes/05-access-data.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [here](https://urs.earthdata.nasa.gov)
* Set up a netrc file with your credentials, e.g. by using [this script](https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse/EarthdataLoginSetup.py);
* Define the following environment variables:

Expand Down
6 changes: 3 additions & 3 deletions episodes/06-raster-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@

The Python package we will use throughout this episode to handle raster data is [`rioxarray`](https://corteva.github.io/rioxarray/stable/). This package is based on the popular [`rasterio`](https://rasterio.readthedocs.io/en/latest/) (which is build upon the GDAL library) for working with raster data and [`xarray`](https://xarray.pydata.org/en/stable/) for working with multi-dimensional arrays.

`Rioxarray` extends `xarray` by providing top-level functions like the [`open_rasterio`](https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray-open-rasterio) function to open raster datasets. Furthermore, it adds a set of methods to the main objects of the `xarray` package like the [`Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) and the [`DataArray`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html#xarray.DataArray). These methods are made available via the `rio` accessor and become available from `xarray` objects after importing `rioxarray`. Since a lot of the functions, methods and attributes do not orginate from rioxarray, but come from the other packages mentioned and are accessible through the accessor, the documentation is in some cases limited and requires a little puzzling. It is therefore recommended to foremost focus at the notebook´s functionality to use tab and go through the various functionalities. In addition, every function or method offers the opportunity to add a questionmark `?` to see the various options.
`rioxarray` extends `xarray` by providing top-level functions like the [`open_rasterio`](https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray-open-rasterio) function to open raster datasets. Furthermore, it adds a set of methods to the main objects of the `xarray` package like the [`Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) and the [`DataArray`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html#xarray.DataArray). These methods are made available via the `rio` accessor and become available from `xarray` objects after importing `rioxarray`. Since a lot of the functions, methods and attributes do not originate from `rioxarray`, but come from the other packages mentioned and are accessible through the accessor, the documentation is in some cases limited and requires a little puzzling. It is therefore recommended to foremost focus at the notebook´s functionality to use tab completion and go through the various functionalities. In addition, adding a question mark `?` after every function or method offers the opportunity to see the various options.

For instance if you want to understand the options for rioxarray´s open_rasterio call:
For instance if you want to understand the options for rioxarray´s `open_rasterio` function:

```python
rioxarray.open_rasterio?
Morrizzzzz marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -137,12 +137,12 @@
## Getting the data

We will continue from the results of the satellite image search that we have carried out in an exercise from
[the previous episode](05-access-data.md). We will load data using the created metadata file and use one scene from the search results as an example to demonstrate data loading and visualization. To do so you can use the `rhodes_sentinel-2.json` you created or download and use the created JSON file [here](../data/stac_json/rhodes_sentinel-2.json).

Check warning on line 140 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[missing file]: [here](../data/stac_json/rhodes_sentinel-2.json) [uninformative link text]: [here](../data/stac_json/rhodes_sentinel-2.json)

In case you would like to work with raster data for this lesson without downloading data on-the-fly, you can download the raster data using this [link](https://figshare.com/ndownloader/files/36028100). Save the `geospatial-python-raster-dataset.tar.gz` file in your current working directory, and extract the archive file by double-clicking on it or by running the

Check warning on line 142 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [link](https://figshare.com/ndownloader/files/36028100)
following command in your terminal `tar -zxvf geospatial-python-raster-dataset.tar.gz`.

If you use choose to download the data you can skip the following part and continue with
If you use choose to download the data you can skip the following part and continue with

**Load a Raster and View Attributes**.

Expand All @@ -159,7 +159,7 @@
```python
print(len(file_items))
```
To check the items themselves you can make a for loop on the items (as we did in episode 5). To look at specific parameters of the item you can have a look [here](https://pystac.readthedocs.io/en/latest/api/item.html). Since we are interested in the most recent image let us print the `datetime` and `id`:

Check warning on line 162 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [here](https://pystac.readthedocs.io/en/latest/api/item.html)

```python
for item in file_items:
Expand Down Expand Up @@ -266,7 +266,7 @@

The output tells us that we are looking at an `xarray.DataArray`, with `1` band, `10980` rows, and `10980` columns. We can also see the number of pixel values in the `DataArray`, and the type of those pixel values, which is unsigned integer (or `uint16`). The `DataArray` also stores different values for the coordinates of the `DataArray`. When using `rioxarray`, the term coordinates refers to spatial coordinates like `x` and `y` but also the `band` coordinate. Each of these sequences of values has its own data type, like `float64` for the spatial coordinates and `int64` for the `band` coordinate.

This `DataArray` object also has a couple of attributes that are accessed like `.rio.crs`, `.rio.nodata`, and `.rio.bounds()` (in jupyter you can browse through these attributes by using `tab` for auto completion or have a look at the documentation [here](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray-rio-accessors)), which contains the metadata for the file we opened. Note that many of the metadata are accessed as attributes without `()`, however since `bounds()` is a method (i.e. a function in an object) it requires these parentheses this is also the case for `.rio.resolution()`.

Check warning on line 269 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [here](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray-rio-accessors)

```python
print(rhodes_red.rio.crs)
Expand Down Expand Up @@ -366,7 +366,7 @@

![Raster plot using vmin 100 and vmax 2000](fig/E06/rhodes_red_80_B04_vmin100_vmax2000.png){alt="raster plot with robust setting"}

More options can be consulted [here](https://docs.xarray.dev/en/v2024.02.0/generated/xarray.plot.imshow.html). You will notice that these parameters are part of the `imshow` method from the plot function. Since plot originates from matplotlib and is so widely used, your python environment helps you to interpret the parameters without having to specify the method. It is a service to help you, but can be confusing when teaching it. We will explain more about this below.

Check warning on line 369 in episodes/06-raster-intro.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [here](https://docs.xarray.dev/en/v2024.02.0/generated/xarray.plot.imshow.html)

:::

Expand Down
8 changes: 3 additions & 5 deletions episodes/08-crop-raster-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
[Episode 5: Access satellite imagery using Python](05-access-data.md).

If you would like to work with the data for this lesson without downloading data on-the-fly, you can download the
raster data using this [link](https://figshare.com/ndownloader/files/36028100). Save the `geospatial-python-raster-dataset.tar.gz`

Check warning on line 29 in episodes/08-crop-raster-data.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [link](https://figshare.com/ndownloader/files/36028100)
file in your current working directory, and extract the archive file by double-clicking on it or by running the
following command in your terminal `tar -zxvf geospatial-python-raster-dataset.tar.gz`. Use the file `geospatial-python-raster-dataset/search.json`
(instead of `search.json`) to get started with this lesson.
Expand All @@ -38,11 +38,9 @@

### Data loading

### Data loading

First, we will load the visual image of Sentinel-2 over Rhodes Island, which we downloaded and stored in `data/sentinel2/visual.tif`.
First, we will load the visual image of Sentinel-2 over Rhodes Island, which we downloaded and stored in `data/sentinel2/visual.tif`.

We can open this asset with `rioxarray`, and specify the overview level, since this is a Cloud-Optimized GeoTIFF (COG) file. As explained in episode 6 raster images can be quite big, therefore we decided to resample the data using ´rioxarray's´ overview parameter and set it to `overview_level=1`.
We can open this asset with `rioxarray`, and specify the overview level, since this is a Cloud-Optimized GeoTIFF (COG) file. As explained in episode 6 raster images can be quite big, therefore we decided to resample the data using ´rioxarray's´ overview parameter and set it to `overview_level=1`.

```python
import rioxarray
Expand Down Expand Up @@ -96,7 +94,7 @@
array([27.7121001 , 35.87837949, 28.24591124, 36.45725024])
```

The bounding box is composed of the `[minx, miny, maxx, maxy]` values of the raster. Comparing these values with the raster image, we can identify that the magnitude of the bounding box coordinates does not match the coordinates of the raster image. This is because the two datasets have different coordinate reference systems (CRS). This will cause problems when cropping the raster image, therefore we first need to align the CRS-s of the two datasets
The bounding box is composed of the `[minx, miny, maxx, maxy]` values of the raster. Comparing these values with the raster image, we can identify that the magnitude of the bounding box coordinates does not match the coordinates of the raster image. This is because the two datasets have different coordinate reference systems (CRS). This will cause problems when cropping the raster image, therefore we first need to align the CRS-s of the two datasets

Considering the raster image has larger data volume than the vector data, we will reproject the vector data to the CRS of the raster data. We can use the `to_crs` method:

Expand Down
Loading
Loading