Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update index and ep05 #106

Merged
merged 8 commits into from
Apr 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# notebooks update narrative NleSc
notebooks/

# sandpaper files
episodes/*html
site/*
Expand Down
28 changes: 5 additions & 23 deletions episodes/01-intro-raster-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,39 +15,21 @@ exercises: 5
- Describe the strengths and weaknesses of storing data in raster format.
- Distinguish between continuous and categorical raster data and identify types of datasets that would be stored in each format.
:::


## Introduction

This episode introduces the two primary types of geospatial
data: rasters and vectors. After briefly introducing these
data types, this episode focuses on raster data, describing
some major features and types of raster data.
This episode introduces the two primary types of geospatial data: rasters and vectors. After briefly introducing these data types, this episode focuses on raster data, describing some major features and types of raster data.

## Data Structures: Raster and Vector

The two primary types of geospatial data are raster
and vector data. Raster data is stored as a grid of values which are rendered on a
map as pixels. Each pixel value represents an area on the Earth's surface. Vector data structures represent specific features on the
Earth's surface, and
assign attributes to those features. Vector data structures
will be discussed in more detail in [the next episode](02-intro-vector-data.md).
The two primary types of geospatial data are raster and vector data. Raster data is stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth's surface. Vector data structures represent specific features on the Earth's surface, and assign attributes to those features. Vector data is stored as points, lines or polygons and can contain multiple attributes for every feature. Vector data will be discussed in more detail in [the next episode](02-intro-vector-data.md).

This workshop will focus on how to work with both raster and vector
data sets, therefore it is essential that we understand the
basic structures of these types of data and the types of data
that they can be used to represent.
This lesson will focus on how to work with both raster and vector data sets, therefore it is essential to understand the basic structures of these data models and the types of data that they can be used to represent.

### About Raster Data

Raster data is any pixelated (or gridded) data where each pixel is associated
with a specific geographic location. The value of a pixel can be
continuous (e.g. elevation) or categorical (e.g. land use). If this sounds
familiar, it is because this data structure is very common: it's how
we represent any digital image. A geospatial raster is only different
from a digital photo in that it is accompanied by spatial information
that connects the data to a particular location. This includes the
raster's extent and cell size, the number of rows and columns, and
its coordinate reference system (or CRS).
Raster data is any pixelated (or gridded) data where each pixel, also called a cell, has the same size and is associated with a specific geographic location. The value of a pixel or cell can be continuous, typically stored as a float number, (e.g. temperature, precipitation, elevation) or categorical, stored as an integer (e.g. land-use types, building footprints, roads). If this sounds familiar, it is because this data structure is very common: it's how we represent any digital image. A geospatial raster is only different from a digital photo in that it is accompanied by spatial information that connects the data to a particular location. This includes the raster's extent and cell size, the number of rows and columns, and its coordinate reference system (or CRS).

![Raster Concept (Source: National Ecological Observatory Network (NEON))](fig/E01/raster_concept.png){alt="raster concept"}

Expand Down
54 changes: 33 additions & 21 deletions episodes/05-access-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
::: challenge
## Exercise: Discover a STAC catalog
Let's take a moment to explore the Earth Search STAC catalog, which is the catalog indexing the Sentinel-2 collection
that is hosted on AWS. We can interactively browse this catalog using the STAC browser at [this link](https://radiantearth.github.io/stac-browser/#/external/earth-search.aws.element84.com/v1).

Check warning on line 69 in episodes/05-access-data.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [this link](https://radiantearth.github.io/stac-browser/#/external/earth-search.aws.element84.com/v1)

1. Open the link in your web browser. Which (sub-)catalogs are available?
2. Open the Sentinel-2 Level 2A collection, and select one item from the list. Each item corresponds to a satellite
Expand All @@ -89,34 +89,39 @@
:::

When opening a catalog with the STAC browser, you can access the API URL by clicking on the "Source" button on the top
right of the page. By using this URL, we have access to the catalog content and, if supported by the catalog, to the
right of the page. By using this URL, you have access to the catalog content and, if supported by the catalog, to the
functionality of searching its items. For the Earth Search STAC catalog the API URL is:

```python
api_url = "https://earth-search.aws.element84.com/v1"
```

You can query a STAC API endpoint from Python using the [`pystac_client` library](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client). To do so we will first import `Client` from `pystac_client` and use the [method open from the Client class](https://pystac-client.readthedocs.io/en/stable/quickstart.html):
You can query a STAC API endpoint from Python using the [`pystac_client` library](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client).
To do so we will first import `Client` from `pystac_client` and use the [method open from the Client object](https://pystac-client.readthedocs.io/en/stable/quickstart.html):

```python
from pystac_client import Client

client = Client.open(api_url)
```

For this episode we will focus at scenes belonging to the `sentinel-2-l2a` collection. This dataset is useful for our case and includes Sentinel-2 data products pre-processed at level 2A (bottom-of-atmosphere reflectance).
For this episode we will focus at scenes belonging to the `sentinel-2-l2a` collection.
This dataset is useful for our case and includes Sentinel-2 data products pre-processed at level 2A (bottom-of-atmosphere reflectance).

In order to see which collections are available in the provided `api_url` the [`get_collections`](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.get_collections) method can be used on the Client class.
In order to see which collections are available in the provided `api_url` the
[`get_collections`](https://pystac-client.readthedocs.io/en/stable/api.html#pystac_client.Client.get_collections) method can be used on the Client object.

```python
collections = client.get_collections()
```

To print the collections we can make a for loop doing:

```python
for collection in collections:
print(collection)
```

```output
<CollectionClient id=cop-dem-glo-30>
<CollectionClient id=naip>
Expand All @@ -134,8 +139,7 @@
collection_sentinel_2_l2a = "sentinel-2-l2a"
```

The data in this collection is stored in the Cloud Optimized GeoTIFF (COG) format and as JPEG2000 images. In this episode we will focus at COGs, as these offer useful functionalities for our purpose.

The data in this collection is stored in the Cloud Optimized GeoTIFF (COG) format and as JPEG2000 images. In this episode we will focus at COGs, as these offer useful functionalities for our purpose.

::: callout
## Cloud Optimized GeoTIFFs
Expand All @@ -149,10 +153,11 @@
COGs typically include multiple lower-resolution versions of the original image, called "overviews", which can also be
accessed independently. By providing this "pyramidal" structure, users that are not interested in the details provided
by a high-resolution raster can directly access the lower-resolution versions of the same image, significantly saving
on the downloading time. More information on the COG format can be found [here](https://www.cogeo.org).

Check warning on line 156 in episodes/05-access-data.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [here](https://www.cogeo.org)
:::

In order to get data for a specific location you can add longitude latitude coordinates (World Geodetic System 1984 EPSG:4326) in your request. In order to do so we are using the `shapely` library to define a geometrical point.
In order to get data for a specific location you can add longitude latitude coordinates (World Geodetic System 1984 EPSG:4326) in your request.
In order to do so we are using the `shapely` library to define a geometrical point.
Below we have included a center point for the island of Rhodes, which is the location of interest for our case study (i.e. Longitude: 27.95 | Latitude 36.20).

```python
Expand Down Expand Up @@ -196,6 +201,8 @@

How many scenes are available?

:::: solution

```python
search = client.search(
collections=[collection_sentinel_2_l2a],
Expand All @@ -213,13 +220,15 @@
::::
:::

Now that we have added a time filter, we retrieve the metadata of the search results:
Now that we have added a time filter, we retrieve the metadata of the search results by calling the method `item_collection`:

```python
items = search.item_collection()
```

The variable `items` is an `ItemCollection` object. We can check its size by:
The variable `items` is an `ItemCollection` object. More information can be found at the [pystac documentation](https://pystac.readthedocs.io/en/latest/api/pystac.html#pystac.ItemCollection)

Now let us check the size using `len`:

```python
print(len(items))
Expand Down Expand Up @@ -253,9 +262,10 @@
```

Each of the items contains information about the scene geometry, its acquisition time, and other metadata that can be
accessed as a dictionary from the `properties` attribute. To see which information each item contains you can have a look at the [item documentation of pystac](https://pystac.readthedocs.io/en/latest/api/item.html).
accessed as a dictionary from the `properties` attribute. To see which information Item objects can contain you can have a look at the [pystac documentation](https://pystac.readthedocs.io/en/latest/api/pystac.html#pystac.Item).

Let us inspect the metadata associated with the first item of the search results. Let us first look at the collection date of the first item::

Let us inspect the metadata associated with the first item of the search results. Let us first look at collection date of the first item::
```python
item = items[0]
print(item.datetime)
Expand All @@ -277,7 +287,7 @@
{'created': '2023-08-27T18:15:43.106Z', 'platform': 'sentinel-2a', 'constellation': 'sentinel-2', 'instruments': ['msi'], 'eo:cloud_cover': 0.955362, 'proj:epsg': 32635, 'mgrs:utm_zone': 35, 'mgrs:latitude_band': 'S', 'mgrs:grid_square': 'NA', 'grid:code': 'MGRS-35SNA', 'view:sun_azimuth': 144.36354987218, 'view:sun_elevation': 59.06665363921, 's2:degraded_msi_data_percentage': 0.0126, 's2:nodata_pixel_percentage': 12.146327, 's2:saturated_defective_pixel_percentage': 0, 's2:dark_features_percentage': 0.249403, 's2:cloud_shadow_percentage': 0.237454, 's2:vegetation_percentage': 6.073786, 's2:not_vegetated_percentage': 18.026696, 's2:water_percentage': 74.259061, 's2:unclassified_percentage': 0.198216, 's2:medium_proba_clouds_percentage': 0.613614, 's2:high_proba_clouds_percentage': 0.341423, 's2:thin_cirrus_percentage': 0.000325, 's2:snow_ice_percentage': 2.3e-05, 's2:product_type': 'S2MSI2A', 's2:processing_baseline': '05.09', 's2:product_uri': 'S2A_MSIL2A_20230827T084601_N0509_R107_T35SNA_20230827T115803.SAFE', 's2:generation_time': '2023-08-27T11:58:03.000000Z', 's2:datatake_id': 'GS2A_20230827T084601_042718_N05.09', 's2:datatake_type': 'INS-NOBS', 's2:datastrip_id': 'S2A_OPER_MSI_L2A_DS_2APS_20230827T115803_S20230827T085947_N05.09', 's2:granule_id': 'S2A_OPER_MSI_L2A_TL_2APS_20230827T115803_A042718_T35SNA_N05.09', 's2:reflectance_conversion_factor': 0.978189079756816, 'datetime': '2023-08-27T09:00:21.327000Z', 's2:sequence': '0', 'earthsearch:s3_path': 's3://sentinel-cogs/sentinel-s2-l2a-cogs/35/S/NA/2023/8/S2A_35SNA_20230827_0_L2A', 'earthsearch:payload_id': 'roda-sentinel2/workflow-sentinel2-to-stac/af0287974aaa3fbb037c6a7632f72742', 'earthsearch:boa_offset_applied': True, 'processing:software': {'sentinel2-to-stac': '0.1.1'}, 'updated': '2023-08-27T18:15:43.106Z'}
```

If we want to access one item in the dictionary, for instance the EPSG code of the projected coordinate system, you need to access the item in the dictionary as usual. For instance:
Now if we want to access one item in the dictionary, for instance the EPSG code of the projected coordinate system, you need to access the item in the dictionary as usual. For instance:

```python
print(item.properties['proj:epsg'])
Expand Down Expand Up @@ -382,7 +392,7 @@
wvp-jp2: Water vapour (WVP)
```

Among the others, assets include multiple raster data files (one per optical band, as acquired by the multi-spectral
Among the other data files, assets include multiple raster data files (one per optical band, as acquired by the multi-spectral
instrument), a thumbnail, a true-color image ("visual"), instrument metadata and scene-classification information
("SCL"). Let's get the URL link to the thumbnail, which gives us a glimpse of the Sentinel-2 scene:

Expand Down Expand Up @@ -417,11 +427,15 @@

Now let us focus on the near ´red´ band by accessing the item `red` from the assets dictionary and get the Hypertext Reference (also known as URL) attribute using `.href` after the item selection.

Remote raster data can be directly opened via the `rioxarray` library. We will learn more about this library in the next episodes.

For now we are using [rioxarray to open the raster file](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray-open-rasterio).

```python
import rioxarray
nir_href = assets["red"].href
nir = rioxarray.open_rasterio(nir_href)
print(nir)
red_href = assets["red"].href
red = rioxarray.open_rasterio(red_href)
print(red)
```

```output
Expand All @@ -444,7 +458,7 @@

```python
# save whole image to disk
nir.rio.to_raster("red.tif")
red.rio.to_raster("red.tif")
```

That might take a while, given there are over 10000 x 10000 = a hundred million pixels in the 10-meter NIR band.
Expand All @@ -453,7 +467,7 @@
In order to do that, we are using rioxarray´s [clip_box](https://corteva.github.io/rioxarray/stable/examples/clip_box.html) with which you can set a bounding box defining the area you want.

```python
nir_subset = nir.rio.clip_box(
red_subset = red.rio.clip_box(
minx=560900,
miny=3995000,
maxx=570900,
Expand All @@ -463,14 +477,12 @@
Next, we save the subset using `to_raster` again.

```python
nir_subset.rio.to_raster("red_subset.tif")
red_subset.rio.to_raster("red_subset.tif")
```

The difference is 241 Megabytes for the full image vs less than 10 Megabytes for the subset.




:::challenge
## Exercise: Downloading Landsat 8 Assets
In this exercise we put in practice all the skills we have learned in this episode to retrieve images from a different
Expand Down Expand Up @@ -540,7 +552,7 @@
The authentication procedure for dataset with restricted access might differ depending on the data provider. For the
NASA CMR, follow these steps in order to access data using Python:

* Create a NASA Earthdata login account [here](https://urs.earthdata.nasa.gov);

Check warning on line 555 in episodes/05-access-data.md

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [here](https://urs.earthdata.nasa.gov)
* Set up a netrc file with your credentials, e.g. by using [this script](https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse/EarthdataLoginSetup.py);
* Define the following environment variables:

Expand Down
36 changes: 23 additions & 13 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,29 @@
site: sandpaper::sandpaper_site
---

Data Carpentry’s teaching is hands-on, so participants are encouraged to use
their own computers to ensure the proper setup of tools for an efficient
workflow. To most effectively use these materials, please make sure to download
the data and install everything before working through this lesson.

The data used in this lesson includes optical satellite images from
[the Copernicus Sentinel-2 mission][sentinel-2] and public geographical datasets
from [the dedicated distribution platform of the Dutch government][pdok].
These are real-world data sets that entail sufficient complexity to teach many
aspects of data analysis and management. They have been selected to allow students
to focus on the core ideas and skills being taught while offering the chance
to encounter common challenges with geospatial data.
## Data Carpentries
[Data Carpentry’s](https://datacarpentry.org/) teaching is hands-on. Participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow. To most effectively use these materials, please make sure to download the data and install everything before working through this lesson.

## Geospatial Raster and Vector Data with Python
In this lesson you will learn how to work with geospatial data and how to process these with python. Python is one of the most popular programming languages for data science and analytics, with a large and steadily growing community in the field of Earth and Space Sciences. The lesson is meant for participants with a working basic knowledge of Python and allow them to to familiarize with the world of geospatial raster and vector data. (If you are unfamiliar to python we recommend you to follow [this course](https://swcarpentry.github.io/python-novice-inflammation/) or have a look [here](https://greenteapress.com/thinkpython2/thinkpython2.pdf) ). In the *Introduction to Geospatial Raster and Vector Data with Python* lesson you will be introduced to a set of tools from the Python ecosystem and learn how these can be used to carry out geospatial data analysis tasks. In particular, you will learn to work with satellite images (i.e. [the Copernicus Sentinel-2 mission][sentinel-2] ) and open topographical geo-datasets (i.e. [OpenStreetmap][osm]). You will learn how these spatial datasets can be accessed, explored, manipulated and visualized using Python.

## Case study - Wildfires
As a case study for this lesson we will focus on wildfires. According to the IPCC assessment report, the wildfire seasons are lengthening as a result of changes in temperature and increasing drought conditions [IPCC](https://www.ipcc.ch/report/ar6/wg2/about/frequently-asked-questions/keyfaq1/). To analyse the impact of these wildfires, we will focus on the wildfire that occured on the Greek island [Rhodes in the summer of 2023](https://news.sky.com/story/wildfires-on-rhodes-force-hundreds-of-holidaymakers-to-flee-their-hotels-12925583), which led to the evacuation of [19.000 people](https://en.wikipedia.org/wiki/2023_Greece_wildfires).

The analysis that you are going to work on is to estimate which built-up areas were affected by these wildfires. Furthermore, we will analyse which vegetation and land-use types have been affected the most by the wildfire in order to get an understanding of which areas are more vulnerable to wildfires. The latter will generate insights which can be used as input for wildfire mitigation strategies.

The data used in this lesson includes optical satellite images from [the Copernicus Sentinel-2 mission][sentinel-2] and topgraphical data from [OpenStreetmap][osm]. These datasets are real-world open data sets that entail sufficient complexity to teach many aspects of data analysis and management. The datasets have been selected to allow participants to focus on the core ideas and skills being taught while offering the chance to encounter common challenges with geospatial data. Furthermore, we have selected datasets which are available anywhere on earth.

*Note, that the analyses presented in this lesson are developed for educational purposes. Therefore in some occasions the analysis steps have been simplified and assumptions have been made.*

## Python libraries used in this lesson
The main python libraries that are used in this lesson are:
- [geopandas](https://geopandas.org/en/stable/)
- [rioxarray](https://github.com/corteva/rioxarray)
- [xarray-spatial](https://xarray-spatial.org/)
- [dask](https://www.dask.org/)
- [pystac](https://pystac.readthedocs.io/en/stable/)

[sentinel-2]: https://sentinel.esa.int/web/sentinel/missions/sentinel-2
[pdok]: https://www.pdok.nl
[osm]: https://www.openstreetmap.org/#map=14/45.2935/18.7986
[workbench]: https://carpentries.github.io/sandpaper-docs
Loading