Skip to content

Commit

Permalink
differences for PR #106
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Apr 30, 2024
1 parent 707cb4d commit 572df43
Show file tree
Hide file tree
Showing 7 changed files with 330 additions and 160 deletions.
28 changes: 5 additions & 23 deletions 01-intro-raster-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,39 +15,21 @@ exercises: 5
- Describe the strengths and weaknesses of storing data in raster format.
- Distinguish between continuous and categorical raster data and identify types of datasets that would be stored in each format.
:::


## Introduction

This episode introduces the two primary types of geospatial
data: rasters and vectors. After briefly introducing these
data types, this episode focuses on raster data, describing
some major features and types of raster data.
This episode introduces the two primary types of geospatial data: rasters and vectors. After briefly introducing these data types, this episode focuses on raster data, describing some major features and types of raster data.

## Data Structures: Raster and Vector

The two primary types of geospatial data are raster
and vector data. Raster data is stored as a grid of values which are rendered on a
map as pixels. Each pixel value represents an area on the Earth's surface. Vector data structures represent specific features on the
Earth's surface, and
assign attributes to those features. Vector data structures
will be discussed in more detail in [the next episode](02-intro-vector-data.md).
The two primary types of geospatial data are raster and vector data. Raster data is stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth's surface. Vector data structures represent specific features on the Earth's surface, and assign attributes to those features. Vector data is stored as points, lines or polygons and can contain multiple attributes for every feature. Vector data will be discussed in more detail in [the next episode](02-intro-vector-data.md).

This workshop will focus on how to work with both raster and vector
data sets, therefore it is essential that we understand the
basic structures of these types of data and the types of data
that they can be used to represent.
This lesson will focus on how to work with both raster and vector data sets, therefore it is essential to understand the basic structures of these data models and the types of data that they can be used to represent.

### About Raster Data

Raster data is any pixelated (or gridded) data where each pixel is associated
with a specific geographic location. The value of a pixel can be
continuous (e.g. elevation) or categorical (e.g. land use). If this sounds
familiar, it is because this data structure is very common: it's how
we represent any digital image. A geospatial raster is only different
from a digital photo in that it is accompanied by spatial information
that connects the data to a particular location. This includes the
raster's extent and cell size, the number of rows and columns, and
its coordinate reference system (or CRS).
Raster data is any pixelated (or gridded) data where each pixel, also called a cell, has the same size and is associated with a specific geographic location. The value of a pixel or cell can be continuous, typically stored as a float number, (e.g. temperature, precipitation, elevation) or categorical, stored as an integer (e.g. land-use types, building footprints, roads). If this sounds familiar, it is because this data structure is very common: it's how we represent any digital image. A geospatial raster is only different from a digital photo in that it is accompanied by spatial information that connects the data to a particular location. This includes the raster's extent and cell size, the number of rows and columns, and its coordinate reference system (or CRS).

![Raster Concept (Source: National Ecological Observatory Network (NEON))](fig/E01/raster_concept.png){alt="raster concept"}

Expand Down
299 changes: 209 additions & 90 deletions 05-access-data.md

Large diffs are not rendered by default.

119 changes: 89 additions & 30 deletions 06-raster-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,48 +17,107 @@ exercises: 30
- Visualize single/multi-band raster data.
:::

Raster datasets have been introduced in [Episode 1: Introduction to Raster Data](01-intro-raster-data.md). Here, we introduce the fundamental principles, packages and metadata/raster attributes for working with raster data in Python. We will also explore how Python handles missing and bad data values.

[`rioxarray`](https://corteva.github.io/rioxarray/stable/) is the Python package we will use throughout this lesson to work with raster data. It is based on the popular [`rasterio`](https://rasterio.readthedocs.io/en/latest/) package for working with rasters and [`xarray`](https://xarray.pydata.org/en/stable/) for working with multi-dimensional arrays.
`rioxarray` extends `xarray` by providing top-level functions (e.g. the `open_rasterio` function to open raster datasets) and by adding a set of methods to the main objects of the `xarray` package (the `Dataset` and the `DataArray`). These additional methods are made available via the `rio` accessor and become available from `xarray` objects after importing `rioxarray`.
:::callout
## Raster Data

We will also use the [`pystac`](https://github.com/stac-utils/pystac) package to load rasters from the search results we created in the previous episode.
In the [first episode](01-intro-raster-data.md) of this course we provided an introduction on what Raster datasets are and how these divert from vector data. In this episode we will dive more into raster data and focus on how to work with them. We introduce fundamental principles, python packages, metadata and raster attributes for working with this type of data. In addition, we will explore how Python handles missing and bad data values.

The Python package we will use throughout this episode to handle raster data is [`rioxarray`](https://corteva.github.io/rioxarray/stable/). This package is based on the popular [`rasterio`](https://rasterio.readthedocs.io/en/latest/) package for working with raster data and [`xarray`](https://xarray.pydata.org/en/stable/) for working with multi-dimensional arrays.

`Rioxarray` extends `xarray` by providing top-level functions like the [`open_rasterio`](https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray-open-rasterio) function to open raster datasets. Furthermore, it adds a set of methods to the main objects of the `xarray` package like the [`Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) and the [`DataArray`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html#xarray.DataArray). These methods are made available via the `rio` accessor and become available from `xarray` objects after importing `rioxarray`.

*[comment_mdk]: what do we mean with the "rio accessor"? Furthermore, it is unclear in this text what the advantages are of dataset and dataarray, why are these interesting for us? We should explain that.*

In addition, we will use the [`pystac`](https://github.com/stac-utils/pystac) package to load rasters from the search results that were created in [episode 5](05-access-data.md).

:::callout
## Introduce the Raster Data
We'll continue from the results of the satellite image search that we have carried out in an exercise from
[a previous episode](05-access-data.md). We will load data starting from the `search.json` file,
using one scene from the search results as an example to demonstrate data loading and visualization.

If you would like to work with the data for this lesson without downloading data on-the-fly, you can download the
raster data ahead of time using this [link](https://figshare.com/ndownloader/files/36028100). Save the `geospatial-python-raster-dataset.tar.gz`
file in your current working directory, and extract the archive file by double-clicking on it or by running the
following command in your terminal `tar -zxvf geospatial-python-raster-dataset.tar.gz`. Use the file `geospatial-python-raster-dataset/search.json`
(instead of `search.json`) to get started with this lesson.

This can be useful if you need to download the data ahead of time to work through the lesson offline, or if you want
to work with the data in a different GIS.
:::

## Load a Raster and View Attributes
In the previous episode, we searched for Sentinel-2 images, and then saved the search results to a file: `search.json`. This contains the information on where and how to access the target images from a remote repository. We can use the function `pystac.ItemCollection.from_file()` to load the search results as an `Item` list.
## Getting the data

We will continue from the results of the satellite image search that we have carried out in an exercise from
[the previous episode](05-access-data.md). We will load data using the created metadata file and use one scene from the search results as an example to demonstrate data loading and visualization. To do so you can use the `rhodes_sentinel-2.json` you created or download and use the created JSON file [here](../data/stac_json/rhodes_sentinel-2.json).

In case you would like to work with raster data for this lesson without downloading data on-the-fly, you can download the raster data using this [link](https://figshare.com/ndownloader/files/36028100). Save the `geospatial-python-raster-dataset.tar.gz` file in your current working directory, and extract the archive file by double-clicking on it or by running the
following command in your terminal `tar -zxvf geospatial-python-raster-dataset.tar.gz`.

*[comment mdk]: update zip file [link](https://figshare.com/ndownloader/files/36028100)*
If you use choose to download the data you can skip the following part and continue at chapter XXXX.

When you use the `rhodes_sentinel-2.json` you will remember that this file contains information on where and how to access the target images from a remote repository. To load the saved search results as an `Item` list we will use [`pystac.ItemCollection.from_file()`](https://pystac.readthedocs.io/en/stable/api/item_collection.html). Through this, we are instructing Python to use the `from_file` method of the `ItemCollection` class in the `pystac` module to load data from a specified JSON file.

```python
import pystac
items = pystac.ItemCollection.from_file("search.json")
file_items = pystac.ItemCollection.from_file("../data/stac_json/rhodes_sentinel-2.json")
```
In the search results, we have eleven `Item` type objects, corresponding to the Sentinel-2 scenes parameters we set in episode 5 (which were all images from July 8th till the 27th of August 2023 that have less than 1% cloud coverage). In this episode we will focus on the oldest item in our collection, since that would show the situation before the wildfire.

Like when reading the data from the api we can perform the same actions on the collection since it is stored to the class ItemColleciton. For instance, you can check the number of items in the item collection, you can use `len` or to check the items themselved include a for loop.

```python
print(len(file_items))
```
To check the items themselves you can make a for loop on the items (as we did in episode 5). To look at specific parameters of the item you can have a look [here](https://pystac.readthedocs.io/en/latest/api/item.html). Since we are interested in the most recent image let us print the `datetime` and `id`:

```python
for item in file_items:
print(item.id)
print(item.datetime)
```

You will notice that the item collection `S2A_35SNA_20230708_0_L2A` would be the oldest since it is printed as last.

```python
print(file_items[-1])
```

```output
<Item id=S2A_35SNA_20230708_0_L2A>
```

To access an actual raster image from that date we look at the item´s attribute `assets` which is accessible through a dictionary. This contains all the various datasets that have been collected at that specific date.

```python
assets = file_items[-1].assets
print(assets.keys())
```

```output
dict_keys(['aot', 'blue', 'coastal', 'granule_metadata', 'green', 'nir', 'nir08', 'nir09', 'red', 'rededge1', 'rededge2', 'rededge3', 'scl', 'swir16', 'swir22', 'thumbnail', 'tileinfo_metadata', 'visual', 'wvp', 'aot-jp2', 'blue-jp2', 'coastal-jp2', 'green-jp2', 'nir-jp2', 'nir08-jp2', 'nir09-jp2', 'red-jp2', 'rededge1-jp2', 'rededge2-jp2', 'rededge3-jp2', 'scl-jp2', 'swir16-jp2', 'swir22-jp2', 'visual-jp2', 'wvp-jp2']
```

To analyse the burned areas, we are interested in the `red` band of the satellite scene. In [episode 9](/episodes/09-raster-calculations.md) we will further explain why the characteristics of that band are interesting in relation to wildfires.

To select the red band we need to get the URL / `href` (Hypertext Referenc) and put it into a variable.

```python
red_href = assets["red"].href
```

```python
print(red_href)
```

## Load a Raster and View Attributes

Now we can load `red` band using the function [`rioxarray.open_rasterio()`](), via the variable we created.

In the search results, we have 2 `Item` type objects, corresponding to 4 Sentinel-2 scenes from March 26th and 28th in 2020. We will focus on the first scene `S2A_31UFU_20200328_0_L2A`, and load band `nir09` (central wavelength 945 nm). We can load this band using the function `rioxarray.open_rasterio()`, via the Hypertext Reference `href` (commonly referred to as a URL):
```python
import rioxarray
raster_ams_b9 = rioxarray.open_rasterio(items[0].assets["nir09"].href)
red = rioxarray.open_rasterio(red_href)
```

By calling the variable name in the jupyter notebook we can get a quick look at the shape and attributes of the data.
In case you used the downloaded data you can do.

```python
raster_ams_b9
import rioxarray
red = rioxarray.open_rasterio("../data/stac_json/red.tif")
```


By calling the variable name we can get a quick look at the shape and attributes of the data.
```python
red
```

```output
Expand All @@ -82,11 +141,11 @@ The output tells us that we are looking at an `xarray.DataArray`, with `1` band,
This `DataArray` object also has a couple of attributes that are accessed like `.rio.crs`, `.rio.nodata`, and `.rio.bounds()`, which contain the metadata for the file we opened. Note that many of the metadata are accessed as attributes without `()`, but `bounds()` is a method (i.e. a function in an object) and needs parentheses.

```python
print(raster_ams_b9.rio.crs)
print(raster_ams_b9.rio.nodata)
print(raster_ams_b9.rio.bounds())
print(raster_ams_b9.rio.width)
print(raster_ams_b9.rio.height)
print(red.rio.crs)
print(red.rio.nodata)
print(red.rio.bounds())
print(red.rio.width)
print(red.rio.height)
```

```output
Expand Down
Binary file added fig/E05/STAC-s2-preview-after.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/E05/STAC-s2-preview-before.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
36 changes: 23 additions & 13 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,29 @@
site: sandpaper::sandpaper_site
---

Data Carpentry’s teaching is hands-on, so participants are encouraged to use
their own computers to ensure the proper setup of tools for an efficient
workflow. To most effectively use these materials, please make sure to download
the data and install everything before working through this lesson.

The data used in this lesson includes optical satellite images from
[the Copernicus Sentinel-2 mission][sentinel-2] and public geographical datasets
from [the dedicated distribution platform of the Dutch government][pdok].
These are real-world data sets that entail sufficient complexity to teach many
aspects of data analysis and management. They have been selected to allow students
to focus on the core ideas and skills being taught while offering the chance
to encounter common challenges with geospatial data.
## Data Carpentries
[Data Carpentry’s](https://datacarpentry.org/) teaching is hands-on. Participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow. To most effectively use these materials, please make sure to download the data and install everything before working through this lesson.

## Geospatial Raster and Vector Data with Python
In this lesson you will learn how to work with geospatial data and how to process these with python. Python is one of the most popular programming languages for data science and analytics, with a large and steadily growing community in the field of Earth and Space Sciences. The lesson is meant for participants with a working basic knowledge of Python and allow them to to familiarize with the world of geospatial raster and vector data. (If you are unfamiliar to python we recommend you to follow [this course](https://swcarpentry.github.io/python-novice-inflammation/) or have a look [here](https://greenteapress.com/thinkpython2/thinkpython2.pdf) ). In the *Introduction to Geospatial Raster and Vector Data with Python* lesson you will be introduced to a set of tools from the Python ecosystem and learn how these can be used to carry out geospatial data analysis tasks. In particular, you will learn to work with satellite images (i.e. [the Copernicus Sentinel-2 mission][sentinel-2] ) and open topographical geo-datasets (i.e. [OpenStreetmap][osm]). You will learn how these spatial datasets can be accessed, explored, manipulated and visualized using Python.

## Case study - Wildfires
As a case study for this lesson we will focus on wildfires. According to the IPCC assessment report, the wildfire seasons are lengthening as a result of changes in temperature and increasing drought conditions [IPCC](https://www.ipcc.ch/report/ar6/wg2/about/frequently-asked-questions/keyfaq1/). To analyse the impact of these wildfires, we will focus on the wildfire that occured on the Greek island [Rhodes in the summer of 2023](https://news.sky.com/story/wildfires-on-rhodes-force-hundreds-of-holidaymakers-to-flee-their-hotels-12925583), which led to the evacuation of [19.000 people](https://en.wikipedia.org/wiki/2023_Greece_wildfires).

The analysis that you are going to work on is to estimate which built-up areas were affected by these wildfires. Furthermore, we will analyse which vegetation and land-use types have been affected the most by the wildfire in order to get an understanding of which areas are more vulnerable to wildfires. The latter will generate insights which can be used as input for wildfire mitigation strategies.

The data used in this lesson includes optical satellite images from [the Copernicus Sentinel-2 mission][sentinel-2] and topgraphical data from [OpenStreetmap][osm]. These datasets are real-world open data sets that entail sufficient complexity to teach many aspects of data analysis and management. The datasets have been selected to allow participants to focus on the core ideas and skills being taught while offering the chance to encounter common challenges with geospatial data. Furthermore, we have selected datasets which are available anywhere on earth.

*Note, that the analyses presented in this lesson are developed for educational purposes. Therefore in some occasions the analysis steps have been simplified and assumptions have been made.*

## Python libraries used in this lesson
The main python libraries that are used in this lesson are:
- [geopandas](https://geopandas.org/en/stable/)
- [rioxarray](https://github.com/corteva/rioxarray)
- [xarray-spatial](https://xarray-spatial.org/)
- [dask](https://www.dask.org/)
- [pystac](https://pystac.readthedocs.io/en/stable/)

[sentinel-2]: https://sentinel.esa.int/web/sentinel/missions/sentinel-2
[pdok]: https://www.pdok.nl
[osm]: https://www.openstreetmap.org/#map=14/45.2935/18.7986
[workbench]: https://carpentries.github.io/sandpaper-docs
Loading

0 comments on commit 572df43

Please sign in to comment.