forked from Env-Data-Sci-FA23/Week-4-Geospatial
-
Notifications
You must be signed in to change notification settings - Fork 0
/
01_spatial_intro.Rmd
383 lines (238 loc) · 17.6 KB
/
01_spatial_intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
---
title: "Intro to Spatial Data in R"
author: "Caitlin Mothes"
date: "`r Sys.Date()`"
output: github_document
---
```{r setup, include= FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, eval = FALSE)
```
## 1. Spatial Data Formats
**Vector Data**
- Locations (points)
- Coordinates, address, country, city
- Shapes (lines or polygons)
- Political boundaries, roads, building footprints, water bodies
**Raster Data**
- Images (matrix of cells organized by rows and columns)
- Satellite imagery, climate, landcover, elevation
![](spatial_formats.png){width="50%"}
## 2. Import and manipulate spatial data
There are a few new R packages we will need to work with spatial data, listed below with hyperlinks and decribed in more detail throughout this and other lessons.
- `sf` : working with vector data
- `terra` : working with raster data
- `tmap` : visualizing spatial data (i.e., making maps!)
- `tigris` : import vector data from the U.S. Census database (i.e., political boundaries, roads, etc.)
- `elevatr` : import elevation data
- `rgbif` (optional) : import species occurrence data from the GBIF database
- `soilDB` (optional) : import snow depth data from SNOTEL sites across the U.S.
We've already added these packages to a 'setup.R' script in this project directory, so you can use `source("setup.R")` at the beginning of each lesson if you want, otherwise you will need to install each new one manually with `install.packages()`.
```{r}
source("setup.R")
```
### 2.1 Vector Data
### [`tigris`](https://github.com/walkerke/tigris)
### **Polygons**
All the data we are working with in this lesson is confined to the state of Colorado. Let's start by pulling in political boundaries for Colorado counties with the `tigris` package, which returns a shapefile consisting of polygons for each county.
```{r}
# download county shapefile for the state of Colorado
co_counties <- counties(state = "CO")
```
The `tigris` package is one of many data retrieval R packages that uses API calls to pull in data from various online/open databases directly into your R session, without the need to separately download. When you close out your R session, these 'temp' files are erased, so it does not use up any of your local storage.
At the end of this lesson you will learn how to save shapefiles to your computer if you do in fact want to store and use them in the future (e.g., you manipulated a data set quite a bit and don't want to re-run the entire process every new R session).
### **Lines**
`tigris` has many other data sets in addition to political boundaries. Today let's work with another shapefile, importing roads for Larimer county, which returns a polyline dataset for all roads in Larimer County.
```{r}
co_roads <- roads(state = "CO", county = "Larimer")
```
### [`tmap`](https://r-tmap.github.io/tmap/)
Throughout this lesson we will be using the `tmap` package to produce quick static or interactive maps.
`tmap` allows for both static ("plot" mode) and interactive ("view" mode) mapping options, which you can set using the function `tmap_mode()` . For today we will be making quick interactive plots. Once you set the mode with `tmap_mode()`, every plot call to `tmap` after that produces a plot in that mode.
```{r}
tmap_mode("view")
```
Lets view our Colorado counties and Larimer County roads shapefiles. To make a "quick thematic map" in `tmap` you can use the `qtm()` function. You can also use `tm_shape()` plus the type of spatial layer (e.g., `tm_polygons()`) to add your layers to the map. Both methods below will produce the same exact map, and you may think why would you ever need to use the `tm_shape()` method since its more code? The answer may be rarely, but there are some cases where you can customize your maps better with `tm_shape()` that we will see later on.
Also notice that `tmap` uses `+` signs to tack on additional maps/elements similar to `ggplot2` code (i.e., no pipe!)
*Note: map rendering may take a few seconds because the roads layer is pretty large and detailed.*
```{r}
# Option 1: Using qtm()
qtm(co_counties)+
qtm(co_roads)
```
```{r}
# Option 2: Using tm_shape()
tm_shape(co_counties)+
tm_polygons()+
tm_shape(co_roads)+
tm_lines()
```
Mess around with this map a little bit. See that you can change the basemap, turn layers on and off, and click on features to see their attributes.
There are a ton of ways to customize these maps (more details on this in the spatial viz lesson!). For example, `co_counties` has an 'ALAND' variable, which represents the total land area of each county. To color by that variable we would use:
```{r}
qtm(co_counties, fill = "ALAND")
```
Let's inspect the spatial data sets a little more. What do you see when you run the following line of code?
```{r}
class(co_counties)
```
### [`sf`](https://r-spatial.github.io/sf/)
By default, the `tigris` package imports spatial data in `sf` format, which stands for 'simple features'. The `sf` package provides an easy and efficient way to work with vector data, and represents spatial features as a `data.frame` or `tibble` with a **geometry** column, and therefore also works well with `tidyverse` packages to perform manipulations like you would a data frame.
For example, we are going to do an exercise for the Poudre Canyon Highway, so we want to filter out the roads data set to only those features. Using your investigative geography skills and your interactive map, find the highway on your map and find out what the exact 'FULLNAME' attribute is, and use that to `filter()` the data set. Call the new roads feature `poudre_hwy`.
```{r}
poudre_hwy <- co_roads %>%
filter(FULLNAME == "Poudre Canyon Hwy")
qtm(poudre_hwy)
```
### Points
Most often when you are working with points, you start with an excel file or something similar that consists of the raw latitude and longitude. When you have spatial data that is not explicitly spatial yet or not in the `sf` format, you use the `st_as_sf()` function to transform it.
Lets work with a couple locations along the Poudre highway, making a small data frame of their coordinates:
```{r}
poudre_points <- data.frame(name = c("Mishawaka", "Rustic", "Blue Lake Trailhead"),
long = c(-105.35634, -105.58159, -105.85563),
lat = c(40.68752, 40.69687, 40.57960))
```
Right now, `poudre_points` is just a data frame (run `class(poudre_points)` to check). We need to convert it to a spatial (`sf`) object first in order to map and spatially analyze it.
Within the `st_as_sf()` function we need to specifying the longitude and latitude columns in our `poudre_points` data frame and the CRS (Coordinate Reference System). **Note** **that 'x' (longitude) always goes first followed by 'y' (latitude).** Otherwise it will map your points on the other side of the world.
```{r}
poudre_points_sf <- st_as_sf(poudre_points, coords = c("long", "lat"), crs = 4326)
qtm(poudre_hwy)+
qtm(poudre_points_sf)
```
Note the 4-digit number we assign for `crs`. This is an EPSG code, which is tied to a specific CRS called WGS84 and one of the most common reference systems coordinates are recorded in (often noted by the fact that the values are in decimal degrees). This is used by Google Earth, the U.S. Department of Defense and all GPS satellites (among others). A full list of EPSG codes and coordinate reference systems can be found [here](https://spatialreference.org/ref/epsg/). Note, there are A LOT. Probably the most common used in the U.S. are WGS84 (a global CRS) and NAD83 (used by many U.S. federal agencies).
### Coordinate Reference Systems
Probably the most important part of working with spatial data is the coordinate reference system (CRS) that is used. The CRS describes how and where your spatial data is located on Earth. There are numerous different CRS's depending on when and how the data was collected, the spatial location and extent it was collected, etc. In order to analyze and visualize spatial data, **all objects must be in the exact same CRS**.
We can check a spatial object's CRS by printing it the object name to the console, which will return a bunch of metadata about the object. You can specifically return the CRS for `sf` objects with `st_crs()`.
```{r}
# see the CRS in the header metadata:
co_counties
#return just the CRS (more detailed)
st_crs(co_counties)
```
You can check if two objects have the same CRS like this:
```{r}
st_crs(poudre_hwy) == st_crs(poudre_points_sf)
```
Uh oh, the CRS of our points and lines doesn't match. While `tmap` performs some on-the-fly transformations to map the two layers together, in order to do any analyses with these objects you'll need to re-project one of them. You can project one object's CRS to that of another with `st_transform` like this:
```{r}
# transform the CRS of poudre_points_sf to the CRS of poudre_hwy
poudre_points_prj <- st_transform(poudre_points_sf, st_crs(poudre_hwy))
# Now check that they match
st_crs(poudre_points_prj) == st_crs(poudre_hwy)
```
### 2.2 Raster Data
### [`elevatr`](https://github.com/jhollist/elevatr/)
Lets import some elevation data using the `elevatr` package. The function `get_elev_raster()` returns a raster digital elevation model (DEM) from the AWS Open Data Terrain Tiles. For this function you must supply a spatial object specifying the **extent** of the returned elevation raster and the resolution (specified by the zoom level `z`). We are importing elevation at \~ 1km resolution (more like 900 m), and we can use our `co_counties` object as the extent we want to download to, which will return elevation tiles for the state of Colorado.
*Note: 'extent' is the spatial bounding box of the data (represented by the x,y coordinates of the four corners inclusive of the entire spatial data)*
```{r}
co_elevation <- get_elev_raster(co_counties, z = 7)
```
```{r}
qtm(co_elevation)
```
By default, `tmap` uses a categorical symbology to color the cells by elevation. You can change that to a continuous palette like this (an example of when `tm_shape()` allows us to edit the map more):
```{r}
tm_shape(co_elevation)+
tm_raster(style = "cont", title = "Elevation (m)")
```
When we see this on a map, we see that it actually extends beyond Colorado due to how the Terrain Tiles are spatially organized.
Let's inspect this raster layer a little. By printing the object name to the console we see a bunch of metadata like resolution (cell/pixel size), extent, CRS, and file name.
```{r}
co_elevation
```
### `terra`
We use the `terra` package to work with raster data. For example, we only want to see elevation along the Poudre highway. We can use `crop` to crop the raster to the extent of our `poudre_hwy` spatial object using the `ext()` function to get the extent (i.e., bounding box) of our `poudre_hwy` object.
However...the following line of code **doesn't work:**
```{r}
# If we try this, we get an error
co_elevation_crop <- crop(co_elevation, ext(poudre_hwy))
```
This doesn't work because our `co_elevation` object is actually not in the proper format to work with the `terra` package. The `elevatr` package still uses the `raster` package to work with raster data, however this package is outdated and we want to stick with `terra` for this course and any future work you do with raster data.
```{r}
# note the data type of elevation is RasterLayer
class(co_elevation)
```
`terra` uses objects of a new class called `SpatRaster`. Converting a `RasterLayer` to a `SpatRaster` is quick using the `rast()` function.
```{r}
co_elevation <- rast(co_elevation)
```
Now check the class:
```{r}
class(co_elevation)
```
Now we can use `terra` functions, and re-run the `crop()` code we tried earlier:
```{r}
co_elevation_crop <- crop(co_elevation, ext(poudre_hwy))
```
Plot all our spatial layers together:
```{r}
qtm(co_elevation_crop) +
qtm(poudre_hwy) +
qtm(poudre_points_prj)
```
## 3. Reading and Writing Spatial Data
### 3.1 Writing spatial data
All of the spatial data we've worked with are only saved as objects in our environment. To save the data to disk, the `sf` and `terra` packages have functions to do so. You are not required to save these files, but if you want to follow along with these functions save the data to the 'data/' folder.
To save vector data with `sf`, use `write_sf()`
```{r}
write_sf(poudre_hwy, "data/poudre_hwy.shp")
write_sf(poudre_points_prj, "data/poudre_points.shp")
```
While you can give the file any name you want, note that you **must put '.shp' as the extension of the file**. While '.shp' stands for 'shapefile', if you run the code above you'll notice a bunch of other files are saved, having the same file name but different extensions. These are auxiliary files required to properly work with the .shp shapefile. **If you ever want to share or move a shapefile,** **you must zip all the auxiliary files and .shp file together**. Think of them as a package deal!
To save raster data with `terra` use `writeRaster()`
```{r}
writeRaster(co_elevation_crop, "data/poudre_elevation.tif")
```
Same as with the vector data, when saving raster data you **must add the '.tif' file extension** to the name. There are various formats raster data can be stored as (e.g., ASCII, ESRI Grid) but GeoTiffs are the most common and generally easiest to deal with in R.
### 3.2 .RData Files
Another way you can store data is saving your environmental variables as R Data objects. You may have already seen '.RData' files in your folders before if you ever click 'yes' when closing out of RStudio asks you to save your workspace. What this does is save everything in your environment to a file with a '.RData' extension in your project directory, and then every time you open your project it reloads everything that was in the environment. This however is often poor practice, as it prevents you from writing reproducible code and all those variables start racking up storage space on your computer. We recommend changing this setting by going to Global Options and under 'Workspace' set 'Save workspace to .RData on exit' to '**Never**'.
However, there are times you may want to save your variables as R files, such as when you have a set of variables you want to quickly re-load at the beginning of your session, or some files that are pretty large in size which is often the case with spatial data (R object files are much smaller). **You can save single *or* multiple variables to an .RData file, or single variables to an .RDS file**.
Since the `poudre_hwy` and `poudre_points_prj` were objects you created in this session, to avoid the need to recreate them you can save them to an .RData file with `save()` :
```{r}
save(poudre_hwy, poudre_points_prj, file = "data/poudre_spatial_objects.RData")
```
Note that you must add the 'file =' to your second argument.
Now to test out how .RData files work, remove them from your environment with `rm()` (*be careful with this function though, it is permanent!*) and load them back in with `load()`
```{r}
rm(poudre_hwy, poudre_points_prj)
```
See they are no longer in your Environment pane, but after you load the .RData file back in, it loads in those two objects with the same environmental names they were given when you saved them.
```{r}
load("data/poudre_spatial_objects.RData")
```
Note that `terra` objects don't properly save to .RData files, but there is a work around if you save a single `terra` object as an .RDS file with `saveRDS`. Here is that workflow, there is just a second step to 'unpack' the loaded .RDS object with `rast()`.
```{r}
saveRDS(co_elevation_crop, "data/poudre_elevation.RDS")
```
```{r}
readRDS("data/poudre_elevation.RDS") %>% rast()
```
Note that with .RDS files you must assign the loaded file to a new environmental variable (unlike with .RData that returns the objects with the exact names they had before).
### 3.3 Reading Spatial Data
To read in shapefiles, you use `read_sf()` . If you saved the `poudre_hwy` shapefile in the steps above, you can load it back into your environment like this:
```{r}
read_sf("data/poudre_hwy.shp")
```
Notice that when reading shapefiles into R you only specify the file with the '.shp' extension, and don't need to pay much attention to any of those auxiliary files. As long as all the other auxiliary files are saved in that same folder, it will read in the shapefile correctly, but if you are missing any then the .shp file becomes useless on its own.
To read in raster files you use the `rast()` function and file path with the appropriate file extension
```{r}
rast("data/poudre_elevation.tif")
```
**Remember when reading in files you will want to assign them to a new variable name with `<-` to keep them in your environment**.
## 4. Exercises
1. **Explore the use of `extract` from the `terra` package by running `?terra::extract`. (Note we need to specify `terra::` because 'extract' is a function name in multiple packages we may have loaded in our session).**
**How would you extract the elevation at each of the three points in `poudre_points_prj` ? (2 pts)**
```{r}
```
2. **Choose your favorite state (other than Colorado). For that state, carry out the following tasks: (8 pts)**
Import the county boundaries for your state:
```{r}
```
Import elevation for your state (using your new counties object as the extent/bounding box and set `z = 7`):
```{r}
```
Create an interactive map of your state counties and the elevation layer underneath (*note:* use `?qtm` to see the argument options for `fill =` to draw only the county borders, i.e. remove the fill color).
```{r}
```
Choose a single county within your state county object, and crop your elevation layer to the extent of that county (*note:* use `filter()` to create an object of just your selected county that you want to crop to). Follow the steps above we used to crop `co_elevation` to the poudre hwy.
```{r}
```