complex_workflow.Rmd

---
title: "A Not so Simple Workflow - SDM Version"
author:
  - name: "Nora Schlenker"
    institute: [uwiscgeog]
    correspondence: false
    email: nschlenker@wisc.edu
    orcid_id: 0000-0002-3693-5946
  - name: "Simon Goring"
    institute: [uwiscgeog,uwiscdsi]
    correspondence: true
    email: goring@wisc.edu
    orcid_id: 0000-0002-2700-4605
  - name: "Socorro Dominguez Vidaña"
    institute: [rhtdata]
    correspondence: false
    email: s.dominguez@ht-data.com
    orcid_id: 0000-0002-7926-4935
institute:
  - htdata:
    name: "HT Data"
  - uwiscgeog:
    name: "University of Wisconsin -- Madison: Department of Geography"
  - uwiscdsi:
    name: "University of Wisconsin -- Data Science Institute"
date: "`r Sys.Date()`"
output:
  html_document:
    code_folding: show
    fig_caption: yes
    keep_md: yes
    self_contained: yes
    theme: readable
    toc: yes
    toc_float: yes
    css: "text.css"
  pdf_document:
    pandoc_args: "-V geometry:vmargin=1in -V geometry:hmargin=1in"
dev: svg
highlight: tango
csl: 'https://bit.ly/3khj0ZL'
---

## Building Species Distribution Models from Pollen Data

This RMarkdown document will walk you through the process of:

1. Downloading pollen records from multiple sites -- [Loading Datasets](#loading-datasets)
2. Filtering for specific taxa and taxonomic harmonization -- [Data Filtering](#data-filtering)
3. Filtering and binning for specific time periods
4. Linking to environmental data
5. Performing simple SDMs for different time periods

This work-through is the companion to the Introduction to Neotoma [presentation](https://docs.google.com/presentation/d/1Fwp5yMAvIdgYpiC04xhgV7OQZ-olZIcUiGnLfyLPUt4/edit?usp=sharing).

## Disciplinary Knowledge

Before we start any analysis, we need to recognize that paleoecological data has some important considerations that need to be taken into account. Our "complex" workflow has been simplified to some degree, but, were we to work towards published results, these concepts need to be considered in our work.

### Taxonomies and Morphotaxa

Pollen "taxa" are identified by the structures of the pollen grains themselves, identified on microscope slides. The level of taxonomic resolution (e.g. species, genera, family) in pollen data varies depending on a number of factors, and, in some cases, does not directly align with standard taxonomies (such as the GBIF Backbone). These taxonomic issues generally fall along several main topics:

* **Taxonomic Resolution**: Many pollen taxa can only be identify to family or genus level due to similarity of pollen at finer taxonomic levels. An additional consideration is that although some pollen that can only be identify to a genus level may be able to be classified to a finer scale based if it is the only species within that genus within a geographic area, however, the pollen may only be identified to the genus level.
* **Analyst Experience**: The technology and skill available to the pollen counter can vary, both as a result of the overall experience and quality of equipment, but also based on where and what kinds of pollen they have identified in the past. An expert in western North American pollen taxa may have skill in this region, but might identify pollen to a lower taxonomic resolution when identifying pollen from Australia.
* **Naming conventions**: Just as analysts have differences in identification, the ways taxa are named varies by analyst and over time. For example, "Poaceae" and "Poaceae undiff." generally refer to the same pollen type but are recorded as different taxa within the database.
* **Uncertainty**: Neotoma uses a [well-documented system for indicating uncertainty](https://open.neotomadb.org/manual/taxonomy-related-tables-1.html#Taxa) in identification. Depending on the research question, we may accept more or less taxonomic uncertainty in our identification.

Due to this, a required step of working with pollen data is to harmonize taxon across sites. The level of resolution of your final taxon harmonization will depend on your research question and location of study.

### Chronologies and Ages

Chronologies underpin all of our paleoenvironmental analysis and interpretation. Neotoma's data model recognizes the importance of chronologies by providing a data model that supports multiple chronologies per record, and allows new analysts to add their own chronologies to existing records.

Some records in Neotoma are from the 1980s or earlier, and these records may have weak chronologies, relative to modern chronologies. They are built using linear interpolation, the radiocarbon dates are uncalibrated, and often, there is no uncertainty in the age predictions.

Our understanding of the relationship between ^14^C production in the upper atmosphere and its incorporation into organic material has improved over time, resulting in updates to the INTCAL curve. In addition, our chronology construction models have improved, fom simple linear interpolation to more complex Baysian models like BACON.

For this reason, researchers will often undertake chronology re-construction early on in their analysis to ensure there is a standardized approach to building the models.

### Age Uncertainty

Although each pollen sample is associated with an age (based on the age-depth model/chronology) it is also associated with uncertainty around that age. The amount of uncertainty varies depending on the number and type of dated material (i.e. radiocarbon from macrofossils vs. bulk carbon), the distance of a sample from dated material along the core, and the quality of the overall chronology. Age uncertainty means that pollen samples from two sites with the same estimated age do not neccessarily represent the same moment in time.

As statistical modelling has improved in the sciences, we have seen more rigorous methods for managing age uncertainty in analysis.

## Load Libraries

For this workshop we only need a few packages. We will download multiple records from Neotoma, filter by taxa and time period, and preform simple species distribution models.

We'll be using the R package `pacman` here, to automatically load and install packages. Note that we also use the package rpaleoclim to download the climate data, but since we have pre-downloaded the climate rasters for you, you will not need that package today.

```{r setup}
pacman::p_load(neotoma2, tidyverse, remotes, terra, sf, rpaleoclim)
```

## Loading Datasets

We worked through the process for finding and downloading records using `neotoma2` in the [previous workshop](https://open.neotomadb.org/Current_Workshop/simple_workflow.html). For this exercise we will be pulling European pollen records from Neotoma to build species distribution models (SDMs) for taxa at multiple time periods in the past.

To undertake this analysis we need to:

* Define a region of interest
* Download full records (sites, datasets & samples)
* Filter records for data quality (presence of chronologies, taxa, etc.)

### Define Spatial Domain {.tabset}

There are a large number of records in Europe. Downloading this volume of data from Neotoma can take a long time (~2hrs total) and is computationally expensive for the database itself. We have already completed the data download saved it to our data folder. Under the results you can take a look at all the sites we downloaded. As with the previous example, we can use `loc` parameter along with  [WKT](https://arthur-e.github.io/Wicket/sandbox-gmaps3.html) or [geoJSON](http://geojson.io/#map=2/20.0/0.0) objects, and native `sf` objects.

#### Code

```{r geteurope, message = FALSE, eval = TRUE}
europe <-  '{"type": "Polygon",
             "coordinates": [[
              [ -32.13,66.46],
              [-14.09,36.93],
              [30.16,34.71],
              [35.79,69.17],
              [-32.13,66.46]]]}'
europe_sf <- geojsonsf::geojson_sf(europe)

## The following lines of code are commented out because 
#  we've already run it for you to speed up the process. 
#  Downloading large amounts of data can take a long time.
#
# europe_downloads <- neotoma2::get_datasets(loc = europe,
#      datasettype = "pollen",
#      all_data = TRUE) %>%
#   neotoma2::filter(!is.na(age_range_young)) %>% 
#   get_downloads(all_data = TRUE)
# europe_samples <- samples(europe_downloads)

# saveRDS(europe_downloads, "data/europe_downloads.RDS")
# saveRDS(europe_samples, "data/europe_samples.RDS")

europe_downloads <- readRDS("data/europe_downloads.RDS")
europe_samples <- readRDS("data/europe_samples.RDS")

```

#### Sites Map

```{r geteuropeshow, eval=TRUE, echo = FALSE}

neotoma2::plotLeaflet(europe_downloads) %>% 
  leaflet::addPolygons(map = ., 
                       data = europe_sf, 
                       color = "green")

```

#### Data table

```{r downloadsCode, echo = FALSE}

head(europe_samples, n = 10) %>%
  dplyr::select(age,
    variablename,
    ecologicalgroup,
    value,
    units,
    sitename,
    lat,
    long) %>%
  DT::datatable(data = ., 
                options = list(scrollX = "100%", dom = 't'))
```

## Data Filtering

### Sample Filtering

When we look at pollen data we are looking at counts of pollen data, from a slide, for a particular analysis unit. Along with pollen there can also be measurements of charcoal, records of algal cycts, and other lab-based measurements. In addition, we often look only at terrestrial pollen types, since they generally reflect regional vegetation, as opposed to the pollen of aquatic types that would represent local site conditions.

Even though the datasets are pollen datasets, the `europe_samples` `data.frame` contains data for a number of different measured elements, including things like mandibles, scales and sporangia. In addition, these elements are measured using `r length(unique(europe_samples$units))` different measurement units (`unique(europe_samples$units)`) across `r length(unique(europe_samples$ecologicalgroup))` ecological groups (`unique(europe_samples$ecologicalgroup)`) including diatoms, trees and shrubs, rotifers and insects.

#### Standard Units and Elements

We often use pollen proportions in our analysis, and for this we need to find out how many total pollen grains were counted in each sample.  We also want to make sure we're looking at a common set of taxa across all sites. In this case we will filter for trees and shrubs (ecological group `TRSH`) and upland herbs (ecological group `UPHE`). We will also look for pollen (element type `pollen`) that is measured using the Number of Identified Specimens (units `NISP`).

This makes the filtering rather straightforward:

```{r cutsamples}
europe_cleaner <- europe_samples %>%
  filter(units == 'NISP' & 
         elementtype == 'pollen' &
         (ecologicalgroup %in% c('TRSH', 'UPHE')))
```

This initial filtering removes about `r round(nrow(europe_samples) - nrow(europe_cleaner), -3)` rows from our original dataset.

#### Standardizing Taxa

For the purposes of this analysis we will only be focusing on building models for *Corylus*, *Fagus* and *Picea*.  

Before we can start looking at our focal species we need to do some work with our sample table so that we have all the information we want. We need to do three things at this step: filtering the data to only include our focal taxa, doing taxon harmonization on our focal taxa, and calculating the pollen percent for each focal taxa for each pollen sample.

*Note:* Our taxon harmonization here is very simple and more thought should be put into taxon harmonization.

Our first task is to make a table that we will join with our samples table that tells us the total number of pollen per sample which we will use in our pollen percentage calculations. There are many decisions you can make at this step depending on your research question but here we are going to calculate pollen percent based on the total number of trees, shrubs, and herbaceous groups (ecologicalgroup = TRSH and UPHE). This is common for may studies that focus on these ecological groups but know that you can choose other combinations and this will effect how you interpret your results.  

Then we can combine taxon filtering and harmonization and pollen percent calculation all in one segment of code. Here we are selecting the taxa *Fagus*, *Picea*, and *Corylus* and harmonizing all variations of those taxa to the genus level. We can use the code below to look at the diversity of identifiers for these three taxa:

```r
europe_cleaner %>%
  filter(stringr::str_detect(variablename, "Fagus|Corylus|Picea")) %>%
  select(variablename) %>% distinct()
```

For our purposes it will be enough to simply standardize all taxa identified as some form of *Picea* to *Picea*, and the same for the *Corylus* and *Fagus* types.  This then lets us right some fairly clean code as follows:

##### Code

```{r replacetaxa, warning = FALSE, message = FALSE}
europe_cleaner <- europe_cleaner %>%
  mutate(variablename = replace(variablename, 
                                stringr::str_detect(variablename, "Fagus"),
                                "Fagus"),
         variablename = replace(variablename,
                                stringr::str_detect(variablename, "Picea"),
                                "Picea"),
         variablename = replace(variablename,
                                stringr::str_detect(variablename, "Corylus"),
                                "Corylus")) %>%
    group_by(datasetid, age, variablename, units, ecologicalgroup, sitename, lat, long) %>%
    summarise(value = sum(value)) %>% as.data.frame()
```

We have to `summarise` at the end to make sure that counts of taxon name variants within a single sample get lumped together properly.  Once we've run this code, we can test out the earlier code block to see if we are now down to three distinct taxa (and we ought to be).

**Note**: We're using `as.data.frame()` here to speed up operations below. Leaving this object as a `grouped_df` seem to seriously impact performance.

### Age Filtering

The age ranges for these records may be broad. Some European records go back over 100,000 years before present, and some samples (and datasets) appear to have no chronologies. We can check the ages using the columns `age`, `ageolder` and `ageyounger`. Note that many chronologies have been entered without uncertainty measures, in which case `ageolder` and `ageyounger` would be unreported. Otherwise these values represent ±1 standard deviation.

We will be building our SDMs for three critical time periods in the time since the last glaciation:

| time period | age range (ybp) | label |
| ----------- | --------------- | ----- |
| Modern      |  -76 - 150      | `Modern` |
| Younger Dryas | 11,700 - 12,900 | `YD` |
| Last Glacial Maximum | > 21,000 | `LGM` |

To prepare the data for analysis we will create a new column using `mutate`, based on our time bins, and filter out any records that are not within our defined periods of study:

#### Code
```{r agecleaning}
europe_ready <- europe_cleaner %>%
  mutate(timebins = case_when(age >= -72 & age <= 150 ~ "Modern",
                              age >= 11700 & age <= 12900 ~ "YD", 
                              age >= 20000 & age <= 22000 ~ "LGM",
                              .default = NA)) %>%
    filter(!is.na(timebins)) %>%
  as.data.frame()
```


###

Our data is now ready for further analysis. We've cleaned the taxonomy for our three focal taxa, ensured all data are of a similar type and measured with similar units, and we've made sure that our time bins are clearly defined.

*Note:* We do not attempt to update age-depth models and we filter out all samples that do not have age estimates when some of them may be able to be included under closer inspection. Additionally, we do not take into account time uncertainty which is another aspect that should be considered when filtering by time.

### Calculating Pollen Sums, Percentages, and Presence {.tabset}

From here we need to calculate the pollen sum for each sample in our set of samples. This means that we will get the total number of pollen grains counted for e.g., Kansjön at depth 445cm in the core. We can calculate both the pollen sum and the proportion at the same time by using `group_by()` and `ungroup()` to first sum the `value` fields (the counts) and then calculate the proportion from our new `totalpollen` field:

```{r}
europe_subset <- europe_ready %>%
  group_by(datasetid, age, sitename, lat, long, timebins) %>%
  mutate(totalpollen = sum(value)) %>%
  ungroup() %>%
  mutate(polprop = round(value / totalpollen, 2)) %>% #calculates pollen proportion of total pollen counted per sample
  mutate(presence =  case_when(polprop > .01 ~ 1,
                              .default = 0)) %>% #calculates presence of a taxa when it has >1% pollen proportion 
  as.data.frame()
```

#### Data Table

```{r filtertaxaShow, eval = TRUE, message = FALSE, echo = FALSE}
europe_subset[1:10,] %>% 
  DT::datatable(data = ., 
                options = list(scrollX = "100%", dom = 't'))
```

#### Samples by Time Period

```{r agecleaningshow, echo=FALSE, warning=FALSE, message=FALSE}
europe_ready %>% 
  select(variablename, timebins) %>%
  filter(variablename %in% c("Fagus", "Corylus", "Picea")) %>%
  mutate(timebins = factor(timebins, levels = c("Modern", "YD", "LGM")), 
         taxa = factor(variablename, levels = c("Fagus", "Corylus", "Picea"))) %>%
  ggplot(aes(timebins)) + geom_bar(stat = "count") + 
  geom_text(stat='count', aes(label=after_stat(count)), vjust=-.5) +
  facet_wrap(~taxa, ncol = 3) +
  ylab("Count of Samples") + xlab("Time Bins") + theme_minimal()
```

###

## Loading Climate Data {.tabset}

There is no climate data associated with pollen data downloaded from Neotoma but you can utilize climate data that is available online to intersect with the locations of pollen samples. For this we will be using climate data from [Paleoclim](http://www.paleoclim.org/) which compiles climate data from present to the Pliocene from various sources. We will be using their modern and LGM climate data from [CHELSA](https://chelsa-climate.org/) and Younger Dryas climate data from [Fordham  et al., 2017](https://nsojournals.onlinelibrary.wiley.com/doi/full/10.1111/ecog.03031). 

### Code

```{r downloadclims, message = FALSE, eval = TRUE}
# europe_clim_mod <- paleoclim("cur", "5m", region = terra::ext(europe_sf))
# europe_clim_yd <- paleoclim("yds", "5m", region = terra::ext(europe_sf))
# europe_clim_lgm <- paleoclim("lgm", "5m", region = terra::ext(europe_sf))
# 
# saveRDS(europe_clim_mod, "data/europe_clim_mod.RDS")
# saveRDS(europe_clim_yd, "data/europe_clim_yd.RDS")
# saveRDS(europe_clim_lgm, "data/europe_clim_lgm.RDS")

europe_clim_mod <- readRDS("data/europe_clim_mod.RDS")
europe_clim_yd <- readRDS("data/europe_clim_yd.RDS")
europe_clim_lgm <- readRDS("data/europe_clim_lgm.RDS")

```

### Modern

```{r downloadclimsShow, echo = FALSE, message = FALSE, eval = TRUE}
plot(europe_clim_mod[["bio_1"]]/10, main = "Modern Mean Annual Temp, Europe")

```

### Younger Dryas

```{r downloadclimsShow2, echo = FALSE, message = FALSE, eval = TRUE}
plot(europe_clim_yd[["bio_1"]]/10, main = "Younger Dryas Mean Annual Temp, Europe")

```

### Last Glacial Maximum

```{r downloadclimsShow3, echo = FALSE, message = FALSE, eval = TRUE}
plot(europe_clim_lgm[["bio_1"]]/10, main = "LGM Mean Annual Temp, Europe")

```

## Extracting Environmental Variables {.tabset}

Now that we have cleaned taxa data and have loaded in climate data we can extract environmental variables for each pollen sample based on the lat/long of the collection unit. Below is a function that will extract the [19 biocliamte variables](https://chelsa-climate.org/bioclim/) from the climate rasters.

### Extracting

```{r extractclims, message = FALSE, eval = TRUE}
#First lets save the projection information from our climate data to use later when creating our species points data
proj_data <- crs(europe_clim_mod)

extract_clims <- function(taxa_data, clim_data, time_text, taxa_text, projection = proj_data) {
  time_cut <- dplyr::filter(taxa_data, timebins == time_text)
  time_sf <- st_as_sf(time_cut, coords = c("long", "lat"), crs = projection)
  time_sf_vect <- vect(time_sf$geometry)
  time_sf_clims <- terra::extract(clim_data, time_sf_vect)
  time_sf_full <- cbind(time_sf, time_sf_clims)
  dplyr::filter(time_sf_full, variablename == taxa_text)

}

## Fagus
fagus_mod <- extract_clims(taxa_data = europe_subset, clim_data = europe_clim_mod, 
                             time_text = "Modern", taxa_text = "Fagus", projection = proj_data)
fagus_yd <- extract_clims(taxa_data = europe_subset, clim_data = europe_clim_yd, 
                             time_text = "YD", taxa_text = "Fagus", projection = proj_data)
fagus_lgm <- extract_clims(taxa_data = europe_subset, clim_data = europe_clim_lgm, 
                             time_text = "LGM", taxa_text = "Fagus", projection = proj_data)
```

### Fagus E-Space 1D

```{r extractclimsshow1, echo = FALSE, message = FALSE, eval = TRUE, warning = FALSE}

rbind(fagus_mod, fagus_yd, fagus_lgm) %>%
  select(timebins, polprop, bio_1, bio_12, bio_4, bio_15, geometry) %>%
  pivot_longer(-c(timebins, polprop, geometry)) %>%
  mutate(timebins_fact = factor(timebins, levels = c("Modern", "YD", "LGM"))) %>%
  ggplot(aes(value, polprop)) + geom_point() +
  facet_grid(timebins_fact~name, scales = "free", switch = "x") +
  xlab("") +
  ylab("Pollen Percent")+
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

```

### Fagus E-Space 2D

```{r extractclimsshow2, echo = FALSE, message = FALSE, eval = TRUE, warning = FALSE}

rbind(fagus_mod, fagus_yd, fagus_lgm) %>%
  select(timebins, polprop, bio_1, bio_12, bio_4, bio_15, geometry) %>%
  mutate(timebins_fact = factor(timebins, levels = c("Modern", "YD", "LGM"))) %>%
  ggplot(aes(bio_1, bio_12, size = polprop)) + geom_point() +
  facet_wrap(~timebins_fact, ncol = 3, scales = "fixed", switch = "x") +
  xlab("Mean Annual Temperature") +
  ylab("Annual Precipitation")+
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45))

```

## Exercise 1 - Predicting Modern Presence {.tabset}

For our first exercise we will be exploring how pollen data can be used to predict suitable climates for taxa under current conditions. For this first example we will compare two models, one which uses pollen proportion as input to predict pollen abundance and another which uses the presence only data to predict likelihood of occurrence using a binomial distribution. 

### Code

```{r exercise1, message = FALSE, eval = TRUE}
# Create a GLM using pollen proportion data using the gaussian distribution 
fagus_mod_glm <- glm(polprop ~ bio_1 + bio_12, data = fagus_mod, family = gaussian)
fagus_mod_glm_predict <- predict(c(europe_clim_mod[["bio_1"]], europe_clim_mod[["bio_12"]]), fagus_mod_glm)

# Create a GLM using presence only data using the binomial distribution 
fagus_mod_glm_pres <- glm(presence ~ bio_1 + bio_12, data = fagus_mod, family = binomial)
fagus_mod_glm_pres_predict <- predict(c(europe_clim_mod[["bio_1"]], europe_clim_mod[["bio_12"]]), fagus_mod_glm_pres)

```

### Proportion Model Fit
```{r excersise1show1, echo = FALSE, message = FALSE, eval = TRUE}

summary(fagus_mod_glm)


```

### Presence Model Fit
```{r excersise1show2, echo = FALSE, message = FALSE, eval = TRUE}

summary(fagus_mod_glm_pres)

```

### Maps
```{r excersise1show3, echo = FALSE, message = FALSE, eval = TRUE, out.width = "50%"}

plot(fagus_mod_glm_predict, main = "Predicted Fagus Pollen Percent, Modern")
plot(fagus_mod$geometry, add = TRUE)

plot(fagus_mod_glm_pres_predict, main = "Predicted Fagus Presence, Modern")
plot(fagus_mod$geometry[which(fagus_mod$presence == 1)], add = TRUE)


```

##

### Your Task
Using the example above create your own model of modern species presence either using one of our other taxa (*Corylus* or *Picea*), You can also experiment with using different combinations of environmental variables (we have all 19 bioclimatic variables) or even use a different model type!  

When you are finished you can share your results by pasting a screenshot of a figure to our [presentation](https://docs.google.com/presentation/d/1Fwp5yMAvIdgYpiC04xhgV7OQZ-olZIcUiGnLfyLPUt4/edit?usp=sharing) on the Exercise 1 slides. 

## Exercise 2 - Hindcasting Species Presence {.tabset}

Now lets take a look at how we can utilize the power of paleoecological data to create and/or validate distribution models. Here we will use the same model created above for Fagus presence during modern times and hindcast it using the climate data from the LGM. We will then create a new model using the pollon proportion data of pollen at the LGM and predict species presence at the LGM. We can then compare the two models. Although we are not doing this formally here you can also use taxa pollen presence in the past to formally validate models created with modern species-environment relationships. 

### Code

```{r exercise2, message = FALSE, eval = TRUE}
# Create a GLM using pollen proportion data using the gaussian distribution 
fagus_modtolgm_glm_predict <- predict(c(europe_clim_lgm[["bio_1"]], europe_clim_lgm[["bio_12"]]), fagus_mod_glm)

# Create a GLM using presence only data using the binomial distribution 
fagus_lgm_glm <- glm(polprop ~ bio_1 + bio_12, data = fagus_lgm, family = gaussian)
fagus_lgmtolgm_glm_predict <- predict(c(europe_clim_lgm[["bio_1"]], europe_clim_lgm[["bio_12"]]), fagus_lgm_glm)

```

### LGM Model Fit
```{r excersise2show1, echo = FALSE, message = FALSE, eval = TRUE}

summary(fagus_lgm_glm)

```

### Maps
```{r excersise2show3, echo = FALSE, message = FALSE, eval = TRUE, out.width = "50%"}

plot(fagus_modtolgm_glm_predict, main = "Predicted Fagus Pollen Proportion, Modern to LGM")
plot(fagus_lgm$geometry, add = TRUE)

plot(fagus_lgmtolgm_glm_predict, main = "Predicted Fagus Pollen Proportion, LGM to LGM")
plot(fagus_lgm$geometry, add = TRUE)


```

##

### Your Task
Using the example above create your own model of pollen proportions of one of our other taxa (*Corylus* or *Picea*) in one of our past time periods (Younger Dryas and Last Glacial Maximum). You can also experiment with using different combinations of environmental variables (we have all 19 bioclimatic variables) or even use a different model type! 

When you are finished you can share your results by pasting a screenshot of a figure to our [presentation](https://docs.google.com/presentation/d/1Fwp5yMAvIdgYpiC04xhgV7OQZ-olZIcUiGnLfyLPUt4/edit?usp=sharing) on the Exercise 2 slides. 


## Exercise 3 - Choose your own adventure!

Now that we have walked through getting your data from the Neotoma Paleoecology Database, correlating pollen samples with environmental variables, and explored a few ways that you can use the data you can now experiment with out types of questions we can answer with this data:

* Do SDMs created for past climates accurately predict where species are today?
* Do pollen percent or presence of taxa occupy different portions of climate space through time? 
* Many more!

Now it is your turn to come up with a question that can be answered by our data and create code to answer your question. When you are finished you can share your results by pasting a screenshot of a figure to our [presentation](https://docs.google.com/presentation/d/1Fwp5yMAvIdgYpiC04xhgV7OQZ-olZIcUiGnLfyLPUt4/edit?usp=sharing) on the Exercise 3 slides.

## Summary

From this notebook we have learned how to:

1. Downloading pollen records from multiple sites
2. Filtering for specific taxa and taxonomic harmonization
3. Filtering and binning for specific time periods
4. Linking to environmental data
5. Performing simple SDMs for different time periods

This approach is very simple workflow for taking records from many sites and combining them into species distribution models to compare species distributions at different periods of time.

### Other Helpful R Packages

These packages may be helpful to you for future paleoecological analysis:

* [RFossilpoll](https://hope-uib-bio.github.io/FOSSILPOL-website/) - package to help streamline the data downloading and filtering of paleoecological data from Neotoma and other resources

* [R-Ratepol](https://hope-uib-bio.github.io/R-Ratepol-package/) - package to calculate rate of change from community time series data

* [rpaleoclim](http://www.paleoclim.org/) - package to facilitate direct download of high resolution paleo-climate data