02-choosing_effect_size.Rmd

# Choosing an effect size

The effect size is the back bone of meta-analysis as all other aspects of this book will branch out from initial effect size calculations that we make. 

When we talk about effect sizes we make the distinction between **measurement effect sizes** that are calculated based on means and standard deviations for a pair of treatment and control sites (there will be many of these in a meta-analysis) and a **summary effect size** (there will be one of these in the typical meta-analysis).  The summary effect size is the statistical aggregation of many measurement effect sizes.

There are many helpful texts and reviews that lay out the benefits and costs to using the many effect sizes that can be found in meta-analytic research.  One of the most popular packages for ecological meta-analysis, `metafor` allows for calculation of at least 15 effect sizes just by changing a few letters of you R code.  Please see these resources for more information on effect size options [@koricheva_handbook_2013, @schwarzer_meta-analysis_2015]. 

Within the `metafor` package the `escalc` function handles calculating the measurement-level effect size calculations. Essentially the function takes in pairs data from both treatment (abbreviated by a "t") and control sites (abbreviated with a "c"). The paired data typically represents sample means ($\bar{x_t}$ and $\bar{x_c}$), standard deviations ($s_t$ and $s_c$), and sample sizes ($n_t$ and $n_c$). Using these paired data, `escalc` will output a single effect size and measurement of variance for each row (i.e., paired measurement in your database).  

Here's how the paired data appear in our example database. Take a look at the table below and scroll to the right. You'll find columns that contain all of the quantitative information that is necessary for calculating an effect size for each pair of measurements or rows in the database. The column `mean_contol` has the mean species richness at uninvaded sites, the next column, `sample_size_control` provides the number of control study sites, however the authors defined "study site" within their article. You will be able to locate $\bar{x}$, $s$, and $n$ for every invasive species included in the database.  When we are missing some necessary metric, this is most often standard deviation, we can use methods to impute missing values, however, these will be covered in a later chapter.  

```{r}
require(DT) #This package allows for scrollable tables in R Markdown
datatable(meta_analysis_data, fillContainer = TRUE)
```

Now, let's see how we can use these data and the `escalc` function in the `metafor` package to calculate measurement-level effect sizes.  

## Standardized Mean Difference (aka Hedges' g)  

The Standardized Mean Difference (SMD) was one of the first effect sizes used in published meta-analyses [@hedges_statistical_1985] and it remains widely used in both ecological meta-research [@Crystal-Ornelas2021-hs; @zavorka_negative_2018; @doherty_reptile_2020]. There are many ways to calculate the SMD for a dataset, the one we will use here, and the default for the `metafor` package is called Hedge's g. Hedges' g is a useful effect size in ecological meta-analysis because it statistically corrects for variance that may be introduced when sample sizes are small [@hedges_distribution_1981].  

Another benefit of using Hedges' g as an effect size is that by having a "standardized" effect we can synthesize data that were measured on a different *scales*. The term scales here depends on your research question and systems. For example, we would likely not want to directly pool the raw data on species richness from many different articles because the articles likely vary in their exact definition of how they quantified species richness (i.e., did authors measure richness of all native tree species or richness of native tree and herbaceous plants?). The Hedges' g effect size converts all of these richness measurements to a common unitless scale for each study, so that eventually pooling of the data together in a meta-analytic framework is more appropriate. The Hedges' g calculations in `metafor` also take into account sample size for treatment and control groups as well as measures of variation for both groups.  

Below, we include the code necessary to calculate Hedges' g  for the invasive tree dataset included in this book.  

```{r}
require(metafor) # Make sure the metafor package is loaded
SMD_effect_sizes <-
  escalc( # This is the function in metafor that allows us to calculate an effect size for each row in a database
    "SMD",
    # Specify the effect size we want to calculate. In this case SMD for Standardized mean difference
    m1i = mean_invaded,
    # mean richness at invaded sites
    n1i = sample_size_invaded,
    # invaded site sample size
    sd1i = SD_invaded,
    # invaded site SD
    m2i = mean_control,
    # mean richness at control sites
    n2i = sample_size_control,
    # control site sample size
    sd2i = SD_control,
    # control site SD
    data = meta_analysis_data # This is where the escalc function can find all the data for our meta-analysis
  )
```


Now, take a look at the last two columns in the new dataframe we've created. One of these `yi` represents the new effect size calculated for each row in the database.  The other column, `vi` is the variance for each effect size.  

```{r}
datatable(SMD_effect_sizes, fillContainer = TRUE)
```
  
See the `yi` value for the first row. This is a study [@ellis_surface-active_2000] that investigated how arthropod richness changes in different types of vegetation along rivers in New Mexico. They included sites with stands of introduced saltcedar (treatment) and sites without saltcedar (control) and counted up the number of species of arthropods at both sites.  The `yi` in this first row is the calculated value for Hedges' g for this study.  The value of -0.62 indicates that invasive sites had lower species richness than control sites.  We know this because the data assigned to "position #1" in our effect size calculations (see R code above) is all of data that has to do with invaded sites.  If we switch up the code and assign control sites and associated data to "position #1" in our effect size code above, we would expect a +0.62 effect size. The value for `vi` shows the variance calculated for this effect size estimate.  

The next chapters in this book all rely on the values for `yi` and `vi` that we just calculated to generate an overall, single, effect size to summarize findings from the many studies included in this database.  

```{r, echo = FALSE, fig.align="center", out.width="400px", fig.cap="Flowering tamarisk. Photo credit: Teun Spaans, CC BY-SA 3.0"}
knitr::include_graphics("https://upload.wikimedia.org/wikipedia/commons/4/44/Tamarix_gallica_bloemen.jpg")
```

## Response Ratio (aka Ratio of Means)  

The next effect size we'll discuss is called the Response Ratio (RR). It's another frequently used effect size in both ecological and medical meta-analysis [@koricheva_handbook_2013; ]. This effect size is most often used when the effects being compared both have positive signs or both have negative signs. For example, in our invasion impact dataset, but values from the invaded and control sites will have positive signs since richness values (the average number of species at a site or across several sites) can't be negative. 

Using our example data, an effect size is calculated for each paired richness measurements, and it is a ratio of the average richness at invaded sites in one article $\bar{x_i}$ over the average richness of sites without the focal invasive species $\bar{x_c}$. The mathematical representation of the response ratio is relatively straightforward:

<font size="5"> $RR = \frac{\bar{x_i}}{\bar{x_c}}$ </font>  

Because in statistics, fractions can present challenges for creating models based on a normal distribution, meta-analysts typically convert the RR fraction to the natural log of the RR prior to making any statistical calculations.

<font size="5"> $\ln(RR) = ln(\frac{\bar{x_i}}{\bar{x_c}})$ </font>  

As you will see in later calculations, a positive value for $\ln(RR)$ means that the numerator of the response ratio is larger than the denominator, and a negative value would indicate that the denominator is larger than the numerator. In the context of our example data, a positive $\ln(RR)$ suggest that invasive species increase richness where they are found, and negative $\ln(RR)$ suggests that invasive species decrease richness.  

One benefit of using the Response Ratio is that $\ln(RR)$ can quickly be backtransformed to the $RR$ to provide % increases or decreases in richness with invasive species.

You can find many examples of ecological meta-analyses that use the response ratio as an effect size from a wide variety of disciplines. For example there are meta-analyses using the Response Ratio that investigate how cover crops change biomass on farms [@thapa_biomass_2018], how bee density influences crop production [@rollin_impacts_2019], bat activity in wet vs. dry Australian landscapes [@blakey_importance_2018], and how changes in natural habitat affects reptile abundance [@doherty_reptile_2020].

```{r}
RR_effect_sizes <- escalc( # Function for calculating effect sizes.
    "ROM",
    # Specify the effect size we want to calculate. In this case ROM for the Ratio of Means or Response Ratio
    m1i = mean_invaded,
    # mean richness at invaded sites
    n1i = sample_size_invaded,
    # invaded site sample size
    sd1i = SD_invaded,
    # invaded site SD
    m2i = mean_control,
    # mean richness at control sites
    n2i = sample_size_control,
    # control site sample size
    sd2i = SD_control,
    # control site SD
    data = meta_analysis_data # This is where the escalc function can find all the data for our meta-analysis
  )
```

One limitation to using the $RR$ is that the response ratio cannot be calculated if the either mean effect of the control or treatment group is zero. However, some researchers suggest substituting the 0 value for a very small non-zero value (e.g., 0.1) so that the response ratio can still be used for the meta-analysis [@thapa_cover_2018]. However, another recent meta-analysis opted to use Hedges' g as an effect size precisely because some of their data had a mean effect size of 0 [@doherty_reptile_2020].  

Going forward, we illustrate meta-analytic models using the Response Ratio as an effect size for two main reason. First, we did not have richness values of zero in our dataset and so our data could seamlessly work with the response ratio. Second, we wanted to back-transform our $\ln(RR)$ values to provide land managers with an average percent increase or decrease in richness with the presence of invasive species.