Skip to content

n-a-gilbert/multispecies_data_integration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A multispecies hierarchical model to integrate count and distance sampling data

Data/code DOI: DOI

Please contact the first author for questions about the code or data: Neil A. Gilbert ([email protected])


Abstract

Integrated community models—an emerging framework in which multiple data sources for multiple species are analyzed simultaneously—offer opportunities to expand inferences beyond the single-species and single-data source approaches common in ecology. We developed a novel integrated community model that combines distance sampling and single-visit count data; within the model, information is shared among data sources (via a joint likelihood) and species (via a random effects structure) to estimate abundance patterns across a community. Parameters relating to abundance are shared between data sources, while the model specifies separate observation processes for each data source. Simulations demonstrated that the model provided unbiased estimates of abundance and detection parameters even when detection probabilities vary between the data types. Simulations also showed that the integrated community model tended to provide more accurate and more precise parameter estimates than alternative single-species and single-datastream models. We applied the model to datasets on 11 herbivore species from the Masai Mara National Reserve, Kenya, and found considerable interspecific variation in response to local wildlife management practices: five species showed higher abundances in a region with passive conservation enforcement (median across species: 4.5x higher), three species showed higher abundances in a region with active conservation enforcement (median: 3.9x higher), and the remaining three species showed no abundance differences between the two regions. Furthermore, the hierarchical structure of the model revealed that the community average of abundance was slightly higher (posterior mean: by 0.20 animals) in the region with active conservation enforcement, but this difference was not statistically significant. Future applications of this modeling framework should consider the circumstances under which data integration is appropriate given assumptions about shared abundance patterns between data sources.

alt text

Repository Directory

code: Contains code for preparing case study data, running case study model, and simulations

data: Contains data for case study

  • Shapefiles Various shapefiles.

    • DS Shapefiles for distance sampling transects.
    • Transects Shapefiles for transects where count data was collected.
    • reserve Shapefile for reserve / management zone boundaries.
  • Herbivore Utilization Complete.csv Unformatted distance sampling data.

  • count_data_v01.RData - Formatted count data. This .RData file contains 1 object:

    • transect_data. A dataframe with the following columns:

      Variable name Meaning
      transect Transect name
      sp_name Common name of species
      date Date of survey
      sp Species id
      site Site (transect} id
      rep Visit id
      count Count of the total number of individuals of a species observed on a survey
      area Area offset for transect
      region Binary variable indicating Mara (0) or Talek (1) region
  • distance_sampling_data_v01.RData Formatted distance sampling data. This .RData file contains 3 objects:

    • b. A scalar, the maximum distance to which animals are counted (1000 m).

    • mdpt. A vector, the distance (in m) to the midpoint of each distance bin from the transect line.

    • v. A scalar, the width (in m) of the distance bins.

    • final2. A dataframe with the following columns:

      Variable name Meaning
      sp Species id
      site Site (transect) id
      rep Visit id
      gs Observed group size
      dclass Distance class (1 through 40) of observed group
      ng Observed number of groups for species x site x rep combo
      area Area offset for transect
      region Binary variable indicating Mara (0) or Talek (1) region
      date Date of survey
      sp_name Common name of species
  • tblPreyCensus_2012to2014.csv Unformatted count data.

figures Contains figures, and code to create them.

results Contains results files.

  • herbivore_case_study_results_v01.RData Model output for Mara herbivores case study. This .RData contains 4 objects

    • constants. A list of constants used in Nimble model:

      Variable name Meaning
      NSPECIES Number of species
      NBINS Number of distance bins (distance sampling data)
      NBINS_C Number of distance bins for latent detection function for count data
      NDISTANCES Number of distance observations
      NSURVEYS Number of distance sampling surveys
      NCOUNTS Number of count surveys
      SP_GS Species index for the distance data
      SP_NG Species index for the abundance data (distance sampling)
      SP_TC Species index for the count data
      REGION_NG Region index for the abundance data
      REGION_TC Regon index for the count data
      REGION_GS Region index for the distance data
      NREGION Number of regions
    • data. A list of data used in the Nimble model:

      Variable name Meaning
      MIDPOINT Distance to the midpoint of each distance bin
      DCLASS Observed distance class
      B_DS Maximum distance to which animals are counted for distance sampling
      B_TC Maximum distance to which animals are counted for counts
      V Width of distance bins
      yN_DS Observed count of animals (distance sampling
      yN_TC Observed count of animals (counts)
      OFFSET_DS Area offset for distance sampling transects
      OFFSET_TC Area offset for count transects
      MASS Body mass of each species
    • out. A list of the MCMC chains with the posterior samples for model parameters.

    • model.code. Code for the Nimble model.

  • cc.RData Simulation results for community count-only model. This .RData contains one dataframe named cc, with the following variables:

    Variable name Meaning
    model Model identifier, here "cc" (for count community)
    simrep Replicate simulation
    param Name of parameter
    sp Species identifier
    nobs Total number of individuals for that species counted across sites
    truth True value of parameter
    mean Posterior mean of parameter estimate
    sd Posterior standard deviation of parameter estimate
    2.5% Lower bound of 95% credible interval for estimate
    97.5% Upper bound of 95% credible interval for estimate
    Rhat Convergence diagnostic for parameter
  • dc.RData Simulation results for community distance-sampling-only model. This .RData contains one dataframe named dc, which has the same variable names as cc (see above)

  • ic.RData Simulation results for community integrated model. This .RData contains one dataframe named ic, which has the same variable names as cc (see above)

  • cs.RData Simulation results for single-species count-only model. This .RData contains one dataframe named cs, which has the same variable names as cc (see above)

  • ds.RData Simulation results for single-species distance-sampling-only model. This .RData contains one dataframe named ds, which has the same variable names as cc (see above)

  • is.RData Simulation results for single-species integrated model. This .RData contains one dataframe named is, which has the same variable names as cc (see above)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages