5. Multi-species occupancy models

library(camtrapR)
library(purrr)
library(DT)
library(knitr)
library(ggplot2)
library(sf)

## Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE

Overview

Multi-species occupancy models are a powerful tool for combining information from multiple species to estimate both individual and community-level responses to environmental variables. These models are typically fit using Bayesian frameworks such as BUGS, JAGS, or Nimble. However, preparing the necessary input from camera trapping data, writing the model code, setting parameters to monitor, and providing initial values can be a daunting and time-consuming task, even for experienced users.

To simplify this process, a new extension of camtrapR automatically generates custom community occupancy models for analysis in JAGS or Nimble. This flexible workflow supports:

standard and Royle-Nichols occupancy models
continuous and categorical site covariates on detection and occupancy probability
continuous and categorical observation-level covariates on detection probability
fixed and random effects of all these covariates
independent effects of continuous site covariates between the different species
nested random effects
fixed, independent or random intercepts of detection and occupancy probability
data augmentation (fully open community or known maximum species richness)
species richness estimates
models in JAGS and Nimble
derived quantities, e.g. Bayesian p-values (overall and by species) and species richness estimates

The main function is communityModel(). It automates:

writing model code
setting parameters to monitor
setting initial values
bundling input data

The output is a commOccu object. The workflow provides methods for:

fitting models
plotting marginal effects (response curves)
plotting effect sizes
spatial predictions of species occupancy probabilities
spatial predictions of species richness

The output of fit() is an mcmc.list from the coda package and can be uses as such, so the summary(), plot(), etc. methods are available as well as the functions from bayesplot.

Unlike packages such as unmarked or ubms, the communityModel function does not currently support R formula syntax. Instead, covariates and their effect types are specified through dedicated function arguments. While this non-standard approach may change in the future to support formula syntax, it may be less confusing than the lme4 syntax for random effects.

Warning

A word of caution, please always check the model code and make sure you understand the code and agree with it. While the workflow is designed for convenience and does not require you to examine the model code, it is highly recommended that you do so. The model code contains many comments to facilitate understanding. This workflow still requires a proper understanding of Bayesian modelling frameworks and the concepts behind multi-species occupancy models.

Limitations

Please note that the workflow is still experimental and it is important to carefully review and critically evaluate the results. If you encounter any issues, do not hesitate to report them.

Currently, there are some limitations to the workflow:

priors can not be customized
models with categorical observation-level covariates cannot be compiled in Nimble and must be fit in JAGS instead
nested random effects and random effects of site covariates are still experimental, and the plot_* methods don’t work on these yet
likewise, the output of predict may not be correct for these
input checks are a bit limited and may not always lead to meaningful error messages. This will improve over time with user feedback.
Nimble does not yet seem to have an advantage in processing speed. That might require further customization
independent effects are only supported in continuous site covariates

Over time these will likely be worked on.

Parameter naming convention

The parameter names are assembled from standardized building blocks. The nomenclature is as follows:

Name	Refers to	Description
alpha	Submodel	detection submodel
beta	Submodel	occupancy submodel
0	Intercept	denotes the intercepts (alpha0, beta0)
fixed	Effect type	fixed effects (constant across species)
indep	Effect type	independent effects (separate and independent between species)
ranef	Effect type	random effects (of species and/or other categorical covariates)
cont	Covariate type	continuous covariates
categ	Covariate type	categorical covariates
mean	Hyperparameter	mean of random effect
sigma	Hyperparameter	standard deviation of random effect
tau	Hyperparameter	precision of random effect (used internally, not returned)

For example, a fixed intercept of occupancy (constant across species) is beta0, and a fixed intercept of detection probability is alpha0.

An occupancy probability intercept with a random effect of species is:

beta0.mean community mean of the occupancy probability intercept

beta0.sigma standard deviation of the community occupancy probability intercept.

beta0[1] occupancy probability intercept of species 1 (likewise for other species).

For effects of site covariates, the pattern is:

submodel.effectType.covariateType.CovariateName.hyperparameter

For example:

beta.ranef.cont.habitat.mean is the mean community effect of the continuous site covariate ‘habitat’ on occupancy probability.

beta.ranef.cont.habitat[1] is the effect of continuous site covariate ‘habitat’ on occupancy probability of species 1.

Site-occasion covariates are denoted by .obs after the submodel, e.g.:

alpha.obs.fixed.cont.effort is the fixed effect of the continuous observation-level covariate ‘effort’ on detection probability

Quick example

Below is a brief example using the sample data included in camtrapR. This data set contains only five species at three fictional camera trap stations and uses made-up covariates. As a result, the parameter estimates will not have any ecological significance. This example serves only as a technical demonstration of how community models fit into the camtrapR workflow.

Data preparation

First load the camera trap table and create a camera operation matrix:

 data("camtraps")

 camop_no_problem <- cameraOperation(CTtable      = camtraps,
                                     stationCol   = "Station",
                                     setupCol     = "Setup_date",
                                     retrievalCol = "Retrieval_date",
                                     hasProblems  = FALSE,
                                     dateFormat   = "dmy"
 )

First, we load the record table and create a list of detection histories. While all species are included in this example, you may choose to subset the record table beforehand. It is important to note the includeEffort = TRUE argument, which means that the output for each species will be a list containing both the detection history and effort matrix.

 data("recordTableSample")

 # list of detection histories
 DetHist_list <- lapply(unique(recordTableSample$Species), FUN = function(x) {
   detectionHistory(
     recordTable         = recordTableSample,
     camOp                = camop_no_problem,
     stationCol           = "Station",
     speciesCol           = "Species",
     recordDateTimeCol    = "DateTimeOriginal",
     species              = x,     # this gets modifies by lapply
     occasionLength       = 7,
     day1                 = "station",
     datesAsOccasionNames = FALSE,
     includeEffort        = TRUE,
     scaleEffort          = FALSE,
     timeZone             = "Asia/Kuala_Lumpur"
   )}
 )

 # assign species names to the list items
 names(DetHist_list) <- unique(recordTableSample$Species)

 # note, DetHist_list is a list containing a list for each species

Get the detection history of each species and put into a new list (thereby removing the effort matrix).

 ylist <- lapply(DetHist_list, FUN = function(x) x$detection_history)

Create some made-up covariates (only for demonstration):

 sitecovs <- camtraps[, c(1:3)]
 sitecovs$elevation <- c(300, 500, 600)
 sitecovs[, c(2:4)] <- scale(sitecovs[,-1])   # scale numeric covariates
 sitecovs$fact <- factor(c("A", "A", "B"))    # categorical covariate

Now we bundle the necessary data for communityModel in a list with 3 items:

ylist: the list of detection histories
siteCovs: data frame with site covariates
obsCovs: (named) list with observation level covariates. Requires at least the effort matrix.

 data_list <- list(ylist    = ylist,
                   siteCovs = sitecovs,
                   obsCovs  = list(effort = DetHist_list[[1]]$effort))  # is identical for all species

Now data_list is a list containing the detection histories, site covariates and occasion level covariates.

Please note that it is always necessary to provide an effort matrix. This ensures that the detection probability is set to 0 when there was no effort. This is achieved by creating a binary effort matrix based on the user-provided effort matrix. In the binary effort matrix, all cells with effort = NA are set to 0, while all cells with values are set to 1, regardless of the value of effort (as long as it is not NA). As a result, all occasions with effort = NA in the provided effort matrix will have a detection probability of 0.

This does not automatically include effort as a covariate on detection probability. To include effort as a covariate on detection probability, include “effort” in argument detCovsObservation (you can specify a fixed or random effect of effort on detection probability).

Fit model in JAGS

Now the interesting part. The function communityModel() creates a new class, “commOccu”. It is an object that contains all relevant information for running a community occupancy model. Here we will create a JAGS model (the default). For a model in Nimble, see further below.

# text file to save the model
 modelfile1 <- tempfile(fileext = ".txt")

 mod.jags <- communityModel(data_list,
                            occuCovs = list(fixed = "utm_y", ranef = "elevation"),
                            detCovsObservation = list(fixed = "effort"),
                            intercepts = list(det = "ranef", occu = "ranef"),
                            modelFile = modelfile1)

## Wrote model to C:\Users\NIEDBA~1\AppData\Local\Temp\Rtmpi03Mlp\file255c5441323f.txt

This call defined a model with a fixed effect of utm_y on occupancy probability of all species, and a species-specific (random) effect of elevation on occupancy probability. Furthermore, we specified effort as a covariate on detection probability (constant across species), and there are random effects of species on the intercepts of detection and occupancy probabilities.

Note that throughout this vignette we use a temporary file to save the model text. In real examples you’d probably want to use a persistent file.

One can also see the model code using (not shown here for brevity):

There is a very basic summary method for commOccu objects:

 summary(mod.jags)

## commOccu object for community occupancy model (in JAGS)
## 
## 5 species,  3 stations,  7 occasions
## 21 occasions with effort
## Number of detections (by species): 2 - 11 
## 
## Available site covariates:
##  Station, utm_y, utm_x, elevation, fact 
## 
## Used site covariates:
##  utm_y, elevation 
## 
## Available site-occasion covariates:
##  effort

The model is fit with fit().

 fit.jags <- fit(mod.jags,
                 n.iter = 5000,
                 n.burnin = 2500,
                 chains = 3)

The output of the fit() method is an mcmc.list from the coda package.

Summarize posterior estimates:

fit_summary <- summary(fit.jags)

Statistics of parameter estimates:

# Note, colors may not render correctly in dark themes in RStudio.
DT::datatable(round(fit_summary$statistics, 3))

Quantiles of parameter estimates:

DT::datatable(round(fit_summary$quantiles, 3))

Marginal effect plots (response curves) can be plotted with plot_effects(). Argument submodel defines whether the output is for the detection or occupancy part of the model:

 plot_effects(mod.jags,
              fit.jags,
              submodel = "state")

## $utm_y

## 
## $elevation

 plot_effects(mod.jags,
              fit.jags,
              submodel = "det")

## $effort

Likewise, we can plot effect sizes for easier comparison between species and for easily checking significance with plot_coef():

 plot_coef(mod.jags,
           fit.jags,
           submodel = "state",
           combine = T)

## 'combine' and 'ordered' can't both be TRUE. Setting 'ordered = FALSE'

 plot_coef(mod.jags,
           fit.jags,
           submodel = "det")

## $effort

There are further arguments for the significance levels, sorting species by parameter estimates, and for combining multiple plots.

Models in Nimble

Introduction

NIMBLE is a system for building and sharing analysis methods for statistical models, especially for hierarchical models and computationally-intensive methods. NIMBLE is built in R but compiles your models and algorithms using C++ for speed.

(https://r-nimble.org/)

It allows you to run the same models as BUGS or JAGS, but can be much faster in some situations.

Using Nimble for community occupancy models requires the packages “nimble” and “nimbleEcology”:

library(nimble)
library(nimbleEcology)

Compilation and Rtools

Using nimble only makes sense if one can compile the models (which runs them as fast C++ code). Compiling Nimble code requires Rtools.

You can obtain Rtools from: https://cran.r-project.org/bin/windows/Rtools/.

It then needs to be put in PATH, see here for how to do it: https://cran.r-project.org/bin/windows/Rtools/#putting-rtools-on-the-path.

Depending on the version and where you installed Rtools, you should see something like “C:/rtools40/usr/bin” or “C:/RBuildTools/4.2” somewhere in the output of the following line (for me it’s usually the first or second entry).

Sys.getenv("PATH")

Fit models in Nimble

We can fit the same model in Nimble using the exact same functions as for the JAGS model above. Only set nimble = TRUE.

 modelfile2 <- tempfile(fileext = ".txt")

 mod.nimble <- communityModel(data_list,
                              occuCovs = list(fixed = "utm_x", ranef = "utm_y"),
                              detCovsObservation = list(fixed = "effort"),
                              intercepts = list(det = "ranef", occu = "ranef"),
                              modelFile = modelfile2,
                              nimble = TRUE)      # set nimble = TRUE

## Wrote model to C:\Users\NIEDBA~1\AppData\Local\Temp\Rtmpi03Mlp\file255c7c552f72.txt

It is possible to fit uncompiled models, but it is usually extremely slow and should only be done for checking model structure and expected outputs, using very few iterations.

 fit.nimble.uncomp <- fit(mod.nimble,
                          n.iter = 10,
                          chains = 1)

# the notes and the error message above are harmless

To fit a compiled model, set compile = TRUE:

 fit.nimble.comp <- fit(mod.nimble,
                        n.iter = 20000,
                        n.burnin = 10000,
                        chains = 3,
                        compile = TRUE, 
)

The output as an mcmc.list, just like for the JAGS model above, and can be worked with just the same:

 # parameter summary statistics (not shown)
 summary(fit.nimble.comp)

Response curves (= marginal effect plots):

 plot_effects(object = mod.nimble,
              mcmc.list = fit.nimble.comp,
              submodel = "state")

## $utm_x

## 
## $utm_y

 plot_effects(object = mod.nimble,
              mcmc.list = fit.nimble.comp,
              submodel = "det")

## $effort

Effect size plots:

 plot_coef(mod.nimble,
           fit.nimble.comp,
           submodel = "state",
           combine = TRUE)

## 'combine' and 'ordered' can't both be TRUE. Setting 'ordered = FALSE'

 plot_coef(mod.nimble,
           fit.nimble.comp,
           submodel = "det")

## $effort

Traceplots (not shown)

 plot(fit.nimble.comp)

The estimates and marginal effects are not very pretty, and the model likely requires more iterations to achieve smooth estimates.

More complete example

The example above was very small and unrealistic. We can’t provide a large camera trapping data set in camtrapR, but we can simulate one using the simComm function from the AHMbook package. Using a simulated data set also means we can compare model estimates to the “true” parameters (which were used to create the simulated data).

Prepare data

First, install the AHMbook package if necessary:

install.packages("AHMbook")

library(AHMbook)

Simulate community with simComm(). Here we simulate a community with 20 species, 30 camera trap stations and 5 occasions.

set.seed(10)
com <- simComm(nspecies = 20, 
                nsites = 30,
                nreps = 5,
                mean.psi = 0.25,     # community mean of psi over all species in community (probability scale)  (0.25 = -1.1 on logit scale)
                sig.lpsi = 1,        # SD around intercept of logit(psi)
                mean.p = 0.25,       # community mean of detection probability over all species (0.25 = -1.1 on logit scale)
                sig.lp = 1,          # SD around intercept of logit(p)
                mu.beta.lpsi = 0.5,  # mean (community) effect of habitat on logit(psi)
                sig.beta.lpsi = 0.5, # SD around mean (community) effect of habitat on logit(psi)
                mu.beta.lp = -0.5,   # mean (community) effect of wind on logit(p)
                sig.beta.lp = 1)     # SD around mean (community) effect of wind on logit(p)

We need to wrangle the data a little to convert the data to input for communityModel(). This is only needed here because the data are generated by AHMbook::simComm().

# get continuous site covariate "habitat"
habitat <- com$habitat

# create (fake) categorical covariate (3 levels)
# setting "medium" as reference level
categ <- factor(ifelse(habitat < -0.5, "low", ifelse(habitat < 0.5, "medium", "high")), levels = c("medium", "low", "high"))

# get observation covariate
wind <- com$wind

# list of detection histories
ylist <- purrr::array_branch(com$y.all, 3)

# remove unobserved species  (we leave that to data augmentation)
ylist <- ylist[sapply(ylist, sum) >= 1]

# here, effort was constant at all stations and occasions. Effort is thus a matrix of all 1
effort <- matrix(1, nrow = nrow(ylist[[1]]), ncol =  ncol(ylist[[1]]))

# bundle data
input_AHM <- list(ylist = ylist, 
                  siteCovs = data.frame(habitat = habitat,
                                        categ   = categ),
                  obsCovs = list(effort = effort,
                                 wind = wind))

JAGS Model

Here’s a model in JAGS, with a species-specific effect of habitat on occupancy and wind on detection, and species-specific intercepts.

modelFile_jags <- tempfile(fileext = ".txt")

model1_jags <- communityModel(
  occuCovs = list(ranef = c("habitat")), 
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM,
  modelFile = modelFile_jags)

## Site covariates may not be scaled. Scaling is strongly recommended for numeric covariates.

## Wrote model to C:\Users\NIEDBA~1\AppData\Local\Temp\Rtmpi03Mlp\file255c64bf507.txt

out_ahm_jags <- fit(model1_jags, 
                    n.iter = 10000, 
                    n.burnin = 5000,
                    thin = 5,
                    chains = 3,
                    quiet = T
)

Summary statistics of model parameters:

DT::datatable(round(summary(out_ahm_jags)$statistics, 3))

Since we used simulated data we can compare model estimates with the truth. For example, the true effect of habitat on occupancy was mean = 0.5 (SD = 0.5). See beta.ranef.cont.habitat.mean and beta.ranef.cont.habitat.sigma in the table above for estimates.

The mean effect of wind on detection probability was -0.5 (SD = 1). See alpha.obs.ranef.cont.wind.mean and alpha.obs.ranef.cont.wind.sigma in the table above for estimates.

Bayesian p-values can be inspected to check for lack of fit (overall or by species). Lack of fit at significance level = 0.05 is indicated by Bayesian p-values below 0.025 or greater than 0.975. In the example above, the overall Bayesian p-value (Bpvalue) indicates no problems with lack of fit. Likewise, species-level Bayesian p-values (Bpvalue_species) indicate no lack of fit for any species. This is not surprising because the model uses simulated data.

Plot response curves (marginal effects)

plot_resp_jags_occu <- plot_effects(model1_jags, 
                                    out_ahm_jags)
plot_resp_jags_occu

## $habitat

plot_resp_jags_det <- plot_effects(model1_jags, 
                                   out_ahm_jags,
                                   submodel = "det")
plot_resp_jags_det

## $wind

Plot coefficient estimates:

plot_eff_jags_occu <- plot_coef(model1_jags, 
                                out_ahm_jags)
plot_eff_jags_occu

## $habitat

plot_eff_jags_det <- plot_coef(model1_jags, 
                               out_ahm_jags,
                               submodel = "det")
plot_eff_jags_det

## $wind

By default, species are sorted by effect size. Alternatively, you can have them sorted by species names. Set ordered = FALSE:

plot_eff_jags_occu2 <- plot_coef(model1_jags, 
                                out_ahm_jags,
                                ordered = FALSE)
plot_eff_jags_occu2

## $habitat

Nimble Model

Here’s the same model in Nimble:

modelFile_nimble <- tempfile(fileext = ".txt")

model1_nimble <- communityModel(
  occuCovs = list(ranef = c("habitat")),
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM,
  modelFile = modelFile_nimble,
  nimble = TRUE)

## Site covariates may not be scaled. Scaling is strongly recommended for numeric covariates.

## Wrote model to C:\Users\NIEDBA~1\AppData\Local\Temp\Rtmpi03Mlp\file255c2d792031.txt

We use some more iterations, since these Nimble models tend to require more iterations than equivalent JAGS models to achieve good estimates (numbers are still too low for publication quality though).

# compiled model run
out_ahm_nimble2 <- fit(model1_nimble, 
                       n.iter =20000,
                       n.burnin = 10000,
                       thin = 5,
                       chains = 3,
                       compile = TRUE,
                       quiet = T
)

## Defining model

##   [Note] Using 'y' (given within 'constants') as data.

## Building model

## Setting data and initial values

## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.

## Checking model sizes and dimensions

##   [Note] This model is not fully initialized. This is not an error.
##          To see which variables are not initialized, use model$initializeInfo().
##          For more information on model initialization, see help(modelInitialization).

## Compiling
##   [Note] This may take a minute.
##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## Compiling
##   [Note] This may take a minute.
##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.

## running chain 1...

## running chain 2...

## running chain 3...

DT::datatable(round(summary(out_ahm_nimble2)$statistics, 3))

Plot responses (marginal effects):

plot_resp_nimble_occu <- plot_effects(model1_nimble, 
                                      out_ahm_nimble2)
plot_resp_nimble_occu

## $habitat

plot_resp_nimble_det <- plot_effects(model1_nimble, 
                                     out_ahm_nimble2,
                                     submodel = "det")
plot_resp_nimble_det

## $wind

Plot coefficient estimates:

plot_eff_nimble_occu <- plot_coef(model1_nimble, 
                                  out_ahm_nimble2)
plot_eff_nimble_occu

## $habitat

plot_eff_nimble_det <- plot_coef(model1_nimble, 
                                 out_ahm_nimble2,
                                 submodel = "det")
plot_eff_nimble_det

## $wind

Missing values in detection histories

The examples so far did not have any missing data in the detection histories. Real-world data commonly have missing data though, e.g. due to camera malfunction or specific study designs. In single-species occupancy models (e.g. in unmarked or ubms) this is handled automatically, but requires a little more attention in these multi-species models (at least when fitting with Nimble). Let’s modify the above data set and introduce some NAs in the detection history and the effort matrix:

# introduce a few missing values in detection history of all species
index_na <-  sample(1:length(ylist[[1]]), 10) #
ylist_with_na <- lapply(ylist, FUN = function(x) {x[index_na] <- NA; x})


# set the cells with NA in detection history to NA in effort too
effort_with_na <- effort
effort_with_na[index_na] <- NA

# bundle model input
input_AHM_with_NA <- list(ylist = ylist_with_na, 
                          siteCovs = data.frame(habitat = habitat,
                                                categ   = categ),
                          obsCovs = list(effort = effort_with_na,
                                         wind = wind))

And fit models in JAGS and Nimble:

modelFile_jags_na <- tempfile(fileext = ".txt")

model1_jags_na <- communityModel(
  occuCovs = list(ranef = c("habitat")), 
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM_with_NA,
  modelFile = modelFile_jags_na)

## Site covariates may not be scaled. Scaling is strongly recommended for numeric covariates.

## Wrote model to C:\Users\NIEDBA~1\AppData\Local\Temp\Rtmpi03Mlp\file255c6a523ae0.txt

out_ahm_jags_na <- fit(model1_jags_na, 
                    n.iter = 5000, 
                    n.burnin = 2500,
                    thin = 5,
                    chains = 3,
                    quiet = T
)

Summary statistics of model parameters:

DT::datatable(round(summary(out_ahm_jags_na)$statistics, 3))

As can be seen, the JAGS model handles NAs well and parameter estimates are very similar to the model without missing data above.

Nimble models can technically also be fit with NAs present, but there will be numerous warnings and residuals as well as Bayesian p-values cannot be computed.

modelFile_nimble_na <- tempfile(fileext = ".txt")

model1_nimble_na <- communityModel(
  occuCovs = list(ranef = c("habitat")), 
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM_with_NA,
  modelFile = modelFile_nimble_na,
  nimble = T)

## Site covariates may not be scaled. Scaling is strongly recommended for numeric covariates.

## Wrote model to C:\Users\NIEDBA~1\AppData\Local\Temp\Rtmpi03Mlp\file255c395039c2.txt

When fitting the model there will be numerous warnings about values of data nodes y[,,] and determinisitc nodes res[,], R2[] and R3 being NA (not shown here).

out_ahm_nimble_na <- fit(model1_nimble_na, 
                    n.iter = 5000, 
                    n.burnin = 2500,
                    thin = 5,
                    chains = 3,
                    quiet = T
)

## Defining model

##   [Note] Using 'y' (given within 'constants') as data.

## Building model

## Setting data and initial values

## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.

## Checking model sizes and dimensions

##   [Note] This model is not fully initialized. This is not an error.
##          To see which variables are not initialized, use model$initializeInfo().
##          For more information on model initialization, see help(modelInitialization).

## Compiling
##   [Note] This may take a minute.
##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## Compiling
##   [Note] This may take a minute.
##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.

## running chain 1...

## running chain 2...

## running chain 3...

Parameter summaries cannot be computed and will return an error (not shown here).

DT::datatable(round(summary(out_ahm_nimble2)$statistics, 3))

To alleviate this, it is necessary (for Nimble models) to convert NAs in the detection history to 0, e.g. with:

# replace NAs with 0 in the ylist component of input_AHM (for all species at once)
input_AHM_with_NA$ylist <- lapply(input_AHM_with_NA$ylist, FUN = function(x) {
  x[is.na(x)] <- 0
  x
})

After converting NAs to 0, Nimble models can be run without warnings and summaries can be calculated.

To prevent the additional 0s from affecting the estimation of detection probabilities, information about occasions without effort is provided via the mandatory argument effortCov in the communityModel function. Use the $effort part of the output of detectionHistory() to provide the necessary information, and of course ensure that detectionHistory is run with includeEffort = T. All occasions with NA in the effort matrix will have their detection probabilities fixed at 0. This ensures that those occasions do not affect the model fit. Be aware that this does not include effort as a covariate on detection. Including effort as a covariate on detection would need to be done via detCovsObservation.

Further possibilies

This section demonstrates additional possibilities. The models here are (mostly) fit in JAGS, but can also be fit with Nimble.

Fixed effects

One can define fixed effects of covariates if the responses are thought to be constant across all species. This might make sense for the effect of sampling effort in camera trapping studies.

Priors

It is not yet possible to customize priors. The current defaults are:

dnorm(0, 0.05) : random effect means
dgamma(0.1, 0.1): random effect precision (tau)

Future versions may allow more flexibility. Please contact me if this is relevant for you. In the meantime feel free to adjust the model code in the txt file as needed.

Data augmentation

communityModel has an argument augmentation, which can implement fully open data augmentation or data augmentation up to a known maximum number of species.

We know there are 20 species in the simulated community example. We can inform the model that the community is known to consist of 20 species via: augmentation = c(maxknown = 20):

modelFile_jags_aug1 <- tempfile(fileext = ".txt")

model1_jags_aug1 <- communityModel(
  occuCovs           = list(ranef = c("habitat")), 
  detCovsObservation = list(ranef = "wind"),
  intercepts         = list(det = "ranef", occu = "ranef"),
  data_list          = input_AHM,
  augmentation       = c(maxknown = 20),
  modelFile          = modelFile_jags_aug1)

# short model run for demonstration
out_ahm_jags_aug1 <- fit(model1_jags_aug1, 
                         n.iter = 500, 
                         n.burnin = 250,
                         thin = 5,
                         chains = 3,
                         quiet = T
)

Here is another example with fully open data augmentation (metacommunity size is unknown). The model will estimate the probability for each species to be part of the metacommunity. Fully open data augmentation is specified with augmentation = c(full = 30) (for a maximum potential community size of 30 species in this example).

modelFile_jags_aug2 <- tempfile(fileext = ".txt")

model1_jags_aug2 <- communityModel(
  occuCovs = list(ranef = c("habitat")), 
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM,
  augmentation = c(full = 30),
  modelFile = modelFile_jags_aug2)

# short model run for demonstration
out_ahm_jags_aug2 <- fit(model1_jags_aug2, 
                         n.iter = 500, 
                         n.burnin = 250,
                         thin = 5,
                         chains = 3,
                         quiet = T
)

See parameters “omega” and “w[1]” to “w[30]” in the model output:

DT::datatable(round(summary(out_ahm_jags_aug2)$statistics, 3))

Higher numbers of species in data augmentation increase model run time.

Categorical covariates

Site covariates

Site covariates will be interpreted as categorical if they are factors in the input data frame. If a random effect of a categorical covariate is defined, there will be independent species-level random effects within each factor level of the categorical covariate. In other words, each factor level will have its own and independent random effect of species.

Here is an example with a categorical site covariate “categ” on occupancy:

modelFile_jags_categ1 <- tempfile(fileext = ".txt")

model_jags_categ1 <- communityModel(
  occuCovs = list(ranef = c("categ")),
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM,
  modelFile = modelFile_jags_categ1)

# short model run for demonstration
out_ahm_jags_categ1 <- fit(model_jags_categ1, 
                         n.iter = 500, 
                         n.burnin = 250,
                         thin = 5,
                         chains = 3,
                         quiet = T
)

Plot the results:

plot_effects(object = model_jags_categ1,
             mcmc.list = out_ahm_jags_categ1)

## $categ

plot_coef(object = model_jags_categ1,
          mcmc.list = out_ahm_jags_categ1)

## $categ

In the table below, see the arguments beta.ranef.categ.categ. The species effects have two indices: [species, factorLevel]. Accordingly, there are three estimates for mean and SD, respectively. All estimates for the first factor level are 0, since the intercept refers to the first factor level (i.e. reference level, which we defined as “medium” above).

DT::datatable(round(summary(out_ahm_jags_categ1)$statistics, 3))

Observation-level covariates

Currently categorical observation-level covariates are only fully supported in JAGS, not Nimble. Observation-level covariates are matrices and thus can’t be factors. Thus they need to be defined as character matrices.

# Create example categorical observation covariate
input_AHM$obsCovs$wind_categ <- ifelse(input_AHM$obsCovs$wind > 0, "windy", "calm")

# result is a character matrix:
str(input_AHM$obsCovs$wind_categ)

##  chr [1:30, 1:5] "calm" "calm" "windy" "windy" "calm" "calm" "windy" "calm" ...

Model example:

modelFile_jags_categ2 <- tempfile(fileext = ".txt")

model_jags_categ2 <- communityModel(
  occuCovs = list(fixed = c("habitat")), 
  detCovsObservation = list(ranef = c("wind_categ")),     # "wind_categ" now
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM,
  modelFile = modelFile_jags_categ2)

out_ahm_jags_categ2 <- fit(model_jags_categ2, 
                         n.iter = 500, 
                         n.burnin = 250,
                         thin = 5,
                         chains = 3,
                         quiet = T,
                         compile = FALSE
)

Model estimates. As above, the estimates for the first factor level are 0. The relevant parameters here are called alpha.obs.ranef.categ.wind_categ.

DT::datatable(round(summary(out_ahm_jags_categ2)$statistics, 3))

Plot the results:

plot_effects(object = model_jags_categ2, 
             mcmc.list = out_ahm_jags_categ2,
             submodel = "det")

## $wind_categ

plot_coef(object = model_jags_categ2, 
          mcmc.list = out_ahm_jags_categ2,
          submodel = "det")

## $wind_categ

Random effects other than species

By default random effects are by species. Alternatively, you can specify random effects of a categorical site covariate2 on continuous covariate1 using the pattern:

occuCovs(ranef = c(“covariate1|covariate2”))

The same syntax can be used in

detCovs(ranef = c(“covariate1|covariate2”))

and

detCovs(ranef = c(“covariate1|covariate2”))

Here is an example, where the effect of “habitat” differs between levels of “categ”.

modelFile_jags_ranef <- tempfile(fileext = ".txt")

model1_jags_ranef <- communityModel(
  occuCovs = list(ranef = c("habitat|categ")), 
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM,
  modelFile = modelFile_jags_ranef)

# short model run for demonstration
out_ahm_jags_ranef <- fit(model1_jags_ranef, 
                         n.iter = 500, 
                         n.burnin = 250,
                         thin = 5,
                         chains = 3,
                         quiet = T
)

Search for the term “habitat_categ” in the table below. There are 2 estimates (for stations with “A” and “B”), plus the mean and sigma of the random effect (this is only for demonstration, 2 factor levels is of course not enough for applying random effects).

DT::datatable(round(summary(out_ahm_jags_ranef)$statistics, 3))

Currently random effects other than species can not be plotted with plot_effects and plot_coef.

Nested random effects

Consider a numeric covariate1 and categorical covariate2.

Use the keyword “+Species” to specify a species-specific random effect within each factor level of covariate2:

occuCovs(ranef = c(“covariate1|covariate2+Species”))

Here is an example, with independent random effect of “categ” on “habitat” for each species:

modelFile_jags_nested <- tempfile(fileext = ".txt")

model1_jags_nested <- communityModel(
  occuCovs = list(ranef = c("habitat|categ+Species")), 
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM,
  modelFile = modelFile_jags_nested)

# short model run for demonstration
out_ahm_jags_nested <- fit(model1_jags_nested, 
                         n.iter = 500, 
                         n.burnin = 250,
                         thin = 5,
                         chains = 3,
                         quiet = T
)

Search for the term “habitat_categ_Species” in the table below. Depending on the number of species and factor levels The number of parameter estimates is very high when using nested random effects.

DT::datatable(round(summary(out_ahm_jags_nested)$statistics, 3))

Currently nested random effects can not be plotted with plot_effects and plot_coef.

Species and site random effect on detection

Species and site random effect on detection probability can be helpful if Bayesian p-values indicate lack of fit (overall and / or for individual species). Such an effect can be added via argument speciesSiteRandomEffect. Setting it to TRUE adds a random effect “ran” to detection probability in the form:

logit.p[i,j,k] <- alpha0[i] + … + ran[i, j]

where i,j are species and station and occasion indices. ran is defined as:

ran[i,j] ~ dnorm(0, tau.ran)

with tau.ran being the precision of the normal distribution (precision = 1 / standard deviation).

Site is defined as the sampling location (e.g. camera trap station in this context).

Below is an example model:

modelFile_jags_species_station_ran <- tempfile(fileext = ".txt")

model1_jags_species_station_ran <- communityModel(
  occuCovs = list(ranef = c("habitat")), 
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  speciesSiteRandomEffect = list(det = TRUE),
  data_list = input_AHM,
  modelFile = modelFile_jags_species_station_ran)

# short model run for demonstration
out_ahm_jags_species_station_ran <- fit(model1_jags_species_station_ran, 
                                        n.iter = 500, 
                                        n.burnin = 250,
                                        thin = 5,
                                        chains = 3,
                                        quiet = T
)

The model returns the standard deviation of this random effect as alpha.speciessite.ranef.sigma, which can be found in the table below.

DT::datatable(round(summary(out_ahm_jags_species_station_ran)$statistics, 3))

Quadratic covariate effects

The easiest way to include quadratic effects is by squaring the covariate and including it as a new covariate. The plot_* and predict methods will be able to identify the quadratic term via a required keyword (the suffix “_squared” by default).

Covariates should be scaled to prior to squaring (“habitat” in this examples is already scaled).

Create quadratic covariate:

input_AHM$siteCovs$habitat_squared <- input_AHM$siteCovs$habitat ^2

Create and fit model:

modelFile_jags_quadratic <- tempfile(fileext = ".txt")


model1_jags_quadratic <- communityModel(
  occuCovs = list(ranef = c("habitat", "habitat_squared")), 
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM,
  modelFile = modelFile_jags_quadratic,
  keyword_quadratic = "_squared")   # this is the default, added for clarity

# short model run for demonstration
out_ahm_jags_quadratic<- fit(model1_jags_quadratic, 
                         n.iter = 10000, 
                         n.burnin = 5000,
                         thin = 5,
                         chains = 3,
                         quiet = T
)

Thanks to to keyword “_squared” in the covariate name, plot_effects knows that “habitat” and “habitat_squared” are the same covariate and will combine the effects of “habitat” and “habitat_squared” in response curves. If a different keyword was used, set the parameter “keyword_quadratic” accordingly (it defaults to “_squared”).

plot_effects(model1_jags_quadratic, 
             out_ahm_jags_quadratic)

## $habitat

The simulated data have no quadratic relationship with covariates, hence the quadratic effects are rather weak in this example.

WAIC

Nimble models can return WAIC. Set argument fit(..., WAIC = TRUE).

out_ahm_nimble3 <- fit(model1_nimble, 
                       n.iter = 500,
                       n.burnin = 250,
                       thin = 5,
                       chains = 3,
                       compile = TRUE,
                       quiet = F, 
                       WAIC = TRUE
)

The message about “individual pWAIC values that are greater than 0.4” in the example is likely due to the short model run.

If WAIC = TRUE, the output will be a list with 2 elements. The first (“samples”) is the familiar mcmc.list, the second (“WAIC”) is a waicList.

mcmc.list is accessible with $samples:

# not printed here
summary(out_ahm_nimble3$samples)$statistics

WAIC:

out_ahm_nimble3$WAIC

## nimbleList object of type waicNimbleList
## Field "WAIC":
## [1] 1071.5
## Field "lppd":
## [1] -490.7924
## Field "pWAIC":
## [1] 44.95773

Species richness

Global

Global species richness estimates is returned in parameter “Nspecies” (see below).

By station

Species richness by station is returned in parameter “Nspecies_station” (see below).

Richness estimates for categorical covariates

This is only possible in JAGS models. Argument “richnessCategories” can be used to calculate separate species richness estimates for different levels of categorical site covariates. If it is not defined, there will only be global and by-station species richness estimates.

modelFile_jags_richness <- tempfile(fileext = ".txt")

model1_jags_richness <- communityModel(
  occuCovs = list(ranef = c("habitat")), 
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM,
  richnessCategories = "categ",
  modelFile = modelFile_jags_richness)

# short model run for demonstration
out_ahm_jags_rich <- fit(model1_jags_richness, 
                         n.iter = 500, 
                         n.burnin = 250,
                         thin = 5,
                         chains = 3,
                         quiet = T
)

Now search for “Nspecies” in the table below. You will see entries for three different parameters.

“Nspecies” is overall species richness,
“Nspecies_by_category[1]”, “…[2]” and “…[3]” are for the stations in category “medium” and “low”, “high” respectively.
“Nspecies_by_station[1]” to “…[30]” are species richness estimates for each sampling station.

DT::datatable(round(summary(out_ahm_jags_rich)$statistics, 3))

Spatial predictions

You can create spatial predictions based on the occupancy models created with communityModel, their model fits and covariate rasters.

This requires the terra package.

  library(terra)

For demonstration we will use a model that has both a continuous and a categorical covariate.

modelFile_jags_categ3 <- tempfile(fileext = ".txt")

model_jags_categ3 <- communityModel(
  occuCovs = list(ranef = c("habitat", 
                            "categ")),
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM,
  modelFile = modelFile_jags_categ3)

# short model run for demonstration
out_ahm_jags_categ3 <- fit(model_jags_categ3, 
                         n.iter = 500, 
                         n.burnin = 250,
                         thin = 5,
                         chains = 3,
                         quiet = T
)

Create a habitat raster. We’ll use the volcano raster for demonstration. The coordinates are not geographic though since this is only for demonstration.

data("volcano")
r_volcano <- scale(rast(volcano))

Create a categorical raster. Since it is not possible to have categorical raster (with data type “factor”), the raster must consist of integer numbers that refer to the factor levels (as defined in the model input).

levels(input_AHM$siteCovs$categ)    # show the order of factor levels, first one is reference

## [1] "medium" "low"    "high"

raster_categ <- rast(x = matrix(rep(c(1,2, 3), each = ncell(r_volcano) / 3), 
                                  ncol = ncol(r_volcano),
                                  nrow = nrow(r_volcano)))

# add raster to list
stack_prediction <- rast(list(habitat = r_volcano,
                                 categ = raster_categ))

plot(stack_prediction, asp = 1)

# raster categ, from west to east: 1 = medium, 2 = low, 3 = high

Species occupancy maps:

# species occupancy estimates
  predictions_psi <- predict(object    = model_jags_categ3, 
                             mcmc.list = out_ahm_jags_categ3,
                             x         = stack_prediction,
                             type      = "psi",
                             draws     = 1000)

## draws (1000) is greater than the number of available samples (150).

  plot(predictions_psi$mean, zlim = c(0,1), 
       col = hcl.colors(100), 
       maxnl = 9,   # plotting only the first 9 species for space reasons
       asp = 1)

Species richness maps:

# species richness estimates
  predictions_rich <- predict(object   = model_jags_categ3, 
                             mcmc.list = out_ahm_jags_categ3,
                             x         = stack_prediction,
                             type      = "richness")
  
  plot(predictions_rich, col = hcl.colors(100), asp = 1)

  # mean = mean species richness estimate
  # sd = standard deviation of richness estimate

In these examples we did not return confidence intervals. Doing so would be easy by setting intervals = "confidence".

Computation can take long on larger rasters and can require lots of RAM. Check RAM usage during computations. Parameter “batch” can help reduce memory load.

Percentage of area occupied (by species)

The predict function can be used to calculate Percentage of Area Occupied (PAO) for all species, optionally for a given area of interest (AOI).

For demonstration, let’s create a AOI as a simple feature (from the sf package) Normally you’d just load a shapefile or other spatial vector data set.

Sr1 = st_polygon(list(cbind(c(10, 10, 30, 30, 10),
                    c(40, 70, 80, 40, 40))))
aoi <- data.frame(name = "My AOI")
st_geometry(aoi) <- st_geometry(Sr1)

plot(r_volcano)
plot(vect(aoi), add = T)   # easier plotting via SpatVect object from terra

To use the vector layer as an AOI, we need to rasterize it. PAO will refer to all cells in the AOI raster that are not NA.

r_aoi <- rasterize(aoi, r_volcano)
plot(r_aoi)

Now the prediction of PAO:

  predictions_pao <- predict(object    = model_jags_categ3, 
                             mcmc.list = out_ahm_jags_categ3,
                             x         = stack_prediction,
                             type      = "pao", 
                             aoi       = r_aoi)

## No id variables; using all as measure variables
## No id variables; using all as measure variables

Output is a list with several components:

$pao_summary is a summary table with quantiles of predicted PAO for all species.

predictions_pao$pao_summary

##       Min. lower.ci.0.025 1st Qu. Median  Mean 3rd Qu. upper.ci.0.975  Max.
## sp1  0.027          0.053   0.132  0.243 0.308   0.450          0.797 0.919
## sp2  0.020          0.047   0.142  0.286 0.326   0.488          0.777 0.900
## sp3  0.013          0.026   0.115  0.271 0.312   0.495          0.780 0.826
## sp4  0.130          0.230   0.659  0.786 0.740   0.881          0.945 0.966
## sp5  0.031          0.049   0.129  0.227 0.288   0.431          0.669 0.741
## sp6  0.167          0.388   0.698  0.841 0.786   0.900          0.973 0.983
## sp7  0.063          0.084   0.278  0.421 0.443   0.604          0.820 0.910
## sp8  0.014          0.027   0.089  0.164 0.209   0.277          0.583 0.664
## sp9  0.043          0.099   0.289  0.404 0.448   0.643          0.826 0.867
## sp10 0.011          0.034   0.080  0.148 0.177   0.238          0.508 0.653
## sp11 0.083          0.157   0.590  0.731 0.687   0.823          0.911 0.973
## sp12 0.050          0.086   0.373  0.575 0.542   0.735          0.885 0.943
## sp13 0.017          0.026   0.069  0.141 0.199   0.294          0.627 0.701
## sp14 0.074          0.112   0.410  0.611 0.581   0.771          0.898 0.936
## sp15 0.047          0.067   0.201  0.389 0.400   0.587          0.796 0.856
## sp16 0.091          0.101   0.313  0.492 0.478   0.645          0.814 0.890
## sp18 0.006          0.030   0.170  0.375 0.378   0.567          0.839 0.914
## sp19 0.007          0.035   0.177  0.355 0.365   0.517          0.829 0.913
## sp20 0.030          0.061   0.232  0.454 0.446   0.657          0.834 0.917

$pao_matrix contains the PAO estimates for all species from all posterior draws (not shown here)

predictions_pao$pao_matrix

$pao_df is similar to $pao_matrix, but in a data frame in long format, ready for use in ggplot2.

  ggplot2::ggplot(predictions_pao$pao_df, aes(x = Species , y = PAO)) +
  geom_boxplot(outlier.shape = NA) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

$aoi is a raster layer (SpatRaster) showing the AOI (now with cells removed that are NA in the prediction rasters)

plot(predictions_pao$aoi)

bayesplot

We can also use the powerful bayesplot package for visualizing model estimates. One downside is that it won’t automatically insert the species names.

library(bayesplot)

Here is an example for density plots showing the species-level effects of habitat on occupancy:

mcmc_areas(out_ahm_nimble2, regex_pars = "beta.ranef.cont.habitat")

bayesplot has many other visualization options beyond this example. See the respective documentation, e.g. https://mc-stan.org/bayesplot/.

Model diagnostics

Output of fit() is a coda::mcmc.list, so plot() provides traceplots (not shown here for space reasons):

plot(out_ahm_nimble2)

The Gelman-Rubin convergence statistic (potential scale reduction factor) can be calculated with gelman.diag() from coda:

gd <- coda::gelman.diag(out_ahm_nimble2,  multivariate = FALSE)
gd

## Potential scale reduction factors:
## 
##                                 Point est. Upper C.I.
## alpha.obs.ranef.cont.wind[1]          1.00       1.00
## alpha.obs.ranef.cont.wind[2]          1.00       1.00
## alpha.obs.ranef.cont.wind[3]          1.00       1.00
## alpha.obs.ranef.cont.wind[4]          1.00       1.00
## alpha.obs.ranef.cont.wind[5]          1.00       1.00
## alpha.obs.ranef.cont.wind[6]          1.00       1.01
## alpha.obs.ranef.cont.wind[7]          1.00       1.00
## alpha.obs.ranef.cont.wind[8]          1.00       1.00
## alpha.obs.ranef.cont.wind[9]          1.00       1.00
## alpha.obs.ranef.cont.wind[10]         1.00       1.00
## alpha.obs.ranef.cont.wind[11]         1.00       1.00
## alpha.obs.ranef.cont.wind[12]         1.00       1.00
## alpha.obs.ranef.cont.wind[13]         1.00       1.00
## alpha.obs.ranef.cont.wind[14]         1.01       1.02
## alpha.obs.ranef.cont.wind[15]         1.00       1.00
## alpha.obs.ranef.cont.wind[16]         1.00       1.00
## alpha.obs.ranef.cont.wind[17]         1.00       1.00
## alpha.obs.ranef.cont.wind[18]         1.00       1.00
## alpha.obs.ranef.cont.wind[19]         1.00       1.01
## alpha.obs.ranef.cont.wind.mean        1.00       1.00
## alpha.obs.ranef.cont.wind.sigma       1.00       1.01
## alpha0[1]                             1.00       1.01
## alpha0[2]                             1.00       1.01
## alpha0[3]                             1.00       1.00
## alpha0[4]                             1.00       1.01
## alpha0[5]                             1.00       1.00
## alpha0[6]                             1.00       1.00
## alpha0[7]                             1.00       1.00
## alpha0[8]                             1.01       1.02
## alpha0[9]                             1.00       1.00
## alpha0[10]                            1.00       1.00
## alpha0[11]                            1.00       1.01
## alpha0[12]                            1.00       1.01
## alpha0[13]                            1.00       1.00
## alpha0[14]                            1.00       1.00
## alpha0[15]                            1.00       1.00
## alpha0[16]                            1.00       1.00
## alpha0[17]                            1.00       1.01
## alpha0[18]                            1.00       1.00
## alpha0[19]                            1.00       1.01
## alpha0.mean                           1.00       1.01
## alpha0.sigma                          1.00       1.01
## beta.ranef.cont.habitat[1]            1.00       1.01
## beta.ranef.cont.habitat[2]            1.00       1.01
## beta.ranef.cont.habitat[3]            1.00       1.00
## beta.ranef.cont.habitat[4]            1.00       1.00
## beta.ranef.cont.habitat[5]            1.01       1.02
## beta.ranef.cont.habitat[6]            1.00       1.01
## beta.ranef.cont.habitat[7]            1.00       1.00
## beta.ranef.cont.habitat[8]            1.00       1.00
## beta.ranef.cont.habitat[9]            1.00       1.01
## beta.ranef.cont.habitat[10]           1.00       1.00
## beta.ranef.cont.habitat[11]           1.00       1.00
## beta.ranef.cont.habitat[12]           1.00       1.01
## beta.ranef.cont.habitat[13]           1.00       1.00
## beta.ranef.cont.habitat[14]           1.00       1.00
## beta.ranef.cont.habitat[15]           1.00       1.01
## beta.ranef.cont.habitat[16]           1.00       1.00
## beta.ranef.cont.habitat[17]           1.00       1.00
## beta.ranef.cont.habitat[18]           1.00       1.00
## beta.ranef.cont.habitat[19]           1.00       1.00
## beta.ranef.cont.habitat.mean          1.00       1.00
## beta.ranef.cont.habitat.sigma         1.00       1.01
## beta0[1]                              1.00       1.00
## beta0[2]                              1.00       1.00
## beta0[3]                              1.01       1.02
## beta0[4]                              1.00       1.00
## beta0[5]                              1.00       1.00
## beta0[6]                              1.00       1.02
## beta0[7]                              1.00       1.00
## beta0[8]                              1.00       1.02
## beta0[9]                              1.00       1.00
## beta0[10]                             1.00       1.00
## beta0[11]                             1.00       1.01
## beta0[12]                             1.00       1.01
## beta0[13]                             1.00       1.01
## beta0[14]                             1.00       1.00
## beta0[15]                             1.00       1.00
## beta0[16]                             1.00       1.00
## beta0[17]                             1.00       1.01
## beta0[18]                             1.00       1.00
## beta0[19]                             1.00       1.01
## beta0.mean                            1.01       1.02
## beta0.sigma                           1.00       1.02

One can also use gelman.plot to create Gelman-Rubin-Brooks plots (evolution of Gelman-Rubin statistic over iterations):

# not shown here due to space reasons
coda::gelman.plot(out_ahm_nimble2,  multivariate = FALSE)

Royle-Nichols model

Instead of a standard occupancy model one can fit the occupancy model of Royle and Nichols (2003), which relates probability of detection of the species to the number of individuals available for detection at each site (instead of their occupancy status alone). Probability of occupancy is a derived parameter: the probability that at least one individual is available for detection at the site. It is achieve by setting the parameter “model” to “RN”:

modelFile_jags_RN <- tempfile(fileext = ".txt")

model1_jags_RN <- communityModel(
  model = "RN",
  occuCovs = list(ranef = c("habitat")),
  detCovsObservation = list(ranef = "wind"),
  intercepts = list(det = "ranef", occu = "ranef"),
  data_list = input_AHM,
  modelFile = modelFile_jags_RN)

## Site covariates may not be scaled. Scaling is strongly recommended for numeric covariates.

## Wrote model to C:\Users\NIEDBA~1\AppData\Local\Temp\Rtmpi03Mlp\file255c5ee657b.txt

out_ahm_jags_RN <- fit(model1_jags_RN, 
                    n.iter = 5000, 
                    n.burnin = 2500,
                    thin = 5,
                    chains = 3,
                    quiet = T
)

The models can be processed as shown before, e.g.:

Summary statistics of model parameters:

DT::datatable(round(summary(out_ahm_jags_RN)$statistics, 3))

Response curves (marginal effect plots).

# In the occupancy (state) submodel, one can show abundance (the latent quantity modelled) 
plot_effects(model1_jags_RN, 
             out_ahm_jags_RN,
             submodel = "state",
             response = "abundance")

## $habitat

# In the occupancy (state) submodel, one can also show occupancy (which is derived from abundance):
plot_effects(model1_jags_RN, 
             out_ahm_jags_RN,
             submodel = "state",
             response = "occupancy")

## $habitat

plot_effects(model1_jags_RN, 
             out_ahm_jags_RN,
             submodel = "det")

## $wind

Coefficient plots:

plot_coef(model1_jags_RN,
          out_ahm_jags_RN,
          ordered = FALSE)

## $habitat

plot_coef(model1_jags_RN,
          out_ahm_jags_RN,
          ordered = FALSE,
          submodel = "det")

## $wind

Predictions work as before, but a Royle-Nichols model allows for returning expected species abundance (lambda) via type = “abundance” and type = “lambda_array”. Occupancy predictions internally convert abundance to occupancy.

Occupancy predictions (type = “psi”)

predictions_psi_RN <- predict(object    = model1_jags_RN, 
                             mcmc.list = out_ahm_jags_RN,
                             x         = stack_prediction,
                             type      = "psi",
                             draws     = 200)
plot(predictions_psi_RN$mean, 
     col = hcl.colors(100),
       maxnl = 9)  # first 9 species only

Abundance predictions (type = “abundance”):

predictions_abundance_RN <- predict(object    = model1_jags_RN, 
                             mcmc.list = out_ahm_jags_RN,
                             x         = stack_prediction,
                             type      = "abundance",
                             draws     = 200)
plot(predictions_abundance_RN$mean, 
     col = hcl.colors(100),
     maxnl = 9)   # first 9 species only

Richness predictions (type = “richness”)

  predictions_richness_RN <- predict(object    = model1_jags_RN, 
                             mcmc.list = out_ahm_jags_RN,
                             x         = stack_prediction,
                             type      = "richness",
                             draws     = 200)
plot(predictions_richness_RN$mean, col = hcl.colors(100))

PAO predictions (type = “pao”):

  predictions_pao_RN <- predict(object    = model1_jags_RN, 
                             mcmc.list = out_ahm_jags_RN,
                             x         = stack_prediction,
                             type      = "pao",
                             draws     = 200)

## No id variables; using all as measure variables
## No id variables; using all as measure variables

DT::datatable(predictions_pao_RN$pao_summary)

Single-species occupancy models

Incidentally, the workflow above can be used for single-species models too. The code will be less streamlined and possibly slower than normal code for single-species models (due to the species loop which is unnecessary for single-species models), but it is possible.

To create a single-species occupancy model, simply pass a single detection history in the data_list$ylist of the input to communityModel.

Take care to only define fixed effects for covariates and the intercepts, not random effects of species. It is possible however to specify random effects of categorical site covariates on continuous site covariates as in the example below.

In this example we use species 1 from the community created with AHMbook::simComm.

input_AHM_sp1 <- list(ylist = ylist[1],    # ylist[1] is still a list, with 1 element
                      siteCovs = input_AHM$siteCovs,
                      obsCovs = input_AHM$obsCovs)

Create model:

modelFile_jags_sp1 <- tempfile(fileext = ".txt")

model_sp1_jags_ranef <- communityModel(
  occuCovs = list(ranef = c("habitat|categ")),    # response to habitat differs between categ via random effect
  detCovsObservation = list(fixed = "wind"),
  intercepts = list(det = "fixed", occu = "fixed"),
  data_list = input_AHM_sp1,
  modelFile = modelFile_jags_sp1)

Fit model:

fit_jags_sp1 <- fit(model_sp1_jags_ranef, n.iter = 1000, n.burnin = 500, thin = 5)

Print summary statistics:

DT::datatable(round(summary(fit_jags_sp1)$statistics, 3))

The plot_effects() and plot_coef() functions currently don’t work for this case, since single-species models are mostly a side-effect of the intended community occupancy workflow and not the main goal. There are other great options for Bayesian single-species occupancy models, e.g. in the packages ubms or wiqid, or unmarked for a frequentist implementation.

Juergen Niedballa (camtrapr@gmail.com)

2024-02-24

Overview

Warning

Limitations

Parameter naming convention

Quick example

Data preparation

Fit model in JAGS

Models in Nimble

Introduction

Compilation and Rtools

Fit models in Nimble

More complete example

Prepare data

JAGS Model

Nimble Model

Missing values in detection histories

Further possibilies

Fixed effects

Priors

Data augmentation

Categorical covariates

Site covariates

Observation-level covariates

Random effects other than species

Nested random effects

Species and site random effect on detection

Quadratic covariate effects

WAIC

Species richness

Global

By station

Richness estimates for categorical covariates

Spatial predictions

Percentage of area occupied (by species)

bayesplot

Model diagnostics

Royle-Nichols model

Single-species occupancy models