Occupancy models. Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology

Occupancy models Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology Advances in Species distribution modelling in ecological studies and conservation Pavia and Gran Paradiso, Italy 12-18 Sept. 2011

Outline Session 1: Introduction to occupancy modelling S1.1 Introduction S1.2 Statistical background S1.3 Single-season occupancy models Session 2: Occupancy modelling in practice S2.1 Practical: single-season S2.2 Study design Session 3: Occupancy modelling developments S3.1 Multiple-season occupancy model S3.2 Practical: multi-season S3.3 Further models 2

SESSION 1 Introduction to occupancy modelling S1.1 - Introduction

The study of wildlife populations Why study wildlife populations? Science (ecology): how our world works Conservation: threatened species Management: harvesting sustainably Hypothesis testing & discrimination are my observations consistent with a hypothesis? which of this competing hypothesis is more likely? Prediction e.g. species distribution 4

The study of wildlife populations Different state variables: Abundance Occupancy Species richness Which state variable to use will depend on things like... hypotheses to test management objectives biology of the species under consideration resources available 5

Occupancy as a state variable Occupancy: Probability of a site being occupied by a species Depends on scale (definition of site ) Of interest in various areas of ecology: Species geographic range Habitat relationships Metapopulation dynamics Large-scale monitoring Species interactions ψ 1 =2/4=0.5 ψ 2 =2/16=0.125 6

Occupancy as a state variable Often well suited for surveying large areas Abundance typically more difficult to estimate Few cases where direct counts are possible Some statistical techniques available (e.g. closedpopulation Mark-Recapture; Distance Sampling) In general: time-consuming / resource intensive / expensive / expert-skills-based (etc); and strong statistical assumptions 7

Occupancy as a state variable Occupancy may be used as a valid state variable in itself... although sometimes used as surrogate of abundance Warning example: occupancy as a surrogate for abundance? Ideal species Annoying species: 8

Occupancy as habitat preference If we can relate occupancy ( probability of a site being occupied by the species ) to covariates that describe the habitat model of habitat preferences habitat suitability model, species distribution model, etc Can construct maps of probability of occupancy or distribution 9

Sampling issues Spatial variation In general, you cannot sample the whole area of interest Sampling should be done according to project objectives and so that can make inference about the locations not surveyed Imperfect detection Animals, or even the whole species, can go undetected at sites where they are present 10

Sampling: imperfect detection Alaotran gentle lemur Hapalemur alaotrensis Critically Endangered Only lives in the Alaotra marsh, Madagascar 11

Sampling: imperfect detection 12

Sampling: imperfect detection 13

Sampling: imperfect detection 14

Sampling: imperfect detection Sumatran tiger Panthera tigris sumatrae 15

Sampling: imperfect detection 16

Sampling: imperfect detection Effects of imperfect detection: underestimate occupancy (estimated occupancy) = (real occupancy) x (detectability) 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 17 occupied sites out of 36 survey sites proportion occupied sites = 17/36 = 0.47 Seen in 6 out of 17 occupied sites naive estimate= 6/36 = 0.17 17

probability Sampling: imperfect detection Effects of imperfect detection: obscure trends in occupancy 1 0.8 true occupancy detectability 0.6 0.4 0.2 observed occupancy 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 year 18

apparent logit (occupancy) Sampling: imperfect detection Effects of imperfect detection: obscure relationships of occupancy with covariates 3 2 1 0-2 -1.5-1 -0.5 0 0.5 1 1.5 2-1 -2-3 -4-5 habitat true relationship p<1 p ~ +habitat p ~ -habitat (MacKenzie et al. 2006) 19

Sampling: imperfect detection Note that the main problem is the variation in detectability (from site to site, from survey to survey) If detection probability is constant across geographical space and time, it may have less impact in our inference Sometimes we may be able to mitigate these variations with careful survey design But, in any case, we ll improve our estimates and inference if we model imperfect detection explicitly 20

Sampling: imperfect detection The issue of detectability is not a new thing Taken into account in old frameworks like Mark-Recapture and Distance Sampling ID 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 45 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 66 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 67 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 68 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 69 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 85 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 86 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 87 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 88 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 90 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 91 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 101 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 21

Sampling: imperfect detection Various methods for accounting for it when estimating occupancy MacKenzie et al (2002) provide the most general approach for dealing with this problem 22

SESSION 1 Introduction to occupancy modelling S1.2 - Statistical background

Basic concepts REALITY MODEL d h L w Parameters: d, h, L, w 24

Basic concepts "all models are wrong, but some are useful" (Box, 1976) 25

Basic concepts Parameter: characteristic of the system under study Estimator: a rule for calculating an estimate of the parameter from observed data Estimate: the output of the estimator for our data set DATA 1.23 100.734 1,0,0,1,1,0,... 0.03 28.96 ESTIMATOR of θ ESTIMATE θ = 2.56 2.51, 2.67 26

Basic concepts Bias precision - accuracy Precise Biased Imprecise Biased Imprecise Unbiased Precise Unbiased accurate! 27

Model description (e.g. seed germination) Suppose we plant 6 seeds, record whether they germinate successfully. How probable is to obtain this outcome? Assuming independence and given θ=probability of success (Bernoulli trials) Pr data θ = θ 1 θ 1 θ θθθ = θ 4 1 θ 2 e.g. if θ=0.7 Pr(data θ)=0.0216 28

Methods of inference Estimates can be obtained from the probabilistic description of the model using different methods of inference: Maximum likelihood ( classic or frequentist framework) Bayesian (Marc s modules) Note: the inference, NOT the model, is frequentist/bayesian Central to these two method is the likelihood function. 29

likelihood Likelihood function L(θ data)=pr(data θ) (for discrete data) A function of the model parameters (i.e. θ is the variable) How likely different parameter values are, given the data Simple change in point of view, no mathematical change! 0.025 0.020 0.015 0.010 0.005 L(θ data) = θ 4 1 θ 2 0.000 0 0.2 0.4 0.6 0.8 1 θ 30

likelihood Likelihood function The actual likelihood value is not important It s the relative values that matter! 0.025 0.020 0.015 0.010 0.005 L(θ data) = θ 4 1 θ 2 0.000 0 0.2 0.4 0.6 0.8 1 θ 31

likelihood Maximum likelihood estimation Find parameter value that maximize L MLE The widest the peak the more uncertainty The curvature of the function reflects the estimator variance Can derive variances inverting the information matrix (matrix of second partial derivatives of the log-likelihood function) 0.025 0.020 0.015 0.010 0.66 0.005 L(θ data) = θ 4 1 θ 2 0.000 0 0.2 0.4 0.6 0.8 1 θ 32

likelihood likelihood Maximum likelihood estimation 0.025 0.020 n t = 6 n s =4 0.015 0.010 0.005 n t = 25 n s =18 0.000 4.E-07 0 0.2 0.4 0.6 0.8 1 θ L θ data = θ n s 1 θ n t n s 3.E-07 2.E-07 1.E-07 n t = #trials n s =#successes 0.E+00 0 0.2 0.4 0.6 0.8 1 θ 33

loglikelihood Likelihood function Usually log-likelihood: L θ data = θ 4 1 θ 2 log L θ data = 4 log θ + 2log 1 θ 0-2 0 0.2 0.4 0.6 0.8 1-4 -6-8 -10-12 -14 θ 34

Maximum likelihood estimation Sometimes an explicit expression exists to find the maximum deriving the function and solving e.g. dl dθ = n s θn t dl dθ = 0 θ = n s n t But usually we need to find the maximum numerically Theoretically we could evaluate the function in all points but normally not feasible as our function may have many parameters We use optimization algorithms (different methods exist) 35

Adding covariates Often interest in allowing parameters to be a function of covariates water oxygen temperature light When dealing with parameters (like probabilities) that are restricted to a range (e.g. 0-1) we require a link function to transform linear relationships 36

θ The logit link function The logit link function constrains values to 0-1 Not the only option but a widely used one (logistic regression) logit θ i = ln θ i 1 θ i = β 0 + β 1 c i1 + β 2 c i2 + 1.0 0.8 0.6 0.4 0.2 0.0-10.0-5.0 0.0 5.0 10.0 logit(θ) 37

The logit link function The relationship of the variable with the covariates is: logit θ i = ln θ i 1 θ i = β 0 + β 1 c i1 + β 2 c i2 + θ i = eβ 0+β 1 c i1 +β 2 c i2 + 1 + e β 0+β 1 c i1 +β 2 c i2 + = 1 1 + e (β 0+β 1 c i1 +β 2 c i2 + ) 38

response θ The logit link function logit θ i = ln θ i 1 θ i = β 0 + β 1 x i The intercept β 0 controls the position of the transition 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 b0=3, b1=1 b0=0, b1=1 b0=-3, b1=1-10 -5 0 5 10 covariate x 39

response θ The logit link function logit θ i = ln θ i 1 θ i = β 0 + β 1 x i The slope coefficient β 1 control the slope ( rapidness ) of the transition 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 b0=0, b1=0.4 b0=0, b1=1 b0=0, b1=1.5-10 -5 0 5 10 covariate x 40

response θ The logit link function logit θ i = ln θ i 1 θ i = β 0 + β 1 x i The sign of β 1 tells if we go from high to low or vice versa (i.e. if the covariate has a positive or negative effect on θ) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 b0=1, b1=-1 b0=1, b1=+1-10 -5 0 5 10 covariate x 41

response θ The logit link function logit θ i = ln θ i 1 θ i = β 0 + β 1 x i Note that a large β 1 creates an abrupt transition......while β 1 =0 means there is no relationship with the covariate (no transition) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 b0=0, b1=1 b0=0, b1=20 b0=0, b1=0-10 -5 0 5 10 covariate x 42

The logit link function Now the optimization is carried out on the β parameters Coming back to our seed example: Let s assume 1 covariate with value x i for pot i 1 2 3 4 5 6 Pr data θ = L θ data = θ 1 1 θ 2 1 θ 3 θ 4 θ 5 θ 6 = 1 1 + e β 0 β 1 x 1 1 1 1 + e β 0 β 1 x 2 1 1 1 1 1 1 + e β 0 β 1 x 3 1 + e β 0 β 1 x 4 1 + e β 0 β 1 x 5 1 + e β 0 β 1 x 6 = L β 0, β 1 data 43

Delta method Can be used to derive SE s for transformed parameters e.g. SE for θ from variance-covariance of β s (previous slide) It s based on approximations and large-sample assumptions Involves derivatives and matrix multiplications Y = f(β) var Y Y β Σ β Y β T For a nice explanation, see the appendix 2 in the MARK book 44

Model comparison d h L w MODEL 1 Params.: d, h, L, w REALITY? Z z h h MODEL 2 Params.: z, h z z x MODEL 3 Params.: z, x 45

Model selection What is the support for model X, relative to others in the set, given the data? Model selection indicates what inferences the data supports, not what the full reality might be Conditional on sample size With more data further effects could probably be found 46

Model selection Hypothesis-testing approaches Assess if there is evidence to reject the null hypothesis e.g. likelihood-ratio test Information-theoretic approaches (Anderson, 2008) Rank models based on parsimony Trade-off between underfitting and overfitting e.g. Akaike information criterion (AIC) 47

Model selection: likelihood-ratio tests Test to compare the fit of two nested models i.e. null model (M 0 ) is a special case of the alternative model (M A ) e.g. θ(.) vs θ(temp) Test statistic D, based on the likelihood ratio D = 2log likelihood_m 0 likelihood_m A = 2 loglik_m 0 + 2loglik_M A Under null hypothesis, asymptotically, d : difference in # of parameters D ~ χ d 2 Can compute a p-value, or compare to a critical value, to 2 decide whether to reject M 0 (can reject if ) D > χ d:α 48

Model selection: likelihood-ratio tests E.g. θ(.) vs θ(temp) We fit θ(.) and obtain a maximum log-likelihood of -78.4 We fit θ(temp) and obtain a maximum log-likelihood of -77.3 D = 2 78.4 2 77.3 = 156.8 154.6 = 2.2 (test-statistic) d = 2 1 = 1 (difference in # of parameters) Compare D with a χ 2 with d = 1 degree of freedom 2 2 χ 1:0.05 = 3.84 D < χ d:α Do not have evidence to reject the null model θ(.) 49

Model selection: AIC Akaike Information Criterion AIC = 2L + 2K L is the maximum log-likelihood K is the number of parameters in the model (penalty against overfitting) The lower the AIC within the set, the better the model e.g. θ(.): L = -78.4, K=1 AIC = -2 (-78.4)+2 1 =158.8 θ(temp): L = -77.3, K=2 AIC = -2 (-77.3)+2 2 =158.6 (Burnham & Anderson 2002) 50

Model selection: AIC It is the relative AIC value that matters Usually talk about ΔAIC = AIC AIC bestmodel Rule-of-thumb: ΔAIC 0-2 units substantial support 4-7 considerably less support >10 no support (Burnham & Anderson 2002) 51

Model selection: AIC Be careful with pretending variables e.g occupancy study of a bird species in Italy Model A: ψ(elev + habitat) Model B adds an irrelevant covariate: ψ(elev + habitat + rainfall in China) The new covariate does not improve the fit (i.e. same likelihood) 1 extra covariate (i.e. 2 AIC units of penalty) Model B is 2 AIC units from Model A AIC = 2L + 2K Rule-of-thumb model B is a good model But the new variable is only pretending to be important This should not be taken as evidence that the new parameter is relevant! 52

Model selection: AIC Small sample adjustments: AIC may perform poorly if too many parameters for the sample size 2K K + 1 AICc = AIC + n K 1 n over is the effective sample size Debate over what n is in occupancy models Adjustment for overdispersion: QAIC = 2L c + 2K 53

Model averaging and model selection uncertainty Sometimes no single model is clearly best Can do multiple-model inference using all models in the set Model weights: ad-hoc measures of model support w j = exp ΔAIC j /2 m exp ΔAIC i /2 Averaged estimates and their standard errors θ A = m i=1 i=1 w i θ i m SE θ A = w i Var θ i M i + θ i θ A 2 i=1 Model uncertainty (Burnham & Anderson 2002) Between-model uncertainty 54

Model averaging and model selection uncertainty Considerations: Usually safer to model average the real parameters (e.g. probabilities) rather than the betas (i.e. regression coefficients) Need to make sure that the parameters averaged have the same interpretation Regression coefficients can be sensitive to other covariates in the model if they are correlated Typically, all models in the set used in averaging. If one or more models are removed, then the model weights must be renormalized such that they sum to 1. 55

Goodness-of-fit Model comparison identifies the best model in the set... but is it a good model at all? GoF tests assess how well a model fits a set of observations Look for evidence of lack of fit For small samples, tests may have low power to detect lack of fit Ideally an assessment of GoF should always be carried out Area that needs more development 56

SESSION 1 Introduction to occupancy modelling S1.3 Single-season occupancy model

Key ideas Aim: estimate species occupancy Issue: species imperfect detection Protocol: collect data so that we can model the detection process and therefore obtain unbiased occupancy estimates MacKenzie et al. 2002 Tyre et al. 2003 58

Sampling protocol s sampling units surveyed (out of S; for now assume s<<s) Data collected: detection/non-detection ( presence/absence ) Replicate surveys are carried out in each sampling unit System closed to occupancy changes during sampling season 1001 1001 1001 1101 0000 0001 0000 0001 0000 59

Replication Types of replication Repeated visits at different points in time Simultaneous independent observers (or detection methods) Spatial replication within the site 60

Detection history e.g. resulting data set h 1 2 3... k Site 1 1 0 0... 0 Site 2 0 0 0... 0 Site 3 1 0 1... 1 Site 4 1 0 0... 1 Site 5 0 0 0... 0.................. Site s 1 1 0... 0 61

True or false absence? 1010 0000 0000 0000 0000 1101 0000 1000 Reality Field observations 62

True or false absence? 1010 1101 0000? 0000 0000?? 0000 0000?? 1000 Reality Field observations 63

Probabilistic description (likelihood) ψ = occupancy: probability a site is occupied 1 ψ) = probability a site is empty p = detectability: probability species is detected at a site in a survey, given presence 1 p) = probability of not detecting the species, given presence Model: based on closure assumption and independence of replicate surveys 64

Probabilistic description (likelihood) ψ = occupancy, Pr(site occupied) p = detectability, Pr(species detected present) e.g. h i =1001 the species is present and was detected in two of the surveys (and missed in the other two) Pr h i = 1001 ψ, p = ψp 1 p 1 p p 65

Probabilistic description (likelihood) ψ = occupancy, Pr(site occupied) p = detectability, Pr(species detected present) e.g. h i =0000 species is present and was not detected in any survey OR species is not present at the site Pr h i = 0000 ψ, p = ψ 1 p 1 p 1 p 1 p + (1 ψ) 66

Probabilistic description (likelihood) Likelihood is the product of the probabilistic statements for all sites L ψ, p h = Pr h ψ, p = Pr i ψ, p Maximized to obtain maximum-likelihood estimates (MLEs) S i=1 67

Probabilistic description (likelihood) E.g. system with ψ = 0.5, p = 0.3 Sampling s=200, k=3 Data: detected at 79 sites 147 detections Naive-ψ = 0.39 Estimates (SE): ψ-hat = 0.46 (0.044) p-hat = 0.32 (0.028) 68

Survey-specific detection probability Modelled using a parameter for each survey occasion e.g. h i =1001 the species is present and was detected in the 1 st and 4 th survey (and was missed in the 2 nd and 3 rd ) Pr h i = 1001 ψ, p 1, p 2, p 3, p 4 = ψp 1 1 p 2 1 p 3 p 4 69

Missing observations Some survey visits may be missed e.g. due to weather conditions or other logistical difficulties The model can readily cope with missing data e.g. h i =10 0 species is present and was detected in the 1 st survey and not detected in the 2 nd and 4 th surveys (we cannot say anything about the 3 rd survey, since it did not take place in this site) Pr h i = 10 0 ψ, p = ψp 1 1 p 2 1 p 4 70

Introducing covariates: site-specific Occupancy and detection probability can be a function of site characteristics e.g. Habitat type, patch size, human disturbance,... E.g. logit link function: logit ψ i = α 0 + α 1 A i + α 2 B i + logit p ij = β 0 + β 1 Q i + β 2 R i + Extension of logistic regression to account for imperfect detection (species distribution model) 71

Introducing covariates: survey-specific Detection probability can also be a function of survey-specific characteristics e.g. Observer, weather... logit p ij = β 0 + β 1 Q i + β 2 R i + β 3 S ij + β 4 T ij + Remember: in this model we assume no changes in occupancy during the survey season occupancy cannot be a function of survey-specific covariates 72

Introducing covariates Covariates can be: Continuous, e.g. elevation (m) Categorical, e.g. habitat type Standardizing continuous covariates into a meaningful scale can be useful and may avoid numerical problems e.g. z-transform: x x SD x 73

Introducing covariates Categorical represented with dummy variables (binary) If m categories need m-1 dummy variables (e.g. 4 habitat types: 3 dummy variables indicating habitats 1, 2 & 3; habitat 4 as reference) Can also use a variable per category Habitat A B C 1 1 0 0 2 0 1 0 3 0 0 1 4 0 0 0 Habitat A B C D 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4 0 0 0 1 logit ψ i = α 0 + α 1 A i + α 2 B i + α 3 C i logit ψ i = α 1 A i + α 2 B i + α 3 C i + α 4 D i Habitat 4 is taken as the reference here 74

apparent logit (occupancy) Introducing covariates Remember, not accounting for imperfect detection may obscure relationships of occupancy with covariates: 3 2 1 0-2 -1.5-1 -0.5 0 0.5 1 1.5 2-1 -2-3 -4-5 habitat true relationship p<1 p ~ +habitat p ~ -habitat (MacKenzie et al. 2006) 75

Covariates and imperfect detection: an example MacKenzie (2006) explores the effect of disregarding detectability in the analysis of resource use by pronghorns (Antilocapra americana) 256 locations in Wyoming surveyed during 2 consecutive winters (1980 81 and 1981 82) 4 covariates: Sg : sagebrush density (bushes/ha) Sl : slope DW : distance to water (km) A : aspect J. Leupold 76

Covariates and imperfect detection: an example Analysis 1: simple logistic regression Implicit assumptions: Negligible probability of false detection......or detectability constant ( results are then relative) Summed AIC model weights: Distance to water: 86% Slope: 52% Sagebrush density: 35% Aspect: 16% (MacKenzie 2006) 77

Covariates and imperfect detection: an example Analysis 2: single-season occupancy model Each winter used as a replicate k=2 Use rather than occupancy (pronghorns may not be available for detection at the site at the time of the survey) General model for detectability: p(sg+sl+dw+a) Summed AIC model weights: Slope: 55% Sagebrush density: 41% Distance to water: 29% Aspect: 6% (MacKenzie 2006) 78

Covariates and imperfect detection: an example Analysis 2b: single-season occupancy model exploration of detectability Model selection on detectability while keeping general model for occupancy: ψ(sg+sl+dw+a) Summed AIC model weights (p): Distance to water: 86% Aspect: 37% Slope: 29% Sagebrush density: 27% Model with constant detectability p(.): low support (ΔAIC =2.8) (MacKenzie 2006) 79

Conditional occupancy Probability that a site is occupied, given there were no detections Pr occupied & nondetected Pr(occupied nondetected)= Pr nondetected = k ψ i (1 p ij ) k j =1 j =1 ψ i (1 p ij ) + 1 ψ i E.g. ψ=0.3, p=0.7 and k=3: 0.3 1 0.7 3 0.3 1 0.7 3 + 1 0.3 = 0.011 80

Goodness of fit MacKenzie & Bailey (2004) propose a GOF test based on the observed vs expected number of sites with each of the possible detection histories Observed (O h ): from the data set Expected (E h ): based on the parameter estimates obtained in the analysis Test statistic (Pearson s chi-square: Χ 2 ) Χ 2 = O E 2 E Parametric bootstrapping (i.e. simulating histories) to determine whether the observed test statistic is unusually large 81

Goodness of fit e.g. simple case (no covariates, no missing data) s=40, k=3, ψ-hat=0.6, p-hat=0.3 E 000 = s ψ 1 p 3 + 1 ψ History E h O h (O h -E h ) 2 /E h 000 24.232 25 0.0243 E 001 = s ψ 1 p 2 p Distribution bootstrap χ 2 s 001 3.528 2 0.6618 010 3.528 3 0.0790 011 1.512 2 0.1575 100 3.528 6 1.7321 101 1.512 1 0.1734 110 1.512 0 1.5120 111 0.648 1 0.1912 3 6 9 12 15 17 4.531 χ 2 4.5310 82

Goodness of fit Null hypothesis: there is no lack of fit p-val < 0.05 evidence of lack of fit p-val > 0.05 no evidence of lack of fit ( evidence of fit!) Overdispersion parameter c-hat 2 2 c = Χ obs /Χ B In general recommended to assess the global model first and get the c-hat. 83

Goodness of fit Test performance: Power to detect lack-of-fit caused by an incorrect structure for detection probability Failure to detect poor model fit caused by occupancy probabilities In general low power for the sample sizes expected in many applications! 84

Key model assumptions 1. System closed to changes in occupancy 2. Independent detections 3. No false positives 4. No unmodelled heterogeneity Pr 1001 ψ, p = ψ i p i1 1 p i2 1 p i3 p i4 85

Assumptions: closure If changes at random: occupancy estimator remains unbiased ψ interpreted as use p is smaller as it involves two components Pr species detected at site i during survey j = Pr(species uses site i) = 1 if closure Pr(species present at site i during survey j uses site i) Pr(species detected during survey j present at site i during survey j) 86

Assumptions: closure If changes at random: occupancy estimator remains unbiased ψ interpreted as use p is smaller as it involves two components Pr species detected at site i during survey j = Pr(species uses site i) ψ Pr(species present at site i during survey j uses site i) = 1 if closure p Pr(species detected during survey j present at site i during survey j) 87

Assumptions: closure If emigration/immigration ψ relates to the probability species present at the site at the start/end of the season Need to allow p to change along the season Other non-random changes can cause bias (harder to interpret what parameter estimates mean) An issue to be considered during study design is the season defined appropriately? is the time between surveys suitable? 88

Assumptions: independence If outcome of one survey depends on the outcome of other survey lack of independence Can be induced by different mechanisms: e.g. Species easier to detect at a site where it has already been detected ( trap response ) e.g. Surveys carried out close in time so that the species is more likely to be detected if it was detected in the previous survey To tackle it: good design, modelling 89

Assumptions: independence Survey-specific covariate to account for trap response indicates surveys that happen after 1 st detection at the site Site History 1 2 3 4 1 1001 0 1 1 1 2 0000 0 0 0 0 3 0010 0 0 0 1 4 0101 0 0 1 1...... s 0011 0 0 0 1 90

Assumptions: independence Model to account for lack of closure and dependence between consecutive replicates (Hines et al. 2010) Motivating example: sign surveys along transects (replicate = transect segment) Two new parameters into the model: θ = probability that the species is present at a replicate visit given it was not present in the previous replicate θ = probability that the species is present at a replicate visit given it was present in the previous replicate 92

Assumptions: independence Hidden Markov model Detection process at occupied sites is 1- θ θ 1-θ Present at a replicate Absent at a replicate Detected at a replicate p θ If θ = θ independence 93

Assumptions: independence e.g. Pr( i = 0101) = π 1 1 p θ pθ 1 p θ p + π 1 1 p θ p 1 θ θp + 1 π 1 θpθ 1 p θ p + 1 π 1 θp 1 θ θp ψ Hidden states: PPPP PPAP APPP APAP π 1 = probability of starting in present state (function of the other parameters) 1- θ θ 1-θ P A 1 p θ 94

Assumptions: no false positives If no false positives unambiguous state that allows the estimation of other parameters (like false negatives) If false positives, occupancy could be severely overestimated In practice, usually less of an issue that false absences Can be accounted for in the modelling: If no data on known false detections, could model it with finite mixture (Royle & Link, 2006) but this approach has problems If auxiliary data on false detection rate (e.g. expert prior information or genetic tests), can model the misidentification process with a separate likelihood component (McClintock et al. 2010) Explicit model of false positives if data from multiple detection methods, when one method has no uncertainty (Miller et al. 2011) 95

Assumptions: no unmodelled heterogeneity Model assumes that ψ and p are constant or a function of covariates If heterogeneity remains in occupancy probability: Parameter values still valid as average values across the sites surveyed If heterogeneity remains in detection probability: May induces negative bias in occupancy estimator (underestimation) 96

Assumptions: incorporating heterogeneity in p Models that incorporate heterogeneity in detectability Finite mixtures Continuous mixtures (random effects) Abundance-induced heterogeneity (Royle-Nichols) See Royle (2006) in Biometrics 97

Assumption: incorporating heterogeneity in p Finite mixtures Assume that each site belongs to one of G groups, each with a different detection probability group 1 p 1 group 2 p 2... group G p G Group membership is not known e.g. 2 groups with π 1 = probability of belonging to group 1 Pr( i = 101) = ψ π 1 p 1 1 p 1 p 1 + 1 π 1 p 2 1 p 2 p 2 98

Assumption: incorporating heterogeneity in p Continuous mixtures Assume that p i is a random value drawn from a continuous distribution (e.g. beta distribution, logit-normal...) Estimate the parameters of the distribution In some cases a closed expression exists (e.g. beta-binomial) In general, easier to implement in the Bayesian framework 99

Assumption: incorporating heterogeneity in p Abundance-induced heterogeneity (Royle-Nichols model) Differences in site abundance can induce heterogeneity in detection probability Royle & Nichols (2003) propose a way to model this Link heterogeneity in detection probability and heterogeneity in abundance by where p ij = species detection probability, r j p ij = 1 1 r j N i = individuals detection probability N i = number of individuals in the site 101

Assumption: incorporating heterogeneity in p Abundance-induced heterogeneity (Royle-Nichols model) The number of individuals at the site N i is unknown Modelled as a random variable with a given distribution Occupancy can be derived as Pr(N i >0 =1 Pr N i =0) e.g. Poisson distribution Pr N = x = λx e λ x! ψ = 1 e λ 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 lambda=2 lambda=6 0 1 2 3 4 5 6 7 8 9 10 102

Assumption: incorporating heterogeneity in p Abundance-induced heterogeneity (Royle-Nichols model) The number of individuals at the site N i is unknown Modelled as a random variable with a given distribution Occupancy can be derived as Pr(N i >0 =1 Pr N i =0) Discrete mixture over site abundance Pr i = 101 λ, r = p i1 1 p i2 p i3 Pr(N i ) N i = p i1 1 p i2 p i3 e λ λ N i N i N i! with p ij = 1 1 r j Ni p ij = 1 1 r j N i 103

Assumption: incorporating heterogeneity in p Abundance-induced heterogeneity (Royle-Nichols model) Model assumptions each individual detected independently of others in the site all individuals equally detectable number of individuals in each site does not change during the survey season ( closure ) abundance main source of heterogeneity in p 104

Assumption: incorporating heterogeneity in p Abundance-induced heterogeneity (Royle-Nichols model) Need to take care with the interpretation of abundance and the model assumptions Heterogeneity in p from other sources can be picked up as abundance Perhaps more useful as a means to account for heterogeneity that as a method to estimate abundance 105

Assumption: incorporating heterogeneity in p Abundance-induced heterogeneity (Royle-Nichols model) Example (Royle & Nichols 2003) North American Breeding Bird Survey route S=50 stops K=11 (over 30 days) Poisson assumption on abundance at each stop (no covariates constant λ) Possible changes in breeding activity (individual) detectability may vary over time: logit r = β 0 + β 1 day + β 2 day 2 106

Assumption: incorporating heterogeneity in p Abundance-induced heterogeneity (Royle-Nichols model) Example (Royle & Nichols 2003) Hermit thrush logit r = β 0 + β 1 day + β 2 day 2 Model M1: only intercept (constant detectability) Model M2: β 0 and β 1 (linear change in detectability) Model M3: all three parameters (quadratic change in detectability) Model M0: MacKenzie et al. (2002) (no abundance-induced heterogeneity) Wood thrush D. Gordon Robertson Steve Maslowski 107

Finite population So far we have assumed s<<s ( infinite population ) We estimate the probability of occupancy, an underlying characteristic of the population The proportion of occupied sites is a realisation of this process very large population: S our survey area: s << S 108

Finite population The distinction between these two concepts is important when dealing with finite population (s S) we survey all available habitat (our survey area s covers practically all S) 109

Finite population The distinction between these two concepts is important when dealing with finite population (s S) SE s will be too large if not accounted for By default we are including the uncertainty derived from sampling from a infinite population ( binomial experiment ) The proportion of occupied sites can be calculated as s S d + ψ i c i=s d +1 S + ψ i i=s+1 S SE can be derived using the delta method (MacKenzie et al. 2006) 110