Occupancy models. Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology

Size: px

Start display at page:

Download "Occupancy models. Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology"

Kristopher Henderson
5 years ago
Views:

1 Occupancy models Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology Advances in Species distribution modelling in ecological studies and conservation Pavia and Gran Paradiso, Italy Sept. 2011

2 Outline Session 1: Introduction to occupancy modelling S1.1 Introduction S1.2 Statistical background S1.3 Single-season occupancy model Session 2: Occupancy modelling in practice S2.1 Practical: single -season S2.2 Study design Session 3: Occupancy modelling developments S3.1 Multiple-season occupancy model S3.2 Practical: multi-season S3.3 Further models 2

3 SESSION 2 Occupancy modelling in practice 2.1 Practical: single-season

4 Software packages to fit occupancy models Package unmarked 4

5 Practical We will use the sample data sets that come with PRESENCE To find them go to the PRESENCE installation folder and look under the folder sample_data e.g. C:\program files\presence\sample_data 5

6 Sample data set 1: Blue-ridge two-lined salamander Blue-ridge two-lined salamander Eurycea wilderae Habitat: temperate forests, rivers, freshwater springs Endemic to the US Found in the southern Appalachians 6

7 Sample data set 1: Blue-ridge two-lined salamander Data set from a 2001 survey (Blue_Ridge_pg99.csv) Sampling protocol s = 39 sites surveyed, k =5 replicates per site Each site: a 50m-transect on natural cover and coverboard stations Each site surveyed once every two weeks from April to mid-june, when the salamanders are believed to be most active 7

8 Design matrix Used to relate probabilities ψ and p to the regression coefficients ( β parameters) 8

9 Design matrix Used to relate probabilities ψ and p to the regression coefficients ( β parameters) Columns: β parameters Row: real parameter (ψ) Grid cells: values that define the regression equation logit ψ = a 1 1 = a 1 9

10 Design matrix Used to relate probabilities ψ and p to the regression coefficients ( β parameters) Columns: β parameters To incorporate covariates need to add more columns Row: real parameter (ψ) Grid cells: values that define the regression equation logit ψ = a 1 1 = a 1 10

11 Design matrix Used to relate probabilities ψ and p to the regression coefficients ( β parameters) 11

12 Design matrix Used to relate probabilities ψ and p to the regression coefficients ( β parameters) Columns: β parameters logit p 1 = b 1 logit p 2 = b 1 Row: real parameters ( 1...p 5 ) Grid cells: values that define the regression equation logit p 3 = b 1 logit p 4 = b 1 logit p 5 = b 1 12

Design matrix Used to relate probabilities ψ and p to the regression coefficients ( β parameters) Columns: β parameters To incorporate covariates need

13 Design matrix Used to relate probabilities ψ and p to the regression coefficients ( β parameters) Columns: β parameters To incorporate covariates need to add more columns Row: real parameters ( 1...p 5 ) Grid cells: values that define the regression equation Add more columns also for survey-specific p 13

14 Design matrix Used to relate probabilities ψ and p to the regression coefficients ( β parameters) logit p 1 = b 1 logit p 2 = b 2 logit p 3 = b 3 logit p 4 = b 4 logit p 5 = b 5 14

15 Practical 1 - results Part 1a: regression equations logit p 1 = b 1 logit p 2 = b 1 logit p 3 = b 2 logit p 4 = b 2 logit p 5 = b 2 15

16 Practical 1 - results Part 1a: regression equations logit p 1 = b 1 logit p 2 = b 1 1 b 1 = p 1 2 = 1 + exp b 1 1 = 1 + exp = logit p 3 = b 2 logit p 4 = b 2 logit p 5 = b 2 1 b 2 = p 3 5 = 1 + exp b 2 1 = 1 + exp =

17 Practical 1 - results Part 1a: model support 2.94 AIC units better than the constant model ψ(.) p(.) Likelihood-ratio test: = > 3.84 (χ 2 value at 0.05 for 3-2=1degrees of freedom) p-val < 0.05 i.e. there is support to reject the null hypothesis: ψ(.) p(.) 17

18 Practical 1 - results Part 1b: regression equations logit p 1 = b 1 logit p 2 = b 1 + b 2 logit p 3 = b 1 + b 3 logit p 4 = b 1 + b 4 logit p 5 = b 1 + b 5 Note: survey 1 is the reference here 18

19 Practical 1 - results Part 1b: regression equations logit p 1 = b 1 logit p 2 = b 1 + b 2 logit p 3 = b 1 + b 3 logit p 4 = b 1 + b 4 logit p 5 = b 1 + b 5 b 1 = p 1 = b 2 = p 2 = b 3 = p 3 = b 4 = p 4 = b 5 = p 5 = Note: survey 1 is the reference here 19

20 Practical 1 - results Part 1c: regression equations logit p 1 = b 1 logit p 2 = b 2 logit p 3 = b 3 logit p 4 = b 4 logit p 5 = b 5 b 1 = p 1 = b 2 = p 2 = b 3 = p 3 = b 4 = p 4 = b 5 = p 5 =

21 Practical 1 - results Part 1c: model support 4.89 AIC units worse than the best model in the set ψ(.) p(1-2,3-5) Likelihood-ratio test: = < 7.82 (χ 2 value at 0.05 for 6-3=3degrees of freedom) p-val > 0.05 i.e. no support to reject the null hypothesis: ψ(.) p(1-2,3-5) 21

22 Sample data set 2: Mahoenui giant weta Mahoenui giant weta Deinacrida mahoenui Endemic to the King Country in New Zealand s northern island Only 2 surviving populations (main one in a 240-ha reserve) Use gorse plants as protection from predators and food source Goats and cattle used to browsed the gorse ( + foliage) 22

23 Sample data set 2: Mahoenui giant weta Data set from a 2004 survey (Weta_pg116.xls) Sampling protocol Each site a 3m radius circular plot s = 72 sites surveyed, k = 3-5 replicates per site within 5-day period 3 observers (each surveyor visited each site at least once) Interest in the effect of browsing on ψ 23

24 Practical 2 - results Part 2a: logit ψ u = a 1, logit ψ b = a 2 ψ u = 0.481, ψ b =

25 Practical 2 - results Part 2a: logit ψ u = a 1, logit ψ b = a 2 ψ u = 0.481, ψ b = logit ψ u = a 1, logit ψ b = a 1 + a 2 ψ u = 0.481, ψ b =

26 Practical 2 - results Part 2b: set of models 26

27 Practical 2 - results Part 2c: probabilities ψ for a browsed site: (0.121) p for a site surveyed in day 3 by observer 2: (0.081) 27

28 Practical 2 - results Part 2d: Combined model weights for day in p = 0.91 Combined model weights for observer in p = 0.73 Combined model weights for browse in ψ =

29 Practical 2 - results Part 2e: model averaging Model averaged ψ for browsed sites: (0.127) Model averaged ψ for unbrowsed sites: (0.142) 29

30 Worked examples in PRESENCE If you want to play more with these data sets: you can find in PRESENCE a file with exercises and detailed explanations Go to Help PRESENCE worked examples and exercises the occupancy book also discusses the analysis of these data sets (MacKenzie et al, 2006, p99 and p116) 30

31 SESSION 2 Occupancy modelling in practice 2.2 Study design

perform a post-mortem examination: he may be able to say what the experiment died of (Ronald

32 Think first! To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of (Ronald Fisher, ca. 1938) "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data..." (John Tukey, 1986) 32

33 Think first! Lot s of things to think about: why-what-how (Yoccoz et al 2001) Why to carry out this study? (i.e. articulate objectives) What would be a suitable state variable? How to do it? Here we concentrate on the how Note it is related to the what and why A note on what : do not artificially force yourself to use a particular framework/model e.g. if raw data consists of counts, collapsing them to detection/nondetection will inevitably lose information! Whole literature of models for count data (e.g. N-mixture models) 33

34 Design issues in occupancy modelling How to define site? How to choose sites to sample? What is our sampling season? How to obtain replication to estimate detectability? How far apart should my replicate visits be? How to avoid heterogeneity? How to allocate my survey effort into sites and replicates? How much total effort will be needed? What can I expect to obtain with my available resources? 34

35 How to define site? Sometimes there is a natural definition ( discrete units) E.g. ponds, habitat patch In others, it is more arbitrary ( continuous landscape) E.g. plots within a forest 35

36 How to define site? Remember: occupancy is related to my definition of site! ψ 1 =2/4=0.5 ψ 2 =2/16=0.125 An occupancy estimate is meaningless if we do not know how the sampling site was defined. 36

37 How to define site? There is no universal truth To be assessed on a case-by-case basis Things to consider: At what scale we want to measure occupancy? At which scale an observed 0 or 1 is meaningful? Is the species territorial? What is the size of its home range? 37

38 How to choose sites to sample? Usually aim to do inference beyond the specific sites sampled select sites so that results can be generalized Need a probabilistic sampling scheme Random sampling Stratified random sampling. Selecting sites based on knowledge of their occupancy status is in general not a good idea Unless those sites actually represent the population of interest Estimates of occupancy for the entire population may be biased 38

39 How to choose sites to sample? Best approach depends on project objectives e.g. two possible objectives for an occupancy study: 1. Compare occupancy for two specific habitat types in the area 2. Obtain an overall estimate of occupancy for the entire area For objective 1 an efficient design would involve to identify areas within the reserve corresponding to the two habitat types and then randomly select sites A However, this design would not be appropriate for objective 2 because the sample is not representative from the whole reserve B C 39

40 What is our sampling season? The window of time in which the system is sampled Sometimes there is a natural definition e.g. breeding season, wet season... But it could be something different e.g. If survey once/day during 7 days, our sampling season is 1 week Need to consider how the species moves, as this influences the biological quantity that the sampling is capturing Remember the closure assumption. Is there emigration/immigration? Are we looking at ψ as actual occupancy or as use? If occupancy as a sort of surrogate for population size, better shorter season (~snapshot) If interest in use, better longer season 40

41 How to obtain replication to estimate detectability? Repeated visits to each site at different points in time Multiple surveys within one visit One observer carried out various independent surveys Simultaneous independent observers Simultaneous independent detection methods Spatial replication within the site 41

42 How to obtain replication to estimate detectability? Which is more appropriate depends on the biology of the species and the factors that affect its detection If p constant, repeated surveys within a visit may be more efficient But if p varies e.g. daily, heterogeneity may be induced Multiple visits allow each site to be surveyed under a range of conditions 42

43 How far apart should my replicate visits be? So that our estimates have a useful interpretation E.g. far enough to ensure that the detections are independent E.g. close enough to ensure we do not have closure problems Once again one needs to consider how the species moves If ψ as usage, ensure that the species has had the chance to randomly enter/leave the site from one visit to the next If ψ as occupancy ( snapshot), closer so species is either present or absent at the site during the whole period. 43

44 How to avoid heterogeneity? Choose site size so that no great differences in abundance Collect relevant information and include as covariates E.g. habitat information, meteorological data... Avoid sampling always the same sites under same conditions if possible, rotate! 44

45 How to avoid heterogeneity? e.g. monitoring of Alaotran gentle lemur Observer Time of day Meteorological conditions Day 1 T1 village T2 T3 (Guillera-Arroita et al. 2010a) 45

46 How to avoid heterogeneity? e.g. monitoring of Alaotran gentle lemur Observer Time of day Meteorological conditions Day 2 T1 village T2 T3 (Guillera-Arroita et al. 2010a) 46

47 How to allocate my survey effort? How to allocate our effort into sites and replicate surveys? Is it better to visit less sites and carry out more surveys in each one? Is it better to cover more sites and do less visits? Site Site Site sites x 3 surveys = sites x 6 surveys = Site Site Trade-off: s var(ψ) BUT if k more likely to miss the species at occupied sites and p not so well estimated var(ψ) 47

48 How to allocate my survey effort? Can obtain guidelines looking at the estimator properties for the constant occupancy model (no covariates) Based on large-sample assumption (asymptotic) The optimal allocation depends on actual ψ and p Need some estimates for those! Sensitivity analysis can be a good idea Also need to have: an indication of the max number of surveys that can be conducted level of acceptable precision (part of the objective) 48

49 How to allocate my survey effort? We want a good estimate of ψ... so let s look at the estimator variance for different designs (s, k) Assume a standard survey design (i.e. s sites visited k times) For this model there is an explicit expression: var ψ = ψ s 1 ψ + 1 p p kp 1 p k 1 p = 1 1 p k from binomial experiment extra variance due to imperfect detection Pr(detection in at least one visit) 49

50 asymptotic variance of ψ How to allocate my survey effort? Fixed total effort E = s k E =2000 psi=0.4,p=0.3 psi=0.4,p=0.6 psi=0.7,p= # replicates (k) 50

51 asymptotic variance of ψ asymptotic variance of ψ How to allocate my survey effort? Note the optimal replication (k) depends only on ψ and p, not on the total effort E E =2000 psi=0.4,p=0.3 psi=0.4,p= E =1000 psi=0.4,p=0.3 psi=0.4,p= psi=0.7,p= psi=0.7,p= # replicates # replicates 51

52 How to allocate my survey effort? Optimal k ψ p Rare species: more sites, less intensively Common species: less sites, more intensively (MacKenzie & Royle 2005) 52

53 How to allocate my survey effort? We have assumed constant cost per survey (E= k s) but other scenarios are possible e.g. It could be that repeat surveys at a site are cheaper than the 1 st MacKenzie & Royle (2005) explore different cost scenarios Results on optimal replication reasonably robust to the effect of cost We used the occupancy estimator precision as design criterion Guillera-Arroita et al (2010b) explore other optimality criteria, incorporating the precision of the detection probability estimator Broadly the same patterns arise 53

54 How to allocate my survey effort? Once k chosen, what about s? That depends on the precision we want to achieve! Study constraints: Maximum survey effort available Minimum estimator precision Two approaches: A) Best estimator with the given available effort B) Good enough estimator with the minimum effort possible 54

55 How to allocate my survey effort? Approach A: Best estimator with the given available effort Use all the effort s = E k Check: do we achieve the minimum precision with this design? var ψ = ψ s 1 ψ + 1 p p kp 1 p k 1 p = 1 1 p k 55

56 How to allocate my survey effort? Approach B: good enough estimator with minimum effort Derive s given ψ, p and k from the variance expression s = ψ var ψ 1 ψ + 1 p p kp 1 p k 1 p = 1 1 p k Check: is the design (s, k) in line with the maximum effort available? 56

57 How to allocate my survey effort? e.g. we expect ψ 0.6, p 0.3 and want to achieve SE of 0.05 Optimal k = 6 replicates per site ( p* = 0.88) 57

58 How to allocate my survey effort? e.g. we expect ψ 0.6, p 0.3 and want to achieve SE of 0.05 Optimal k = 6 replicates per site ( p* = 0.88) s = ψ var ψ E= k s = = ψ + 1 p p kp 1 p k 1 = = = =

59 How to allocate my survey effort? e.g. we expect ψ 0.6, p 0.3 and want to achieve SE of 0.05 Optimal k = 6 replicates per site ( p* = 0.88) s = ψ var ψ E= k s = = ψ + 1 p p kp 1 p k 1 = = = = 146 p=1 s = =

60 How to allocate my survey effort? e.g. we expect ψ 0.6, p 0.3 and want to achieve SE of 0.05 What about if k = 3 replicates per site? ( p* = 0.66) 60

61 How to allocate my survey effort? e.g. we expect ψ 0.6, p 0.3 and want to achieve SE of 0.05 What about if k = 3 replicates per site? ( p* = 0.66) s = ψ var ψ E= k s = = ψ + 1 p p kp 1 p k 1 = = = =

62 Non-standard designs Other designs explored by MacKenzie & Royle (2005) Double sampling design: repeated surveys at a subset of sites, the rest only surveyed once In general not more efficient Removal design: stop surveying at a site when 1 st detection Repetition helps i) establishing occupancy status and ii) estimating p Can be slightly more efficient (especially high p, high ψ) But provides less flexibility for modelling and may be less robust to heterogeneity in p 62

63 Further considerations Simulations as a tool for design expressions and tables shown are based on approximations which may break if the sample size is small they provide a useful guidance but it is recommended to verify the properties of the chosen design via simulations Designing requires some idea about the parameter values Pilot studies can provide helpful information An optimal design is not necessarily a robust design e.g. what if there is heterogeneity in p? in that case a larger number of replicates may be better 63

64 An R-script to evaluate the standard ψ(.)p(.) Call: source("occdesign1sp.r") #only needed once myres<-evaldesign(psi=0.5,p=0.3,s=30,k=3,nits=10000, doprint=1,doplot=1) Information on estimator properties (bias, variance and MSE) Intuitive plot showing the distribution of the estimator (Guillera-Arroita et al. 2010b) 64

65 psi ^ An R-script to evaluate the standard ψ(.)p(.) ψ=0.8, p=0.7, s=100, k=4, simulations ψ-hat SIMULATION Var = MSE = SE = Boundary = 0% ASYMPTOTIC Var = p^ 65

66 psi ^ An R-script to evaluate the standard ψ(.)p(.) ψ=0.5, p=0.7, s=100, k=4, simulations ψ-hat SIMULATION Var = MSE = SE = Boundary = 0% ASYMPTOTIC Var = p^ 66

67 psi ^ An R-script to evaluate the standard ψ(.)p(.) ψ=0.5, p=0.3, s=100, k=4, simulations ψ-hat SIMULATION Var = MSE = SE = Boundary = 0% ASYMPTOTIC Var = p^ 67

68 psi ^ An R-script to evaluate the standard ψ(.)p(.) ψ=0.5, p=0.3, s=30, k=4, simulations ψ-hat SIMULATION Var = MSE = SE = Boundary = 2% ASYMPTOTIC Var = p^ 68

69 psi ^ An R-script to evaluate the standard ψ(.)p(.) ψ=0.2, p=0.3, s=30, k=4, simulations ψ-hat SIMULATION Var = MSE = SE = Boundary = 11% ASYMPTOTIC Var = p^ 69

70 psi ^ An R-script to evaluate the standard ψ(.)p(.) ψ=0.2, p=0.3, s=30, k=3, simulations ψ-hat SIMULATION Var = MSE = SE = Boundary = 18% ASYMPTOTIC Var = p^ 70

71 psi ^ psi ^ An R-script to evaluate the standard ψ(.)p(.) ψ=0.5, p=0.7, s=100, k=4 ψ=0.5, p=0.7, s=200, k= ψ-hat Var = MSE = SE = p^ ψ-hat Var = MSE = SE = p^ 71

72 GENPRES: a tool for occupancy survey design 72

Survey design exercise Using simulations for survey design for the rolling giraffe (Neogiraffa rotatoria) Survey: visit to a 1 ha plot Six plot visits can be

73 Survey design exercise Using simulations for survey design for the rolling giraffe (Neogiraffa rotatoria) Survey: visit to a 1 ha plot Six plot visits can be carried out per day Total survey effort allocated to the study: 60 days Based on findings of a previous similar study, we assume: ψ ~ p ~

74 Survey design exercise (solutions) Variance of ψ based on simulations: ψ=0.3 ψ=0.4 k=2 s= p=0.4 k=3 s= k=4 s= k=2 s= p=0.5 k=3 s= k=4 s= k=2 s= p=0.6 k=3 s= k=4 s= Total effort E=6 60=360 (10000 simulations) 74

0039 This is higher than the target (SE ψ =0.05 var ψ =0.

75 Survey design exercise (solutions) k=2 not so good, k=3 or 4 a better compromise If choose k=4, the most restrictive case (higher var.) is: ψ=0.4, p=0.4 var ψ = This is higher than the target (SE ψ =0.05 var ψ =0.0025) If we want to achieve that target with this setup (ψ, p, k) we ll need s=135 sites (using formula) Total survey effort of s k=135 4= days 75

Occupancy models. Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology

Occupancy models. Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology Occupancy models Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology Advances in Species distribution modelling in ecological studies and conservation Pavia and Gran