Hierarchical models
Hierarchical models Represent processes and observations that span multiple levels (aka multi level models) R 1 R 2 R 3 N 1 N 2 N 3 N 4 N 5 N 6 N 7 N 8 N 9 N i = true abundance on a plot Consider factors that govern abundance at the plot scale R j = true abundance in a region Consider factors that govern abundance at the regional scale Consider processes important at each scale or at many scales
Hierarchical models Add additional levels N 1 R 1 N 2 N 3 ρ λ state processes o 1 o 2 o n o 1 o 2 o n o 1 o 2 o n p observation process Define parameters for each level Hierarchical, because parameters at one level govern parameters at lower level
Two level hierarchical model y ij ~ N(θ i, σ i2 ) level 1, i = sites, j = surveys Key idea: Consider an attribute of a sample unit, θ i, as having been drawn from an underlying distribution. We don t estimate θ i s for each sample unit, but instead we estimate parameters of the distribution from which θs were drawn θ i ~ N(θ, σ 2 ) level 2 Parameters of interest are θ and σ 2, which in this case are the mean and variance of the distribution of θ i s; we estimate these from data
Two level hierarchical model Key idea: Estimate parameters of the upper level distribution assumed to govern processes that give rise to data observed at lower levels Parameters from all levels are estimated simultaneously Important because uncertainty at one level affects inferences at other levels Most alternative modeling frameworks do not allow us to model state and observation processes simultaneously Modeling density with Program DISTANCE? Modeling abundance in Program MARK?
Hierarchical models Two common types: 1) Latent variable models 2) Mixed effects models N 3 λ o 1 o 2 o n p
Hierarchical models in ecology Ecological Process Model for describing state variables (latent or unobserved): abundance, occupancy, survival Parameters: λ, ψ, φ Site / individual covariates Observation Process Model for describing the detection process Parameter: p Site / individual covariates Survey covariates Realized Data: y 1, y 2, y 3,, y n
Imperfect observations
Imperfect observations Wish to estimate abundance of a species on a plot, N i Use a survey method that yields counts on plots, C i, e.g, point counts, line transects, removals, etc. Probability that we observe an individual that is present, β, is often <1 No. individuals counted is related to true abundance: C = β N, where β ranges from 0 1 Translate C into an estimate of abundance: Example: Count 5 quail on a plot; if β = 0.25, then: =5/0.25=20
Occupancy, single season
Presence absence data Classifying a species as present or absent across space is the basis for studying biogeography (study of distributions) and many types of habitat analyses Changes in present absent status over time is the basis for patch dynamics and metapopulation dynamics Problem: when detection process is imperfect, we cannot distinguish non detection from absence Estimates of the area occupied will be biased
What is occupancy? Occupancy proportion of area, patches, or other sample unit occupied by a species Probability of occupancy probability (ψ) that any given unit within a sampling frame is occupied Single season goal: estimate ψ when p < 1 during a single season Multi season goal: dynamics = colonization and extinction
Changes in geographic range Has purple loosestrife spread across the Lake Erie basin? If so, how fast? Are eradication methods working?
Habitat relationships and resource selection Identify habitat features associated with selection Classify presence absence of species on sample units, then assess with logistic regression Does not account for false absences = imperfect detection
Occupancy as a parameter Trade offs: Not as sensitive as abundance to changes over time 10 5 1 ψ = 1 Year 1 Year 2 Year 3 Value of ψ is a function of size of sample units (sites) Ψ = 4/4 = 1.0 Ψ = 9/25 = 0.36
Basic sampling scheme Select a sample of s units ( sites ) from a larger set of S units (population) Survey each site K times and record whether species of interest is detected or not = temporal replication Resurvey all sites in sample, even those where species detected previously forms the basis for estimating detection probability Sampling can be direct (visual) or indirect (tracks)
Occupancy: hierarchical structure Season 1 Sites 1 2 S Surveys 1 2 K 1 1 2 K 2 Closure
Encounter histories 0, 1, 0, 1, 1 1, 0, 1, 1, 1 0, 0, 0, 0, 0 Detection No Detection
Encounter histories Survey results: 1 = detected 0 = not detected Survey history for each site: When surveys complete, we have two types of sites: Detection Occupied Site ID 1 2 3 4 A 0 1 1 0 B 1 1 0 0 C 0 0 0 0 D 0 1 0 1 E 1 1 0 0 F 0 0 0 0 G 1 1 0 1 No Detection Not occupied Occupied, but not detected
Ideas underlying estimates Site Survey 1 Survey 2 Survey 3 Survey 4 1 0 1 1 0 2 1 1 1 1 3 1 0 0 0 4 0 0 0 0 If surveys were perfect, 0 0 0 0 would indicate true absence, so we could estimate ψ as proportion of sites with 1 detection Naïve estimate of ψ = ¾ or 0.75 If surveys imperfect, estimate p from sites with 1detections p = (0.50 + 1.00 + 0.25) / 3 = 0.58
Estimate ψ and p Use a model based approach to estimate occupancy and detection parameters simultaneously Consider two stochastic process: Occupancy: a site will either be occupied with probability ψ or unoccupied with probability 1 ψ Detection: if site unoccupied, species cannot be detected; if site occupied, then at each survey there is some probability of detecting the species (p): Species detected = ψ Species not detected = 1 ψ or ψ(1 p)
Binomial distribution Discrete distribution. Represents the outcome of a number of independent Bernoulli trials = events with two possible outcomes Notation: Bin(n, p) Parameters: n = number of trials, p = prob. of success each trial p = 0.1 (blue) p = 0.5 (green) p = 0.8 (red) n = 20
Occupancy: single season Ecological Process Z i ~ Bin(1, ψ) Unobservable true occupancy (state) Binomial distribution Probability of occupancy Observation Process y ij ~ Bin(1, Z i p) Observed outcome Binomial distribution Unobservable true state of occupancy Probability of detection
Logistic regression Binary response, so represent the response (stochastic part) with binomial distribution; mean is a probability or proportion (p) Link function is the logit (log odds): logit(y) = β 0 + β 1 x 1 + β 2 x 2 + Occupancy state: logit(ψ i ), i = no. sites Observation process: logit(p ij ), j = no. visits/site Binomial distribution y ~ Bin(N, p) Observed outcome Number trials Prob(occupancy) or Prob(detection)
Assumptions Species never falsely detected when absent Detection of a species at a site independent of detecting species at other sites Sites closed to changes in occupancy state during survey period (no colonization or extinction) ψ and p constant across sites, unless heterogeneity in parameters is explained by covariates
Accounting for heterogeneity with covariates Consider additional factors to explain variation in ψ and p ψ can be modeled as a function of site level covariates covariates for ψ must remain constant during survey period; e.g., plant community, patch size p can be modeled as a function of: site level covariates; e.g., vegetation cover survey level covariates; e.g., cloud cover, air temperature, observer
Covariates Two types: Site level covariates (for ψ and p) Observation level covariates (for p) Surv.1 Surv.2 Surv.3 Surv.4 Buffel% Time.1 Time.2 Time.3 Time.4 Site 1 0 1 1 0 40 M E M E Site 2 1 1 1 1 60 E M E M Site 3 1 0 0 0 20 E M E M Site 4 0 0 0 0 10 M E M E M = morning E = evening
Adding covariates Extend models with Generalized Linear Modeling framework that allow us to model linear functions regardless of the distribution of the response Ecological and Observation Processes y ij ~ Bin(N i, p ij ) logit(p) = β 0 + β 1 X 1 + β 2 X 2 + +β n X n
Run models to estimate parameters For estimates based on maximum likelihood methods: Code directly in R Use UNMARKED package in R For estimates based on Bayesian methods: WinBUGS OpenBUGS JAGS
Fitting models in Unmarked Develop and fit a set of candidate models for the state variable (here, occupancy) and detection process robject < occu (~detect ~occupancy, UMF) time.buff < occu (~time ~buffel, goagumf) timedate.buffyear < occu (~time + date ~ buff + year, goagumf) Use model selection or frequentist methods to establish model for inference