Represent processes and observations that span multiple levels (aka multi level models) R 2

Similar documents
Occupancy models. Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology

Occupancy models. Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology

Occupancy models. Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology

High-Throughput Sequencing Course

Topic 12 Overview of Estimation

Proposed methods for analyzing microbial community dynamics. Zaid Abdo March 23, 2011

Estimating abundance of unmarked animal populations: accounting for imperfect detection and other sources of zero inflation

Introduction to Occupancy Models. Jan 8, 2016 AEC 501 Nathan J. Hostetter

Incorporating Boosted Regression Trees into Ecological Latent Variable Models

Bayesian Learning (II)

Equivalence of random-effects and conditional likelihoods for matched case-control studies

Variability within multi-component systems. Bayesian inference in probabilistic risk assessment The current state of the art

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

CHAPTER 21. Occupancy models

Approach to Field Research Data Generation and Field Logistics Part 1. Road Map 8/26/2016

Probability and Information Theory. Sargur N. Srihari

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA 216, GLM, Lecture 16. October 29, 2007

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

multilevel modeling: concepts, applications and interpretations

Biometrics Unit and Surveys. North Metro Area Office C West Broadway Forest Lake, Minnesota (651)

Bayesian Hierarchical Models

Lecture 09 - Patch Occupancy and Patch Dynamics

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Reconstruction of individual patient data for meta analysis via Bayesian approach

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

Introduction to mtm: An R Package for Marginalized Transition Models

Generalized Linear Models for Non-Normal Data

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Making rating curves - the Bayesian approach

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Chapter 5 Lecture. Metapopulation Ecology. Spring 2013

Darryl I. MacKenzie 1

AGEC 661 Note Fourteen

Two-step centered spatio-temporal auto-logistic regression model

Lecture 3.1 Basic Logistic LDA

Plausible Values for Latent Variables Using Mplus

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Logistic Regression for Distribution Modeling

Bayesian Methods for Machine Learning

Bayesian course - problem set 6 (lecture 7)

Frequentist-Bayesian Model Comparisons: A Simple Example

Chapter 4: Factor Analysis

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Probability Review - Bayes Introduction

Graduate Econometrics I: What is econometrics?

Multi-level Models: Idea

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

,..., θ(2),..., θ(n)

MS&E 226: Small Data

Metapopulations with infinitely many patches

A Discussion of the Bayesian Approach

Design and Analysis of Ecological Data Landscape of Statistical Methods: Part 2

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Introduction to Statistical Analysis

Intro to Probability. Andrei Barbu

Part 8: GLMs and Hierarchical LMs and GLMs

Goal. Background and motivation

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Statistics 572 Semester Review

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

MS&E 226: Small Data

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

Signal Modeling, Statistical Inference and Data Mining in Astrophysics

Probability Theory for Machine Learning. Chris Cremer September 2015

An introduction to biostatistics: part 1

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

Brett Skelly, Katharine Lewis, Reina Tyl, Gordon Dimmig & Christopher Rota West Virginia University

Introduction to capture-markrecapture

Spring 2012 Math 541B Exam 1

Single-level Models for Binary Responses

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

Supplementary Note on Bayesian analysis

2.3 Estimating PDFs and PDF Parameters

Bayesian Inference by Density Ratio Estimation

Ecology Regulation, Fluctuations and Metapopulations

The Naïve Bayes Classifier. Machine Learning Fall 2017

Data Analysis and Monte Carlo Methods

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

Bayesian spatial quantile regression

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Lecture 14 Bayesian Models for Spatio-Temporal Data

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Machine Learning Linear Classification. Prof. Matteo Matteucci

2. A Basic Statistical Toolbox

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Transcription:

Hierarchical models

Hierarchical models Represent processes and observations that span multiple levels (aka multi level models) R 1 R 2 R 3 N 1 N 2 N 3 N 4 N 5 N 6 N 7 N 8 N 9 N i = true abundance on a plot Consider factors that govern abundance at the plot scale R j = true abundance in a region Consider factors that govern abundance at the regional scale Consider processes important at each scale or at many scales

Hierarchical models Add additional levels N 1 R 1 N 2 N 3 ρ λ state processes o 1 o 2 o n o 1 o 2 o n o 1 o 2 o n p observation process Define parameters for each level Hierarchical, because parameters at one level govern parameters at lower level

Two level hierarchical model y ij ~ N(θ i, σ i2 ) level 1, i = sites, j = surveys Key idea: Consider an attribute of a sample unit, θ i, as having been drawn from an underlying distribution. We don t estimate θ i s for each sample unit, but instead we estimate parameters of the distribution from which θs were drawn θ i ~ N(θ, σ 2 ) level 2 Parameters of interest are θ and σ 2, which in this case are the mean and variance of the distribution of θ i s; we estimate these from data

Two level hierarchical model Key idea: Estimate parameters of the upper level distribution assumed to govern processes that give rise to data observed at lower levels Parameters from all levels are estimated simultaneously Important because uncertainty at one level affects inferences at other levels Most alternative modeling frameworks do not allow us to model state and observation processes simultaneously Modeling density with Program DISTANCE? Modeling abundance in Program MARK?

Hierarchical models Two common types: 1) Latent variable models 2) Mixed effects models N 3 λ o 1 o 2 o n p

Hierarchical models in ecology Ecological Process Model for describing state variables (latent or unobserved): abundance, occupancy, survival Parameters: λ, ψ, φ Site / individual covariates Observation Process Model for describing the detection process Parameter: p Site / individual covariates Survey covariates Realized Data: y 1, y 2, y 3,, y n

Imperfect observations

Imperfect observations Wish to estimate abundance of a species on a plot, N i Use a survey method that yields counts on plots, C i, e.g, point counts, line transects, removals, etc. Probability that we observe an individual that is present, β, is often <1 No. individuals counted is related to true abundance: C = β N, where β ranges from 0 1 Translate C into an estimate of abundance: Example: Count 5 quail on a plot; if β = 0.25, then: =5/0.25=20

Occupancy, single season

Presence absence data Classifying a species as present or absent across space is the basis for studying biogeography (study of distributions) and many types of habitat analyses Changes in present absent status over time is the basis for patch dynamics and metapopulation dynamics Problem: when detection process is imperfect, we cannot distinguish non detection from absence Estimates of the area occupied will be biased

What is occupancy? Occupancy proportion of area, patches, or other sample unit occupied by a species Probability of occupancy probability (ψ) that any given unit within a sampling frame is occupied Single season goal: estimate ψ when p < 1 during a single season Multi season goal: dynamics = colonization and extinction

Changes in geographic range Has purple loosestrife spread across the Lake Erie basin? If so, how fast? Are eradication methods working?

Habitat relationships and resource selection Identify habitat features associated with selection Classify presence absence of species on sample units, then assess with logistic regression Does not account for false absences = imperfect detection

Occupancy as a parameter Trade offs: Not as sensitive as abundance to changes over time 10 5 1 ψ = 1 Year 1 Year 2 Year 3 Value of ψ is a function of size of sample units (sites) Ψ = 4/4 = 1.0 Ψ = 9/25 = 0.36

Basic sampling scheme Select a sample of s units ( sites ) from a larger set of S units (population) Survey each site K times and record whether species of interest is detected or not = temporal replication Resurvey all sites in sample, even those where species detected previously forms the basis for estimating detection probability Sampling can be direct (visual) or indirect (tracks)

Occupancy: hierarchical structure Season 1 Sites 1 2 S Surveys 1 2 K 1 1 2 K 2 Closure

Encounter histories 0, 1, 0, 1, 1 1, 0, 1, 1, 1 0, 0, 0, 0, 0 Detection No Detection

Encounter histories Survey results: 1 = detected 0 = not detected Survey history for each site: When surveys complete, we have two types of sites: Detection Occupied Site ID 1 2 3 4 A 0 1 1 0 B 1 1 0 0 C 0 0 0 0 D 0 1 0 1 E 1 1 0 0 F 0 0 0 0 G 1 1 0 1 No Detection Not occupied Occupied, but not detected

Ideas underlying estimates Site Survey 1 Survey 2 Survey 3 Survey 4 1 0 1 1 0 2 1 1 1 1 3 1 0 0 0 4 0 0 0 0 If surveys were perfect, 0 0 0 0 would indicate true absence, so we could estimate ψ as proportion of sites with 1 detection Naïve estimate of ψ = ¾ or 0.75 If surveys imperfect, estimate p from sites with 1detections p = (0.50 + 1.00 + 0.25) / 3 = 0.58

Estimate ψ and p Use a model based approach to estimate occupancy and detection parameters simultaneously Consider two stochastic process: Occupancy: a site will either be occupied with probability ψ or unoccupied with probability 1 ψ Detection: if site unoccupied, species cannot be detected; if site occupied, then at each survey there is some probability of detecting the species (p): Species detected = ψ Species not detected = 1 ψ or ψ(1 p)

Binomial distribution Discrete distribution. Represents the outcome of a number of independent Bernoulli trials = events with two possible outcomes Notation: Bin(n, p) Parameters: n = number of trials, p = prob. of success each trial p = 0.1 (blue) p = 0.5 (green) p = 0.8 (red) n = 20

Occupancy: single season Ecological Process Z i ~ Bin(1, ψ) Unobservable true occupancy (state) Binomial distribution Probability of occupancy Observation Process y ij ~ Bin(1, Z i p) Observed outcome Binomial distribution Unobservable true state of occupancy Probability of detection

Logistic regression Binary response, so represent the response (stochastic part) with binomial distribution; mean is a probability or proportion (p) Link function is the logit (log odds): logit(y) = β 0 + β 1 x 1 + β 2 x 2 + Occupancy state: logit(ψ i ), i = no. sites Observation process: logit(p ij ), j = no. visits/site Binomial distribution y ~ Bin(N, p) Observed outcome Number trials Prob(occupancy) or Prob(detection)

Assumptions Species never falsely detected when absent Detection of a species at a site independent of detecting species at other sites Sites closed to changes in occupancy state during survey period (no colonization or extinction) ψ and p constant across sites, unless heterogeneity in parameters is explained by covariates

Accounting for heterogeneity with covariates Consider additional factors to explain variation in ψ and p ψ can be modeled as a function of site level covariates covariates for ψ must remain constant during survey period; e.g., plant community, patch size p can be modeled as a function of: site level covariates; e.g., vegetation cover survey level covariates; e.g., cloud cover, air temperature, observer

Covariates Two types: Site level covariates (for ψ and p) Observation level covariates (for p) Surv.1 Surv.2 Surv.3 Surv.4 Buffel% Time.1 Time.2 Time.3 Time.4 Site 1 0 1 1 0 40 M E M E Site 2 1 1 1 1 60 E M E M Site 3 1 0 0 0 20 E M E M Site 4 0 0 0 0 10 M E M E M = morning E = evening

Adding covariates Extend models with Generalized Linear Modeling framework that allow us to model linear functions regardless of the distribution of the response Ecological and Observation Processes y ij ~ Bin(N i, p ij ) logit(p) = β 0 + β 1 X 1 + β 2 X 2 + +β n X n

Run models to estimate parameters For estimates based on maximum likelihood methods: Code directly in R Use UNMARKED package in R For estimates based on Bayesian methods: WinBUGS OpenBUGS JAGS

Fitting models in Unmarked Develop and fit a set of candidate models for the state variable (here, occupancy) and detection process robject < occu (~detect ~occupancy, UMF) time.buff < occu (~time ~buffel, goagumf) timedate.buffyear < occu (~time + date ~ buff + year, goagumf) Use model selection or frequentist methods to establish model for inference