SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL MODEL WITH COMPLEX ERROR PROCESSES AND DYNAMIC MARKOV BASES

Similar documents
Application of Algebraic Statistics to Mark-Recapture Models with Misidentificaton

MARKOV CHAIN MONTE CARLO

Review Article Putting Markov Chains Back into Markov Chain Monte Carlo

Summary of Results on Markov Chains. Abstract

Introduction to Machine Learning CMU-10701

MCMC: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo

Introduction to Restricted Boltzmann Machines

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Markov Chain Monte Carlo (MCMC)

University of Toronto Department of Statistics

eqr094: Hierarchical MCMC for Bayesian System Reliability

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Convergence Rate of Markov Chains

Mark-recapture with identification errors

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Introduction to Bayesian methods in inverse problems

Convex Optimization CMU-10725

Bayesian Methods for Machine Learning

Computational statistics

Markov Chain Monte Carlo


Advances and Applications in Perfect Sampling

Lecture 8: The Metropolis-Hastings Algorithm

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Discrete Markov Chain. Theory and use

MCMC Methods: Gibbs and Metropolis

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

25.1 Ergodicity and Metric Transitivity

Randomized Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Down by the Bayes, where the Watermelons Grow

MARGINAL MARKOV CHAIN MONTE CARLO METHODS

Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC)

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Monte Carlo Method for Finding the Solution of Dirichlet Partial Differential Equations

Infering the Number of State Clusters in Hidden Markov Model and its Extension

Sampling Methods (11/30/04)

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Marginal Markov Chain Monte Carlo Methods

Yaming Yu Department of Statistics, University of California, Irvine Xiao-Li Meng Department of Statistics, Harvard University.

Computer intensive statistical methods

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Markov chain Monte Carlo Lecture 9

\ fwf The Institute for Integrating Statistics in Decision Sciences

Adaptive Monte Carlo methods

Monte Carlo Methods. Leon Gu CSD, CMU

Homework 10 Solution

Stochastic optimization Markov Chain Monte Carlo

Undirected Graphical Models

Markov Chains, Random Walks on Graphs, and the Laplacian

Inverse Problems in the Bayesian Framework

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET. Directional Metropolis Hastings updates for posteriors with nonlinear likelihoods

Reminder of some Markov Chain properties:

Lecture 2 : CS6205 Advanced Modeling and Simulation

Bayesian GLMs and Metropolis-Hastings Algorithm

Markov Chain Monte Carlo methods

Consistency of the maximum likelihood estimator for general hidden Markov models

STA 4273H: Statistical Machine Learning

Markov Chain Monte Carlo. Simulated Annealing.

STAT STOCHASTIC PROCESSES. Contents

CPSC 540: Machine Learning

Kernel Sequential Monte Carlo

Lecture 6: Markov Chain Monte Carlo

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

CS281A/Stat241A Lecture 22

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

ELEC633: Graphical Models

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

Continuous, Individual, Time-Dependent Covariates in the Cormack-Jolly-Seber Model

Answers and expectations

Slice Sampling Mixture Models

LECTURE 15 Markov chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Markov chain Monte Carlo

Control Variates for Markov Chain Monte Carlo

Markov Chain Monte Carlo Lecture 6

A Geometric Interpretation of the Metropolis Hastings Algorithm

The Metropolis-Hastings Algorithm. June 8, 2012

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

MARKOV CHAINS AND HIDDEN MARKOV MODELS

Lecture 4: Dynamic models

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

High-dimensional Markov Chain Models for Categorical Data Sequences with Applications Wai-Ki CHING AMACL, Department of Mathematics HKU 19 March 2013

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Markov chain Monte Carlo

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

On Reparametrization and the Gibbs Sampler

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

Introduction to Machine Learning CMU-10701

Computer intensive statistical methods

Transcription:

SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL MODEL WITH COMPLEX ERROR PROCESSES AND DYNAMIC MARKOV BASES By Simon J Bonner, Matthew R Schofield, Patrik Noren Steven J Price University of Western Ontario, University of Otago, North Carolina State University, University of Kentucky Supplement A: Proof of Convergence To prove that chains generated from Algorithm 2 from the manuscript converge to the correct distribution, we need to satisfy three conditions: 1) that Step 1 produces chains which converge to π(φ, p z) for any z such that z = Bx for some x F n, 2) that Step 2 produces chains which converge to π(x n, φ, p, α) for any φ p in the parameter space, 3) that the joint posterior distribution satisfies the positivity condition of (Robert Casella, 2010, pg 345) Sampling from π(φ, p z) is equivalent to sampling from the posterior distribution for a simple CJS model This is now stard (Link et al, 2002, see eg), so we assume that Condition 1 is satisfied It is also simple to show that the positivity constraint is satisfied given that the prior distributions for φ p are positive over all of (0, 1) T (0, 1) T 1, as assumed in Section 5 of the original manuscript It remains to show that Condition 2 holds We assume here that F n contains at least two elements The fibre always contains at least one element with no errors which we denote by x The entries of this element are { x nν if ν is observable ν = Cases in which F n = {x } arise when no errors could have occurred, for example, if no individuals were ever recaptured These situations are easily identified there is no need to sample from the joint posterior of both x θ in such cases since x = x with probability one Some useful results that are easy to prove are: 1 that any configuration of the latent error histories within the fibre has positive probability under the conditional posterior for all values of the parameters in the parameter space, 1

2 S J BONNER ET AL Lemma 1 If x F n then π(x n, φ, p, α) > 0 for all values of φ p in the parameter space 2 that the local sets within the dynamic Markov basis are symmetric, Lemma 2 Let x F n If b + M 1 (x) then b + M 2 (x + b + ) if b M 2 (x) then b M 1 (x + b ) 3 that all proposals remain inside F n, Lemma 3 F n Let x F n If b M(x) = M 1 (x) M 2 (x) then x+b 4 that there is a unique element x in F n with no errors Lemma 4 only if Suppose that x F n Then e t (x ) = 0 t = 2,, T if { x nν if ν is observable ν = First we establish irreducibility Proposition 1 implies that there is a path connecting any two elements in the fibre while Proposition 2 implies that each step, hence the entire path, has positive probability under the transition kernel Together, these show that that the chains are irreducible Proposition 1 For any distinct x 1, x 2 F n there exists a sequence of moves b 1,, b L such that: ( 1 b L M x 1 + ) L 1 l=1 b l for all L = 1,, L 2 x 1 + L l=1 b l F n for all L = 1,, L 1, 3 x 2 = x 1 + L l=1 b l, where we take x 1 + 0 l=1 b l = x 1 Proof Our proof follows by (reverse) induction on the number of errors Suppose that e t (x 1 ) > 0 for some t Then X 2t (x 1 ) X 3t (x 1 ) are both nonempty b 11 M 2(x 1 ) Then e t (x 1 +b 11 ) = e t(x 1 ) 1 x 1 +b 11 F n by Lemma 3 Repeating this procedure L 1 = T t=2 e t(x 1 ) times, we find b 11,, b 1L 1 such that 1 b 1L M 2 (x 1 + L 1 l=1 b 1l ) for L = 1,, L 1, 2 x 1 + L l=1 b 1l F n for all L = 1,, L 1,

SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL 3 3 e t (x 1 + L 1 l=1 b 1l ) = 0 It follows from Lemma 4 that x 1 + L 1 l=1 b 1l = x By the same argument, b 21,, b 2L 2 such that 1 b 2l M 2(x 2 + L 1 l=1 b 2l ) for L = 1,, L 2,, 2 x 2 + L l=1 b 2l F n for all L = 1,, L 2, 3 x 2 + L 2 l=1 b 2l = x Moreover, b 2,L 2 l+1 M 1(x + L 1 l=0 b 2,L 2 l ) for all L = 1,, L 2 by Lemma 2 Then the sequence b 1,, b L where L = L 1 + L 2, b l = b 1l for l = 1,, L 1, b L1 +l = b 2,L 2 l+1 for l = 1,, L 2 satisfies the conditions of the proposition Note that half of this argument suffices if either x 1 = x or x 2 = x Proposition 2 Let x F n If b M(x) then P (x (k+1) = x+b x (k) = x) > 0 Proof Suppose that b M 1 (x) let x = x + b Then b M 2 (x ) by Lemma 2 Direct calculation of equations (5) (6) shows that both q(x x) > 0 q(x x ) > 0 Combined with Lemma 1 it follows that r(x, x φ (k), p (k), α) (defined in Step 2, Substep iii of Algorithm 2) is positive hence that P (x (k+1) = x x (k) = x) = q(x x) r(x, x φ (k), p (k), α) > 0 A similar argument shows that P (x (k+1) = x + b x (k) = x) > 0 for all b M 2 (x) We establish aperiodicity by showing that there is positive probability of holding at x Proposition 3 If x (k) = x then P (x (k+1) = x 0 x (k) = x 0 ) 5 Proof The set M 2 (x ) is empty since there are no errors to remove from x However, Algorithm 2 still proposes to draw a move from M 2 (x ) with probability 5 When this occurs we set x (k+1) = x (k) so that P (x (k+1) = x x (k) = x ) 5 This shows that x is an aperiodic state hence that the entire chain is aperiodic (Cinlar, 1975, pg 125) Since the fibre is finite, irreducibility aperiodicty are sufficient to ensure that the chains have a unique stationary distribution which is also the limiting distribution (see Cinlar, 1975, Corollary 211) That this distribution is equal to the target distribution is guaranteed by the detailed balance

4 S J BONNER ET AL condition of the MH algorithm which holds under Proposition 4 (Liu, 2004, pg 111) Proposition 4 If q(x x) > 0 then q(x x ) > 0 for all x, x F n Proof Suppose that q(x x) > 0 Then either x x M 1 (x) or x x M 2 (x) If x x M 1 (x) then x x M 2 (x ) by Lemma 2 q(x x ) > 0 Similarly, if x x M 2 (x) then x x M 1 (x ) q(x x ) > 0 This completes our proof that the Markov chains produced by Algorithm 2 have unique limiting distribution π(x, φ, p n, α) so that realisations from the tail of a converged chain can be used to approximate properties of the joint posterior distribution of x, φ, p Supplement B: Model M tα Here we show how the model described by Link et al (2010) can be fit into the extended model using a dynamic Markov basis to sample from the posterior distribution Model M tα extends the stard closed population model with time dependent capture probabilities by allowing for individuals to be misidentified Specifically, M tα assumes that individuals are identified correctly with probability α on each capture that errors do not duplicate observed marks in that one marked individual cannot be identified as another marked individual that the same error can never occur twice This means that each error creates a new recorded history with a single non-zero entry For example, suppose that T = 3 that individual i is captured on the first occasion, recaptured misidentified on the second occasion, not captured on the third occasion The true capture history for this individual, including errors, is 120, where the event 2 denotes that the individual was captured misidentified The individual would then contribute two recorded histories, 100 010, to the observed data set Extended Formulation As with the CJS/BRE model, we cast model M tα into the new framework by 1) identifying the sets of observable capture histories, latent error histories, latent capture histories, 2) constructing the linear constraints matrices for the corresponding count vectors, 3) identifying the components of the likelihood function For an experiment with T occasions, the set of observable capture histories contains all 2 T 1 histories in {0, 1} T excluding the unobservable zero history, the set of latent error histories includes all 3 T histories in {0, 1, 2} T, the set of latent capture histories includes all 2 T histories in {0, 1} T The matrix A is defined

SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL 5 exactly as in Link et al (2010): A ij = 1 if the i th observable history is observed from the j th latent error history A ij = Similarly, B kj = 1 if the k th latent capture history has the same pattern of captures as the j th latent error history Mathematically, let ω i, ν j, ξ k represent the i th, j th, k th observable history, latent error history, latent capture history for some implicit orderings Then 1 if ω it = I(ν jt = 1) for all t = 1,, T or if ω A ij = i = δ t ν jt = 2 for some t {1,, T } 1 ξ kt = I(ν jt = 1) + I(ν jt = 2) for all t = 1,, T B kj = Here δ t represents t th column of the T T identity matrix (ie, the vector with a single 1 in the t th entry) Finally, the two components of the likelihood function are π(z p) = [ N! K ξ Z z ξ! ξ Z k=1 p I(ξ k=1) k (1 p k ) I(ξ k=0) ] zξ π(x z, α) = ξ Z z ξ! ν X x ν! [ K α I(νk=1) (1 α) I(ν k=2) ν X k=1 ] xν Here N = ξ Z zt ξ represents the true population size Note that the product of these two contributions exactly reconstructs the single likelihood function for M tα defined by Link et al (2010, eqns 6 7) Dynamic Moves We can again generate a dynamic Markov basis by selecting from a set of moves which add or remove errors from the current configuration An extra step is also needed on each iteration of the algorithm to update the count of the unobserved individuals, ν 0 Let χ νt (x) = {ν : ν t = ν x ν > 0} be the set of latent error histories with event ν on occasion t positive counts in x As with the CJS/BRE model, we define M(x) as the union of two sets: M 1 (x) containing moves

6 S J BONNER ET AL which add errors M 2 (x) containing moves which remove errors Each move in the dynamic Markov basis modifies the counts of three latent error histories Moves in M 1 (x) are defined by sampling one history ν 0 χ 0t (x) for some t are denoted by b(ν 0 ) The other two latent error histories, ν 1 ν 2, are defined by setting ν 1 = δ t ν 2 = ν 0 + 2δ t The move b(ν 0 ) is then defined by setting { 1 if ν = ν0 or ν = ν b ν (ν 0 ) = 1 1 if ν = ν 2 Similarly, moves in M 2 (x), denoted by b(ν 2 ), are defined by sampling a history ν 2 χ 2t (x) for some t, setting ν 1 = δ t ν 0s = { 0 s = t ν 2s otherwise The move b(ν 2 ) is then defined by setting t = 1,, T b ν (ν 2 ) = { 1 if ν = ν2 1 if ν = ν 0 or ν = ν 1 If we assume that the decision to add or remove an error are each chosen with probability 5 that histories are sampled uniformly from χ 0 = T t=1 χ 0t χ 2 = T t=1 χ 2t, when adding or removing an error respectively, then the proposal densities for the moves x = x (k 1) +b(ν 0 ) x = x (k 1) +b(ν 2 ) are given by q(x x (k 1) ) = q(x x (k 1) ) = 5 #χ 0 (x (k 1) ) #{t : ν 0t = 1} 5 #χ 2 (x (k 1) ) #{t : ν 2t = 2} As in the algorithm for the CJS/BRE, we retain x (k 1) with probability 1 if an empty set is encountered in one of these processes or if the selected move reduces any of the counts in x below zero Details of the full algorithm for sampling from the posterior distribution of model M tα are given in Algorithm 3

SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL 7 Initialise p (0) α Initialise x (0) so that n = Ax (0) set z (0) = Bx (0) Set k = 1 1 Update p α conditional on x (k 1) z (k 1) Call the results p (k) α (k) 2 Update x z conditional on p (k) α (k) as follows i) With probability 5 sample b from M 1(x (k 1) ) If M 1(x (k 1) ) = then set x (k) = x (k 1) continue to step v) Otherwise sample b from M 2(x (k 1) ) If M 2(x (k 1) ) = then set x (k) = x (k 1) continue to step v) ii) Set x prop = x (k 1) + b If x j < 0 for any j = 1,, J set x (k) = x (k 1) continue to step v) iii) Calculate the Metropolis acceptance probability: { } r(x, x prop p (k), α) = min 1, π(xprop n, p (k), α) π(x (k 1) n, p (k), α) q(x(k 1) x prop ) q(x prop x (k 1) ) iv) Set x (k) = x prop with probability r(x, x prop φ (k), p (k), α) x (k) = x (k 1) otherwise v) Set z (k) = Bx (k) 3 Increment k Algorithm 3: Proposed algorithm for sampling from the posterior distribution of Model M tα using the dynamic Markov basis References Cinlar, E (1975) Introduction to Stochastic Processes Prentice-Hall, Englewood Cliffs, NJ Link, W A, Cam, E, Nichols, J D Cooch, E G (2002) Of BUGS Birds: Markov chain Monte Carlo for hierarchical modeling in wildlife research Journal of Wildlife Management 66 277 291 Link, W A, Yoshizaki, J, Bailey, L L Pollock, K H (2010) Uncovering a latent multinomial: analysis of mark-recapture data with misidentification Biometrics 66 178 185 Liu, J S (2004) Monte Carlo Strategies in Scientific Computing Springer, New York, NY Robert, C P Casella, G (2010) Monte Carlo Statistical Methods, 2nd ed Springer-Verlag, New York Department of Statistical Actuarial Sciences University of Western Ontario London, Ontario N6A 5B7 E-mail: sbonner6@uwoca Department of Mathematics Statistics University of Otago Dunedin 90504, New Zeal E-mail: mschofield@mathsotagoacnz Department of Mathematics North Carolina State University Box 8205 Raleigh, NC 27695 E-mail: pgnoren2@ncsuedu Department of Forestry University of Kentucky Lexington, Kentucky 40546-0073 USA E-mail: stevenprice@ukyedu