Application of Algebraic Statistics to Mark-Recapture Models with Misidentificaton

Size: px

Start display at page:

Download "Application of Algebraic Statistics to Mark-Recapture Models with Misidentificaton"

Hugh Greene
5 years ago
Views:

1 Application of Algebraic Statistics to Mark-Recapture Models with Misidentificaton Dr. Simon Bonner 1, Dr. Matthew Schofield 2, Patrik Noren 3, and Dr. Ruriko Yoshida 1 1 Department of Statistics, University of Kentucky, USA 2 Department of Mathematics and Statistics, University of Otago, NZ 2 Institute of Science and Technology, AT simon.bonner@uky.edu Algebraic Statistics 2014 Illinois Institute of Technology

2 Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 2/22

3 Introduction to Mark-Recapture Setup Individuals are sampled from a population on T occasions. On each occasion: New individuals are marked Identities of marked individuals are recorded All individuals are returned to the population Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 3/22

4 Introduction to Mark-Recapture Setup Individuals are sampled from a population on T occasions. On each occasion: New individuals are marked Identities of marked individuals are recorded All individuals are returned to the population Raw Data ID History Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 3/22

5 Introduction to Mark-Recapture Setup Individuals are sampled from a population on T occasions. On each occasion: New individuals are marked Identities of marked individuals are recorded All individuals are returned to the population Raw Data ID History Summaries n n n n Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 3/22

6 Introduction to Mark-Recapture Setup Individuals are sampled from a population on T occasions. On each occasion: New individuals are marked Identities of marked individuals are recorded All individuals are returned to the population Raw Data ID History Summaries n n n n Model π(n θ) Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 3/22

7 Introduction to Mark-Recapture Setup Individuals are sampled from a population on T occasions. On each occasion: New individuals are marked Identities of marked individuals are recorded All individuals are returned to the population Raw Data ID History Summaries n n n n Model π(θ n) π(n θ)π(θ) Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 3/22

8 images.nationalgeographic.com/wpf/media-live/photos/000/007/cache/whale-shark_754_600x450.jpg Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 4/22

9 Mark-Recapture with Multiple Marks ID History 001 0L00R 002 L00L R LR0L. Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 5/22

10 Mark-Recapture with Multiple Marks ID History 001 0L00R 002 L00L R LR0L. ID(L/R) 001(L) 001(R) 002(L) 003(R) 004(L) 004(R) Obs. Hist. 0L R L00L0 00R00 0L00L 00R00. Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 5/22

11 Mark-Recapture with Multiple Marks Obs. Hist. 0L R L00L0 00R00 0L00L 00R00. Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 5/22

http://en.wikipedia.org/wiki/file:ambystoma_opacumpcslxyb.

12 Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 6/22

13 Model M tα Model Assumptions 1 Individuals may be misidentified 2 Errors occur independently with probability α 3 All errors are unique True Observed Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 7/22

14 Connection with Algebraic Statistics Define: H obs : set of observable histories ( H obs = 2 T 1) H true : set of true histories and: n: observed counts of the histories in H obs x: latent counts of the true histories in H true Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 8/22

15 Connection with Algebraic Statistics Define: and: H obs : set of observable histories ( H obs = 2 T 1) H true : set of true histories n: observed counts of the histories in H obs x: latent counts of the true histories in H true Assume that there is a linear map: n = Ax Given a model π(x θ) we can make infernce about θ by sampling from the joint posterior distribution: π(x, θ n) = I (n = Ax)π(x θ)π(θ) Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 8/22

16 MCMC Algorithm of Link et al. (2010) Construct a basis for ker(a): b 1,..., b K. Define initial values x (0) s.t. n = Ax (0). On iteration j 1) Set x curr = x (j 1) 2) For k = 1, 2,...: i. Sample c k { D k,..., 1, 1,..., D k } and set: x prop = x curr + c k b k ii. Set x curr = x prop with probability: ( α = min 1, π(xprop θ) ) π(x prop θ) 3) Set x (j) = x curr. Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 9/22

17 MCMC Algorithm of Link et al. (2010) Construct a (lattice) basis for ker(a): b 1,..., b K. Define initial values x (0) s.t. n = Ax (0). On iteration j 1) Set x curr = x (j 1) 2) For k = 1, 2,...: i. Sample c k { D k,..., 1, 1,..., D k } and set: x prop = x curr + c k b k ii. Set x curr = x prop with probability: ( α = min 1, π(xprop θ) ) π(x prop θ) 3) Set x (j) = x curr. Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 9/22

18 MCMC Algorithm of Link et al. (2010) Theorem 1a: A Markov Basis for Multiple Marks There exists a lattice basis for the multiple marks model which is also a Markov basis. Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 10/22

19 MCMC Algorithm of Link et al. (2010) Theorem 1a: A Markov Basis for Multiple Marks There exists a lattice basis for the multiple marks model which is also a Markov basis. Theorem 1b: A Markov Basis for Model M tα There exists a lattice basis for model M tα which is also a Markov basis. Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 10/22

www.fws.gov/midwest/greenbay/images/birds/kiwa/2012updates/malekiwa2012.

20 Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 11/22

21 Misidentification 2: Band Read Error Model Model Assumptions 1 Individuals may have false negative (2) or false positive (3) captures. 2 Errors may not occur on or before the individual is marked. 3 False negatives occur independently with probability α 4 False negatives and positives cannot co-occur True Observed Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 12/22

22 Misidentification 2: Band Read Error Model Computation MB with 4ti2 Occasions (T ) # Lattice Basis # Markov Basis ?????? Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 13/22

23 Misidentification 2: Band Read Error Model Theorem 2: A Markov basis for the BRE Model Let ω, ν H true such that: Define ω and µ such that: { ωs ωs s t = 2 s = t ω t = 0 and ν t = 1. and ν s = { νs s t 3 s = t Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 14/22

24 Misidentification 2: Band Read Error Model Theorem 2: A Markov basis for the BRE Model Let ω, ν H true such that: Define ω and µ such that: { ωs ωs s t = 2 s = t ω t = 0 and ν t = 1. and ν s = { νs s t 3 s = t A Markov basis, M, is formed by all moves of the form: x ω x ω 1 x ω x ω + 1 x ν x ν 1 x ν x ν + 1 and their invereses. Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 14/22

25 Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 15/22

26 Simple Corruptions We call an error in mark-recapture a simple corruption if it splits the true history for an individual into two or more observed histories. Examples: Multiple marks: Model M tα : 0LRL0 0L0L0, 00R , Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 16/22

27 Simple Corruptions We call an error in mark-recapture a simple corruption if it splits the true history for an individual into two or more observed histories. If all errors are simple corruptions then the configuration matrix has Hermite normal form (up to permuting the columns): H = [I A ]. where A contains only the values 0, 1. Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 16/22

28 General Result Theorem 1: General Result Suppose that the configuration matrix has Hermite normal form (up to permuting the columns): H = [I A ] where A contains only the values 0, 1. Then there exists a lattice basis which is a Markov basis. Proof A Markov basis is formed by the columns of: ( ) A I Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 17/22

29 Band Read Error Model: Continuing Work Theorem 3: Size of the Fiber Ignoring the constraints matching the number of false negatives and false positives we can compute size of the fiber generated by n. This provides and upper bound on the true fiber: F A,n ( ) nω + 2 fω 1 2 fω 1 where: ω H true f ω = T min{t ω t = 1}. Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 18/22

30 Band Read Error Model: Continuing Work Theorem 4: Connectivity of the Fiber Graph Suppose n ω 1 for all ω H obs. Then the fiber graph formed by the previously defined Markov basis, G(F A,n, M) has connectivity: κ(g(f A,n, M) = δ(g(f A,n, M) = T (2 2T 2 ) 2 T (2 T 1)+ (4T 1). 3 Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 19/22

31 Band Read Error Model: Continuing Work Conjecture 1 The Markov basis M is a Gröbner basis with some term order. Conjecture 2 The Markov basis M is a universal Gröbner basis. Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 20/22

32 Thank You! Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 21/22

33 References Bonner, S. J. and Holmberg, J. (2013). Mark-recapture with multiple non-invasive marks. Biometrics, 69(3): Link, W. a., Yoshizaki, J., Bailey, L. L., and Pollock, K. H. (2010). Uncovering a latent multinomial: Analysis of mark-recapture data with misidentification. Biometrics, 66(March): Wright, J. A., Barker, R. J., Schofield, M. R., Frantz, A. C., Byrom, A. E., and Gleeson, D. M. (2009). Incorporating genotype uncertainty into mark-recapture-type models for estimating abundance using DNA samples. Biometrics, 65(3): Yoshizaki, J., Brownie, C., Pollock, K. H., and Link, W. A. (2011). Modeling misidentification errors that result from use of genetic tags in capture-recapture studies. Environmental and Ecological Statistics, 18(1): Mark-Recapture and Misidentification Bonner, Schofield, Noren, and Yoshida 22/22

SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL MODEL WITH COMPLEX ERROR PROCESSES AND DYNAMIC MARKOV BASES

SUPPLEMENT TO EXTENDING THE LATENT MULTINOMIAL MODEL WITH COMPLEX ERROR PROCESSES AND DYNAMIC MARKOV BASES By Simon J Bonner, Matthew R Schofield, Patrik Noren Steven J Price University of Western Ontario,