Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods

Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods M. González, R. Martínez, I. del Puerto, A. Ramos Department of Mathematics. University of Extremadura Spanish Branching Processes Group 18th International Conference in Computational Statistics Porto, August 2008 M. González

Contents 1 Controlled Branching Processes 2 3 Introducing the Method Developing the Method Simulated Example 4 Concluding Remarks References M. González

Branching Processes Inside the general context concerning Stochastic Models, Branching Processes Theory provides appropriate mathematical models for description of the probabilistic evolution of systems whose components (cell, particles, individuals in general), after certain life period, reproduce and die. Therefore, it can be applied in several fields (biology, demography, ecology, epidemiology, genetics, medicine,...). M. González

Branching Processes Example Z 0 = 1. Z n Z n+1 = j=1 X nj M. González

Branching Processes Example Z 0 = 1 Z 1 = 2. Z n Z n+1 = j=1 X nj M. González

Branching Processes Example Z 0 = 1 Z 1 = 2 Z 2 = 7. Z n Z n+1 = j=1 X nj M. González

Branching Processes Example Z 0 = 1 Z 1 = 2 Z 2 = 7 Z 3 = 10. Z n Z n+1 = j=1 X nj M. González

Branching Processes Main Results for Galton Watson Branching Processes Let m = E[X 01 ] and σ 2 = Var[X 01 ] Extinction Problem If m 1 the process dies out with probability 1 If m > 1 there exists a positive probability of non-extinction Asymptotic behaviour Statistical Inference M. González

Branching Processes Many monographs about the theory and applications about the branching processes have been published: Harris, T. (1963). The Theory of branching processes. Springer-Verlag. Jagers, P. (1975) Branching processes with Biological Applications, John Wiley and Sons, Inc. Asmussen, S. and Hering, H. (1983). Branching processes. Birkhäuser. Boston. Athreya, K.B. and Jagers, P. (1997). Classical and modern branching processes. Springer-Verlag. Kimmel, M. and Axelrod, D.E. (2002) Branching processes in Biology, Springer-Verlag New York, Inc. Haccou, P., Jagers, P., and Vatutin, V. (2005) Branching Processes: Variation, Growth, and Extinction of Populations. Cambridge University Press. M. González

Branching Processes A Controlled Branching Process is a discrete-time stochastic growth population model in which the individuals with reproductive capacity in each generation are controlled by some function φ. This branching model is well-suited for describing the probabilistic evolution of populations in which, for various reasons of an environmental, social or other nature, there is a mechanism that establishes the number of progenitors who take part in each generation. M. González

Branching Processes Mathematically: Controlled Branching Process Z 0 = N, Z n+1 = {Z n } n 0 φ(z n) i=1 X ni, n = 0, 1,... {X ni : i = 1, 2,..., n = 0, 1,...} are i.i.d. random variables. {p k : k S} Offspring Distribution m = E[X 01 ], σ 2 = Var[X 01 ] φ : R + R + is assumed to be integer-valued for integer-valued arguments Control Function M. González

Controlled Branching Processes Properties {Z n } n 0 is a Homogeneous Markov Chain Duality Extinction-Explosion: P(Z n 0) + P(Z n ) = 1 Main Topics Investigated Extinction Problem Sevast yanov and Zubkov (1974) Zubkov (1974) Molina, González and Mota (1998) Asymptotic Behaviour: Growth rates Bagley (1986) Molina, González and Mota (1998) González, Molina, del Puerto (2002, 2003, 2004, 2005a,b) M. González

Controlled Branching Processes Main Topics Investigated Statistical Inference Dion, J. P. and Essebbar, B. (1995). On the statistics of controlled branching processes. Lecture Notes in Statistics, 99:14-21. M. González, R. Martínez, I. Del Puerto (2004). Nonparametric estimation of the offspring distribution and the mean for a controlled branching process. Test, 13(2), 465-479. M. González, R. Martínez, I. Del Puerto (2005). Estimation of the variance for a controlled branching process. Test, 14(1), 199-213. T.N. Sriram, A. Bhattacharya, M. González, R. Martínez, I. Del Puerto (2007). Estimation of the offspring mean in a controlled branching process with a random control function. Stochastic Processes and their Applications, 117, 928-946. M. González

Non-Parametric Framework Offspring Distribution: p = {p k : k S} S finite. Sample: The entire family tree up to the current generation or at least {X ki : i = 1,..., φ(z k ), k = 0, 1,..., n} Z n = {Z j (k): k S, j = 0,..., n} where Z j (k) = φ(z j ) i=1 I {X ji =k} = number of parents in the jth-generation which generate exactly k offspring Objetive: Make inference on p M. González

Likelihood Function L(p Z n ) n j=0 p Z j(k) k k S Conjugate Class of Distributions: Dirichlet Family Prior Distribution: p D(α k : k S) Posterior Distribution: n p Z n D(α k + Z j (k): k S) j=0 M. González

Setting out the Problem In real problems it is difficult to observe the entire family tree {X ki : i = 1, 2,..., k = 0, 1,..., n} or even the random variables Z n = {Z j (k): k S, j = 0,..., n} Usual Sample Information Z n = {Z j : j = 0,..., n} A Solution We introduce an algorithm to approximate the distribution p Z n using Markov Chain Monte Carlo Methods M. González

Introducing the Method Developing the Method Simulated Example Gibbs Sampler: Introducing the Method Sample: Z n = {Z j : j = 0,..., n} The Problem Latent Variables: p Z n Z n = {Z j (k): k S, j = 0,..., n} Gibbs Sampler: p Z n, Z n Z n Z n, p M. González

Introducing the Method Developing the Method Simulated Example Gibbs Sampler: Introducing the Method First Conditional Distribution: p Z n, Zn n p Z n, Zn p Z n D(α k + Z j (k): k S) j=0 For j = 0,..., n φ(z j ) = k S Z j (k) Z j+1 = k S kz j (k) M. González

Introducing the Method Developing the Method Simulated Example Gibbs Sampler: Introducing the Method Second Conditional Distribution: Z n Zn, p n P(Z n Zn, p) = P(Z j (k): k S Z j, Z j+1, p) j=0 (Z j (k): k S) Z j, Z j+1, p is obtained from a Multinomial(φ(Z j ), p) normalized by considering the constraint Z j+1 = kz j (k) k S M. González

Introducing the Method Developing the Method Simulated Example Gibbs Sampler: Introducing the Method Second Conditional Distribution: Z n Z n, p p φ(z 0 ) Z 0 (k), k S Z 1 φ(z 1 ) Z 1 (k), k S Z 2 φ(z 2 )... Z n φ(z n ) Z n (k), k S Z n+1 φ(z j ) = k S Z j (k), Z j+1 = k S kz j (k) M. González

Introducing the Method Developing the Method Simulated Example Gibbs Sampler: Developing the Method Algorithm Fixed p (0) Do l = 1 Generate Z n (l) Z n Zn, p (l 1) Generate p (l) p Z n (l) Do l = l + 1 For a run of the sequence {p (l) } l 0, we choose Q + 1 vectors in the way {p (N), p (N+G),..., p (N+QG) }, where G is a batch size. The vectors {p (N), p (N+G),..., p (N+QG) } are considered independent samples from p Z n if G and N are large enough (Tierney (1994)). Since these vectors could be affected by the initial state p (0), we apply the algorithm T times, obtaining a final sample of length T(Q + 1). M. González

Introducing the Method Developing the Method Simulated Example Gibbs Sampler: Simulated Example Offspring Distribution: k 0 1 2 3 4 p k 0.28398 0.42014 0.233090 0.05747 0.00531 Parameters: m = 1.08, σ 2 = 0.7884 Control function: φ(x) = 7 if x 7; x if 7 < x 20; 20 if x > 20 Simulated Data Controlled Branching Process Galton Watson Branching Process 5 10 15 20 25 0 200 400 600 0 20 40 60 80 100 0 20 40 60 80 100 M. González

Introducing the Method Developing the Method Simulated Example Gibbs Sampler: Simulated Example Observed Data: n = 15 n 0 1 2 3 4 5 6 7 8 9 10 Z n 10 12 17 13 12 12 11 10 11 14 13 n 11 12 13 14 15 Z n 15 21 24 20 19 p D(1/2,..., 1/2) Selection of N, G, Q and T N = 5000, G = 100, Q = 49 and T = 100 Gelman-Rubin-Brooks diagnostic plots. Estimated potential scale reduction factor. Autocorrelation values. M. González

Introducing the Method Developing the Method Simulated Example Gibbs Sampler: Simulated Example Gelman-Rubin-Brooks diagnostic plots (CODA package for R) p 0 p 1 p 2 shrink factor 1.0 1.5 2.0 2.5 median 97.5% shrink factor 1.0 1.5 2.0 2.5 3.0 median 97.5% shrink factor 1.0 1.5 2.0 2.5 median 97.5% 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 last iteration in chain last iteration in chain last iteration in chain p 3 p 4 shrink factor 1.0 1.5 2.0 2.5 median 97.5% shrink factor 1.0 1.2 1.4 1.6 1.8 median 97.5% 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 last iteration in chain last iteration in chain M. González

Introducing the Method Developing the Method Simulated Example Gibbs Sampler: Simulated Examples p D(1/2,..., 1/2) Selection of N, G, Q and T N = 5000, G = 100, Q = 49 and T = 100 Gelman-Rubin-Brooks diagnostic plots. Estimated potential scale reduction factor. Autocorrelation values. Computation Time: 60.10s for each chain on an Intel(R) Core(TM)2 Duo CPU T7500 running at 2.20GHz with 2038 MB RAM. M. González

Introducing the Method Developing the Method Simulated Example Gibbs Sampler: Simulated Example Sample Information: Z n N = 5000, G = 100, Q = 49 and T = 100 (Sample Size: 5000) p 0 hpd 95% p^0 p 0 hpd 95% p 3 hpd 95% p^3 p 0 hpd 95% Offspring Mean hpd 95% mm^ hpd 95% Density 0 1 2 3 4 5 Density 0 5 10 15 20 25 30 Density 0 2 4 6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.9 1.0 1.1 1.2 1.3 Algorithm s Efficiency MEAN SD MCSE TSSE 1.0766931 0.0579821 0.0008200 0.0008179 M. González

Concluding Remarks Concluding Remarks References In a non-parametric Bayesian framework we can make inference on the offspring distribution of Controlled Branching Processes, and consequently on the rest of offspring parameters, without observing the entire family tree, but only considering the total number of individuals in each generation. We use a MCMC method (Gibbs sampler) and the statistical software and programming environment R, in order to give a "likely" approach to family trees. M. González

References Concluding Remarks References Bagley, J. H. (1986). On the almost sure convergence of controlled branching processes. Journal of Applied Probability, 23:827-831. M. González, M. Molina, I. del Puerto (2002). On the class of controlled branching process with random control functions. Journal of Applied Probability, 39 (4), 804-815. M. González, M. Molina, I. del Puerto (2003). On the geometric growth in controlled branching processes with random control function. Journal of Applied Probability, 40(4), 995-1006. M. González, M. Molina, I. del Puerto (2004). Limiting distribution for subcritical controlled branching processes with random control function. Statistics and Probability Letters, 67(3), 277-284. M. González, M. Molina, I. del Puerto (2005a). Asymptotic behaviour of critical controlled branching process with random control function. Journal of Applied Probability, 42(2), 463-477. M. González, M. Molina, I. del Puerto (2005b). On the L2-convergence of controlled branching processes with random control function. Bernoulli, 11(1), 37-46. M. Molina, M. González, M. Mota (1998). Some theoretical results about superadditive controlled Galton-Watson branching processes. Proceedings of the International Conference Prague Stochastics98, 2, 413-418. Sevast yanov, B. A. and Zubkov, A. (1974). Controlled branching processes. Theory of Probability and its Applications, 19:14-24. Tierney, L. (1994). Markov chains for exploring posterior distributions. Annals of Statistics, 22, 1701-1762. Zubkov, A. M. (1974). Analogies between Galton-Watson processes and φ-branching processes. Theory of Probability and its Applications,19:309-331. M. González

Concluding Remarks References Thank you very much! M. González