A latent Gaussian model for compositional data with structural zeroes

Size: px
Start display at page:

Download "A latent Gaussian model for compositional data with structural zeroes"

Transcription

1 A latent Gaussian model for compositional data with structural zeroes Adam Butler and Chris Glasbey Biomathematics & Statistics Scotland, Edinburgh, UK David Allcroft Edinburgh, UK Sarah Wanless Centre for Ecology and Hydrology, Banchory, UK Summary. Compositional data record the relative proportions of different components within a mixture, and arise frequently in fields such as geology, psychology and ecology. Standard statistical techniques for the analysis of compositional data assume that the data do not contain any proportions which are genuinely zero, but real datasets, such as a dataset on seabird diet that we will consider here, often do contain such structural zeroes. We propose a latent multivariate Gaussian model for the analysis of compositional data which contain zero values, and propose an iterative algorithm to simulate values from this model. Evaluation of the likelihood involves the calculation of intractable integrals, so inferences about the parameters of the model are instead based upon a recently proposed sequential Monte Carlo algorithm for approximate Bayesian computation. Keywords: Latent Gaussian model; Compositional data; Unit-sum constraint; Zero proportion; Approximate Bayesian Computation; Sequential Monte Carlo; Intractable likelihood; Dietary composition; Seabirds; Rissa tridactyla 1. Introduction The analysis of seabird diet can yield crucial insights into the ecology of both seabirds and their prey species, and can thereby play a significant role in contributing to the conservation of marine ecosystems. The data which motivate this paper record the relative frequencies of three different prey types - lesser sandeel ammodytes marinus aged less than one year (SE0), mature lesser sandeel (SE1), and other species - within samples of regurgitated material collected from four island colonies of the Black-legged Kittiwake Rissa tridactyla on the east coast of Britain during the period (Bull et al., 2004). The data are plotted in Figure 1. The Kittiwake data are compositional, in the sense that they record information about the relative frequencies associated with different components of a system - in this case the proportions associated with different prey types. Compositional data routinely arise in disciplines such as geology, economics and ecology, and it has long been recognised (Pearson, 1897) that they should not be analysed using standard statistical methods because of the intrinsic constraint that the proportions associated with the various components must sum to one. Aitchison (1986) demonstrated that this difficulty could effectively be overcome by analysing the log-ratios of the proportions rather than the proportions themselves, and proposed a suite of statistical methods based upon assuming a multivariate Gaussian distribution for these log-ratios. There are excellent theoretical and practical arguments for the

2 2 Butler, A., Glasbey, C., Allcroft, D. & Wanless, S. Fig. 1. Kittiwake diet data, plotted using a ternary diagram. The ternary diagram displays the relative frequency associated with the three different prey types, with the vertices representing those individuals which consume only a single type and the edges representing those individuals that consume two of the three types. Data points are represented by red circles whose area is proportional to the number of observations associated with that point - the area of the black circle represents the contribution of a single observation. SE0 Other SE1

3 A latent Gaussian model for compositional data with structural zeroes 3 use of the Aitchisonian approach when the compositional data lie entirely on the interior of the unit simplex, but the approach breaks down when the proportions associated with some or all of the components may be zero. Rigorous methods for dealing with the situation in which zero values arise solely through the rounding off of small values have been developed (Fry et al., 2000; Martín-Fernández et al., 2000, 2003), but there is as yet no established methodology for dealing with the possibility that zero proportions in the data may correspond to genuine absenses of the component concerned ( structural zeroes, also known as essential zeroes ). The Kittiwake diet data contain a large number of zero proportions, and at least some of these data are likely to arise from instances in which an individual bird has simply failed to consume some of the available prey types. Hierarchical approaches which explicitly model the occurrence of structural zeroes are conceptually attractive (Aitchison and Kay, 2004), but because such models are relatively highly parameterised they are unlikely to be appropriate for situations in which - as here - the amount of data is fairly small and the proportion of zero values is fairly large. Latent Gaussian models have been successfully used to deal with the presence of zero values in data on rainfall (Durban and Glasbey, 2001; Allcroft and Glasbey, 2003b), agriculture (Allcroft and Glasbey, 2003a), and nutritional science (Allcroft et al., 2006), and in this paper we argue that an analogous approach can be used for the analysis of compositional data that contain structural zeroes. More specifically, we propose a model in which compositional data X are assumed to arise from the Euclidean projection of a latent multivariate Gaussian variable Y onto the unit simplex, the geometric region within which compositional data must lie. The only unknown parameters within our proposed model model are the mean vector µ and covariance matrix Σ of the latent variable. Evaluation of the likelihood involves the calculation of intractable integrals unless the number of components D is very small, so parameter estimates are instead derived using methods of approximate Bayesian computation (ABC; Beaumont et al., 2002, Marjoram et al., 2003, Plagnol and Tavare, 2004) in which inferences are based upon repeated simulation from the model rather than upon explicit evaluation of the likelihood function. More specifically, we use a variant of the sequential Monte Carlo algorithm for ABC that was proposed by Sisson et al. (2006). We introduce our proposed model in Section 2, and present an iterative algorithm for simulating realisations from this model. In Section 3 we outline a methodology for drawing inferences about the parameters of our model through the use of a sequential Monte Carlo algorithm for approximate Bayesian computation. We apply the model to simulated data in Section 4, and to the Kittiwake diet data in Section 5. We close the paper with a brief discussion. 2. Proposed latent Gaussian model Consider a D-dimensional random variable Y. We will assume throughout this paper that Y is compositional, in the sense that Y [0, 1] D and Y T 1 = 1. These constraints imply that that the D elements of Y each lie between zero and one and that the elements will always sum to one. They are clearly equivalent to an assumption that Y lies on the unit simplex S D 1 = {y R D : y T 1 = 1, y [0, 1] D },

4 4 Butler, A., Glasbey, C., Allcroft, D. & Wanless, S. which is itself a subset of the hyperplane H D 1 = {y R D : y T 1 = 1}. We propose to model Y as a known transformation of a latent random variable whose support is the hyperplane H D 1, with the latent random variable assumed to be multivariate Gaussian. More specifically, our proposed model in based upon assuming that where: Y = g(z), (a) g is the deterministic function which performs a Euclidean projection of Z H D 1 onto the unit simplex S D 1, so that g(z) = {y S D 1 : z y z y for all y S D 1 }; (b) Z has a multivariate Gaussian distribution with mean vector µ and covariance matrix Σ, where µ T 1 = 1 and Σ1 = 0. Note that the constraints on µ and Σ are necessary and sufficient to ensure that the latent variable Z will always lie on the hyperplane H D 1. The constraints imply that there are (D + 2)(D 1)/2 free parameters within our model, corresponding to the number of parameters within a standard (D 1)-dimensional multivariate Gaussian model Simulation from the model The simulation of values from the proposed model relies upon (a) generating values z from a multivariate Gaussian distribution, which can easily be achieved using standard algorithms, and (b) calculating the Euclidean projection y = g(z ) of z onto the unit simplex. We propose an iterative algorithm that will evaluate g(z) within D 1 steps. Let z j denote the jth component of z. Without loss of generality we can assume, for notational convenience, that the values within z are arranged in ascending order, so that z 1 and z D respectively represent the smallest and largest values within this vector. If z 1 > 0 then we must already have z S D 1, in which case we set y = z. Otherwise, we loop through the values k = 2,..., D 1 and calculate r k = k z l /(D k), l=1 until we reach a value of k for which z k+1 + r k 0. We then stop and set y = (y 1,..., y D ) to be { 0 for j k; y j = z j + r k for j > k. It is clear that this algorithm will always generate a point y that lies within the unit simplex S D 1, so that y T 1 = 1 and y > 0. If we let λ = r k and γ = (γ 1,..., γ D ), where { zj r γ j = k for j k;, 0 for j > k.

5 A latent Gaussian model for compositional data with structural zeroes 5 then it follows immediately that y = z+1λ+γ, γ > 0 and y T γ = 0, and it therefore follows by the theory of Lagrangrian multipliers (e.g. Theorem of Fletcher, 1981) that and so, by definition, that g(z) = y. z y z y for all y S D 1, 3. Inference Let y = (y 1,..., y n ) denote n i.i.d. data that are assumed to be realisations from a random variable Y whose distribution is given by the proposed latent Gaussian model with unknown parameters θ = (µ, Σ). We are interested in using the data y to draw statistical inferences about the values of the p elements of θ. The log-likelihood function is of the form log f(y θ) := log P(Y = y θ) = = n log P(Y = y i ; θ) = i n i log P(Z = z; θ)dz = h(y i) n log P(Z h(y i ); θ) i n i log φ D (z; θ)dz, h(y i) where φ D (z; θ) is the probability density function of a D-dimensional multivariate normal random variable with parameter vector θ and where h(y) = {z H D 1 : g(z) = y}. Note that g is not invertible and that h typically defines a subset of the hyperplane H D 1 rather than a single point. If y lies on the interior of the simplex S D 1, however, then h(y) = y, so that h does indeed define a single point. When D = 2 or D = 3 it is possible to derive formulae which express the log-likelihood in terms of the standard Gaussian distribution function Φ but for general D no such simplification is possible and the likelihood function is consequently intractable. The intractability arises from the fact that the geometry of the regions defined by the function h rapidly becomes complicated as D increases to even moderately large values Approximate Bayesian Computation Methods of Approximate Bayesian Computation were developed to deal with situations such as this, in which the likelihood function for a model is intractable but it is relatively straightforward to generate values from the model via simulation. ABC algorithms are designed to select those parameter values which simulate data that have properties similar to those of the actual data. The degree of similarity between data simulated from the model y and the actual data y is quantified using d(s(y ), S(y)), where d is a distance metric and where S(y) denote a set of summary statistics from y. Algorithm A (Fu and Li, 1997; Weiss and von Haeseler, 1998; Pritchard et al., 1999) gives a mechanism for generating N independent realisations θ (1),..., θ (N) from the distribution θ d(s(y ), S(y)) < ɛ, where the threshold ɛ > 0 is fixed a priori.

6 6 Butler, A., Glasbey, C., Allcroft, D. & Wanless, S. Algorithm A: A1 Set i = 1 A2 Generate θ π(θ), where π(θ) is the prior distribution for θ; A3 If π(θ ) > 0 then generate y f(y θ); else to A1 A4 If d(s(y ), S(y)) < ɛ then set θ (i) = θ ; else go to A1 A5 If i < N then set i = i + 1 If S(y) is a sufficient statistic for y then the distribution θ d(s(y ), S(y)) < ɛ will converge to the posterior distribution of θ y as ɛ 0. For all but the simplest models it will not be able to derive a set of sufficient statistics y, and we must instead select S based upon heuristic considerations. It is clearly essential that almost all of the information about θ which are contained within y should also be contained in S(y), since the limiting distribution of θ d(s(y ), S(y)) < ɛ as ɛ 0 may otherwise not be equal to the target distribution. When D = 2 we take the summary statistics to be (1) the mean of y 1, (2) two times the variance of y 1, (3) the proportion of zeroes in y 1 and (4) the proportion of ones in y 1. When D = 3 we take the summary statistics to be the mean of y 1 the variances of y 1, y 2 and y 3, multiplied by two; the means of (y 1 y 2 )/2, (y 1 y 3 )/2 and (y 2 y 3 )/2; the proportion of zeroes in y 1, y 2 and y 3 ; and the proportion of ones in y 1, y 2 and y 3. This statistics were selected largely on the basis of trial and error, but appear to yield reasonable performance in practise. We have selected the elements of S such that each is constrained to lie on the interval [0, 1], thus avoiding the necessity for attributing different weights to the different elements of S within the distance metric d. We take d(s(y ), S(y)) to be equal to the mean of the absolute values of the elements of S(y ) S(y). The choice of ɛ depends largely upon computational considerations, since the acceptance rate of the algorithm decreases rapidly as we decrease the value of ɛ, and we discuss this in Sections 4 and 5. π(θ) denotes the prior distribution of θ. In this paper we will find it convenient to adopt a Bayesian approach, but we continue to regard this is an approximation to maximum likelihood inference and so adopt a uniform prior of the form where Θ R p. π(θ) { 1 if θ Θ 0 otherwise,

7 A latent Gaussian model for compositional data with structural zeroes A sequential algorithm for ABC The justification for the basic ABC algorithm (Algorithm A) relies upon the threshold ɛ being sufficiently small that the distribution of θ d(s(y ), S(y)) < ɛ is approximately equal to the distribution of θ y. If ɛ is small and the prior distribution π(θ) relatively uninformative, however, then the algorithm will tend to have an extremely low acceptance rate, especially if the dimensionality p of the parameter space Θ is large. Marjoram et al. (2003) propose overcoming this by embedding the ABC criterion d(s(y ), S(y)) < ɛ within a Markov chain Monte Carlo (MCMC) framework, but Sisson et al. (2006) note that the resulting algorithm will often exhibit very poor mixing owing to the fact that excursions into the tails of the distribution are associated with a severe reduction in the acceptance rate. Preliminary analyses of the seabird diet data using the ABC-MCMC algorithm suggest that the effects of poor mixing are particularly acute in the context of our latent Gaussian model. An alternative approach involves applying Algorithm A sequentially using a monotonic decreasing set of thresholds {e 0, e 1,..., e T } where e T = ɛ. Sequential Monte Carlo algorithms provide a powerful, and potentially highly efficient, set of methods for drawing inferences about an arbitrary target distribution, and the development of new sequential algorithms is an active area of statistical research (Robert, 2004). The algorithm that we use in this paper (Algorithm B) is a special case of the sequential Monte Carlo algorithm for ABC that was introduced by Sisson et al. (2006): specifically, it is the algorithm that is obtained by taking all particle weights to be equal to 1/N within the ABC-PRC algorithm that they propose. If q is symmetric and π(θ) 1 for θ Θ then it follows from the arguments in Sisson et al. (2006) that Algorithm B will generate independent realisations from the target distribution θ d(s(y ), S(y)) < ɛ. [note: Sisson et al. preprint has now been revised, and no longer seems to include steps B8 and B9 in the algorithm below - need to verify that the proof in that paper continues to hold] Algorithm B: B1 Set i = 1 B2 Generate θ π(θ) B3 Generate y f(y θ) B4 If ρ(s(y ), S(y) < e 0 then set θ (i) 0 = θ ; else return to B2 B5 If i < N then set i = i + 1 B6 Set t = 1 and i = 1 B7 Sample θ at random from the sequence {θ (1) t 1,..., θ(n) t 1 } B8 Generate y f(y θ ) B9 If ρ(s(y ), S(y)) e t then return to B7 B10 Generate θ q(θ θ ) B11 If π(θ ) > 0 then generate y f(y θ ); else return to B10 B12 If ρ(s(y ), S(y)) < e t then set θ (i) t = θ ; else return to B10

8 8 Butler, A., Glasbey, C., Allcroft, D. & Wanless, S. B13 If i < N then set i = i + 1 B14 If t < T then set t = t + 1 The accuracy with which Algorithm B is able to provide an approximation to the target distribution θ y will, as for Algorithm A, depend upon the threshold ɛ, the set of summary statistics S, the distance metric d, the number of simulations N and the prior distribution π. Algorithm B additionally requires us to specify a proposal distribution q(θ θ ) and a sequence of intermediate thresholds {e 0, e 1,..., e T 1 }: these two choices have an impact upon the efficiency of the algorithm, but will not affect the accuracy with which the final sequence {θ (1) T,..., θ(n) T } provides an approximation to the posterior distribution of θ y. In this paper we take q to be Gaussian, so that q(θ θ ) = φ p (θ, (θ, τ 2 I)), where the proposal standard deviation τ controls the rate at which we explore the parameter space. We take intermediate thresholds to be of the form e t = 2e 0 /(T +2), given a particular value for ɛ and for the initial threshold e Simulation study We use a simulation study to explore the performance of the sequence Monte Carlo ABC algorithm in the two and three component cases, for which the likelihood is tractable and it is therefore possible to compare estimates obtained using ABC against those obtained using standard maximum likelihood Two components When D = 2 we can simplify notation by restricting attention to a single component, y N(µ, σ 2 ), with the second component then being equal to 1 y. We simulate five datasets from our model, each of size n = 200 and each generated using the same seed for random number generation. The datasets are generated using a range of different parameter values: 2a: µ = 0.1, σ = 0.1; 2b: µ = 0, σ = 0.1; 2c: µ = 0.1, σ = 0.1; 2d: µ = 0.5, σ = 0.5; 2e: µ = 0.5, σ = 1. The first three sets of parameters are associated with increasingly large probabilities of obtaining a zero proportion in component one (0.159 for 2a, 0.5 for 2b and for 2c) but have a negligible probability (less than 0.001) of obtaining a zero proportion in component two. The last two set of parameters are associated with non-negligible probabilities of obtaining zero proportions in either of the components (0.159 in each component for 2d, in each component for 2e). We fit the model to each of the datasets by numerical maximum likelihood and using algorithm B with ɛ = 1/500, N = 1000, e 0 = 1/10 and τ = The log-likelihood for the two component case is n i=1 ( ( I{y i = 0}Φ µ ) [ + I{y i = 1} 1 Φ σ ( 1 µ σ )] + I{0 < y i < 1}φ ( yi µ σ )),

9 A latent Gaussian model for compositional data with structural zeroes 9 Fig. 2. Results from fitting proposed latent Gaussian model to three simulated datasets with D = 2 components and n = 200 observations each, using maximum likelihood (grey) and using Algorithm B with a sequence of thresholds from e 0 = 1/10 to ɛ = 1/500 (black). We show 2.5% (solid), 25% (dotted), 50% (thick solid), 75% (dotted) and 97.5% (solid) quantiles for the parameters µ and log σ. 2a 2a µ log σ c 2c µ log σ d 2d µ log σ

10 10 Butler, A., Glasbey, C., Allcroft, D. & Wanless, S. Table 1. Number of evaluations of D per particle to fit the latent Gaussian model to simulated data with D = 2 components using Algorithm B. *: for 2e it was necessary to terminate the algorithm at ɛ = 1/250 because the number of evaluations becomes prohibitively large for smaller values of ɛ. Dataset ɛ = 1/125 ɛ = 1/250 ɛ = 1/500 2a b c d e * where φ and Φ respectively denote the density and distribution functions for a standard Gaussian random variable. The prior distributions for the ABC algorithm are taken to be µ U( 10, 11) and log σ U( 10, 10), which we regard as relatively uninformative. In Figure 2 results of the maximum likelihood and ABC analyses are shown for three of the datasets; results for 2b are qualitatively similar to those for 2a, and results for 2e are similar to those for 2d. The plots for all of the parameters and datasets suggest that the medians of the ABC samples are converging towards the maximum likelihood estimate as e t tends towards zero, and that convergence is relatively rapid (with results changing minimally for e t smaller than 1/300). The more extreme quantiles of the ABC samples (2.5%, 25%, 75%, 97.5%) become increasingly close to the corresponding quantiles of the distribution of the MLE as e t becomes small, but for datasets 2d and 2e, and to a lesser extent 2a, they continue to systematically underestimate the uncertainty within the estimator even when ɛ is very small. For dataset 2a we investigated the impact of changing the prior distributions - from π(µ) U( 5, 5) to either π(µ) U( 10, 10) or π(µ) U( 2, 2) and from π(log σ) U( 10, 2) to π(log σ) U( 5, 1) - of changing the seed used for random number generation, and of reducing the number of particles N (from 1000 down to 200), but we found that these modifications all had a negligible impact upon both the results obtained. We also attempted to run a standard MCMC algorithm using the analytic form of the likelihood but with the same prior distribution as for the ABC algorithm, but found that the results were very similar to those obtained via numerical maximum likelihood. We conclude that the underestimation of uncertainty by the ABC algorithm for the simulated datasets is a robust result, which probably results from inadequacies in the selection of test statistics S - although it is not apparent what the nature of this inadequacy might be. We might expect that estimation would become increasingly difficult as the probability of obtaining a zero proportion becomes higher, since zero proportions are regarded as censored data within the context of our model, so it is somewhat surprising that in Figure 2 convergence appear to occur most slowly for the dataset (2a) in which the proportion of zero values is smallest. This counter-intuitive result is at least partly explained by the fact that the acceptance rate of the sequential ABC algorithm varies enormously between the different datasets - Algorithm B is fairly efficient for datasets 2a, 2b and 2c (in terms of requiring a low number of evaluations of D per particle: Table 1), but highly inefficient for datasets 2d and 2e. For the purposes of comparison, we also attempted to apply Algorithm A to dataset 2a using the same priors and test statistics as for Algorithm B. For a (relatively large) threshold of 1/125 we found that Algorithm A required 20 million evalu-

11 A latent Gaussian model for compositional data with structural zeroes 11 Table 2. Number of evaluations of D per particle to fit the latent Gaussian model to dataset 2a using Algorithm B, for different values of the proposal standard deviation τ. τ ɛ = 1/125 ɛ = 1/ ations in order to generate N = 995 accepted parameter values - i.e. an acceptance rate of 0.005%, or a rate of evaluations per particle - illustrating the extreme inefficiency of this algorithm. Sensitivity analyses using dataset 2a suggest that the efficiency - i.e. acceptance rate - of Algorithm B is largely unaffected by the seed used for random number generation, by the number of particles N and by the choice of prior distribution (so long as this remains relatively uninformative). The efficiency is strongly dependent, however, upon the threshold ɛ and upon the standard deviation τ of the proposal distribution (Table 2). The acceptance rate is highest when the proposal standard deviation is taken to be very small - this reflects the fact that the acceptance rate declines very rapidly with e t if the proposal standard deviation is taken to be even moderately large, since as e t becomes small the set of plausible parameter values also becomes small. The risk of using a very small value for τ, however, is that we may fail to adequately explore the parameter space whilst e t is relatively large, and may consequently become trapped in area of the space that actually has low posterior probability when e t becomes small. It would probably be most efficient to allow τ to depend upon t, so that τ 0, τ 1,..., τ T 1 is a monotonically decreasing sequence, but we have not attempted to implement such a strategy here Three components When D = 3 our model contains five unknown parameters - the mean µ 1 and standard deviation σ 1 of the first component, the mean µ 2 and standard deviation σ 2 of the second component, and the correlation ρ between the first and second components (note that the model can clearly also be parameterised in other ways). We simulate three datasets from our model, each of size n = 200 and each generated using the same seed for random number generation. We take µ 1 = 1/3, µ 2 = 1/3, ρ = 1/2, and 3a: σ 1 = σ 2 = 0.1; 3b: σ 1 = σ 2 = 0.5; 3c: σ 1 = σ 2 = 1; so that the proportion of zero values is higher for dataset 3c than for 3b and higher for 3b than for 3a. The log-likelihood function can be expressed in terms of the density and distribution functions of bivariate and univariate Normal random variable (calculations not shown), and so is again straightforward to calculate. In Figure 3 we compare results obtained using maximum likelihood against those obtained using Algorithm B with N = 1000, τ = 0.05 and e 0 = 1/2 and ɛ = 1/125. Results are shown for the parameters µ 1 and ρ, but qualitatively similar results are obtained for the

12 12 Butler, A., Glasbey, C., Allcroft, D. & Wanless, S. Fig. 3. Results from fitting proposed latent Gaussian model to three simulated datasets with D = 3 components and n = 200 observations each, using maximum likelihood (grey) and using Algorithm B with a sequence of thresholds from e 0 = 1/2 to ɛ = 1/125 (black). We show 2.5% (solid), 25% (dotted), 50% (thick solid), 75% (dotted) and 97.5% (solid) quantiles for the parameters µ 1 and ρ. 3a 3a µ ρ b 3b µ ρ c 3c µ ρ

13 A latent Gaussian model for compositional data with structural zeroes 13 Table 3. Number of evaluations of D per particle to fit the latent Gaussian model to simulated data with D = 3 components using Algorithm B. Dataset ɛ = 1/50 ɛ = 1/100 ɛ = 1/125 3a b c remaining three parameters. We see that there the median values of the ABC samples generally tend to converge towards the MLE in a reasonably smooth way as e t becomes small, but that there are some parameters (e.g. ρ for dataset 3a) for which the ABC samples appear to be converging towards a value that is slightly different from this. More noticeably, the ABC procedure again tends to markedly underestimate the level of uncertainty within the MLEs, especially for those datasets (3b and 3c) in which the data exhibit a relatively high degree of variability. Note that we use a substantially larger value for ɛ here than in the two component case, essentially for computational reasons - the acceptance rate of algorithm is much lower when D = 3 than when D = 2 (Table 3), and the rate drops very sharply between ɛ = 1/125 and ɛ = 1/150 (not shown). The lower efficiency is related to the higher dimensionality of the parameter space (p = 5 rather than p = 2). 5. Application to seabird diet data We use numerical maximum likelihood to fit our model to the kittiwake diet data, and in Figure 4 show the contours of the density associated the maximum likelihood estimate. We see that the mean of the fitted density lies outside the unit simplex. Such behaviour is not precluded by the specification of our model, and serves only to indicate (1) that the prevalances associated with consumption differ quite substantially between the three diet types and (2) that the proportion of zero values within our data is relatively large. We also fit our model to the data using ABC (Algorithm B), and in Figure 5 compare the results against those obtained via maximum likelihood. We see that for all five of the parameters the properties of the ABC samples have converged fairly well to those of the maximum likelihood estimates by the time that we reach a threshold of ɛ = 1/100, although convergence occurs much more slowly than for the simulated data of Section 4. It is not clear whether the ABC samples are underestimating the uncertainty of the MLEs in this case, partly because the quantiles of the ABC samples decay much less smoothly with e t than they do for the simulated data. The sudden jumps in the properties of the ABC samples arise because of the discretised nature of the diet data - proportions are almost always rounded to the nearest 5%. We have focused here upon using our model to provide a description of the full dataset, pooled across years and colonies, but ecologists are predominantly interested in comparing the effects of colony, year and date-within-year upon dietary composition (Bull et al., 2004). Our methodology could, in principle, easily be applied to subgroups of the data and extended to account for the effects of covariates, but preliminary results suggest that there are likely to be practical difficulties in doing this. We attempted to fit the model separately to data for the two groups of colonies - marine and estuarine - that are identified by Bull et al. (2004) as being associated with quite distinct patterns of feeding behaviour, but find (a) that the level of uncertainty in the MLEs is very large for the estuarine group and (b)

14 14 Butler, A., Glasbey, C., Allcroft, D. & Wanless, S. Fig. 4. Contours of the probability density function associated with maximum likelihood estimates obtained by fitting the model to Kittiwake diet data for all colonies. Raw data are also shown. SE0 Other SE1

15 A latent Gaussian model for compositional data with structural zeroes 15 Fig. 5. Results from fitting proposed latent Gaussian model to Kittiwake diet data. Model fitted using maximum likelihood (grey) and using Algorithm B with a sequence of thresholds from e 0 = 1/2 to ɛ = 1/100 (black). We show 2.5% (solid), 25% (dotted), 50% (thick solid), 75% (dotted) and 97.5% (solid) quantiles for the five parameters of the model. We also plot the number of evaluations per particle as a function of e t. µ µ σ σ ρ evaluations per particle ε

16 16 Butler, A., Glasbey, C., Allcroft, D. & Wanless, S. that the ABC algorithms are unable to provide any kind of useful approximation to the distribution of the MLEs for either group. These problems probably result from the lack of data on the interior of the simplex S 3, which could - at least in the limiting case that there were no data on the interior of S 3 - make the parameters of the model unidentifiable. For this paper we have aggregated the diet data into three groups, whereas the raw data actually record composition in terms of seven groups - the other category is broken down in terms of clupeids, small gadidae, planktonic crustacea, polychaetes and an unknown category. The overall approach to modelling and inference that we have presented is applicable for any number of components D. As D increases, however, the proportion of data points that lie on the interior of S D will decrease - the kittiwake diet data contain 22 observations on the interior of S D when D = 3 but only 3 observations if we create D = 4 components by regarding clupeids as a distinct class - again creating problems with estimation. The efficiency of the ABC algorithm will also decrease, because of an increase in the dimensionality of the parameter space. 6. Concluding remarks In this paper we have outlined a novel methodology for analysing compositional data that contain structural zeroes. Our approach is specifically designed to deal with those situations in which the proportion of zero values within the data is reasonably large, since conventional approaches for the analysis of compositional data will not be appropriate in such circumstances, and we have successfully fitted the model to simulated and real datasets in which there are two or three components. Our proposed model effectively regards those data-points which contain zeroes as being partially censored, and estimation problems will consequently arise if the data contain a very high frequency of zero values - as for the Estuarine group in the Kittiwake diet data - because of a lack of identifiability. Conversely, if the data contain no zero values then the model will reduce to a multivariate normal distribution that is subject to sum constraints on the mean vector and covariance matrix. This latter remark indicates that great care should be taken when interpreting the parameters of our model, since it is well known (e.g. Aitchison, 1986) that covariances between variables which are subject to a unit-sum constraint do not have any natural interpretation in terms of the (in)dependence of those variables. It is also essential to verify that the fitted model does indeed provide a reasonable description of the observed data, and the development of appropriate diagnostic tools to assess model fit for latent Gaussian models is an area of ongoing research for us. The proposed model presents a challenging problem for statistical inference since it involves the calculation of integrals over regions which, at least for general D, cannot explicitly be defined. We have adopted an approximate approach that is based upon simulation. ABC methods were originally developed for use with specific forms of highly dependent, unreplicated data that arise in genetics and evolutionary biology (Beaumont et al., 2002; Leman et al., 2005; Thornton and Andolfatto, 2006), but in this paper we have demonstrated that such approximate methods can also successfully be used for analysing the kinds of replicated data that are frequently encountered in ecology and environmental science. The accuracy of ABC methods in providing a good approximation to the posterior distribution of θ y depends upon making appropriate choices for the summary statistics S and the distance measure ρ - we have selected these in a somewhat arbitrary fashion, based largely upon trial and error, and the performance of the ABC algorithms could potentially be improved

17 A latent Gaussian model for compositional data with structural zeroes 17 through the use of alternative choices for S and ρ. The performance of the ABC approach is also closely related to the extent to which the fitted model provides an adequate description of the actual data, and this is a connection which should provide an interesting avenue for future research. The number of parameters within our model is quite large (e.g. p = 5 when D = 3) relative to previous situations in which ABC methods been applied, and standard ABC algorithms based upon direct Monte Carlo simulation or Markov chain Monte Carlo are consequently subject to very low acceptance rates and poor mixing respectively. We have shown that a sequential Monte Carlo approach (Sisson et al., 2006) can be used to ensure that the ABC approach retains a reasonable level of efficiency even when the threshold ɛ is very small. The computational costs of running this algorithm could be reduced, possibly substantially, by more systematic selection of the proposal standard deviation σ and the sequence of intermediate thresholds {e 0, e 1,..., e T 1 }. Finally, Algorithm B assigns equal weights to each particle in the sequence {θ (1) t,..., θ (N) t }, but we could potentially obtain improved inferences by allowing the weights to vary according to the level of support which each particle receives from the data and the prior, as in the general ABC-PRC algorithm presented by Sisson et al., Acknowledgements Funding for this work was provided by the Scottish Executive Environment and Rural Affairs Department. Ken McKinnon (Edinburgh University) provided helpful comments on the algorithm in Section 2.1. The data were collected as part of the UK Seabird Monitoring Programme, and were kindly provided to us by the Centre for Ecology and Hydrology. References Aitchison, J. (1986). The Statistical Analysis of Compositional Data. London: Chapman and Hall. Aitchison, J. and J. W. Kay (2004). Possible solutions of some essential zero problems in compositional data analysis. In Compositional Data Analysis Workshop, October 2004, Girona, Spain. < Allcroft, D. J. and C. A. Glasbey (2003a). Analysis of crop lodging using a latent variable model. Journal of Agricultural Science 140, Allcroft, D. J. and C. A. Glasbey (2003b). A latent Gaussian Markov random field model for spatio-temporal rainfall disaggregation. Applied Statistics 52, Allcroft, D. J., C. A. Glasbey, and M. J. Paulo (2006). A latent Gaussian model for multivariate consumption data. To appear in Food Quality and Preference. Beaumont, M. A., W. Zhang, and D. J. Balding (2002). Approximate Bayesian computation in population genetics. Genetics 162, Bull, K., S. Wanless, D. A. Elston, F. Daunt, S. Lewis, and M. P. Harris (2004). Local-scale variability in the diet of Black-legged Kittiwakes rissa tridactyla. Ardea 92 (1),

18 18 Butler, A., Glasbey, C., Allcroft, D. & Wanless, S. Durban, M. and C. A. Glasbey (2001). Weather modelling using a multivariate latent Gaussian model. Agricultural and Forest Meteorology 109, Fletcher, R. (1981). Practical Methods of Optimization. Chichester (England): Wiley and Sons. Fry, J., T. R. L. Fry, and K. R. McLaren (2000). Compositional data analysis and zeros in micro data. Applied Economics 32 (8), Fu, Y. X. and W. H. Li (1997). Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14, Leman, S. C., Y. Chen, J. E. Stajich, M. A. F. Noor, and M. K. Uyenoyama (2005). Likelihoods from summary statistics: recent divergence between species. Genetics 171 (3), Marjoram, P., J. Molitor, V. Plagnol, and S. Tavare (2003). Markov chain Monte Carlo without likelihoods. PNAS 100 (26), Martín-Fernández, J. A., C. B. Barceló-Vidal, and V. Pawlowsky-Glahn (2000). Zero replacement in compositional data. In H. A. L. Kiers, J. P. Rasson, P. J. F. Groenen, and M. Schader (Eds.), Advances in Data Science and Classification. Proceedings of the 7th Conference of the International Federation of Classification Societies, University of Namur (Belgium), pp Springer-Verlag (Berlin). Martín-Fernández, J. A., C. B. Barceló-Vidal, and V. Pawlowsky-Glahn (2003). Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Mathematical Geology 35 (3), Pearson, K. (1897). Mathematical contributions to the theory of evolution: on a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc. Roy. Soc. 60, Plagnol, V. and S. Tavare (2004). Approximate Bayesian computation and MCMC. In N. H. (Ed.), Monte Carlo and Quasi-Monte Carlo Methods 2002, pp Springer-Verlag. Pritchard, J. K., M. T. Seielstad, A. Perez-Lezaun, and M. W. Feldman (1999). Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, Robert, C. P. (2004). Monte Carlo Statistical Methods. Springer Texts in Statistics. New York: Springer-Verlag. Sisson, S. A., Y. Fan, and M. M. Tanaka (2006). Sequential Monte Carlo without likelihoods. Submitted. Thornton, K. and P. Andolfatto (2006). Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of drosophila melanogaster. Genetics 172 (3), Weiss, G. and A. von Haeseler (1998). Inference of population history using a likelihood approach. Genetics 149,

A latent Gaussian model for compositional data with zeroes

A latent Gaussian model for compositional data with zeroes A latent Gaussian model for compositional data with zeroes Adam Butler and Chris Glasbey Biomathematics & Statistics Scotland, Edinburgh, UK Summary. Compositional data record the relative proportions

More information

Approximate Bayesian Computation: a simulation based approach to inference

Approximate Bayesian Computation: a simulation based approach to inference Approximate Bayesian Computation: a simulation based approach to inference Richard Wilkinson Simon Tavaré 2 Department of Probability and Statistics University of Sheffield 2 Department of Applied Mathematics

More information

arxiv: v1 [stat.me] 30 Sep 2009

arxiv: v1 [stat.me] 30 Sep 2009 Model choice versus model criticism arxiv:0909.5673v1 [stat.me] 30 Sep 2009 Christian P. Robert 1,2, Kerrie Mengersen 3, and Carla Chen 3 1 Université Paris Dauphine, 2 CREST-INSEE, Paris, France, and

More information

Fitting the Bartlett-Lewis rainfall model using Approximate Bayesian Computation

Fitting the Bartlett-Lewis rainfall model using Approximate Bayesian Computation 22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 Fitting the Bartlett-Lewis rainfall model using Approximate Bayesian

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

Tutorial on ABC Algorithms

Tutorial on ABC Algorithms Tutorial on ABC Algorithms Dr Chris Drovandi Queensland University of Technology, Australia c.drovandi@qut.edu.au July 3, 2014 Notation Model parameter θ with prior π(θ) Likelihood is f(ý θ) with observed

More information

An EM-Algorithm Based Method to Deal with Rounded Zeros in Compositional Data under Dirichlet Models. Rafiq Hijazi

An EM-Algorithm Based Method to Deal with Rounded Zeros in Compositional Data under Dirichlet Models. Rafiq Hijazi An EM-Algorithm Based Method to Deal with Rounded Zeros in Compositional Data under Dirichlet Models Rafiq Hijazi Department of Statistics United Arab Emirates University P.O. Box 17555, Al-Ain United

More information

Modeling daily precipitation in Space and Time

Modeling daily precipitation in Space and Time Space and Time SWGen - Hydro Berlin 20 September 2017 temporal - dependence Outline temporal - dependence temporal - dependence Stochastic Weather Generator Stochastic Weather Generator (SWG) is a stochastic

More information

Proceedings of the 2012 Winter Simulation Conference C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds.

Proceedings of the 2012 Winter Simulation Conference C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds. Proceedings of the 2012 Winter Simulation Conference C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds. OPTIMAL PARALLELIZATION OF A SEQUENTIAL APPROXIMATE BAYESIAN COMPUTATION

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Approximate Bayesian computation: methods and applications for complex systems

Approximate Bayesian computation: methods and applications for complex systems Approximate Bayesian computation: methods and applications for complex systems Mark A. Beaumont, School of Biological Sciences, The University of Bristol, Bristol, UK 11 November 2015 Structure of Talk

More information

Approximate Bayesian Computation for Astrostatistics

Approximate Bayesian Computation for Astrostatistics Approximate Bayesian Computation for Astrostatistics Jessi Cisewski Department of Statistics Yale University October 24, 2016 SAMSI Undergraduate Workshop Our goals Introduction to Bayesian methods Likelihoods,

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Sarah Filippi Department of Statistics University of Oxford 09/02/2016 Parameter inference in a signalling pathway A RAS Receptor Growth factor Cell membrane Phosphorylation

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas Multiple Imputation for Missing Data in epeated Measurements Using MCMC and Copulas Lily Ingsrisawang and Duangporn Potawee Abstract This paper presents two imputation methods: Marov Chain Monte Carlo

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

New Insights into History Matching via Sequential Monte Carlo

New Insights into History Matching via Sequential Monte Carlo New Insights into History Matching via Sequential Monte Carlo Associate Professor Chris Drovandi School of Mathematical Sciences ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Updating on the Kernel Density Estimation for Compositional Data

Updating on the Kernel Density Estimation for Compositional Data Updating on the Kernel Density Estimation for Compositional Data Martín-Fernández, J. A., Chacón-Durán, J. E., and Mateu-Figueras, G. Dpt. Informàtica i Matemàtica Aplicada, Universitat de Girona, Campus

More information

Approximate Bayesian Computation and Particle Filters

Approximate Bayesian Computation and Particle Filters Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra

More information

Bayes via forward simulation. Approximate Bayesian Computation

Bayes via forward simulation. Approximate Bayesian Computation : Approximate Bayesian Computation Jessi Cisewski Carnegie Mellon University June 2014 What is Approximate Bayesian Computing? Likelihood-free approach Works by simulating from the forward process What

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information

CoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain;

CoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain; CoDa-dendrogram: A new exploratory tool J.J. Egozcue 1, and V. Pawlowsky-Glahn 2 1 Dept. Matemàtica Aplicada III, Universitat Politècnica de Catalunya, Barcelona, Spain; juan.jose.egozcue@upc.edu 2 Dept.

More information

ABCME: Summary statistics selection for ABC inference in R

ABCME: Summary statistics selection for ABC inference in R ABCME: Summary statistics selection for ABC inference in R Matt Nunes and David Balding Lancaster University UCL Genetics Institute Outline Motivation: why the ABCME package? Description of the package

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

arxiv: v2 [stat.me] 16 Jun 2011

arxiv: v2 [stat.me] 16 Jun 2011 A data-based power transformation for compositional data Michail T. Tsagris, Simon Preston and Andrew T.A. Wood Division of Statistics, School of Mathematical Sciences, University of Nottingham, UK; pmxmt1@nottingham.ac.uk

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

The Dirichlet distribution with respect to the Aitchison measure on the simplex - a first approach

The Dirichlet distribution with respect to the Aitchison measure on the simplex - a first approach The irichlet distribution with respect to the Aitchison measure on the simplex - a first approach G. Mateu-Figueras and V. Pawlowsky-Glahn epartament d Informàtica i Matemàtica Aplicada, Universitat de

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC

One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC Biometrika (?), 99, 1, pp. 1 10 Printed in Great Britain Submitted to Biometrika One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC BY LUKE BORNN, NATESH PILLAI Department of Statistics,

More information

Fast Likelihood-Free Inference via Bayesian Optimization

Fast Likelihood-Free Inference via Bayesian Optimization Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Bayesian Indirect Inference using a Parametric Auxiliary Model

Bayesian Indirect Inference using a Parametric Auxiliary Model Bayesian Indirect Inference using a Parametric Auxiliary Model Dr Chris Drovandi Queensland University of Technology, Australia c.drovandi@qut.edu.au Collaborators: Tony Pettitt and Anthony Lee February

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016 Spatial Statistics Spatial Examples More Spatial Statistics with Image Analysis Johan Lindström 1 1 Mathematical Statistics Centre for Mathematical Sciences Lund University Lund October 6, 2016 Johan Lindström

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

An introduction to Approximate Bayesian Computation methods

An introduction to Approximate Bayesian Computation methods An introduction to Approximate Bayesian Computation methods M.E. Castellanos maria.castellanos@urjc.es (from several works with S. Cabras, E. Ruli and O. Ratmann) Valencia, January 28, 2015 Valencia Bayesian

More information

ASA Section on Survey Research Methods

ASA Section on Survey Research Methods REGRESSION-BASED STATISTICAL MATCHING: RECENT DEVELOPMENTS Chris Moriarity, Fritz Scheuren Chris Moriarity, U.S. Government Accountability Office, 411 G Street NW, Washington, DC 20548 KEY WORDS: data

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Approximate Bayesian Computation for the Stellar Initial Mass Function

Approximate Bayesian Computation for the Stellar Initial Mass Function Approximate Bayesian Computation for the Stellar Initial Mass Function Jessi Cisewski Department of Statistics Yale University SCMA6 Collaborators: Grant Weller (Savvysherpa), Chad Schafer (Carnegie Mellon),

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Modelling trends in the ocean wave climate for dimensioning of ships

Modelling trends in the ocean wave climate for dimensioning of ships Modelling trends in the ocean wave climate for dimensioning of ships STK1100 lecture, University of Oslo Erik Vanem Motivation and background 2 Ocean waves and maritime safety Ships and other marine structures

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

Discussion of Maximization by Parts in Likelihood Inference

Discussion of Maximization by Parts in Likelihood Inference Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm On the Optimal Scaling of the Modified Metropolis-Hastings algorithm K. M. Zuev & J. L. Beck Division of Engineering and Applied Science California Institute of Technology, MC 4-44, Pasadena, CA 925, USA

More information

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize

More information

Spatial point processes in the modern world an

Spatial point processes in the modern world an Spatial point processes in the modern world an interdisciplinary dialogue Janine Illian University of St Andrews, UK and NTNU Trondheim, Norway Bristol, October 2015 context statistical software past to

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Reduction of Random Variables in Structural Reliability Analysis

Reduction of Random Variables in Structural Reliability Analysis Reduction of Random Variables in Structural Reliability Analysis S. Adhikari and R. S. Langley Department of Engineering University of Cambridge Trumpington Street Cambridge CB2 1PZ (U.K.) February 21,

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML

More information

Monte Carlo Integration using Importance Sampling and Gibbs Sampling

Monte Carlo Integration using Importance Sampling and Gibbs Sampling Monte Carlo Integration using Importance Sampling and Gibbs Sampling Wolfgang Hörmann and Josef Leydold Department of Statistics University of Economics and Business Administration Vienna Austria hormannw@boun.edu.tr

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Bayesian model selection for computer model validation via mixture model estimation

Bayesian model selection for computer model validation via mixture model estimation Bayesian model selection for computer model validation via mixture model estimation Kaniav Kamary ATER, CNAM Joint work with É. Parent, P. Barbillon, M. Keller and N. Bousquet Outline Computer model validation

More information

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge Bayesian Quadrature: Model-based Approimate Integration David Duvenaud University of Cambridge The Quadrature Problem ˆ We want to estimate an integral Z = f ()p()d ˆ Most computational problems in inference

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian

More information

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Estimation of Operational Risk Capital Charge under Parameter Uncertainty Estimation of Operational Risk Capital Charge under Parameter Uncertainty Pavel V. Shevchenko Principal Research Scientist, CSIRO Mathematical and Information Sciences, Sydney, Locked Bag 17, North Ryde,

More information

Approximate Bayesian computation for the parameters of PRISM programs

Approximate Bayesian computation for the parameters of PRISM programs Approximate Bayesian computation for the parameters of PRISM programs James Cussens Department of Computer Science & York Centre for Complex Systems Analysis University of York Heslington, York, YO10 5DD,

More information

On Ranked Set Sampling for Multiple Characteristics. M.S. Ridout

On Ranked Set Sampling for Multiple Characteristics. M.S. Ridout On Ranked Set Sampling for Multiple Characteristics M.S. Ridout Institute of Mathematics and Statistics University of Kent, Canterbury Kent CT2 7NF U.K. Abstract We consider the selection of samples in

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information