An Application of MCMC Methods and of Mixtures of Laws for the Determination of the Purity of a Product.
|
|
- Wilfred Oliver
- 5 years ago
- Views:
Transcription
1 An Application of MCMC Methods and of Mixtures of Laws for the Determination of the Purity of a Product. Julien Cornebise & Myriam Maumy E.S.I-E-A LSTA 72 avenue Maurice Thorez, Université Pierre et Marie Curie - Paris VI IVRY SUR SEINE, France 175 rue du Chevaleret, ( cornebis@et.esiea.fr) PARIS, France IRMA Université Louis Pasteur - Strasbourg I 7 rue René Descartes, STRASBOURG Cedex, France ( mmaumy@math.u-strasbg.fr) Abstract The aim of this article is to illustrate practical issues that may arise when applying MCMC methods in a real case. MCMC methods are used over a model of mixture of distributions, in order to determine honnesty specifications about some kind of product. First, assuming a known number of components, parameters of each component are estimated using Gibbs sampling. Convergence and label-switching are taken into account. Then, determination of the number of components is mentioned, through model selection with Bayes Factors. Résumé Le but de cet article est d illustrer des questions partiques qui peuvent survenir lorsque l on applique des méthodes MCMC à un cas réel. Ces dernières sont utilisées sur un modèle de mélange de lois, afin de déterminer des spécifications d honnêteté sur un certain type de produits. En travaillant un premier temps à nombre de composantes fixé, les paramètres de chaque composante sont estimés à l aide de l échantilloneur de Gibbs. La convergence et le label-switching sont pris en compte. Enfin, la détermination du nombre de composantes est envisagé, à l aide des Facteurs de Bayes. 1. The Problem and its Modelization A particular kind of products, avalaible in stores almost everywhere, is subject to different kind of adulteration (or fraud). Some products are pure and thus have low glucose and xylose rates, while some other have been chemically modified (adjunction of extra-components), wich reduces their quality and increases their glucose and xylose rates. Given a set of 1002 observations of the couple (glucose rate, xylose rate), can we determine : (i) the number K of kinds of production, and their parameters (mean, standard deviation), (K 1 different possible modifications, plus one for pure products), (ii) the proportion of each kind of production, and (iii) a way to detect wether a product 1
2 is pure or not (e.g. the 99% quantile of the distribution of the glucose for the honnest kind of production)? We here only consider the univariate case, i.e. glucose and xylose rates separately. The approach could later be generalised to the two-dimensional variable. We modelize the distribution of the glucose (resp. the xylose) as a mixture of normal laws. We consider that the T = 1002 observations from the sample come from K distinct populations (1 honest and K 1 modified), each population k {1,..., K} following a normal law of density f k and of parameters θ k = (µ k, σ k ). The density function of an observation x i, 1 i T is : [x i π, θ] = K k=1 π kf k (x i θ k ), where f k (. θ k ) is the density of a normal law of parameters θ k = (µ k, σ k ) and π k is the probability of belonging to population k, with K k=1 π k = 1. The parameters of interest are θ = (θ 1,..., θ K ) and π = (π 1,..., π K ). The matter of choosing the value of K will be mentioned later. The hypothesis of normality of the law may (and have to) be reconsidered in a second time (e.g. for log-normal distribution). Other parametrisations for mixtures of normal laws have been published, for example by Robert in Droesbeke et al. (2002), wich avoids some label-switching (see below) troubles, but miss the natural physical interpretation, very useful here. The parameters θ k of each law are considered as random variables, with their own distribution, and so the notation [..] is used for the conditional density and for the parametrized density. The distribution of density [θ, π] is the prior distribution, and its parameters are the hyperparameters. They allow to take mathematically into account the prior knowledge of the experts of the field, if avalaible (here, the potential informations held by the chemists). The general aim of a study is to estimate a function of the real values of the parameters F (θ, π). This can be done by calculating E[F (θ, π) x] = F (θ, π)[θ, π x]d(θ, π), where [θ, π x] is the density of the posterior distribution, i.e. the distribution of the parameters conditionnaly to the observations x = (x 1,..., x n ). It can be computed via the Bayes Formula : [θ, π x] = [x θ, π][θ, π]/[x], where [x] is the density of the prior distribution of the observations, which can be taken as constant and thus ignored. The first question is therefore to choose the prior distribution of the parameters, [θ, π]. Before going any further, we introduce the hidden variable z i, i {1,..., T } in the model mentioned above, which is not observed and is thus named latent variable. z i {1,..., K} indicates the original population of the observation x i, and z = (z 1,..., z T ). Thus, the z i are i.i.d, with [z i = k π] = π k, [x i θ, π, z i = k] = N (x µ k, σ k ), where N (. µ, σ) is the density of a univariate normal law. Analysis of mixture distributions by MCMC methods have been the subject of many publications, for example Diebolt and Robert (1990), Richardson and Green (1997), Stephens (1997), Marin et al. (to appear). 2. Choosing the Prior and its Hyperparameters The choice of the prior distribution of the parameters is always a much discussed point 2
3 in any bayesian analysis. Several cases may arise. Either the experts of the field have valuable information about the distribution of the parameters (for example, the mean of each component is approximately known); or they don t have any information at all (or it should be ignored, for blind validation purposes). In this last case, we can either use empirical prior, i.e. hyperparameters built upon the data, or use non-informative prior, i.e. prior carrying no information at all, or poorly informative, i.e. carrying as few information as possible. Moreover, the prior is often chosen in a closed-by-sampling or conjugate prior family, i.e. such that conditionning by the sample (passing from the prior to the posterior distribution) only result in a change of the hyperparameters, not in a change of family. This simplifies implementation. This part of the bayesian analysis is the most subjective. A sensitivity analysis should be done after the study, to test the stability of the results relatively to hyperparameters and prior choices. Here we have chosen the following model, that often arises, because each law belongs to a closed-by-sampling family : π = (π 1,..., π K ) Di(a 1,..., a K ), µ k σ k N (m k, σ k / c k ), σ 2 k IG(α k, β k ) where Di is a Dirichlet law, and IG an Inverse Gamma. The hyperparameters are a k, m k, c k, α k, and β k, k {1,..., K}. For non-informative prior, we should have a Uniform law for π, which can be obtained with a 1 =... = a K = 1, a Dirac density for σk 2, which can be obtained with limit values for α k and β k, and a unary density on R for each µ k, which is more difficult to implement with limit values on c k. In practice, non-informative prior is not so easy to deal with : implementation under Matlab or Scilab, for example, it is not always possible due to infinite values of the parameters, of to particular densities. Therefore, poorly informative prior, or even empirical prior, should be used. This is what we have done here. We have chosen : k {1,..., K} π k = 1, c k = 1, m k = x (empirical mean), α k = K, and β k = (T (K 1)) 1 T i=1 (x i x) 2. Thus, the proportions are non-informative, the mean of the components equals the sample s one, and the mean of σk 2 is equal to the empirical variance (E[σk 2] = (α k 1)β k for Inverse Gamma). For reasons that should become clear further (related to label-switching problems and bayes factors), we have choosen the same hyperparameters for each one of the K components : this maintains the symmetry of the density (and therefore of the likelihood). The choice of prior is cleary the weakest, the most arguable, and the most argued point of the analysis. Each of the many approaches has its good and bad points. The approach chosen here is neither the most rigorous one, nor the purest, but allows easy implementation. Further discussions about the choice of the prior can be found, for example in Droesbeke et al. (2002), and Gelman et al. (2003). 3. Gibbs Sampling, Complete Conditionnal Laws The evaluation of the expectation E[F (θ, π) x] is hard to achieve, either analitically or numerically, due either to its complex expression or to its highly multidimensional 3
4 nature. MCMC methods are a good way to solve this problem. The principle of Monte- Carlo methods is to use N independent realizations (θ (i), π (i) ) of random variables following the posterior distribution [θ, π x], and to approximate : F (θ, π)[θ, π x]d(θ, π) i=n i=1 F (θ(i), π (i) )/N. Here F will be first the identity function, to estimate the parameters, then the 99%-quantile function of the component corresponding to pure product. In this last case, in order to lower the producer-risk, some praticians advise to use the empirical 95%-quantile of the sampled values F (θ (i), π (i) ), instead of their empirical mean. Markov-Chain part of MCMC methods solves the problem of sampling from the posterior distribution. Different algorithms exist. The two main are Metropolis-Hastings, and the Gibbs Sampler. We have used this last one, which is quite straightforward to implement. Its aim is to sample from the joint posterior distribution of (θ, π). It s based on the complete conditional law. The general principle is the following. Let θ = (θ 1,..., θ n ) the vector of the parameters of the model. Here, we have θ = (θ, π) = (µ 1, σ 1,..., µ K, σ K, π 1,..., π K ). Let θ (i) = (θ 1,..., θ i 1, θ i+1,..., θ n ) the vector θ without its i th component. The density of the complete conditional law of θ i is [θ i θ (i), x], whose closed form is often known. Let s assume that we know all complete conditionnal laws, i.e. [θ i θ (i), x], i {1,..., n}. Let s also take arbitrary initial values θ (0) = (θ (0) 1,..[., θ n (0) ). The Gibbs Sampling algorithm consists in successively sampling θ (j) 1 from θ 1 θ (j 1) 2, θ (j 1) 3, θ (j 1) 4,..., θ n (j 1), ] [ ] [ ] θ (j) 2 from θ 2 θ (j) 1, θ (j 1) 3, θ (j 1) 4,..., θ n (j 1),, θ n (j) from θ n θ (j) 1, θ (j) 2, θ (j) 3,..., θ (j) n 1, for j {1,..., N + M}, where M is a number of iterations that will be discarded. They are called burn-in iterations, and correspond to the time before convergence. Eventually, it can be shown that θ (M) = (θ (M) 1,..., θ n (M) ) converges in law to the posterior joint distribution [θ 1,..., θ n x]. The following N values are then considered as a sample from this distribution, and can be used to approximate F (θ) by empirical mean, as mentioned above. In our model, we have the following complete conditional laws (in order to simplify the notations, the list of all parameters but θ are figured by (θ)) : z i (z i ), x Mu(π 1,... π K ) π (π), x Di(a 1 + n 1,..., a K + n K ), where n k = i:z i =k 1 µ k (µ k ), x N ( m kc k +s k σ c k +n k, nk k +c k ) σ 2 k (σ k), x IG ( ( α k + n k+1, β 2 k + 1 ck (µ 2 k m k ) 2 + i:z i =k (x k µ k ) 2)) Now the sampling can take place, for example with M = 5000 and N = Convergence issues are discussed below. Metropolis Hastings algorithm details and variants of the Gibbs Sampler can be found in Droesbeke et al. (2002), in Richardson and Green (1997), or also in Marin et al. (2004). 4. Convergence Issues 4
5 Since the historically first convergence diagnostic (known as the thick pen one, Gelfand and Hills (1990)), there exist two main kinds of methods to determine wether the sampler has reached convergence or not, i.e. wether the values from the current generation can be considered as a sample of the target distribution. They are single-chain convergence diagnostic, and multi-chains ones. Single-chain diagnostics have clearly not been advised by Gelman and Rubin (1992). The principle of muti-chains diagnostics is to run multiple independent chains from very different starting points, and to test wether the last values of each chain come from the same distribution or not. If that is the case, we can assume that convergence has been reached. Gelmand and Rubin (1992) have proposed a method which is often used, based on the comparison of within and between-chains variances. At the beginning of the sampling, the chains are much influenced by the starting point, and the between-chains variance is high above the within-chain one. When each chain as reached the target distribution, the ratio between within and between-chain variance should be around 1. This method as been much improved since then, and some more subtle tests are avalaible. This method (that won t be detailed here, to save space) is quite straightforward to implement, and has proven to be efficient in many cases. But here, we encounter an important problem, making it almost impossible to use : the label switching problem. 5. The Label Switching Problem A particularity of the mixture of laws is that the density of the observations, as well as the likelihood and the posterior joint density of the parameters (which is the target distribution of the Gibbs Sampler) is symmetrical, i.e. invariant by permutation of the components. Therefore, this last density has up to K! duplications of each mode, and the sampler can move from one mode to another freely, thus permuting the components. The absence of label-switching means that the sampler is stuck in a local mode, maybe because modes are well separated (e.g. when K is very low). The space of parameters is thus not completely explored, which is not sane. When estimating a function F (θ, π) which is also invariant by permutation of the components (e.g. estimating the density at a given point), label-switching is not a problem. But when trying to estimate the parameters of each component, this label-switching has to be undone. Two approaches come in mind : imposing an identification constraint during the sampling, or post-processing the sampled values to undo the permutation. The first solution constrains the exploration of the space of parameters by the sampler, and thus may alter the results, see Celeux and Robert (2000), Marin et al. (2004), and Stephens (2000). Forcing the prior to be highly separable (using much different hyperparameters for components) isn t a good idea neither : label switching arises anyway, and the resulting lack of correspondance between the prior and the component would corrupt any interpretation the prior (such as Bayes factors). We applied here the second solution. The post-processing must be done very carefully : ordering on µ k, or on σ k, or on π k, isn t a good idea. Some components may be close in 5
6 mean but not in variance, and vice-versa. Some methods use a loss function and clustering algorithms (e.g. K-means with K! classes) to determine to which mode (i.e. permutation) belongs each of the sampled vectors of parameters. The reader is invited to read the three references above for more details. Label-switching is alas incompatible with convergence diagnostics such as these mentionned in paragraph 4 : comparing the variance between chains is meaningless when components can swap! Moreover, clustering assumes that convergence has been reached, it would thus be nonsense to use variance-based diagnostics on post-processed samples. 6. Choosing a Model : Bayes Factors Until now, we ve worked with a given number K of components. The question is now to choose between different models. Let M = {M 1,..., M K } be a finite ensemble of possible models (each one with k components, up to K, K = 7 in our application) to explain the observations x i, parametrized by θ. The best model within the K possible is the one with the highest a posteriori probability. The a posteriori probability of model M k is obtained via Bayes formula : [M k x] = ([M k ][x M k ]) / ( M k M [M k][x M k ] ), where [M k ] is an a priori probability of the validity of M k, with M k M [M k] = 1 (e.g. k [M k ] = 1/K), and [x M k ], defined by [x M k ] = [x θk ][θ k ] dθ k is the a priori predictive law of x under model M k. Let s consider M 1 et M 2. The ratio between a posteriori probabilities [M 2 x]/[m 1 x] = ([M 2 ][x M 2 ]) / ([M 1 ][x M 1 ]) is a bet a posteriori in favor of M 2 compared to M 1. The ratio B 21 = [x M 2 ]/[x M 1 ], modifying the bet a priori [M 2 ]/[M 1 ] is called Bayes factors of model M 2 relatively to M 1. Kass and Raftery (1995) suggest a scale to interpret the Bayes factors, wich can be valid for a first indication, but is far from being general. The evaluation of Bayes factors requires integration, wich can be done with MCMC methods. Nevertheless, sampling from the prior distribution [θ k ] wouldn t be efficient because the prior is often very flat. Kass and Raftery (1995) advises to use another method, based on samples from the posterior distribution [θ x], making once more use of the Bayes formula. It implies to continue the Gibbs sampler, setting each parameter one after the other to its estimate, as described in Chib (1995) and Carlin and Chib (1995). There to label-switching is a problem, and that s why other methods such as Reversible Jump or Birth Death processes are more and more often used, as in Stephens (2000). 7. Possible Improvements, Conclusion We have tried here to illustrate some basic technics of MCMC Bayesian Analysis. They are quite straightforward to implement, and easy to use in a first approach of the problem. Nevertheless they suffer of some particular drawbacks, such as label-switching avoiding convergence diagnostic and even easy estimation of the parameters of interest. That s why many more efficient developments have been achieved these last years. However, this kind of algorithm still deals better with this real-case dataset than other algorithms we have tried on it (such as EM algorithm for example). 6
7 Many points are still to be improved in our work : hypothesis of normality should be reconsidered (for log-normality, maybe), the choice of the prior, analysis of sensitivity, and convergence diagnostic. This is an illustration of the difficulties that praticians may face before benefitting from the powerful tools that are MCMC Bayesian Methods. The authors would like to thank Philippe Girard (Nestlé Research Center, Lausanne, Switzerland), Eric Parent, (ENGREF, Paris, France), and Luc Perrault (Hydro-Québec, Varennes, Québec, Canada) for their help and support. Bibliography [1] B.P. Carlin and S. Chib. (1995) Bayesian model choice via markov chain monte carlo methods. J. R. Statist. Soc. B, 57, [2] G. Celeux, M. Hurn, and C. Robert. (2000) Computational and inferential difficulties with mixture posterior distributions. J. Am. Stat. Assoc., 95, [3] S. Chib. (1995) Marginal likelihood from the gibbs output. J. Am. Stat. Assoc., 90, [4] J. Diebolt and C.P. Robert. (1990) Bayesian estimation of infnite mixture distributions, part i : Theoretical aspects. Technical Report 110, LSTA, Université Paris VI. [5] J.J. Droesbeke, J. Fine, and G. Saporta, editors. (2002) Méthodes Bayésiennes en statistique. Technip. [6] A.E. Gelfand, S.E. Hills, A. Rancine-Poon, and A.F.M. Smith. (1990) Illustration of bayesian inference in normal data models using gibbs sampling. J. Amer. Stat. Assoc, 85, [7] A. Gelman and D.B. Rubin. (1992) Inference from iterative simulation using multiple sequence (with discussion). Statistical Science, 7, [8] A. Gelman and D.B. Rubin. (1992) A single series from the gibbs sampler provides a false sense of security (with discussion). In J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, editors, Bayesian Statistics 4, pages Oxford University Press. [9] A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin. (2003) Bayesian Data Analysis. Chapman and Hall. [10] R.E. Kass and A.E. Raftery. (1995) Bayes factors. J. Am. Stat. Assoc., 90, [11] J.M. Marin, K. Mengersen, and C.P. Robert. Bayesian modelling and inference on mixtures of distributions. (to appear). D. Dey and C.R. Rao, editors, Handbook of Statistics 25. Elsevier-Sciences. [12] S. Richardson and P.J. Green. (1997) On bayesian analysis of mixtures with an unknown number of components. J. R. Statist. Soc. B, 59(4), [13] M. Stephens. (1997) Bayesian Methods for Mixtures of Normal Distributions. PhD thesis, Magdalen College, Oxford. [14] M. Stephens. (2000) Bayesian analysis of mixtures with an unknown number of components : an alternative to reversible jump methods. Annals of Statistics. [15] M. Stephens. (2000) Dealing with label-switching in mixture models. J. R. Stat. Soc. B, 62,
A Practical Implementation of the Gibbs Sampler for Mixture of Distributions: Application to the Determination of Specifications in Food Industry
A Practical Implementation of the Gibbs Sampler for Mixture of Distributions: Application to the Determination of Specifications in Food Industry Julien Cornebise 1, Myriam Maumy 2, and Philippe Girard
More informationOn a multivariate implementation of the Gibbs sampler
Note On a multivariate implementation of the Gibbs sampler LA García-Cortés, D Sorensen* National Institute of Animal Science, Research Center Foulum, PB 39, DK-8830 Tjele, Denmark (Received 2 August 1995;
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationBayesian Inference. Chapter 1. Introduction and basic concepts
Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationHmms with variable dimension structures and extensions
Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating
More informationA note on Reversible Jump Markov Chain Monte Carlo
A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationSimulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris
Simulation of truncated normal variables Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Abstract arxiv:0907.4010v1 [stat.co] 23 Jul 2009 We provide in this paper simulation algorithms
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationBayesian Modelling and Inference on Mixtures of Distributions Modelli e inferenza bayesiana per misture di distribuzioni
Bayesian Modelling and Inference on Mixtures of Distributions Modelli e inferenza bayesiana per misture di distribuzioni Jean-Michel Marin CEREMADE Université Paris Dauphine Kerrie L. Mengersen QUT Brisbane
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationMarkov Chain Monte Carlo in Practice
Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France
More informationDifferent points of view for selecting a latent structure model
Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationSurveying the Characteristics of Population Monte Carlo
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 7 (9): 522-527 Science Explorer Publications Surveying the Characteristics of
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationA Simple Solution to Bayesian Mixture Labeling
Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20 A Simple Solution to Bayesian Mixture Labeling
More informationBayesian Mixture Labeling by Minimizing Deviance of. Classification Probabilities to Reference Labels
Bayesian Mixture Labeling by Minimizing Deviance of Classification Probabilities to Reference Labels Weixin Yao and Longhai Li Abstract Solving label switching is crucial for interpreting the results of
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationKobe University Repository : Kernel
Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI URL Note on the Sampling Distribution for the Metropolis-
More informationThe Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model
Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationLikelihood-free MCMC
Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte
More informationPetr Volf. Model for Difference of Two Series of Poisson-like Count Data
Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like
More informationBayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder
Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov
More informationA comparison of reversible jump MCMC algorithms for DNA sequence segmentation using hidden Markov models
A comparison of reversible jump MCMC algorithms for DNA sequence segmentation using hidden Markov models Richard J. Boys Daniel A. Henderson Department of Statistics, University of Newcastle, U.K. Richard.Boys@ncl.ac.uk
More informationBayesian finite mixtures with an unknown number of. components: the allocation sampler
Bayesian finite mixtures with an unknown number of components: the allocation sampler Agostino Nobile and Alastair Fearnside University of Glasgow, UK 2 nd June 2005 Abstract A new Markov chain Monte Carlo
More informationBayesian Inference for the Multivariate Normal
Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationAn adaptive kriging method for characterizing uncertainty in inverse problems
Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p98 An adaptive kriging method for characterizing uncertainty in inverse problems FU Shuai 2 University Paris-Sud & INRIA,
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationRecursive Deviance Information Criterion for the Hidden Markov Model
International Journal of Statistics and Probability; Vol. 5, No. 1; 2016 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Recursive Deviance Information Criterion for
More informationAn empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods
Statistics and Computing 14: 323 332, 2004 C 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods
More informationABC methods for phase-type distributions with applications in insurance risk problems
ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationMarginal maximum a posteriori estimation using Markov chain Monte Carlo
Statistics and Computing 1: 77 84, 00 C 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Marginal maximum a posteriori estimation using Markov chain Monte Carlo ARNAUD DOUCET, SIMON J. GODSILL
More informationSession 3A: Markov chain Monte Carlo (MCMC)
Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte
More informationThe Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.
Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics A Bayesian sparse finite mixture model for clustering data from a heterogeneous population Erlandson F. Saraiva a, Adriano K. Suzuki b and
More informationBayesian Phylogenetics:
Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes
More informationBLIND SEPARATION OF TEMPORALLY CORRELATED SOURCES USING A QUASI MAXIMUM LIKELIHOOD APPROACH
BLID SEPARATIO OF TEMPORALLY CORRELATED SOURCES USIG A QUASI MAXIMUM LIKELIHOOD APPROACH Shahram HOSSEII, Christian JUTTE Laboratoire des Images et des Signaux (LIS, Avenue Félix Viallet Grenoble, France.
More informationLecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH
Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationHeriot-Watt University
Heriot-Watt University Heriot-Watt University Research Gateway Prediction of settlement delay in critical illness insurance claims by using the generalized beta of the second kind distribution Dodd, Erengul;
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationSupplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements
Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model
More informationBayes: All uncertainty is described using probability.
Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w
More informationItem Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions
R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&
More informationIntroduction to Markov Chain Monte Carlo & Gibbs Sampling
Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationStatistical Inference for Stochastic Epidemic Models
Statistical Inference for Stochastic Epidemic Models George Streftaris 1 and Gavin J. Gibson 1 1 Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS,
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationModel-based cluster analysis: a Defence. Gilles Celeux Inria Futurs
Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.
More informationMarkov-Chain Monte Carlo
Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ. References Recall: Sampling Motivation If we can generate random samples x i from a given distribution P(x), then we can
More informationGibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation
Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation Ian Porteous, Alex Ihler, Padhraic Smyth, Max Welling Department of Computer Science UC Irvine, Irvine CA 92697-3425
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationBayesian inference for multivariate extreme value distributions
Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016 Main motivation For a parametric model Z F θ of
More informationBayesian Inference and Decision Theory
Bayesian Inference and Decision Theory Instructor: Kathryn Blackmond Laskey Room 2214 ENGR (703) 993-1644 Office Hours: Tuesday and Thursday 4:30-5:30 PM, or by appointment Spring 2018 Unit 6: Gibbs Sampling
More informationPOSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL
COMMUN. STATIST. THEORY METH., 30(5), 855 874 (2001) POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL Hisashi Tanizaki and Xingyuan Zhang Faculty of Economics, Kobe University, Kobe 657-8501,
More informationWhat is the most likely year in which the change occurred? Did the rate of disasters increase or decrease after the change-point?
Chapter 11 Markov Chain Monte Carlo Methods 11.1 Introduction In many applications of statistical modeling, the data analyst would like to use a more complex model for a data set, but is forced to resort
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationMonte Carlo Integration using Importance Sampling and Gibbs Sampling
Monte Carlo Integration using Importance Sampling and Gibbs Sampling Wolfgang Hörmann and Josef Leydold Department of Statistics University of Economics and Business Administration Vienna Austria hormannw@boun.edu.tr
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationDoing Bayesian Integrals
ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationA generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection
A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection Silvia Pandolfi Francesco Bartolucci Nial Friel University of Perugia, IT University of Perugia, IT
More informationWeakly informative reparameterisations for location-scale mixtures
Weakly informative reparameterisations for location-scale mixtures arxiv:1601.01178v4 [stat.me] 31 Jul 2017 Kaniav Kamary Université Paris-Dauphine, CEREMADE, and INRIA, Saclay Jeong Eun Lee Auckland University
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown
More informationBayesian Sparse Correlated Factor Analysis
Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in
More informationMixture models. Mixture models MCMC approaches Label switching MCMC for variable dimension models. 5 Mixture models
5 MCMC approaches Label switching MCMC for variable dimension models 291/459 Missing variable models Complexity of a model may originate from the fact that some piece of information is missing Example
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof
More informationAnalysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*
Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationBayesian Methods in Multilevel Regression
Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design
More informationBayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference
Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of
More informationAdvanced Statistical Modelling
Markov chain Monte Carlo (MCMC) Methods and Their Applications in Bayesian Statistics School of Technology and Business Studies/Statistics Dalarna University Borlänge, Sweden. Feb. 05, 2014. Outlines 1
More informationLecture 6: Markov Chain Monte Carlo
Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline
More informationBayesian time series classification
Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More information