Approximating mixture distributions using finite numbers of components
|
|
- Morgan Bond
- 5 years ago
- Views:
Transcription
1 Approximating mixture distributions using finite numbers of components Christian Röver and Tim Friede Department of Medical Statistics University Medical Center Göttingen March 17, 2016 This project has received funding from the European Union s Seventh Framework Programme for research, technological development and demonstration under grant agreement number FP HEALTH C. Röver, T. Friede Approximating mixture distributions March 17, / 23
2 Overview meta analysis example general problem: mixture distributions discrete grid approximations design strategy & algorithm application to meta analysis problem C. Röver, T. Friede Approximating mixture distributions March 17, / 23
3 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23
4 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23
5 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ Θ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23
6 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ Θ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23
7 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 Θ marginal posterior p(θ) estimation: via marginal posterior distribution of parameterθ C. Röver, T. Friede March 17, / 23
8 Meta analysis Context: random-effects meta-analysis 50% 90% 95% 99% marginal posterior p(τ) marginal posterior p(θ) two parameters parameter estimation with two unknowns: joint & marginal posterior distributions C. Röver, T. Friede March 17, / 23
9 Meta analysis Context: random-effects meta-analysis here: easy to derive one of the marginals: p(τ y) and conditional posteriors p(θ τ, y) p(τ y) =... (... function of y i, σ i,... ) p(θ τ, y) = Normal(µ = f 1 (τ),σ = f 2 (τ)) but main interest in other marginal: p(θ y) p(θ y) = joint {}}{ p(θ,τ, y) dτ = p(θ τ, y) }{{} conditional p(τ y) }{{} marginal dτ is a mixture distribution C. Röver, T. Friede March 17, / 23
10 Meta analysis Context: random-effects meta-analysis marginal posterior p(τ) conditional posterior p(θ τ i) marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23
11 Meta analysis Context: random-effects meta-analysis conditional mean + sd conditional mean conditional mean sd marginal posterior p(τ) conditional posterior p(θ τ i) marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23
12 Meta analysis Context: random-effects meta-analysis τ 1 marginal posterior p(τ) τ conditional posterior p(θ τ i) τ = τ 1 marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23
13 Meta analysis Context: random-effects meta-analysis τ 1 τ 2 τ 3 τ 4 marginal posterior p(τ) τ 1 τ 2 τ 3 τ conditional posterior p(θ τ i) τ = τ 1 τ = τ 2 τ = τ 3 τ = τ 4 marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23
14 Meta analysis Context: random-effects meta-analysis τ 1 τ 2 τ 3 τ 4 marginal posterior p(τ) τ 1 τ 2 τ 3 τ conditional posterior p(θ τ i) τ = τ 1 τ = τ 2 τ = τ 3 τ = τ 4 marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23
15 Mixture distributions The general problem mixture distribution: a convex combination of component distributions a distribution whose parameters are random variables ( conditional ) distribution with density p(y x) parameter x follows a distribution p(x) marginal / mixture is p(y) = X p(y x) dp(x) x discrete: p(y) = i p(y x i) p(x i ) ubiquitous in many applications Student-t distribution negative binomial distribution marginal distributions convolution... C. Röver, T. Friede March 17, / 23
16 Mixture distributions How to approximate? approximating the continuous mixture through a discrete set of points in τ... actual marginal: approximation: p(θ) = p(θ τ) p(τ) dτ p(θ) i p(θ τ i )π i Questions: how to set up the discrete grid of points? how well can we approximate? do we have a handle on accuracy? C. Röver, T. Friede March 17, / 23
17 Mixture distributions Motivation: discretizing a mixture τ 1 τ 2 τ 3 τ 4 conditional posterior p(θ τ i) τ = τ 1 τ = τ 2 τ = τ 3 τ = τ Note: conditional distributions p(θ τ, y) are very different for τ 1 and τ 2 and rather similar for τ 3 and τ 4. idea: may need fewer bins for larger τ values...?... bin spacing based on similarity / dissimilarity of conditionals? C. Röver, T. Friede March 17, / 23
18 Discretizing mixture distributions Setting up a binning need: discretization of the mixing distribution p(x). domain of X: R (or subset) define bin margins: x (1) < x (2) <... < x (k 1) bins: X i = {x : x x (1) } if i = 1 {x : x (i 1) < x x (i) } if 1 < i < k {x : x (k 1) < x} if i = k. reference points: x 1,..., x k, where x i X i bin probabilities: π i = P ( x (i 1) < x x (i) ) = P ( x Xi ) C. Röver, T. Friede March 17, / 23
19 Discretizing mixture distributions Setting up a binned mixture actual distribution: p(x, y) discrete approximation: q(x, y) same marginal (mixing distribution): q(x) = p(x) but binned conditionals: q(y x) = p(y x = x i ) for x X i. q similar to p, instead of conditioning on exact x, conditioning on corresponding bin s reference point x i marginal: q(y) = q(y x) q(x) dx = i π i p(y x i ) C. Röver, T. Friede March 17, / 23
20 Discretizing mixture distributions Setting up a binned mixture in previous example: bin margins: τ (1) = 10, τ (2) = 20, τ (3) = 30 reference points: τ 1 = 5, τ 2 = 15, τ 3 = 25, τ 4 = 35 probabilities: π 1 = 0.34, π 2 = 0.44, π 3 = 0.15, π 4 = 0.07 τ 1 τ 2 τ 3 τ 4 τ~ 1 τ ~ 2 τ ~ 3 τ ~ 4 τ(1) τ(2) τ(3) marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23
21 Similarity / dissimilarity of distributions Kullback-Leibler divergence The Kullback-Leibler divergence of two distributions with density functions p and q is defined as ( D KL p(θ) q(θ) ) ( p(θ) ) = log p(θ) dθ Θ q(θ) ( p(θ) ) = E p(θ) [log ] q(θ) The symmetrized KL-divergence of two distributions is defined as D s ( p(θ) q(θ) ) = DKL ( p(θ) q(θ) ) +DKL ( q(θ) p(θ) ) the symmetrized KL-divergence... is symmetric: D s ( p(θ) q(θ) ) = Ds ( q(θ) p(θ) ) is always positive: D s ( p(θ) q(θ) ) 0 C. Röver, T. Friede March 17, / 23
22 Divergence Interpretation How to interpret divergences? measure of discrepancy between distributions heuristically: expected log ratio of densities... relevant case here: p(x) q(x). D KL ( p(x), q(x) ) = 0 for p = q D KL ( p(x), q(x) ) = 0.01 corresponds to (expected) 1% difference in densities C. Röver, T. Friede March 17, / 23
23 Divergence Bin-wise maximum divergence: definition consider: divergence between reference point and other points within each bin define: d i { ( = max D s p(y x) p(y xi ) )} { ( ) } = max D s p(y x) q(y x), x X i x X i the bin-wise maximum divergence worst-case discrepancy introduced within each bin C. Röver, T. Friede March 17, / 23
24 Divergence Bin-wise maximum divergence: example τ~ 1 τ ~ 2 τ ~ 3 τ ~ 4 τ (1) τ (2) τ (3) recall: actual parameters of conditionals p(y x) (in black) vs. parameters of q(y x) assumed through binning (in red) C. Röver, T. Friede March 17, / 23
25 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) conditional density p(θ τ) C. Röver, T. Friede March 17, / 23
26 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) conditional density p(θ τ) C. Röver, T. Friede March 17, / 23
27 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) conditional density p(θ τ) C. Röver, T. Friede March 17, / 23
28 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) maximum conditional density p(θ τ) determine maximum d i for each bin i (usually at bin margin) C. Röver, T. Friede March 17, / 23
29 Bounding divergence Idea now consider divergences of true and approximate marginals p(y) and q(y) (not the conditionals!) what about D s ( p(y) q(y) )? having the individual bin-wise divergences d i, we can show: D s ( p(y) q(y) ) π i d i i max d i i in other words: by bounding bin-wise divergences (of conditionals) we can bound the overall divergence (of marginals) DIRECT (Divergence Restricted Conditional Tesselation) method C. Röver, T. Friede March 17, / 23
30 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 δ st reference point τ 1 at zero C. Röver, T. Friede March 17, / 23
31 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) δ st reference point τ 1 at zero, first margin τ (1) at C. Röver, T. Friede March 17, / 23
32 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23
33 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23
34 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23
35 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23
36 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23
37 Discretizing mixtures Sequential DIRECT algorithm 1st bin 2nd bin 3rd bin 4th bin (...) divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) τ ~ 4 τ (4) δ st reference point τ 1 at zero, first margin τ (1) at (... ) result: binning with bounded divergence ( δ) per bin C. Röver, T. Friede March 17, / 23
38 Discretizing mixtures Sequential DIRECT algorithm 1st bin 2nd bin 3rd bin 4th bin (...) divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) τ ~ 4 τ (4) δ st reference point τ 1 at zero, first margin τ (1) at (... ) result: binning with bounded divergence ( δ) per bin (when to stop?) C. Röver, T. Friede March 17, / 23
39 Discretizing mixtures Sequential DIRECT algorithm (variations possible) 1 Specify δ > 0, 0 ǫ 1, and starting reference point x 1 (e.g. minimum possible value, or ǫ 2 -quantile). Define ǫ 1 0 as ǫ 1 := P(X x 1 ). Set i = 1. 2 Set x = x 1. Obviously, D s ( p(y x1 ) p(y x ) ) = 0. Now increase x as far as possible while ensuring that D s ( p(y x1 ) p(y x ) ) δ. Use this point as the first bin margin: x (1) = x. Compute π 1 = P(x < x (1) ). Set i = i Increase x until D s ( p(y x(i 1) ) p(y x ) ) = δ. Use this point as the next reference point: x i = x. 4 Increase x again until D s ( p(y xi ) p(y x ) ) = δ. Use this point as the next bin margin: x (i) = x. 5 Compute the bin weight π i = P(x (i 1) < X x (i) ). 6 If P(X > x (i) ) > (ǫ ǫ 1 ), set i = i + 1 and proceed at step 3. Otherwise stop. C. Röver, T. Friede March 17, / 23
40 Discretizing mixtures General algorithm remaining issue: ignored ǫ > 0 tail probability (usually: problems at domain s margins) only need to keep track of reference points x i and probabilities π i meta-analysis example: 35 reference ( support ) points required (δ = 0.01, ǫ = 0.001) 50% 90% 95% 99% C. Röver, T. Friede March 17, / 23
41 Discretizing mixtures General algorithm remaining issue: ignored ǫ > 0 tail probability (usually: problems at domain s margins) only need to keep track of reference points x i and probabilities π i meta-analysis example: 35 reference ( support ) points required (δ = 0.01, ǫ = 0.001) 50% 90% 95% 99% C. Röver, T. Friede March 17, / 23
42 Conclusions approximation allows to compute density, quantiles, moments,... algorithm yields quick-and-easy solution need to specify error budget in terms of divergence δ tail probability ǫ also works for discrete distributions (p(x) or p(y x)) meta-analysis application: implemented in bayesmeta R package general procedure: C. Röver, T. Friede. Discrete approximation of a mixture distribution via restricted divergence. (submitted for publication.) C. Röver, T. Friede Approximating mixture distributions March 17, / 23
Evidence synthesis for a single randomized controlled trial and observational data in small populations
Evidence synthesis for a single randomized controlled trial and observational data in small populations Steffen Unkel, Christian Röver and Tim Friede Department of Medical Statistics University Medical
More informationEffect and shrinkage estimation in meta-analyses of two studies
Effect and shrinkage estimation in meta-analyses of two studies Christian Röver Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany December 2, 2016 This project has
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationθ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model
Hierarchical models (chapter 5) Introduction to hierarchical models - sometimes called multilevel model Exchangeability Slide 1 Hierarchical model Example: heart surgery in hospitals - in hospital j survival
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More information5 Mutual Information and Channel Capacity
5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationLecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationLatent Dirichlet Allocation
Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology
More informationThe Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models
The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationBayesian Statistics Comparing Two Proportions or Means
Bayesian Statistics Comparing Two Proportions or Means Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma Health Sciences Center May
More informationG8325: Variational Bayes
G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationHarrison B. Prosper. CMS Statistics Committee
Harrison B. Prosper Florida State University CMS Statistics Committee 08-08-08 Bayesian Methods: Theory & Practice. Harrison B. Prosper 1 h Lecture 3 Applications h Hypothesis Testing Recap h A Single
More informationInformation Theory and Communication
Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationDynamic models 1 Kalman filters, linearization,
Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models
More informationOrder Statistics and Distributions
Order Statistics and Distributions 1 Some Preliminary Comments and Ideas In this section we consider a random sample X 1, X 2,..., X n common continuous distribution function F and probability density
More informationEstimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking
FUSION 2012, Singapore 118) Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking Karl Granström*, Umut Orguner** *Division of Automatic Control Department of Electrical
More information5. Conditional Distributions
1 of 12 7/16/2009 5:36 AM Virtual Laboratories > 3. Distributions > 1 2 3 4 5 6 7 8 5. Conditional Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationVariational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M
A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M PD M = PD θ, MPθ Mdθ Lecture 14 : Variational Bayes where θ are the parameters of the model and Pθ M is
More informationGroup sequential designs with negative binomial data
Group sequential designs with negative binomial data Ekkehard Glimm 1 Tobias Mütze 2,3 1 Statistical Methodology, Novartis, Basel, Switzerland 2 Department of Medical Statistics, University Medical Center
More informationSelf Adaptive Particle Filter
Self Adaptive Particle Filter Alvaro Soto Pontificia Universidad Catolica de Chile Department of Computer Science Vicuna Mackenna 4860 (143), Santiago 22, Chile asoto@ing.puc.cl Abstract The particle filter
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationAn introduction to Variational calculus in Machine Learning
n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBios 6649: Clinical Trials - Statistical Design and Monitoring
Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver
More informationLecture 14. Clustering, K-means, and EM
Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationMarkov Chain Monte Carlo
Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationPhenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM
More informationStatistics 135 Fall 2007 Midterm Exam
Name: Student ID Number: Statistics 135 Fall 007 Midterm Exam Ignore the finite population correction in all relevant problems. The exam is closed book, but some possibly useful facts about probability
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationCS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018
CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More informationUnderstanding Covariance Estimates in Expectation Propagation
Understanding Covariance Estimates in Expectation Propagation William Stephenson Department of EECS Massachusetts Institute of Technology Cambridge, MA 019 wtstephe@csail.mit.edu Tamara Broderick Department
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationJoint Probability Distributions
Joint Probability Distributions ST 370 In many random experiments, more than one quantity is measured, meaning that there is more than one random variable. Example: Cell phone flash unit A flash unit is
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationIntroduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??
to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationMachine Learning Srihari. Information Theory. Sargur N. Srihari
Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy
More informationConjugate Predictive Distributions and Generalized Entropies
Conjugate Predictive Distributions and Generalized Entropies Eduardo Gutiérrez-Peña Department of Probability and Statistics IIMAS-UNAM, Mexico Padova, Italy. 21-23 March, 2013 Menu 1 Antipasto/Appetizer
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationBayesian estimation of complex networks and dynamic choice in the music industry
Bayesian estimation of complex networks and dynamic choice in the music industry Stefano Nasini Víctor Martínez-de-Albéniz Dept. of Production, Technology and Operations Management, IESE Business School,
More informationMethods of Estimation
Methods of Estimation MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Methods of Estimation I 1 Methods of Estimation I 2 X X, X P P = {P θ, θ Θ}. Problem: Finding a function θˆ(x ) which is close to θ.
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More information1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters
ISyE8843A, Brani Vidakovic Handout 6 Priors, Contd.. Nuisance Parameters Assume that unknown parameter is θ, θ ) but we are interested only in θ. The parameter θ is called nuisance parameter. How do we
More informationSequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs
Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Presented August 8-10, 2012 Daniel L. Gillen Department of Statistics University of California, Irvine
More informationBayesian Semiparametric GARCH Models
Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods
More informationNotes on pseudo-marginal methods, variational Bayes and ABC
Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution
More informationMODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY
ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.
More informationBayesian Semiparametric GARCH Models
Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationStatistics. Statistics
The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationtopics about f-divergence
topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments
More informationBayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang
Bayesian Dropout Tue Herlau, Morten Morup and Mikkel N. Schmidt Discussed by: Yizhe Zhang Feb 20, 2016 Outline 1 Introduction 2 Model 3 Inference 4 Experiments Dropout Training stage: A unit is present
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationPredictive Hypothesis Identification
Marcus Hutter - 1 - Predictive Hypothesis Identification Predictive Hypothesis Identification Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive
More informationSTAT 430/510: Lecture 15
STAT 430/510: Lecture 15 James Piette June 23, 2010 Updates HW4 is up on my website. It is due next Mon. (June 28th). Starting today back at section 6.4... Conditional Distribution: Discrete Def: The conditional
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationOn the Chi square and higher-order Chi distances for approximating f-divergences
On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen, Senior Member, IEEE and Richard Nock, Nonmember Abstract We report closed-form formula for calculating the
More informationInstructor: Dr. Volkan Cevher. 1. Background
Instructor: Dr. Volkan Cevher Variational Bayes Approximation ice University STAT 631 / ELEC 639: Graphical Models Scribe: David Kahle eviewers: Konstantinos Tsianos and Tahira Saleem 1. Background These
More informationSpring 2006: Examples: Laplace s Method; Hierarchical Models
36-724 Spring 2006: Examples: Laplace s Method; Hierarchical Models Brian Junker February 13, 2006 Second-order Laplace Approximation for E[g(θ) y] An Analytic Example Hierarchical Models Example of Hierarchical
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics KTH Royal Institute of Technology jimmyol@kth.se Lecture 2 Random number generation 27 March 2014 Computer Intensive Methods
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More information