Approximating mixture distributions using finite numbers of components

Size: px
Start display at page:

Download "Approximating mixture distributions using finite numbers of components"

Transcription

1 Approximating mixture distributions using finite numbers of components Christian Röver and Tim Friede Department of Medical Statistics University Medical Center Göttingen March 17, 2016 This project has received funding from the European Union s Seventh Framework Programme for research, technological development and demonstration under grant agreement number FP HEALTH C. Röver, T. Friede Approximating mixture distributions March 17, / 23

2 Overview meta analysis example general problem: mixture distributions discrete grid approximations design strategy & algorithm application to meta analysis problem C. Röver, T. Friede Approximating mixture distributions March 17, / 23

3 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23

4 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23

5 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ Θ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23

6 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ Θ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23

7 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 Θ marginal posterior p(θ) estimation: via marginal posterior distribution of parameterθ C. Röver, T. Friede March 17, / 23

8 Meta analysis Context: random-effects meta-analysis 50% 90% 95% 99% marginal posterior p(τ) marginal posterior p(θ) two parameters parameter estimation with two unknowns: joint & marginal posterior distributions C. Röver, T. Friede March 17, / 23

9 Meta analysis Context: random-effects meta-analysis here: easy to derive one of the marginals: p(τ y) and conditional posteriors p(θ τ, y) p(τ y) =... (... function of y i, σ i,... ) p(θ τ, y) = Normal(µ = f 1 (τ),σ = f 2 (τ)) but main interest in other marginal: p(θ y) p(θ y) = joint {}}{ p(θ,τ, y) dτ = p(θ τ, y) }{{} conditional p(τ y) }{{} marginal dτ is a mixture distribution C. Röver, T. Friede March 17, / 23

10 Meta analysis Context: random-effects meta-analysis marginal posterior p(τ) conditional posterior p(θ τ i) marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

11 Meta analysis Context: random-effects meta-analysis conditional mean + sd conditional mean conditional mean sd marginal posterior p(τ) conditional posterior p(θ τ i) marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

12 Meta analysis Context: random-effects meta-analysis τ 1 marginal posterior p(τ) τ conditional posterior p(θ τ i) τ = τ 1 marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

13 Meta analysis Context: random-effects meta-analysis τ 1 τ 2 τ 3 τ 4 marginal posterior p(τ) τ 1 τ 2 τ 3 τ conditional posterior p(θ τ i) τ = τ 1 τ = τ 2 τ = τ 3 τ = τ 4 marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

14 Meta analysis Context: random-effects meta-analysis τ 1 τ 2 τ 3 τ 4 marginal posterior p(τ) τ 1 τ 2 τ 3 τ conditional posterior p(θ τ i) τ = τ 1 τ = τ 2 τ = τ 3 τ = τ 4 marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

15 Mixture distributions The general problem mixture distribution: a convex combination of component distributions a distribution whose parameters are random variables ( conditional ) distribution with density p(y x) parameter x follows a distribution p(x) marginal / mixture is p(y) = X p(y x) dp(x) x discrete: p(y) = i p(y x i) p(x i ) ubiquitous in many applications Student-t distribution negative binomial distribution marginal distributions convolution... C. Röver, T. Friede March 17, / 23

16 Mixture distributions How to approximate? approximating the continuous mixture through a discrete set of points in τ... actual marginal: approximation: p(θ) = p(θ τ) p(τ) dτ p(θ) i p(θ τ i )π i Questions: how to set up the discrete grid of points? how well can we approximate? do we have a handle on accuracy? C. Röver, T. Friede March 17, / 23

17 Mixture distributions Motivation: discretizing a mixture τ 1 τ 2 τ 3 τ 4 conditional posterior p(θ τ i) τ = τ 1 τ = τ 2 τ = τ 3 τ = τ Note: conditional distributions p(θ τ, y) are very different for τ 1 and τ 2 and rather similar for τ 3 and τ 4. idea: may need fewer bins for larger τ values...?... bin spacing based on similarity / dissimilarity of conditionals? C. Röver, T. Friede March 17, / 23

18 Discretizing mixture distributions Setting up a binning need: discretization of the mixing distribution p(x). domain of X: R (or subset) define bin margins: x (1) < x (2) <... < x (k 1) bins: X i = {x : x x (1) } if i = 1 {x : x (i 1) < x x (i) } if 1 < i < k {x : x (k 1) < x} if i = k. reference points: x 1,..., x k, where x i X i bin probabilities: π i = P ( x (i 1) < x x (i) ) = P ( x Xi ) C. Röver, T. Friede March 17, / 23

19 Discretizing mixture distributions Setting up a binned mixture actual distribution: p(x, y) discrete approximation: q(x, y) same marginal (mixing distribution): q(x) = p(x) but binned conditionals: q(y x) = p(y x = x i ) for x X i. q similar to p, instead of conditioning on exact x, conditioning on corresponding bin s reference point x i marginal: q(y) = q(y x) q(x) dx = i π i p(y x i ) C. Röver, T. Friede March 17, / 23

20 Discretizing mixture distributions Setting up a binned mixture in previous example: bin margins: τ (1) = 10, τ (2) = 20, τ (3) = 30 reference points: τ 1 = 5, τ 2 = 15, τ 3 = 25, τ 4 = 35 probabilities: π 1 = 0.34, π 2 = 0.44, π 3 = 0.15, π 4 = 0.07 τ 1 τ 2 τ 3 τ 4 τ~ 1 τ ~ 2 τ ~ 3 τ ~ 4 τ(1) τ(2) τ(3) marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

21 Similarity / dissimilarity of distributions Kullback-Leibler divergence The Kullback-Leibler divergence of two distributions with density functions p and q is defined as ( D KL p(θ) q(θ) ) ( p(θ) ) = log p(θ) dθ Θ q(θ) ( p(θ) ) = E p(θ) [log ] q(θ) The symmetrized KL-divergence of two distributions is defined as D s ( p(θ) q(θ) ) = DKL ( p(θ) q(θ) ) +DKL ( q(θ) p(θ) ) the symmetrized KL-divergence... is symmetric: D s ( p(θ) q(θ) ) = Ds ( q(θ) p(θ) ) is always positive: D s ( p(θ) q(θ) ) 0 C. Röver, T. Friede March 17, / 23

22 Divergence Interpretation How to interpret divergences? measure of discrepancy between distributions heuristically: expected log ratio of densities... relevant case here: p(x) q(x). D KL ( p(x), q(x) ) = 0 for p = q D KL ( p(x), q(x) ) = 0.01 corresponds to (expected) 1% difference in densities C. Röver, T. Friede March 17, / 23

23 Divergence Bin-wise maximum divergence: definition consider: divergence between reference point and other points within each bin define: d i { ( = max D s p(y x) p(y xi ) )} { ( ) } = max D s p(y x) q(y x), x X i x X i the bin-wise maximum divergence worst-case discrepancy introduced within each bin C. Röver, T. Friede March 17, / 23

24 Divergence Bin-wise maximum divergence: example τ~ 1 τ ~ 2 τ ~ 3 τ ~ 4 τ (1) τ (2) τ (3) recall: actual parameters of conditionals p(y x) (in black) vs. parameters of q(y x) assumed through binning (in red) C. Röver, T. Friede March 17, / 23

25 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) conditional density p(θ τ) C. Röver, T. Friede March 17, / 23

26 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) conditional density p(θ τ) C. Röver, T. Friede March 17, / 23

27 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) conditional density p(θ τ) C. Röver, T. Friede March 17, / 23

28 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) maximum conditional density p(θ τ) determine maximum d i for each bin i (usually at bin margin) C. Röver, T. Friede March 17, / 23

29 Bounding divergence Idea now consider divergences of true and approximate marginals p(y) and q(y) (not the conditionals!) what about D s ( p(y) q(y) )? having the individual bin-wise divergences d i, we can show: D s ( p(y) q(y) ) π i d i i max d i i in other words: by bounding bin-wise divergences (of conditionals) we can bound the overall divergence (of marginals) DIRECT (Divergence Restricted Conditional Tesselation) method C. Röver, T. Friede March 17, / 23

30 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 δ st reference point τ 1 at zero C. Röver, T. Friede March 17, / 23

31 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) δ st reference point τ 1 at zero, first margin τ (1) at C. Röver, T. Friede March 17, / 23

32 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23

33 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23

34 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23

35 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23

36 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23

37 Discretizing mixtures Sequential DIRECT algorithm 1st bin 2nd bin 3rd bin 4th bin (...) divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) τ ~ 4 τ (4) δ st reference point τ 1 at zero, first margin τ (1) at (... ) result: binning with bounded divergence ( δ) per bin C. Röver, T. Friede March 17, / 23

38 Discretizing mixtures Sequential DIRECT algorithm 1st bin 2nd bin 3rd bin 4th bin (...) divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) τ ~ 4 τ (4) δ st reference point τ 1 at zero, first margin τ (1) at (... ) result: binning with bounded divergence ( δ) per bin (when to stop?) C. Röver, T. Friede March 17, / 23

39 Discretizing mixtures Sequential DIRECT algorithm (variations possible) 1 Specify δ > 0, 0 ǫ 1, and starting reference point x 1 (e.g. minimum possible value, or ǫ 2 -quantile). Define ǫ 1 0 as ǫ 1 := P(X x 1 ). Set i = 1. 2 Set x = x 1. Obviously, D s ( p(y x1 ) p(y x ) ) = 0. Now increase x as far as possible while ensuring that D s ( p(y x1 ) p(y x ) ) δ. Use this point as the first bin margin: x (1) = x. Compute π 1 = P(x < x (1) ). Set i = i Increase x until D s ( p(y x(i 1) ) p(y x ) ) = δ. Use this point as the next reference point: x i = x. 4 Increase x again until D s ( p(y xi ) p(y x ) ) = δ. Use this point as the next bin margin: x (i) = x. 5 Compute the bin weight π i = P(x (i 1) < X x (i) ). 6 If P(X > x (i) ) > (ǫ ǫ 1 ), set i = i + 1 and proceed at step 3. Otherwise stop. C. Röver, T. Friede March 17, / 23

40 Discretizing mixtures General algorithm remaining issue: ignored ǫ > 0 tail probability (usually: problems at domain s margins) only need to keep track of reference points x i and probabilities π i meta-analysis example: 35 reference ( support ) points required (δ = 0.01, ǫ = 0.001) 50% 90% 95% 99% C. Röver, T. Friede March 17, / 23

41 Discretizing mixtures General algorithm remaining issue: ignored ǫ > 0 tail probability (usually: problems at domain s margins) only need to keep track of reference points x i and probabilities π i meta-analysis example: 35 reference ( support ) points required (δ = 0.01, ǫ = 0.001) 50% 90% 95% 99% C. Röver, T. Friede March 17, / 23

42 Conclusions approximation allows to compute density, quantiles, moments,... algorithm yields quick-and-easy solution need to specify error budget in terms of divergence δ tail probability ǫ also works for discrete distributions (p(x) or p(y x)) meta-analysis application: implemented in bayesmeta R package general procedure: C. Röver, T. Friede. Discrete approximation of a mixture distribution via restricted divergence. (submitted for publication.) C. Röver, T. Friede Approximating mixture distributions March 17, / 23

Evidence synthesis for a single randomized controlled trial and observational data in small populations

Evidence synthesis for a single randomized controlled trial and observational data in small populations Evidence synthesis for a single randomized controlled trial and observational data in small populations Steffen Unkel, Christian Röver and Tim Friede Department of Medical Statistics University Medical

More information

Effect and shrinkage estimation in meta-analyses of two studies

Effect and shrinkage estimation in meta-analyses of two studies Effect and shrinkage estimation in meta-analyses of two studies Christian Röver Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany December 2, 2016 This project has

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

θ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model

θ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model Hierarchical models (chapter 5) Introduction to hierarchical models - sometimes called multilevel model Exchangeability Slide 1 Hierarchical model Example: heart surgery in hospitals - in hospital j survival

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Cheng Soon Ong & Christian Walder. Canberra February June 2017 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Bayesian Statistics Comparing Two Proportions or Means

Bayesian Statistics Comparing Two Proportions or Means Bayesian Statistics Comparing Two Proportions or Means Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma Health Sciences Center May

More information

G8325: Variational Bayes

G8325: Variational Bayes G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Harrison B. Prosper. CMS Statistics Committee

Harrison B. Prosper. CMS Statistics Committee Harrison B. Prosper Florida State University CMS Statistics Committee 08-08-08 Bayesian Methods: Theory & Practice. Harrison B. Prosper 1 h Lecture 3 Applications h Hypothesis Testing Recap h A Single

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Dynamic models 1 Kalman filters, linearization,

Dynamic models 1 Kalman filters, linearization, Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models

More information

Order Statistics and Distributions

Order Statistics and Distributions Order Statistics and Distributions 1 Some Preliminary Comments and Ideas In this section we consider a random sample X 1, X 2,..., X n common continuous distribution function F and probability density

More information

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking FUSION 2012, Singapore 118) Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking Karl Granström*, Umut Orguner** *Division of Automatic Control Department of Electrical

More information

5. Conditional Distributions

5. Conditional Distributions 1 of 12 7/16/2009 5:36 AM Virtual Laboratories > 3. Distributions > 1 2 3 4 5 6 7 8 5. Conditional Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M PD M = PD θ, MPθ Mdθ Lecture 14 : Variational Bayes where θ are the parameters of the model and Pθ M is

More information

Group sequential designs with negative binomial data

Group sequential designs with negative binomial data Group sequential designs with negative binomial data Ekkehard Glimm 1 Tobias Mütze 2,3 1 Statistical Methodology, Novartis, Basel, Switzerland 2 Department of Medical Statistics, University Medical Center

More information

Self Adaptive Particle Filter

Self Adaptive Particle Filter Self Adaptive Particle Filter Alvaro Soto Pontificia Universidad Catolica de Chile Department of Computer Science Vicuna Mackenna 4860 (143), Santiago 22, Chile asoto@ing.puc.cl Abstract The particle filter

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bios 6649: Clinical Trials - Statistical Design and Monitoring Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Statistics 135 Fall 2007 Midterm Exam

Statistics 135 Fall 2007 Midterm Exam Name: Student ID Number: Statistics 135 Fall 007 Midterm Exam Ignore the finite population correction in all relevant problems. The exam is closed book, but some possibly useful facts about probability

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

Understanding Covariance Estimates in Expectation Propagation

Understanding Covariance Estimates in Expectation Propagation Understanding Covariance Estimates in Expectation Propagation William Stephenson Department of EECS Massachusetts Institute of Technology Cambridge, MA 019 wtstephe@csail.mit.edu Tamara Broderick Department

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Joint Probability Distributions

Joint Probability Distributions Joint Probability Distributions ST 370 In many random experiments, more than one quantity is measured, meaning that there is more than one random variable. Example: Cell phone flash unit A flash unit is

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Machine Learning Srihari. Information Theory. Sargur N. Srihari Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy

More information

Conjugate Predictive Distributions and Generalized Entropies

Conjugate Predictive Distributions and Generalized Entropies Conjugate Predictive Distributions and Generalized Entropies Eduardo Gutiérrez-Peña Department of Probability and Statistics IIMAS-UNAM, Mexico Padova, Italy. 21-23 March, 2013 Menu 1 Antipasto/Appetizer

More information

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :

More information

Bayesian estimation of complex networks and dynamic choice in the music industry

Bayesian estimation of complex networks and dynamic choice in the music industry Bayesian estimation of complex networks and dynamic choice in the music industry Stefano Nasini Víctor Martínez-de-Albéniz Dept. of Production, Technology and Operations Management, IESE Business School,

More information

Methods of Estimation

Methods of Estimation Methods of Estimation MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Methods of Estimation I 1 Methods of Estimation I 2 X X, X P P = {P θ, θ Θ}. Problem: Finding a function θˆ(x ) which is close to θ.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters ISyE8843A, Brani Vidakovic Handout 6 Priors, Contd.. Nuisance Parameters Assume that unknown parameter is θ, θ ) but we are interested only in θ. The parameter θ is called nuisance parameter. How do we

More information

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Presented August 8-10, 2012 Daniel L. Gillen Department of Statistics University of California, Irvine

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Notes on pseudo-marginal methods, variational Bayes and ABC

Notes on pseudo-marginal methods, variational Bayes and ABC Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution

More information

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Statistics. Statistics

Statistics. Statistics The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang Bayesian Dropout Tue Herlau, Morten Morup and Mikkel N. Schmidt Discussed by: Yizhe Zhang Feb 20, 2016 Outline 1 Introduction 2 Model 3 Inference 4 Experiments Dropout Training stage: A unit is present

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information

Predictive Hypothesis Identification

Predictive Hypothesis Identification Marcus Hutter - 1 - Predictive Hypothesis Identification Predictive Hypothesis Identification Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive

More information

STAT 430/510: Lecture 15

STAT 430/510: Lecture 15 STAT 430/510: Lecture 15 James Piette June 23, 2010 Updates HW4 is up on my website. It is due next Mon. (June 28th). Starting today back at section 6.4... Conditional Distribution: Discrete Def: The conditional

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

On the Chi square and higher-order Chi distances for approximating f-divergences

On the Chi square and higher-order Chi distances for approximating f-divergences On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen, Senior Member, IEEE and Richard Nock, Nonmember Abstract We report closed-form formula for calculating the

More information

Instructor: Dr. Volkan Cevher. 1. Background

Instructor: Dr. Volkan Cevher. 1. Background Instructor: Dr. Volkan Cevher Variational Bayes Approximation ice University STAT 631 / ELEC 639: Graphical Models Scribe: David Kahle eviewers: Konstantinos Tsianos and Tahira Saleem 1. Background These

More information

Spring 2006: Examples: Laplace s Method; Hierarchical Models

Spring 2006: Examples: Laplace s Method; Hierarchical Models 36-724 Spring 2006: Examples: Laplace s Method; Hierarchical Models Brian Junker February 13, 2006 Second-order Laplace Approximation for E[g(θ) y] An Analytic Example Hierarchical Models Example of Hierarchical

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics KTH Royal Institute of Technology jimmyol@kth.se Lecture 2 Random number generation 27 March 2014 Computer Intensive Methods

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information