Bayesian model selection: methodology, computation and applications

Size: px
Start display at page:

Download "Bayesian model selection: methodology, computation and applications"

Transcription

1 Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

2 Outline Examples Bayesian statistics Bayesian model selection Marginal likelihood computation Predictive model selection Conclusion

3 What is model selection? For us, a statistical model describes a process (random) by which data might have been generated. Often there will be several plausible models to choose from and hence uncertainty about the data generating process. Sometimes we won t believe that any of the models under consideration generated the data. The goal of model selection is to choose from a collection of models the best for a given purpose.

4 Example: gene expression arrays Gene expression arrays are able to give a measure, for different tissue samples, of the level of gene expression for thousands of genes simultaneously. Over the page I ve randomly selected 30 genes from a microarray experiment and plotted gene expression values for the 30 genes.

5 30 randomly chosen genes Differential expression

6 Example: gene expression arrays A row corresponds to a gene. There are 10 dots in each row (10 microarrays doing 10 comparisons of tissue samples). In this experiment, we are comparing brain tissue in two strains of mice (Cotsapas et al., 2003). Data for gene g has mean µ g (assume normality say). Is µ g = 0 or µ g 0? A model selection problem.

7 Rainfall-runoff models Used by hydrologists for simulating processes such as streamflow in response to a rainfall event. The choice of model for a given application is a difficult problem. One is left with the view that the state of water resources modelling is like an economy subject to inflation that there are too many models chasing (as yet) too few applications; that there are too many modellers chasing too few ideas; and that the response is to print ever-increasing quantities of paper, thereby devaluing the currency..." (Robin Clarke, 1974).

8 S 1 S 2 P E Q s S 3 Q r A 1 A 2 A 3 BS Q b

9 Rainfall-runoff models The diagram shows a representation of a runoff model (the Australian Water Balance Model (AWBM), Boughton (2004)) that relates rainfall and other variables to discharge of a stream. The number of storages shown in the graph is a variable to be chosen by the modeller. How to choose the number of storages in the best way for prediction, interpretation, etc.?

10 Nonparametric regression using linear combinations of basis terms I simulated n = 600 response values from the following model: y i = f (z i ) + ɛ i i = 1,..., 600 where the errors ɛ i are independent N(0, ), f (z i ) is the mean function and the predictors z i = (z i1, z i2 ) are generated uniformly on the unit square [0, 1] 2.

11 0 Mean function for simulated data set f(z) z2 0 z1

12 Nonparametric regression using linear combinations of basis terms The mean function f (z) = f (z 1, z 2 ) is f (z) = 1 + N(µ 1, Σ 1, z) + N(µ 2, Σ 2, z) where N(µ, Σ, z) denotes a bivariate normal density with mean µ and covariance matrix Σ. Here we choose µ 1 = (0.25, 0.75) T, µ 2 = (0.75, 0.25) T, Σ 1 = [ ] [ , Σ 2 = ]

13 Nonparametric regression using linear combinations of basis terms Suppose I didn t know beforehand the mean function f (z) and that I want to estimate it. One way to do this is to adopt a flexible representation for f (z) in terms of a linear combination of a large number of basis functions. K f (z) β j h j (z). j=1

14 Nonparametric regression using linear combinations of basis terms Here the h j (z) are the basis terms and the β j are unknown coefficients. To estimate the unknown coefficients we fit a linear model: y i = K β j h j (z i ) + ɛ i. j=1 Some kind of variable selection (estimating some of the β j as exactly zero) might be done to prevent overfitting since K is large.

15 Nonparametric regression using linear combinations of basis terms One choice of basis (there are many): {1, z 1, z 2, z ρ 1 2 log( z ρ 1 ),..., z ρ s 2 log( z ρ s )} where ρ i, i = 1,..., s are a collection of so-called knot points. The knots are points in the predictor space. For a fairly rich set of knots we can obtain good approximations to the mean function.

16 Purposes of model selection The three examples of model selection problems that I ve presented so far illustrate different purposes of model selection. In the gene expression example, interpretation is all important (which genes are different between the two tissue samples?) In the nonparametric regression example, our purpose is purely to predict well. There is no particular interpretation associated with the basis functions which are selected. In the hydrology example, there are elements of both prediction and interpretation to the problem.

17 Purposes of model selection A key idea in model selection of any kind is that one must consider what the model is to be used for. We won t treat this idea very formally in this talk but it is important. The idea of this talk is to review the Bayesian approach to model selection. I ll give a brief review of Bayesian statistics first.

18 Bayesian statistics Bayesian statistics is distinguished by the use of probability for quantifying all kinds of uncertainty. Set of unknowns θ to learn about, data y. Specify a full probability model for the data and unknowns p(y, θ) = p(θ)p(y θ) p(θ) is called the prior distribution and p(y θ) is the likelihood function. The prior codes in probabilistic form what we know about the unknowns before observing data, and gives the opportunity for use of prior knowledge.

19 Bayesian statistics Conditioning on the observed data in this model we get Bayes rule: p(θ y) p(θ)p(y θ) Here p(θ y) is the posterior distribution expressing what we know about θ given the data y. Inference is from the posterior distribution. In summarizing the posterior (by calculating probabilities or expectations) an integration over the parameter space is needed.

20 S 1 S 2 Example: AWBM P E Q s S 3 Q r A 1 A 2 A 3 BS Q b

21 550 MHz. The CPU time required for 100,000 iterations of each run was measured to compare the speed of the MCMC sampling schemes. These computation times are given in Example: Table 4. AWBM 5. Discussion [60] Table 1 and the corresponding results for each sampling algorithm indicates that the posterior summaries obtained for each proposed scheme were similar, with approximately equal mean results, and comparable posterior quartiles. proposed flow must be calculated three times to account for the combination of block and single site updating. The likelihood function must be calculated four times. The MHSS algorithm requires calculation of the proposed flow seven times. In terms of computation time and simplicity, the AM algorithm and MHBU algorithm are superior. The time taken for each simulation is much less. [63] Examining the estimated variance of the mean of the series compares the efficiency of each scheme. By examining the efficiency of the sampling algorithms statistically, it is evident that the AM algorithm provides a far more Figure 3. Posterior distribution and prior distribution for K parameter. 8of11

22 Predictive inference Suppose predictions of future data y are required. Predictive inference is based on p(y y) = p(y θ)p(θ y)dθ. In Bayesian inference predictive distributions are often the basis for informal methods of model criticism and even model selection. How we go about model criticism usually depends on what the model will be used for.

23 Bayesian model comparison Now let s consider model selection in the Bayesian framework. At the end of this material there will be a question for the audience... Consider a collection of models M = {M 1,..., M k }. Denoting the data by y, write p(y θ i, M i ) for the likelihood function for model M i, where we have written θ i for the set of unknown parameters in M i. Write p(θ i M i ) for the prior distribution on the parameters in model M i, i = 1,..., k.

24 Bayesian model comparison In Bayesian statistics uncertainty about unknowns is treated probabilistically. So we need a prior distribution on the unknown model, which will be updated to a posterior distribution based on the data. Write p(m i ) for the prior probability of model M i, i = 1,..., k. Now we apply Bayes rule to obtain

25 Bayesian model comparison p(m i y) p(m i )p(y M i ) (1) where the so-called marginal likelihood p(y M i ) for model M i is obtained as p(y M i ) = p(y θ i, M i )p(θ i M i )dθ i Normalizing (1) so that the distribution sums to one we obtain p(m i y) = p(m i )p(y M i ) j p(m j)p(y M j )

26 Bayesian model comparison Note that the posterior odds of model M i relative to model M j is p(m i y) p(m j y) = p(m i)p(y M i ) p(m j )p(y M j ) From this the ratio of the posterior to prior odds is p(m i y)/p(m j y) p(m i )/p(m j ) = p(y M i) p(y M j ) which is called the Bayes factor for model M i relative to model M j. If all models are assigned equal probability in the prior, then the Bayes factor is simply the ratio of posterior probabilities of the models compared.

27 Bayesian model averaging We discussed before predictive inference. How do we do this in the presence of model uncertainty? Let be a quantity to be predicted (a future response say). Let p( M i, y) be the predictive distribution under model M i. We talked about these kinds of predictive distributions before. Predictive distribution incorporating model uncertainty (Bayesian model averaging) p( y) = i p( M i, y)p(m i y) Model specific predictive distributions are weighted according to the posterior model probabilities.

28 A simple example Consider testing on a normal mean. Observed data y 1,..., y n, independent N(µ, 1). We want to compare the two models M 1 = {µ = 0} and M 2 = {µ 0}. This might be a reasonable way to formulate model selection for one gene in my gene expression example.

29 Example: testing on a normal mean In model M 1 there are no unknown parameters and the marginal likelihood is ( ) n p(y M 1 ) = (2π) n/2 i=1 exp y i 2 2 In model M 2, we need to specify a prior distribution on µ. We take µ M 2 N(0, σ 2 0 ). Question for the audience - what happens as σ and σ 2 0?

30 Example: testing on a normal mean It is easily shown that p(y M 2 ) = (2π) n/2 (nσ ) 1/2 exp ( ) n n exp )ȳ2 2 (n + 1/σ0 2 ( 1 2 n i=1 y 2 i )

31 Example: testing on a normal mean Comparing the expressions for p(y M 1 ) and p(y M 2 ) the factor ( ) (2π) n/2 exp 1 n yi 2 2 is common to both and hence these terms cancel when we compute the Bayes factor of model M 2 relative to model M 1 which is p(y M 2 ) p(y M 1 ) = (nσ ) 1/2 exp i=1 ( n 2 ) n )ȳ2 (n + 1/σ0 2

32 Example: testing on a normal mean Note that as σ0 2 0 this Bayes factor 1. As σ0 2 the Bayes factor 0. In other words, if the models have equal prior probability p(m 2 y) 0.5 as σ0 2 0 and p(m 2 y) 0 as σ0 2 regardless of the data. This example illustrates that it is not acceptable to thoughtlessly use vague proper priors in Bayesian model selection as this will tend to favour the simplest model.

33 Bayesian model comparison Note that the marginal likelihood p(y M i ) = p(y θ i, M i )p(θ i M i )dθ i can be regarded as a predictive density for y before y is observed (if we haven t observed any data yet, the posterior on θ i is just the prior p(θ i M i ) so our definition of the predictive density of y just reduces to the marginal likelihood above).

34 Bayesian model comparison Looking at the marginal likelihood this way, you can see why the behaviour in our example happens. If we have a very tight prior around zero for µ in model M 2 the models are nearly the same (not much difference between setting µ = 0 in model M 1 and having a very tight prior around zero in model M 2 ) and so the Bayes factor is close to 1.

35 Bayesian model comparison With a diffuse prior on µ in model M 2 the prior predictive density is very spread out (since our prior allows the mean to be anywhere) so that the value of the prior predictive density is bound to be smaller than for model M 1. Finding good default choices for priors in model comparison is an active area of current research. Fortunately it is not hard to find a good default choice for many common model selection problems.

36 Methods for calculating marginal likelihoods Once priors are specified, everything is easy. Not quite. How does one compute for model M i the marginal likelihood p(y M i ) = p(y θ i, M i )p(θ i M i )dθ i?

37 Methods for calculating marginal likelihoods In my example I could do this analytically. For complex models θ i is high-dimensional. p(y M i ) is defined by an integral over the (possibly high-dimensional) parameter space. Hard.

38 Methods for calculating marginal likelihoods Obvious idea: recall that p(y M i ) = p(y θ i, M i )p(θ i M i )dθ i and observe that this is an expectation with respect to the prior. Simulate θ (1) i,..., θ (s) i from p(θ i M i ). Then use to estimate p(y M i ). 1 s s i=1 p(y θ (s) i, M i ) Bad idea. The variance is large as the prior is very spread out compared to the likelihood. More sophisticated methods are available.

39 Methods for calculating marginal likelihoods In what follows let s just consider a single model M. We won t show conditioning on M explicitly so that if M has parameter θ we write p(θ), p(y θ), p(y) for the prior, likelihood and marginal likelihood. Markov chain Monte Carlo methods which sample on the model and parameter space jointly are one common approach to calculating marginal likelihoods (Green, 1995, Biometrika, Carlin and Chib, 1995, JRSSB). These methods are not always easy to apply even for experts.

40 Methods for calculating marginal likelihoods An alternative uses methods based on the so-called candidate s formula. From rearranging Bayes rule, This holds for every θ. p(y) = p(θ)p(y θ). p(θ y) Suppose ˆθ is some estimate of the mode. If we can estimate p(ˆθ y) then substituting into the candidate s formula gives an estimate of p(y) (since calculating p(ˆθ) and p(y ˆθ) is usually easy). If we can just estimate the posterior density at a point we can estimate the marginal likelihood!

41 Methods for calculating marginal likelhioods Laplace approximation arises from the candidate s formula by choosing ˆθ the posterior mode, and using a normal approximation to p(θ y) with mean ˆθ and a covariance H 1 based on derivatives of the log posterior at the mode, giving p(y) (2π) p/2 H 1/2 p(ˆθ)p(y ˆθ) where p is the number of parameters. Further simplifying the Laplace approximation to log p(y) leads to the famous BIC criterion.

42 Methods for calculating marginal likelihoods There are more sophisticated ways in which one can use the candidate s formula approach (Chib, 1995, JASA and Chib and Jeliazkov, 2001, JASA), Bridge estimation (Meng and Wong, 1996, Statistica Sinica). For estimating p(y) (not the most general setting) suppose we have some density r(θ) with no unknown normalizing constant and let t(θ) be any function of θ such that 0 < t(θ)r(θ)p(θ)p(y θ)dθ <.

43 Methods for calculating marginal likelihoods Then p(y) = p(θ)p(y θ)t(θ)r(θ)dθ r(θ)t(θ)p(θ y)dθ. To see this, just write p(y)p(θ y) = p(θ)p(y θ) and multiply both sides by r(θ)t(θ) and then integrate. The denominator can be estimated from a Monte Carlo sample from p(θ y) (this is obtainable using standard Bayesian computational methods). The numerator can be estimated from a sample from r(θ).

44 Predictive model selection So far we ve used what is sometimes called the fully probabilistic" approach to model selection. Some of the challenges in implementing this approach have caused some Bayesians to look for alternatives. The difficulties with the conventional approach are One must assign a prior distribution on the model space, and this may be difficult to do when you don t believe in any of the models. Model comparison may be sensitive to priors on the model parameters. The marginal likelihood is hard to calculate. The goals of model improvement may be better served by examining less formal diagnostics that illuminate how the model doesn t fit. There are of course ways of responding to these criticisms by those who advocate the traditional Bayesian approach but I won t go into this debate here.

45 Predictive model selection There are numerous alternatives to the traditional Bayesian approach to model selection. Most are motivated from the point of view of wanting to predict well rather than choosing the true" model. Often view model selection as a two step decision problem. First a model is chosen, and then the chosen model is used to make a prediction. Optimal choice of model is required to minimize some specified measure of predictive loss.

46 Predictive model selection Popular predictive approaches Bayesian variants of cross-validation (Geisser and Eddy, 1979, JASA, Bernardo and Smith, 1994) DIC (Spiegelhalter et al., 2002, JRSSB). Posterior Bayes factors (Aitkin, 1991, JRSSB) Less formal approaches based on simulations of replicate data sets from posterior predictive distributions (posterior predictive checks, Gelman, Meng and Stern, 1996, Statistica Sinica). Many others.

47 Conclusion Even if you are not a Bayesian, thinking about the Bayesian way to approach a problem is often very illuminating for understanding all the sources of information available. One must also consider the purpose for which a model is constructed. Prediction is a very different goal to attempting to choose the true" model. Don t forget about background knowledge, sensitivity analysis and diagnostics when model building. References to my own research on my website: standj

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

BAYESIAN MODEL CRITICISM

BAYESIAN MODEL CRITICISM Monte via Chib s BAYESIAN MODEL CRITICM Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA)

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) arxiv:1611.01450v1 [stat.co] 4 Nov 2016 Aliaksandr Hubin Department of Mathematics, University of Oslo and Geir Storvik

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Assessing Regime Uncertainty Through Reversible Jump McMC

Assessing Regime Uncertainty Through Reversible Jump McMC Assessing Regime Uncertainty Through Reversible Jump McMC August 14, 2008 1 Introduction Background Research Question 2 The RJMcMC Method McMC RJMcMC Algorithm Dependent Proposals Independent Proposals

More information

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration Bayes Factors, posterior predictives, short intro to RJMCMC Thermodynamic Integration Dave Campbell 2016 Bayesian Statistical Inference P(θ Y ) P(Y θ)π(θ) Once you have posterior samples you can compute

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian Model Diagnostics and Checking

Bayesian Model Diagnostics and Checking Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Tools for Parameter Estimation and Propagation of Uncertainty

Tools for Parameter Estimation and Propagation of Uncertainty Tools for Parameter Estimation and Propagation of Uncertainty Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu Outline Models, parameters, parameter estimation,

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics 9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do

More information

ST 740: Model Selection

ST 740: Model Selection ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 5. Bayesian Computation Historically, the computational "cost" of Bayesian methods greatly limited their application. For instance, by Bayes' Theorem: p(θ y) = p(θ)p(y

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

Will Penny. SPM short course for M/EEG, London 2013

Will Penny. SPM short course for M/EEG, London 2013 SPM short course for M/EEG, London 2013 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated

More information

Bayesian Approaches for Model Evaluation and Uncertainty Estimation

Bayesian Approaches for Model Evaluation and Uncertainty Estimation Bayesian Approaches for Model Evaluation and Uncertainty Estimation Tsuyoshi TADA National Defense Academy, Japan APHW 2006 Bangkok 1 Introduction In wellgauged basins Parameters can be obtained by calibrations.

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Bayesian Inference. p(y)

Bayesian Inference. p(y) Bayesian Inference There are different ways to interpret a probability statement in a real world setting. Frequentist interpretations of probability apply to situations that can be repeated many times,

More information

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Nested Sampling. Brendon J. Brewer.   brewer/ Department of Statistics The University of Auckland Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/ is a Monte Carlo method (not necessarily MCMC) that was introduced by John Skilling in 2004. It is very popular

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Probabilistic machine learning group, Aalto University  Bayesian theory and methods, approximative integration, model Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and

More information

Markov chain Monte Carlo methods in atmospheric remote sensing

Markov chain Monte Carlo methods in atmospheric remote sensing 1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School Bayesian modelling Hans-Peter Helfrich University of Bonn Theodor-Brinkmann-Graduate School H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 1 / 22 Overview 1 Bayesian modelling

More information

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations Brian M. Hartman, PhD ASA Assistant Professor of Actuarial Science University of Connecticut BYU Statistics Department

More information

Bayesian inference: what it means and why we care

Bayesian inference: what it means and why we care Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

New Insights into History Matching via Sequential Monte Carlo

New Insights into History Matching via Sequential Monte Carlo New Insights into History Matching via Sequential Monte Carlo Associate Professor Chris Drovandi School of Mathematical Sciences ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

Nonparametric Bayes Uncertainty Quantification

Nonparametric Bayes Uncertainty Quantification Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes

More information

Will Penny. DCM short course, Paris 2012

Will Penny. DCM short course, Paris 2012 DCM short course, Paris 2012 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated to a

More information

Lecture 3. Univariate Bayesian inference: conjugate analysis

Lecture 3. Univariate Bayesian inference: conjugate analysis Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

MCMC notes by Mark Holder

MCMC notes by Mark Holder MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:

More information

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Model comparison. Christopher A. Sims Princeton University October 18, 2016 ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Monte Carlo Inference Methods

Monte Carlo Inference Methods Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Bridge estimation of the probability density at a point. July 2000, revised September 2003

Bridge estimation of the probability density at a point. July 2000, revised September 2003 Bridge estimation of the probability density at a point Antonietta Mira Department of Economics University of Insubria Via Ravasi 2 21100 Varese, Italy antonietta.mira@uninsubria.it Geoff Nicholls Department

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

Choosing among models

Choosing among models Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

More information

Bayesian Analysis (Optional)

Bayesian Analysis (Optional) Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Estimating marginal likelihoods from the posterior draws through a geometric identity

Estimating marginal likelihoods from the posterior draws through a geometric identity Estimating marginal likelihoods from the posterior draws through a geometric identity Johannes Reichl Energy Institute at the Johannes Kepler University Linz E-mail for correspondence: reichl@energieinstitut-linz.at

More information

Lecture 1 Bayesian inference

Lecture 1 Bayesian inference Lecture 1 Bayesian inference olivier.francois@imag.fr April 2011 Outline of Lecture 1 Principles of Bayesian inference Classical inference problems (frequency, mean, variance) Basic simulation algorithms

More information

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation So far we have discussed types of spatial data, some basic modeling frameworks and exploratory techniques. We have not discussed

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Empirical Likelihood Based Deviance Information Criterion

Empirical Likelihood Based Deviance Information Criterion Empirical Likelihood Based Deviance Information Criterion Yin Teng Smart and Safe City Center of Excellence NCS Pte Ltd June 22, 2016 Outline Bayesian empirical likelihood Definition Problems Empirical

More information