Bayesian model selection: methodology, computation and applications
|
|
- Aubrey Greene
- 5 years ago
- Views:
Transcription
1 Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
2 Outline Examples Bayesian statistics Bayesian model selection Marginal likelihood computation Predictive model selection Conclusion
3 What is model selection? For us, a statistical model describes a process (random) by which data might have been generated. Often there will be several plausible models to choose from and hence uncertainty about the data generating process. Sometimes we won t believe that any of the models under consideration generated the data. The goal of model selection is to choose from a collection of models the best for a given purpose.
4 Example: gene expression arrays Gene expression arrays are able to give a measure, for different tissue samples, of the level of gene expression for thousands of genes simultaneously. Over the page I ve randomly selected 30 genes from a microarray experiment and plotted gene expression values for the 30 genes.
5 30 randomly chosen genes Differential expression
6 Example: gene expression arrays A row corresponds to a gene. There are 10 dots in each row (10 microarrays doing 10 comparisons of tissue samples). In this experiment, we are comparing brain tissue in two strains of mice (Cotsapas et al., 2003). Data for gene g has mean µ g (assume normality say). Is µ g = 0 or µ g 0? A model selection problem.
7 Rainfall-runoff models Used by hydrologists for simulating processes such as streamflow in response to a rainfall event. The choice of model for a given application is a difficult problem. One is left with the view that the state of water resources modelling is like an economy subject to inflation that there are too many models chasing (as yet) too few applications; that there are too many modellers chasing too few ideas; and that the response is to print ever-increasing quantities of paper, thereby devaluing the currency..." (Robin Clarke, 1974).
8 S 1 S 2 P E Q s S 3 Q r A 1 A 2 A 3 BS Q b
9 Rainfall-runoff models The diagram shows a representation of a runoff model (the Australian Water Balance Model (AWBM), Boughton (2004)) that relates rainfall and other variables to discharge of a stream. The number of storages shown in the graph is a variable to be chosen by the modeller. How to choose the number of storages in the best way for prediction, interpretation, etc.?
10 Nonparametric regression using linear combinations of basis terms I simulated n = 600 response values from the following model: y i = f (z i ) + ɛ i i = 1,..., 600 where the errors ɛ i are independent N(0, ), f (z i ) is the mean function and the predictors z i = (z i1, z i2 ) are generated uniformly on the unit square [0, 1] 2.
11 0 Mean function for simulated data set f(z) z2 0 z1
12 Nonparametric regression using linear combinations of basis terms The mean function f (z) = f (z 1, z 2 ) is f (z) = 1 + N(µ 1, Σ 1, z) + N(µ 2, Σ 2, z) where N(µ, Σ, z) denotes a bivariate normal density with mean µ and covariance matrix Σ. Here we choose µ 1 = (0.25, 0.75) T, µ 2 = (0.75, 0.25) T, Σ 1 = [ ] [ , Σ 2 = ]
13 Nonparametric regression using linear combinations of basis terms Suppose I didn t know beforehand the mean function f (z) and that I want to estimate it. One way to do this is to adopt a flexible representation for f (z) in terms of a linear combination of a large number of basis functions. K f (z) β j h j (z). j=1
14 Nonparametric regression using linear combinations of basis terms Here the h j (z) are the basis terms and the β j are unknown coefficients. To estimate the unknown coefficients we fit a linear model: y i = K β j h j (z i ) + ɛ i. j=1 Some kind of variable selection (estimating some of the β j as exactly zero) might be done to prevent overfitting since K is large.
15 Nonparametric regression using linear combinations of basis terms One choice of basis (there are many): {1, z 1, z 2, z ρ 1 2 log( z ρ 1 ),..., z ρ s 2 log( z ρ s )} where ρ i, i = 1,..., s are a collection of so-called knot points. The knots are points in the predictor space. For a fairly rich set of knots we can obtain good approximations to the mean function.
16 Purposes of model selection The three examples of model selection problems that I ve presented so far illustrate different purposes of model selection. In the gene expression example, interpretation is all important (which genes are different between the two tissue samples?) In the nonparametric regression example, our purpose is purely to predict well. There is no particular interpretation associated with the basis functions which are selected. In the hydrology example, there are elements of both prediction and interpretation to the problem.
17 Purposes of model selection A key idea in model selection of any kind is that one must consider what the model is to be used for. We won t treat this idea very formally in this talk but it is important. The idea of this talk is to review the Bayesian approach to model selection. I ll give a brief review of Bayesian statistics first.
18 Bayesian statistics Bayesian statistics is distinguished by the use of probability for quantifying all kinds of uncertainty. Set of unknowns θ to learn about, data y. Specify a full probability model for the data and unknowns p(y, θ) = p(θ)p(y θ) p(θ) is called the prior distribution and p(y θ) is the likelihood function. The prior codes in probabilistic form what we know about the unknowns before observing data, and gives the opportunity for use of prior knowledge.
19 Bayesian statistics Conditioning on the observed data in this model we get Bayes rule: p(θ y) p(θ)p(y θ) Here p(θ y) is the posterior distribution expressing what we know about θ given the data y. Inference is from the posterior distribution. In summarizing the posterior (by calculating probabilities or expectations) an integration over the parameter space is needed.
20 S 1 S 2 Example: AWBM P E Q s S 3 Q r A 1 A 2 A 3 BS Q b
21 550 MHz. The CPU time required for 100,000 iterations of each run was measured to compare the speed of the MCMC sampling schemes. These computation times are given in Example: Table 4. AWBM 5. Discussion [60] Table 1 and the corresponding results for each sampling algorithm indicates that the posterior summaries obtained for each proposed scheme were similar, with approximately equal mean results, and comparable posterior quartiles. proposed flow must be calculated three times to account for the combination of block and single site updating. The likelihood function must be calculated four times. The MHSS algorithm requires calculation of the proposed flow seven times. In terms of computation time and simplicity, the AM algorithm and MHBU algorithm are superior. The time taken for each simulation is much less. [63] Examining the estimated variance of the mean of the series compares the efficiency of each scheme. By examining the efficiency of the sampling algorithms statistically, it is evident that the AM algorithm provides a far more Figure 3. Posterior distribution and prior distribution for K parameter. 8of11
22 Predictive inference Suppose predictions of future data y are required. Predictive inference is based on p(y y) = p(y θ)p(θ y)dθ. In Bayesian inference predictive distributions are often the basis for informal methods of model criticism and even model selection. How we go about model criticism usually depends on what the model will be used for.
23 Bayesian model comparison Now let s consider model selection in the Bayesian framework. At the end of this material there will be a question for the audience... Consider a collection of models M = {M 1,..., M k }. Denoting the data by y, write p(y θ i, M i ) for the likelihood function for model M i, where we have written θ i for the set of unknown parameters in M i. Write p(θ i M i ) for the prior distribution on the parameters in model M i, i = 1,..., k.
24 Bayesian model comparison In Bayesian statistics uncertainty about unknowns is treated probabilistically. So we need a prior distribution on the unknown model, which will be updated to a posterior distribution based on the data. Write p(m i ) for the prior probability of model M i, i = 1,..., k. Now we apply Bayes rule to obtain
25 Bayesian model comparison p(m i y) p(m i )p(y M i ) (1) where the so-called marginal likelihood p(y M i ) for model M i is obtained as p(y M i ) = p(y θ i, M i )p(θ i M i )dθ i Normalizing (1) so that the distribution sums to one we obtain p(m i y) = p(m i )p(y M i ) j p(m j)p(y M j )
26 Bayesian model comparison Note that the posterior odds of model M i relative to model M j is p(m i y) p(m j y) = p(m i)p(y M i ) p(m j )p(y M j ) From this the ratio of the posterior to prior odds is p(m i y)/p(m j y) p(m i )/p(m j ) = p(y M i) p(y M j ) which is called the Bayes factor for model M i relative to model M j. If all models are assigned equal probability in the prior, then the Bayes factor is simply the ratio of posterior probabilities of the models compared.
27 Bayesian model averaging We discussed before predictive inference. How do we do this in the presence of model uncertainty? Let be a quantity to be predicted (a future response say). Let p( M i, y) be the predictive distribution under model M i. We talked about these kinds of predictive distributions before. Predictive distribution incorporating model uncertainty (Bayesian model averaging) p( y) = i p( M i, y)p(m i y) Model specific predictive distributions are weighted according to the posterior model probabilities.
28 A simple example Consider testing on a normal mean. Observed data y 1,..., y n, independent N(µ, 1). We want to compare the two models M 1 = {µ = 0} and M 2 = {µ 0}. This might be a reasonable way to formulate model selection for one gene in my gene expression example.
29 Example: testing on a normal mean In model M 1 there are no unknown parameters and the marginal likelihood is ( ) n p(y M 1 ) = (2π) n/2 i=1 exp y i 2 2 In model M 2, we need to specify a prior distribution on µ. We take µ M 2 N(0, σ 2 0 ). Question for the audience - what happens as σ and σ 2 0?
30 Example: testing on a normal mean It is easily shown that p(y M 2 ) = (2π) n/2 (nσ ) 1/2 exp ( ) n n exp )ȳ2 2 (n + 1/σ0 2 ( 1 2 n i=1 y 2 i )
31 Example: testing on a normal mean Comparing the expressions for p(y M 1 ) and p(y M 2 ) the factor ( ) (2π) n/2 exp 1 n yi 2 2 is common to both and hence these terms cancel when we compute the Bayes factor of model M 2 relative to model M 1 which is p(y M 2 ) p(y M 1 ) = (nσ ) 1/2 exp i=1 ( n 2 ) n )ȳ2 (n + 1/σ0 2
32 Example: testing on a normal mean Note that as σ0 2 0 this Bayes factor 1. As σ0 2 the Bayes factor 0. In other words, if the models have equal prior probability p(m 2 y) 0.5 as σ0 2 0 and p(m 2 y) 0 as σ0 2 regardless of the data. This example illustrates that it is not acceptable to thoughtlessly use vague proper priors in Bayesian model selection as this will tend to favour the simplest model.
33 Bayesian model comparison Note that the marginal likelihood p(y M i ) = p(y θ i, M i )p(θ i M i )dθ i can be regarded as a predictive density for y before y is observed (if we haven t observed any data yet, the posterior on θ i is just the prior p(θ i M i ) so our definition of the predictive density of y just reduces to the marginal likelihood above).
34 Bayesian model comparison Looking at the marginal likelihood this way, you can see why the behaviour in our example happens. If we have a very tight prior around zero for µ in model M 2 the models are nearly the same (not much difference between setting µ = 0 in model M 1 and having a very tight prior around zero in model M 2 ) and so the Bayes factor is close to 1.
35 Bayesian model comparison With a diffuse prior on µ in model M 2 the prior predictive density is very spread out (since our prior allows the mean to be anywhere) so that the value of the prior predictive density is bound to be smaller than for model M 1. Finding good default choices for priors in model comparison is an active area of current research. Fortunately it is not hard to find a good default choice for many common model selection problems.
36 Methods for calculating marginal likelihoods Once priors are specified, everything is easy. Not quite. How does one compute for model M i the marginal likelihood p(y M i ) = p(y θ i, M i )p(θ i M i )dθ i?
37 Methods for calculating marginal likelihoods In my example I could do this analytically. For complex models θ i is high-dimensional. p(y M i ) is defined by an integral over the (possibly high-dimensional) parameter space. Hard.
38 Methods for calculating marginal likelihoods Obvious idea: recall that p(y M i ) = p(y θ i, M i )p(θ i M i )dθ i and observe that this is an expectation with respect to the prior. Simulate θ (1) i,..., θ (s) i from p(θ i M i ). Then use to estimate p(y M i ). 1 s s i=1 p(y θ (s) i, M i ) Bad idea. The variance is large as the prior is very spread out compared to the likelihood. More sophisticated methods are available.
39 Methods for calculating marginal likelihoods In what follows let s just consider a single model M. We won t show conditioning on M explicitly so that if M has parameter θ we write p(θ), p(y θ), p(y) for the prior, likelihood and marginal likelihood. Markov chain Monte Carlo methods which sample on the model and parameter space jointly are one common approach to calculating marginal likelihoods (Green, 1995, Biometrika, Carlin and Chib, 1995, JRSSB). These methods are not always easy to apply even for experts.
40 Methods for calculating marginal likelihoods An alternative uses methods based on the so-called candidate s formula. From rearranging Bayes rule, This holds for every θ. p(y) = p(θ)p(y θ). p(θ y) Suppose ˆθ is some estimate of the mode. If we can estimate p(ˆθ y) then substituting into the candidate s formula gives an estimate of p(y) (since calculating p(ˆθ) and p(y ˆθ) is usually easy). If we can just estimate the posterior density at a point we can estimate the marginal likelihood!
41 Methods for calculating marginal likelhioods Laplace approximation arises from the candidate s formula by choosing ˆθ the posterior mode, and using a normal approximation to p(θ y) with mean ˆθ and a covariance H 1 based on derivatives of the log posterior at the mode, giving p(y) (2π) p/2 H 1/2 p(ˆθ)p(y ˆθ) where p is the number of parameters. Further simplifying the Laplace approximation to log p(y) leads to the famous BIC criterion.
42 Methods for calculating marginal likelihoods There are more sophisticated ways in which one can use the candidate s formula approach (Chib, 1995, JASA and Chib and Jeliazkov, 2001, JASA), Bridge estimation (Meng and Wong, 1996, Statistica Sinica). For estimating p(y) (not the most general setting) suppose we have some density r(θ) with no unknown normalizing constant and let t(θ) be any function of θ such that 0 < t(θ)r(θ)p(θ)p(y θ)dθ <.
43 Methods for calculating marginal likelihoods Then p(y) = p(θ)p(y θ)t(θ)r(θ)dθ r(θ)t(θ)p(θ y)dθ. To see this, just write p(y)p(θ y) = p(θ)p(y θ) and multiply both sides by r(θ)t(θ) and then integrate. The denominator can be estimated from a Monte Carlo sample from p(θ y) (this is obtainable using standard Bayesian computational methods). The numerator can be estimated from a sample from r(θ).
44 Predictive model selection So far we ve used what is sometimes called the fully probabilistic" approach to model selection. Some of the challenges in implementing this approach have caused some Bayesians to look for alternatives. The difficulties with the conventional approach are One must assign a prior distribution on the model space, and this may be difficult to do when you don t believe in any of the models. Model comparison may be sensitive to priors on the model parameters. The marginal likelihood is hard to calculate. The goals of model improvement may be better served by examining less formal diagnostics that illuminate how the model doesn t fit. There are of course ways of responding to these criticisms by those who advocate the traditional Bayesian approach but I won t go into this debate here.
45 Predictive model selection There are numerous alternatives to the traditional Bayesian approach to model selection. Most are motivated from the point of view of wanting to predict well rather than choosing the true" model. Often view model selection as a two step decision problem. First a model is chosen, and then the chosen model is used to make a prediction. Optimal choice of model is required to minimize some specified measure of predictive loss.
46 Predictive model selection Popular predictive approaches Bayesian variants of cross-validation (Geisser and Eddy, 1979, JASA, Bernardo and Smith, 1994) DIC (Spiegelhalter et al., 2002, JRSSB). Posterior Bayes factors (Aitkin, 1991, JRSSB) Less formal approaches based on simulations of replicate data sets from posterior predictive distributions (posterior predictive checks, Gelman, Meng and Stern, 1996, Statistica Sinica). Many others.
47 Conclusion Even if you are not a Bayesian, thinking about the Bayesian way to approach a problem is often very illuminating for understanding all the sources of information available. One must also consider the purpose for which a model is constructed. Prediction is a very different goal to attempting to choose the true" model. Don t forget about background knowledge, sensitivity analysis and diagnostics when model building. References to my own research on my website: standj
Penalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks
(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBAYESIAN MODEL CRITICISM
Monte via Chib s BAYESIAN MODEL CRITICM Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationEstimating the marginal likelihood with Integrated nested Laplace approximation (INLA)
Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) arxiv:1611.01450v1 [stat.co] 4 Nov 2016 Aliaksandr Hubin Department of Mathematics, University of Oslo and Geir Storvik
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationAssessing Regime Uncertainty Through Reversible Jump McMC
Assessing Regime Uncertainty Through Reversible Jump McMC August 14, 2008 1 Introduction Background Research Question 2 The RJMcMC Method McMC RJMcMC Algorithm Dependent Proposals Independent Proposals
More informationBayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration
Bayes Factors, posterior predictives, short intro to RJMCMC Thermodynamic Integration Dave Campbell 2016 Bayesian Statistical Inference P(θ Y ) P(Y θ)π(θ) Once you have posterior samples you can compute
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationA note on Reversible Jump Markov Chain Monte Carlo
A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Model Diagnostics and Checking
Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationTools for Parameter Estimation and Propagation of Uncertainty
Tools for Parameter Estimation and Propagation of Uncertainty Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu Outline Models, parameters, parameter estimation,
More informationIntroduction to Bayesian Inference
University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationDavid Giles Bayesian Econometrics
9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do
More informationST 740: Model Selection
ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 5. Bayesian Computation Historically, the computational "cost" of Bayesian methods greatly limited their application. For instance, by Bayes' Theorem: p(θ y) = p(θ)p(y
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationLecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationWill Penny. SPM short course for M/EEG, London 2013
SPM short course for M/EEG, London 2013 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated
More informationBayesian Approaches for Model Evaluation and Uncertainty Estimation
Bayesian Approaches for Model Evaluation and Uncertainty Estimation Tsuyoshi TADA National Defense Academy, Japan APHW 2006 Bangkok 1 Introduction In wellgauged basins Parameters can be obtained by calibrations.
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationBayesian Inference. p(y)
Bayesian Inference There are different ways to interpret a probability statement in a real world setting. Frequentist interpretations of probability apply to situations that can be repeated many times,
More informationNested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland
Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/ is a Monte Carlo method (not necessarily MCMC) that was introduced by John Skilling in 2004. It is very popular
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationHierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31
Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationMarkov chain Monte Carlo methods in atmospheric remote sensing
1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationBayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School
Bayesian modelling Hans-Peter Helfrich University of Bonn Theodor-Brinkmann-Graduate School H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 1 / 22 Overview 1 Bayesian modelling
More informationUsing Model Selection and Prior Specification to Improve Regime-switching Asset Simulations
Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations Brian M. Hartman, PhD ASA Assistant Professor of Actuarial Science University of Connecticut BYU Statistics Department
More informationBayesian inference: what it means and why we care
Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationNew Insights into History Matching via Sequential Monte Carlo
New Insights into History Matching via Sequential Monte Carlo Associate Professor Chris Drovandi School of Mathematical Sciences ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)
More informationBayesian Methods in Multilevel Regression
Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design
More informationNonparametric Bayes Uncertainty Quantification
Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes
More informationWill Penny. DCM short course, Paris 2012
DCM short course, Paris 2012 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated to a
More informationLecture 3. Univariate Bayesian inference: conjugate analysis
Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationMCMC notes by Mark Holder
MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:
More informationModel comparison. Christopher A. Sims Princeton University October 18, 2016
ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationMonte Carlo Inference Methods
Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationBridge estimation of the probability density at a point. July 2000, revised September 2003
Bridge estimation of the probability density at a point Antonietta Mira Department of Economics University of Insubria Via Ravasi 2 21100 Varese, Italy antonietta.mira@uninsubria.it Geoff Nicholls Department
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationBayesian Phylogenetics:
Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationBayesian Analysis (Optional)
Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationEstimating marginal likelihoods from the posterior draws through a geometric identity
Estimating marginal likelihoods from the posterior draws through a geometric identity Johannes Reichl Energy Institute at the Johannes Kepler University Linz E-mail for correspondence: reichl@energieinstitut-linz.at
More informationLecture 1 Bayesian inference
Lecture 1 Bayesian inference olivier.francois@imag.fr April 2011 Outline of Lecture 1 Principles of Bayesian inference Classical inference problems (frequency, mean, variance) Basic simulation algorithms
More informationSpatial Statistics Chapter 4 Basics of Bayesian Inference and Computation
Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation So far we have discussed types of spatial data, some basic modeling frameworks and exploratory techniques. We have not discussed
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationEmpirical Likelihood Based Deviance Information Criterion
Empirical Likelihood Based Deviance Information Criterion Yin Teng Smart and Safe City Center of Excellence NCS Pte Ltd June 22, 2016 Outline Bayesian empirical likelihood Definition Problems Empirical
More information