Bayesian Model Comparison
|
|
- Albert Stokes
- 5 years ago
- Views:
Transcription
1 BS2 Statistical Inference, Lecture 11, Hilary Term 2009 February 26, 2009
2 Basic result An accurate approximation Asymptotic posterior distribution An integral of form I = b a e λg(y) h(y) dy where h(y) and g(y) are smooth and g has local minimum at y (a, b) can be approximated as I = e λg(y ) h(y ) 2π λg (y ) { 1 + O ( )} 1. λ
3 Basic result An accurate approximation Asymptotic posterior distribution A more accurate approximation is { I = e λ g λ(ỹ λ ) 2π λ g λ (ỹ ρ ( )} 3 3 ρ O λ) 24λ λ 2, where now ỹ λ maximizes g λ (y) = g(y) λ 1 log h(y), and ρ 3 = (4) (3) g λ (ỹ λ) { g λ (ỹ λ)} 3/2, ρ 4 = g λ (ỹ λ) { g λ (ỹ λ)} 2.
4 Basic result An accurate approximation Asymptotic posterior distribution It holds approximately for large n, that the posterior distribution of θ is θ N d {ˆθ, j n (ˆθ) 1 } = N d (ˆθ, j(ˆθ) 1 /n}. A more accurate approximation is obtained from the Laplace approximation to be π exp{l(θ)}π(θ) (θ) = Θ exp{l(θ)}π(θ) dθ = (2π/n) d/2 exp{l(θ) l(ˆθ)} π(θ) j(ˆθ) 1/2 {1 + O(n 1 )}. π(ˆθ) Note in particular the expression for the normalization constant f (x θ)π(θ) dθ = (2π/n) d/2 L(ˆθ)π(ˆθ) j(ˆθ) 1/2 {1 + O(n 1 )}. Θ
5 Basic setup Bayes factor for Gaussian independence We consider a number of competing models M j, j = 1,..., m for data X ; for example M 1 might specify that the expectation of a component X i of X depends linearly on covariates Y i, an alternative M 2 may specify that it has a quadratic dependence, whereas a third model M 3 might specify that the expectation does not depend on Y i at all. Associated with each of these models are parameter spaces Θ j and prior distributions π j (θ j ) as well as prior model probabilities π j for model M j being the correct description of affairs.
6 Basic setup Bayes factor for Gaussian independence The posterior probability for model M j would then satisfy πj f (x θ j, M j )π j (θ j ) dθ j π j Θ j i.e. it will as usual be proportional to the product of the marginal or integrated likelihood L j of model M j with the prior model probability, π j where L j = f (x M j ) = f (x θ j, M j )π j (θ j ) dθ j. Θ j
7 Basic setup Bayes factor for Gaussian independence Comparing two models yields πj πk = f (x M j) f (x M k ) = Θ j f (x θ j, M j )π j (θ j ) dθ j π j. Θ 2 f (x θ k, M k )π k (θ k ) dθ k π k The factor B jk = f (x M j) f (x M k ) = Θ j f (x θ j, M j )π j (θ j ) dθ j = L j. Θ 2 f (x θ k, M k )π k (θ k ) dθ k L k ia known as the Bayes Factor in favour of model j over model k. Note that if the Bayesian model is taken to its consequence, this is nothing but the usual likelihood ratio.
8 Basic setup Bayes factor for Gaussian independence Recall that Σ follows an inverse Wishart distribution if K = Σ 1 follows a Wishart distribution, formally expressed as Σ IW d (δ, Ψ) K = Σ 1 W d (δ + d 1, Ψ 1 ), i.e. if the density of K has the form f (K δ, Ψ) (det K) δ/2 1 e tr(ψk)/2.
9 Basic setup Bayes factor for Gaussian independence The inverse Wishart distributions form a conjugate family for Σ. If the prior distribution of Σ is IW d (δ, Ψ) and W Σ W d (n, Σ), the posterior density of K is f (K δ, Ψ, W ) (det K) n/2 tr(kw )/2 e (det K) δ/2 1 e tr(ψk)/2 = (det K) (n+δ)/2 1 e tr{(ψ+w )K}/2, and hence the posterior distribution is simply IW d (δ + n, Ψ + W ) = IW d (δ, Ψ ).
10 Basic setup Bayes factor for Gaussian independence To calculate the Bayes factor for independence we need the full form of the Wishart density for K: f d (K δ, Ψ) = c(d, δ) 1 (det Ψ) (δ+d 1)/2 (det K) δ/2 1 e tr(ψk)/2 The constant c(d, δ) is c(d, δ) = 2 (δ+d 1)d/2 (2π) d(d 1)/4 d Γ{(δ + d i)/2}. i=1
11 Basic setup Bayes factor for Gaussian independence The marginal density of W becomes f (W δ, Ψ) = f (W n, K)f (K δ, Ψ) dk = (det W ) (n d 1)/2 c(d, n) 1 c(d, δ) 1 (det Ψ) (δ+d 1)/2 (det K) (n+δ)/2 1 e tr{k(w +Ψ)}/2 dk = (det W ) (n d 1)/2 c(d, n) 1 c(d, δ) 1 (det Ψ) (δ+d 1)/2 {det(ψ + W )} (δ+n 1)/2 c(d, n + δ) = (det W )(n d 1)/2 (det Ψ) (δ+d 1)/2 {det(ψ + W )} (δ+n 1)/2 c(d, n + δ) c(d, n)c(d, δ).
12 Basic setup Bayes factor for Gaussian independence Consider now alternative models M 2 with Σ arbitrary and M 1 with Σ of block diagonal form, i.e. with ( ) Σ11 0 Σ =. 0 Σ 22 If the associated prior distributions are for M 2 that Σ IW d (δ, I d ) and for M 1 that Σ 11 IW r (δ, I r ), and Σ 22 IW s (δ, I s ), we can now calculate the Bayes factor.
13 Basic setup Bayes factor for Gaussian independence We get B 12 = f (W 11 δ, I r )f (W 22 δ, I s ) f (W δ, I d ) = (det W 11) (n r 1)/2 (det W 22 ) (n s 1)/2 (det W ) (n d 1)/2 { det(i d + W ) det(i r + W 11 ) det(i s + W 22 ) c(d, n)c(d, δ)c(r, n + δ)c(s, n + δ) c(d, n + δ)c(r, n)c(r, δ)c(s, n)c(s, δ) } (δ+n 1)/2) Note the similarity between the first fraction and Wilks Λ for independence.
14 Basic Laplace approximation Bayesian information criterion In general the Bayes factor is difficult or impossible to calculate explicitly. Recall that for competing models M 1 and M 2 with parameters θ 1 Θ 1 R d 1 and θ 2 Θ 2 R d 2 and prior distributions π 1, π 2, the Bayes factor B in favour of M 1 over M 2 is B = f (x 1,..., x n M 1 ) f (x 1,..., x n M 2 ) = Θ 1 f (x θ 1, M 1 )π 1 (θ 1 ) dθ 1 Θ 2 f (x θ 2, M 2 )π 2 (θ 2 ) dθ 2. Recall the approximate expression obtained for the Bayesian marginal likelihood using Laplace s method f (x θ)π(θ) dθ = (2π/n) d/2 L(ˆθ)π(ˆθ) j(ˆθ) 1/2 {1 + O(n 1 )}. Θ
15 Basic Laplace approximation Bayesian information criterion We then get B = (2π) (d 1 d 2 )/2 n (d 2 d 1 )/2 L(ˆθ 1 )π(ˆθ 1 ) j 2 (ˆθ 2 ) 1/2 L(ˆθ 2 )π(ˆθ 2 ) j 1 (ˆθ 1 ) 1/2 {1 + O(n 1 )}. To study the asymptotic behaviour of the Bayes factor we take logarithms and collect terms of similar order to get log B = n{ l n (ˆθ 1 ) l n (ˆθ 2 )} + d 2 d 1 log n + log{π(ˆθ 1 )/π(ˆθ 2 )} 2 1 { } 2 log j1 (ˆθ 2 ) / j 1 (ˆθ 1 ) d 2 d 1 log(2π) + O(n 1 ). 2
16 Basic Laplace approximation Bayesian information criterion The dominating terms are those on the first line, as all other terms are of smaller order for n. Ignoring the latter we get log B {l(ˆθ 1 ) l(ˆθ 2 )} d 1 d 2 2 log n. The right-hand side is the Bayesian Information Criterion (BIC). It reflects that, for large n, the Bayes factor will favour the model with highest maximized likelihood (the first term), but will also penalize the model having the largest number of parameters. The prior distributions π i do not enter in the expression for BIC which may or may not be seen as an advantage. Models with a high value of BIC would be preferred over models with a low value of BIC.
17 Basic Laplace approximation Bayesian information criterion One can get a more accurate approximation of the Bayes factor by adding terms 1 { } 2 log ji (ˆθ 2 ) + d i 2 log(2π) but this correction is not increasing with n, so it is most commonly ignored. For the comparison of two models we get BIC = l(ˆθ 1 ) l(ˆθ 2 ) + d 1 d 2 2 = log LR + d 1 d 2 2 log n. log n Thus, in comparison with straight maximized likelihood, the simpler model gets preference by entertaining a lower penalty.
18 Basic Laplace approximation Bayesian information criterion In the nested case, if d 1 < d 2 the deviance difference between the models is D = 2 log LR so 2 BIC = D + (d 1 d 2 ) log n. If the true value of the parameter θ 0 M 1 M 2, the deviance D would under suitable regularity conditions be approximately χ 2 (d 2 d 1 ). The penalty term will thus dominate for large values of n, so the simpler model will eventually be chosen. In this sense, BIC will asymptotically choose the simplest model which is correct, often referred to as consistency of the BIC.
Bayesian Asymptotics
BS2 Statistical Inference, Lecture 8, Hilary Term 2008 May 7, 2008 The univariate case The multivariate case For large λ we have the approximation I = b a e λg(y) h(y) dy = e λg(y ) h(y ) 2π λg (y ) {
More informationInverse Wishart Distribution and Conjugate Bayesian Analysis
Inverse Wishart Distribution and Conjugate Bayesian Analysis BS2 Statistical Inference, Lecture 14, Hilary Term 2008 March 2, 2008 Definition Testing for independence Hotelling s T 2 If W 1 W d (f 1, Σ)
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationApproximating models. Nancy Reid, University of Toronto. Oxford, February 6.
Approximating models Nancy Reid, University of Toronto Oxford, February 6 www.utstat.utoronto.reid/research 1 1. Context Likelihood based inference model f(y; θ), log likelihood function l(θ; y) y = (y
More informationLecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationDefault priors and model parametrization
1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)
More informationNew Bayesian methods for model comparison
Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationMultivariate Gaussian Analysis
BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density
More informationMore on nuisance parameters
BS2 Statistical Inference, Lecture 3, Hilary Term 2009 January 30, 2009 Suppose that there is a minimal sufficient statistic T = t(x ) partitioned as T = (S, C) = (s(x ), c(x )) where: C1: the distribution
More informationMODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY
ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationBayesian and frequentist inference
Bayesian and frequentist inference Nancy Reid March 26, 2007 Don Fraser, Ana-Maria Staicu Overview Methods of inference Asymptotic theory Approximate posteriors matching priors Examples Logistic regression
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationEco517 Fall 2014 C. Sims MIDTERM EXAM
Eco57 Fall 204 C. Sims MIDTERM EXAM You have 90 minutes for this exam and there are a total of 90 points. The points for each question are listed at the beginning of the question. Answer all questions.
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationBayesian Inference in Astronomy & Astrophysics A Short Course
Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning
More informationSeminar über Statistik FS2008: Model Selection
Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationModel comparison. Christopher A. Sims Princeton University October 18, 2016
ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research
More informationWilks Λ and Hotelling s T 2.
Wilks Λ and. Steffen Lauritzen, University of Oxford BS2 Statistical Inference, Lecture 13, Hilary Term 2008 March 2, 2008 If X and Y are independent, X Γ(α x, γ), and Y Γ(α y, γ), then the ratio X /(X
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationAn Extended BIC for Model Selection
An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationDecomposable Graphical Gaussian Models
CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationEstimating prediction error in mixed models
Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12 GLMM - Generalized linear mixed models
More informationNuisance parameters and their treatment
BS2 Statistical Inference, Lecture 2, Hilary Term 2008 April 2, 2008 Ancillarity Inference principles Completeness A statistic A = a(x ) is said to be ancillary if (i) The distribution of A does not depend
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationNancy Reid SS 6002A Office Hours by appointment
Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Problems assigned weekly, due the following week http://www.utstat.toronto.edu/reid/4508s16.html Various types of likelihood 1. likelihood,
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More information. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)
Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,
More informationBayesian model selection: methodology, computation and applications
Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
More informationNancy Reid SS 6002A Office Hours by appointment
Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Light touch assessment One or two problems assigned weekly graded during Reading Week http://www.utstat.toronto.edu/reid/4508s14.html
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationF & B Approaches to a simple model
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationEmpirical Likelihood Based Deviance Information Criterion
Empirical Likelihood Based Deviance Information Criterion Yin Teng Smart and Safe City Center of Excellence NCS Pte Ltd June 22, 2016 Outline Bayesian empirical likelihood Definition Problems Empirical
More informationSRNDNA Model Fitting in RL Workshop
SRNDNA Model Fitting in RL Workshop yael@princeton.edu Topics: 1. trial-by-trial model fitting (morning; Yael) 2. model comparison (morning; Yael) 3. advanced topics (hierarchical fitting etc.) (afternoon;
More informationStandard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j
Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )
More informationIntroduction to Bayesian Statistics
Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California
More informationThe Volume of Bitnets
The Volume of Bitnets Carlos C. Rodríguez The University at Albany, SUNY Department of Mathematics and Statistics http://omega.albany.edu:8008/bitnets Abstract. A bitnet is a dag of binary nodes representing
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationBeta statistics. Keywords. Bayes theorem. Bayes rule
Keywords Beta statistics Tommy Norberg tommy@chalmers.se Mathematical Sciences Chalmers University of Technology Gothenburg, SWEDEN Bayes s formula Prior density Likelihood Posterior density Conjugate
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationStructure estimation for Gaussian graphical models
Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of
More informationBayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration
Bayes Factors, posterior predictives, short intro to RJMCMC Thermodynamic Integration Dave Campbell 2016 Bayesian Statistical Inference P(θ Y ) P(Y θ)π(θ) Once you have posterior samples you can compute
More informationSpatial Statistics Chapter 4 Basics of Bayesian Inference and Computation
Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation So far we have discussed types of spatial data, some basic modeling frameworks and exploratory techniques. We have not discussed
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLast week. posterior marginal density. exact conditional density. LTCC Likelihood Theory Week 3 November 19, /36
Last week Nuisance parameters f (y; ψ, λ), l(ψ, λ) posterior marginal density π m (ψ) =. c (2π) q el P(ψ) l P ( ˆψ) j P ( ˆψ) 1/2 π(ψ, ˆλ ψ ) j λλ ( ˆψ, ˆλ) 1/2 π( ˆψ, ˆλ) j λλ (ψ, ˆλ ψ ) 1/2 l p (ψ) =
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationA brief introduction to mixed models
A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.
More informationFREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE
FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationSTK-IN4300 Statistical Learning Methods in Data Science
Outline of the lecture Linear Methods for Regression Linear Regression Models and Least Squares Subset selection STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no
More informationSTK-IN4300 Statistical Learning Methods in Data Science
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no STK-IN4300: lecture 2 1/ 38 Outline of the lecture STK-IN4300 - Statistical Learning Methods in Data Science Linear
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationLearning Bayesian networks
1 Lecture topics: Learning Bayesian networks from data maximum likelihood, BIC Bayesian, marginal likelihood Learning Bayesian networks There are two problems we have to solve in order to estimate Bayesian
More informationModel selection in penalized Gaussian graphical models
University of Groningen e.c.wit@rug.nl http://www.math.rug.nl/ ernst January 2014 Penalized likelihood generates a PATH of solutions Consider an experiment: Γ genes measured across T time points. Assume
More informationParameter Estimation
Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods
More informationIntroduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems
Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Anne Sabourin, Ass. Prof., Telecom ParisTech September 2017 1/78 1. Lecture 1 Cont d : Conjugate priors and exponential
More informationDavid Giles Bayesian Econometrics
9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More information