Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Similar documents
Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Lecture : Probabilistic Machine Learning

Lecture 13 Fundamentals of Bayesian Inference

Bayesian Regression Linear and Logistic Regression

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 18-16th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Bayesian Learning (II)

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models

Density Estimation. Seungjin Choi

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

GAUSSIAN PROCESS REGRESSION

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Unsupervised Learning

Lecture 4: Dynamic models

CSC321 Lecture 18: Learning Probabilistic Models

An Introduction to Bayesian Linear Regression

CMU-Q Lecture 24:

Choosing among models

Stat 451 Lecture Notes Numerical Integration

Hierarchical Models & Bayesian Model Selection

Bayesian Machine Learning

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course

Bayesian RL Seminar. Chris Mansley September 9, 2008

A Bayesian Treatment of Linear Gaussian Regression

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

ST 740: Model Selection

Introduction to Bayesian Inference

CPSC 540: Machine Learning

Statistical learning. Chapter 20, Sections 1 4 1

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Stat 5101 Lecture Notes

STA 4273H: Sta-s-cal Machine Learning

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

STA414/2104 Statistical Methods for Machine Learning II

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Classical and Bayesian inference

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

Testing Restrictions and Comparing Models

CSC 2541: Bayesian Methods for Machine Learning

Parametric Techniques Lecture 3

Introduction to Bayesian Learning. Machine Learning Fall 2018

ECE531 Lecture 2b: Bayesian Hypothesis Testing

Lecture 3. Univariate Bayesian inference: conjugate analysis

Nonparametric Bayesian Methods (Gaussian Processes)

1 Bayesian Linear Regression (BLR)

Gaussian Process Regression

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

1 Hypothesis Testing and Model Selection

Robust Bayesian Simple Linear Regression

Time Series and Dynamic Models

Markov Chain Monte Carlo

COS513 LECTURE 8 STATISTICAL CONCEPTS

ECE521 week 3: 23/26 January 2017

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Naïve Bayes classification

Model comparison and selection

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Nonparameteric Regression:

Bayesian Regressions in Experimental Design

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

David Giles Bayesian Econometrics

Parametric Techniques

Linear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Statistical Data Analysis Stat 3: p-values, parameter estimation

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Bayesian search for other Earths

Bayesian Linear Regression [DRAFT - In Progress]

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Based on slides by Richard Zemel

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

STA 414/2104, Spring 2014, Practice Problem Set #1

CS281A/Stat241A Lecture 22

PMR Learning as Inference

Bayesian Model Diagnostics and Checking

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

COMP90051 Statistical Machine Learning

Bayesian data analysis in practice: Three simple examples

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Statistical Inference: Maximum Likelihood and Bayesian Approaches

Lecture 11: Probability Distributions and Parameter Estimation

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Transcription:

Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1

Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web this afternoon: capture/recapture. Additional articles have been posted.

.1 Outline Bayesian model selection. Bayesian linear model and variable selection. Extensions. Overview of the Lecture Page 3

3.1 Summary of Last Lecture Ones wants to compare two hypothesis: H : θ π versus H 1 : θ π 1 then the prior is where π H )+π H 1 )=1. π θ) =π H ) π θ)+π H 1 ) π 1 θ) One can have in a coin example: π θ) =U [ 1, 1], π 1 θ) =U [ ), 1 or π θ) =δ θ θ) andπ 1 θ) =U [, ) 1 or π θ) =Be α,β )and π 1 θ) =Be α 1,β 1 ). To compare H versus H 1, we typically compute the Bayes factor which partially eliminated the influence of the prior modelling i.e. π H i )) B1 π = π x H 1) f x θ) π x H ) = π1 θ) dθ f x θ) π θ) dθ Bayesian Model Selection Page 4

3.1 Summary of Last Lecture You can also compute the posterior probabilities of H and H 1 π H x) = π x H ) π H ) π x) = π x H ) π H ) π x H ) π H )+π x H 1 ) π H 1 ). The posterior probabilities satisfy π H 1 x) π H x) = π x H 1) π x H ) π H 1 ) π H ) = π H 1 ) Bπ 1 π H ). Bayesian Model Selection Page 5

3.1 Summary of Last Lecture Testing hypothesis in a Bayesian way is attractive... but be careful to vague priors!!! Assume you have X μ, σ ) N μ, σ ) where σ is assumed known but μ the parameter θ) is unknown. We want to test H : μ = vs H 1 : μ N ξ,τ ) then B π 1 x) = π x H 1) π x H ) = = N x; μ, σ ) N μ; ξ,τ ) dμ f x ) σ σ + τ exp τ x ) σ σ + τ ) τ Bayesian Model Selection Page 6

3. Bayesian Polynomial Regression Example In practice, you might have more than potential models/hypothesis for your data. Consider the following polynomial regression problem where D = {x i,y i } n i=1 where x i,y i ) R R. Y = M β i X i + σv where V N, 1) i= = β T :M f M X)+σV Here the problem is that if M is too large then there will be overfitting. Bayesian Model Selection Page 7

3. Bayesian Polynomial Regression Example As M increases, the model overfits. M = M = 1 M = M = 3 5 1 5 1 5 1 5 1 M = 4 M = 5 M = 6 M = 7 5 1 5 1 5 1 5 1 Bayesian Model Selection Page 8

3. Bayesian Polynomial Regression Example Candidate Bayesian models H M for M {,..., M max }. For the model H M, we take the prior π M β:m,σ ) π M β:m,σ ) = π M β:m σ ) π M σ ) = N β :M ;,δ σ ) I M+1 IG σ ; ν, γ ). We have the following Gaussian likelihood f D β :M,σ ) = n N y i ; β:mf T M x i ),σ ) i=1 Bayesian Model Selection Page 9

3. Bayesian Polynomial Regression Example Standard calculations yield π M β:m,σ D ) = N ) β :M ; μ M,σ Σ M IG σ ; ν + n, γ + n i=1 y i μt M Σ 1 M μ M ) where n ) μ M =Σ M y i f M x i ) i=1, Σ 1 M = δ I M+1 + n f M x i ) fm T x i ) i=1 knowing that IG σ ; α, β ) = βα Γβ) 1 σ ) α+1 exp βσ ). Bayesian Model Selection Page 1

3. Bayesian Polynomial Regression Example The marginal likelihood/evidence is given by π D H M )= f D β :M,σ ) π M β:m,σ ) dβ :M dσ =Γ ν +n +1 ) δ M+1) Σ M 1/ γ + n i=1 y i μt M Σ 1 M μ M ) ν +n +1) We can also compute π H M D) = π D H M ) π H M ) Mmax i= π D H i ) π H i ) Bayesian Model Selection Page 11

3. Bayesian Polynomial Regression Example M = M = 1 M = M = 3 1 Model Evidence.8 5 1 M = 4 5 1 M = 5 5 1 M = 6 5 1 M = 7 PY M).6.4. 1 3 4 5 6 7 M 5 1 5 1 5 1 5 1 Bayesian Model Selection Page 1

3. Bayesian Polynomial Regression Example We have assumed here that δ was fixed and set to δ =1. As δ, the prior on β :M is getting vague but then as for M 1 lim π H D) =1 δ π D H ) δ 1 1/ γ Σ + n π D H M ) = i=1 y i μt Σ 1 μ δ M+1) Σ M 1/ γ + n i=1 y i μt M Σ 1 M μ M ) ν +n +1) ) ν +n +1) δ Do not use vague priors for model selection!!! For a robust model, select a random δ and estimate it from the data. However, numerical methods are then necessary. Bayesian Model Selection Page 13

3.3 Bayesian Model Choice: Example In practice, you might have models of different natures for your data x =x 1,..., x T ). M 1 : Gaussian white noise X n iid N,σ WN). M : An AR process of order k AR, k AR being fixed, excited by white Gaussian iid noise V n N,σAR), X n = k AR i=1 a i X n i + V n. M 3 : k sin sinusoids, k sin being fixed, embedded in a white Gaussian noise iid sequence V n N,σsin), k sin X n = acj cos [ω j n]+a sj sin [ω j n] ) + V n. j=1 Bayesian Model Selection Page 14

3.4 Bayesian Model for Model Choice Generally speaking you have a countable collection of models {M i }. For each model M i, you have a prior π i θ i )onθ i and a likelihood function f i x θ i ). You attribute a prior probability π i) toeachmodelm i. The parameter space is Θ = i {i} Θ i and the prior on Θ is π i, θ i )=π i) π i θ i ). Bayesian Model Selection Page 15

3.4 Bayesian Model for Model Choice In the polynomial regression example Θ= M max i= {i} }{{} model indicator } R {{ i+1 } }{{} R + regression parameters β :i noise variance. Remark: In all models, you have a noise variance to estimate. This parameter has a different interpretation for each model. Bayesian Model Selection Page 16

3.4 Bayesian Model for Model Choice In the non-nested example Θ = {1} Θ 1 {} Θ {3} Θ 3 where θ 1 = σ WN and Θ 1 = R +, θ = a 1,..., a kar,σ AR) and Θ = R k AR R +, θ 3 = ) a c1,a s1,ω 1,...,a cksin,a sksin,ω ksin,σwn,θ 3 = R k sin [,π] k sin R +. Remark: In all models, you have a noise variance to estimate. This parameter has a different interpretation for each model. Be careful, we don t select here Θ = {1,, 3} Θ 1 Θ Θ 3. Bayesian Model Selection Page 17

3.4 Bayesian Model for Model Choice The posterior is given by Bayes rule π k, θ k x) = π k) π k θ k ) f k x θ k ) k π k) Θ k π k θ k ) f k x θ k ) dθ k. We can obtain the posterior model probabilities through π k x) = Θ k π k, θ k x) dθ k. Once more, it is conceptually simple but it requires the calculation of many/an infinite number of integrals. Bayesian Model Selection Page 18

3.5 Bayesian Model Averaging Assume you re doing some prediction of say Y g y θ). Then in light of x, wehave g y x) = g y θ) π θ x) dθ = k = k Θ k g k y θ k ) π k, θ k x) dθ k π k x) }{{} posterior proba of model k g k y θ k ) π θ k x, k) dθ k Θ } k {{} Prediction from model k This is called Bayesian model averaging. All the models are taken into account to perform the prediction. Bayesian Model Selection Page 19

3.5 Bayesian Model Averaging An alternative way to make prediction consists of selecting the best model ; say the model which has the highest posterior proba. The prediction is performed according to Θ kbest g kbest y θ kbest ) π θ kbest x, k best ) dθ kbest This is computationally much simpler and cheaper. This can also be very misleading. Bayesian Model Selection Page

3.6 Example Consider the previous example: 1 simulated data from a sum of three sinusoids with a very large additive noise. Priors were selected for the three models: Inverse-Gamma for σ, normal inverse-gamma for AR and normal-inverse Gamma plus uniform for sinusoids. We set π H 1 )=π H )=π H 3 )= 1 3. We obtain π H 1 x) =., πh x) =.1 and π H 3 x) =.86. If we start using very vague priors... π H 1 x) 1. Bayesian Model Selection Page 1

3.7 Bayesian Variable Selection Example Consider the standard linear regression problem Y = p β i X i + σv where V N, 1) i=1 Often you might have too many predictors, so this model will be inefficient. A standard Bayesian treatment of this problem consists of selecting only a subset of explanatory variables. This is nothing but a model selection problem with p possible models. Bayesian Model Selection Page

3.8 Bayesian Variable Selection Example A standard way to write the model is p Y = γ i β i X i + σv where V N, 1) i=1 where γ i =1ifX i is included or γ i = otherwise. However this suggests that β i is defined even when γ i =. A neater way to write such models is to write Y = β i X i + σv = βγ T X γ + σv {i:γ i =1} where, for a vector γ =γ 1,..., γ p ), β γ = {β i : γ i =1},X γ = {X i : γ i =1} and n γ = p i=1 γ i. Prior distributions π γ βγ,σ ) = N ) β γ ;,δ σ I nγ IG σ ; ν, γ and π γ) = p i=1 π γ i)= p. ) Bayesian Model Selection Page 3

3.8 Bayesian Variable Selection Example An alternative way to think of it is to write but the prior follows Y = β T X + σv with π β 1,..., β p )= p π β i ) i=1 β i σ 1 δ + 1 N,δ σ ). The regression coefficients follow a mixture model with a degenerate component. Bayesian Model Selection Page 4

3.9 Bayesian Variable Selection Example For a fixed model γ and n observations D = {x i,y i } n i=1 the marginal likelihood and the posterior analytically then we can determine π γ D βγ,σ ) ν + n =Γ and ) +1 δ n γ Σ γ 1/ γ + n i=1 y i μt γ Σ 1 γ μ γ ) ν +n +1) where π γ βγ,σ D ) = N β γ ; μ γ,σ Σ γ ) IG n ) μ γ =Σ γ y i x γ,i i=1 σ ; ν + n, γ + n i=1 y i μt γ Σ 1 γ, Σ 1 γ = δ I nγ + n x γ,i x T γ,i. i=1 μ γ ) Bayesian Model Selection Page 5

3.1 Conclusion Bayesian model selection is a simple and principled way to do model selection. Bayesian model selection appears in numerous applications. Vague/Improper priors have to be banned in the model selection context!!!! Bayesian model selection only allows us to compare models. It does not tell you if any of the candidate models makes sense. Except for simple problems, it is impossible to perform calculations in closed-form. Bayesian Model Selection Page 6