Bayesian Model Averaging

Similar documents
Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015

Model Choice. Hoff Chapter 9. Dec 8, 2010

Mixtures of Prior Distributions

Mixtures of Prior Distributions

Bayesian Variable Selection Under Collinearity

Mixtures of g-priors for Bayesian Variable Selection

Mixtures of g-priors for Bayesian Variable Selection

Bayesian Variable Selection Under Collinearity

Mixtures of g Priors for Bayesian Variable Selection

Bayesian methods in economics and finance

Supplementary materials for Scalable Bayesian model averaging through local information propagation

MODEL AVERAGING by Merlise Clyde 1

Inference for a Population Proportion

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Bayesian Inference: Concept and Practice

Lecture 5: Conventional Model Selection Priors

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Some Curiosities Arising in Objective Bayesian Analysis

Predictive Distributions

Mixtures of g-priors in Generalized Linear Models

Bayesian Regression (1/31/13)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Posterior Model Probabilities via Path-based Pairwise Priors

Mixtures of g-priors in Generalized Linear Models

g-priors for Linear Regression

STAT 425: Introduction to Bayesian Analysis

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Accounting for Complex Sample Designs via Mixture Models

Stat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010

Introduction to Bayesian Methods

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

The linear model is the most fundamental of all serious statistical models encompassing:

Bayesian linear regression

Stat 5101 Lecture Notes

Simple Linear Regression

David Giles Bayesian Econometrics

Linear Models A linear model is defined by the expression

Robust Bayesian Regression

An Overview of Objective Bayesian Analysis

Bayesian Econometrics

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Mortgage Default. Zachary Kramer. Department of Statistical Science and Economics Duke University. Date: Approved: Merlise Clyde, Supervisor.

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Finite Population Estimators in Stochastic Search Variable Selection

Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

An Introduction to Bayesian Linear Regression

Divergence Based priors for the problem of hypothesis testing

Bayesian Inference for Normal Mean

Or How to select variables Using Bayesian LASSO

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Subject CS1 Actuarial Statistics 1 Core Principles

1 Hypothesis Testing and Model Selection

Machine Learning using Bayesian Approaches

A Note on Hypothesis Testing with Random Sample Sizes and its Relationship to Bayes Factors

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Robust Bayesian Simple Linear Regression

BAYES AND EMPIRICAL-BAYES MULTIPLICITY ADJUSTMENT IN THE VARIABLE-SELECTION PROBLEM. Department of Statistical Science, Duke University

Using Bayes Factors for Model Selection in High-Energy Astrophysics

Bayesian Linear Regression

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Bayesian Linear Models

arxiv: v1 [stat.me] 30 Aug 2018

New Bayesian methods for model comparison

Bayesian Model Averaging with BMS

Penalized Loss functions for Bayesian Model Choice

Sparse Linear Models (10/7/13)

Bayesian Linear Models

Bayesian Model Diagnostics and Checking

Introduction into Bayesian statistics

A Bayesian perspective on GMM and IV

Package horseshoe. November 8, 2016

A Bayesian Treatment of Linear Gaussian Regression

A Very Brief Summary of Bayesian Inference, and Examples

Overall Objective Priors

Bayesian Methods in Multilevel Regression

Methods and Tools for Bayesian Variable Selection and Model Averaging in Univariate Linear Regression

David Giles Bayesian Econometrics

Hierarchical Linear Models

Other Noninformative Priors

Introduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department

(1) Introduction to Bayesian statistics

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Bayesian Statistics Adrian Raftery and Jeff Gill One-day course for the American Sociological Association August 15, 2002

A Discussion of the Bayesian Approach

Model Averaging (Bayesian Learning)

Part 4: Multi-parameter and normal models

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM

DECOUPLING SHRINKAGE AND SELECTION IN BAYESIAN LINEAR MODELS: A POSTERIOR SUMMARY PERSPECTIVE. By P. Richard Hahn and Carlos M.

Forecast combination and model averaging using predictive measures. Jana Eklund and Sune Karlsson Stockholm School of Economics

On the Effect of Prior Assumptions in Bayesian Model Averaging with Applications to Growth Regression

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Module 11: Linear Regression. Rebecca C. Steorts

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15

Bayes Factors for Grouped Data

Bayesian variable selection in high dimensional problems without assumptions on prior model probabilities

Transcription:

Bayesian Model Averaging Hoff Chapter 9, Hoeting et al 1999, Clyde & George 2004, Liang et al 2008 October 24, 2017

Bayesian Model Choice Models for the variable selection problem are based on a subset of the X 1,... X p variables Encode models with a vector γ = (γ 1,... γ p ) where γ j {0, 1} is an indicator for whether variable X j should be included in the model M γ. γ j = 0 β j = 0 Each value of γ represents one of the 2 p models. Under model M γ : Y α, β, σ 2, γ N(1α + X γ β γ, σ 2 I) Where X γ is design matrix using the columns in X where γ j = 1 and β γ is the subset of β that are non-zero.

Posterior Probabilities of Models Posterior model probabilities p(m j Y) = p(y M j)p(m j ) j p(y M j)p(m j ) Marginal likelihod of a model is proportional to p(y M γ ) = p(y α, β γ, σ 2 )p(β γ γ, σ 2 )p(α, σ 2 γ)dβ dα, dσ Bayes Factor BF[i : j] = p(y M i )/p(y M j ) P(M i Y) P(M j Y) = p(y M i) p(y M j ) P(M i) P(M j ) Posterior Odds = Bayes Factor Prior odds Probability β j 0: M j :β j 0 p(m j Y) (marginal posterior inclusion probability)

Prior Distributions Bayesian Model choice requires proper prior distributions on regression coefficients (exception parameters that are included in all models) otherwise marginal likelihoods are ill-determined Vague but proper priors may lead to paradoxes! Conjugate Normal-Gammas lead to closed form expressions for marginal likelihoods, Zellner s g-prior is one of the most popular.

Zellner s g-prior within Models Centered model: Y = 1 n α + X c γβ γ + ϵ Common parameters p(α, ϕ) ϕ 1 Model Specific parameters β γ α, ϕ, γ N(0, gϕ 1 (X c γ X c γ) 1 ) Marginal likelihood of M γ is proportional to p(y M γ ) = C(1 + g) n p 1 2 (1 + g(1 R 2 γ)) (n 1) 2 where R 2 γ is the usual R 2 for model M γ and C is a constant that is p(y M 0 ) (model with intercept alone) uniform distribution over space of models p(m γ ) = 1/(2 p )

USair Data: Enumeration of All Models # library(devtools) # suppressmessages(install_github("merliseclyde/bas")) # c library(bas) poll.bma = bas.lm(log(so2) ~ temp + log(firms) + log(popn) + wind + precip+ rain, data=usair, prior="g-prior", alpha=41, # g = n n.models=2^7,# enumerate (can omit) modelprior=uniform(), method="deterministic") # fast enumerat

Problem with g-prior with arbitrary g The Bayes factor for comparing M γ to the null model: Let g be a fixed constant and take n fixed. Let F = R 2 γ/p γ (1 R 2 γ)/(n 1 p γ) As R 2 γ 1, F LR test would reject H 0 where F is the usual F statistic for comparing model M γ to M 0 Bayes Factor would go to (1 + g) (n pγ 1)/2 as F (bounded for fixed g, n and p γ Bayes and Frequentist would not agree in this limit Information paradox

Resolution of Paradox Liang et al (2008) show that paradox can be resolved with mixtures of g-priors p(β γ ϕ) = 0 N(β γ ; 0, g(x T γx γ ) 1 /ϕ)p(g)dg BF if R 2 1 E g [(1 + g) pγ/2 ] diverges Zellner-Siow Cauchy prior 1/g Gamma(1/2, n/2) Hyper-g p(g) (1 + g) a/2 1 if 2 < a 3 hyper-g/n g 1 + g Beta(1, a 2 1) robust prior (Bayarri et al Annals of Statistics 2012)

Example library(bas) poll.zs = bas.lm(log(so2) ~ temp + log(firms) + log(popn) + wind + precip+ rain, data=usair, prior="jzs", #Jeffreys Zellner-Siow n.models=2^7,# enumerate (can omit) modelprior=uniform(), method="deterministic") # fast enumerat use prior = hyper-g and a = 3 for hyper-g or prior = hyper-g/n and a=3 for hyper-g/n

plot(poll.zs, which=4) Marginal Inclusion Probability 0.0 0.2 0.4 0.6 0.8 1.0 Inclusion Probabilities Intercept temp log(firms) log(popn) wind precip rain bas.lm(log(so2) ~ temp + log(firms) + log(popn) + wind + precip + rain)

Bayesian Model Averaging Posterior for µ = 1α + Xβ is a mixture distribution p(µ Y) = p(µ Y, M γ )p(m γ Y) with expectation expressed as a weighted average E[µ Y] = 1ˆα + X E[β Y, M γ ]p(m γ Y) Predictive Distribution for Y p(y Y) = p(y Y, M γ )p(m γ Y) Posterior Distribution of β j p(β j Y) = p(γ j = 0 Y)δ 0 (β)+ p(β j Y, M γ )γ j p(m γ Y)

Estimator Find ˆµ that minimizes posterior expected loss E[(µ ˆµ) T (µ ˆµ) Y] Solution is posterior mean under BMA E[µ Y] = 1ˆα + X E[β Y, M γ ]p(m γ Y) If one model has probability 1, then BMA is equivalent to using the highest posterior probability model incorporates estimates from other models when there is substantial uncertainty

Coefficients under BMA beta.zs = coef(poll.zs) beta.zs ## ## Marginal Posterior Summaries of Coefficients: ## ## Using BMA ## ## Based on the top 64 models ## post mean post SD post p(b!= 0) ## Intercept 3.153004 0.082496 1.000000 ## temp -0.054355 0.019464 0.974978 ## log(firms) 0.188818 0.167961 0.742681 ## log(popn) -0.031011 0.163396 0.330113 ## wind -0.118926 0.082304 0.794158 ## precip 0.010030 0.010754 0.620004 ## rain 0.001675 0.003843 0.349521

Posterior of Coefficients under BMA temp log(firms) log(popn) 0.0 0.4 0.8 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.15 0.05 0.05 0.0 0.5 1.0 1.5 0.5 0.5 wind precip rain 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.4 0.2 0.0 0.2 0.04 0.00 0.04 0.02 0.00 0.02

Credible Intervals for Coefficients under BMA plot(confint(beta.zs, parm=2:7)) 0.6 0.4 0.2 β 0.0 0.2 0.4 0.6

Selection and Model Uncertainty Select a model and ˆµ that minimizes posterior expected loss E[(µ ˆµ) T (µ ˆµ) Y] BMA is best estimator without selection Best model and estimator is the posterior mean under the model that is closest to BMA under squared error loss (ˆµ BMA ˆµ Mγ ) T (ˆµ BMA ˆµ Mγ ) Often contains more predictors than the HPM or Median Probability Model

Best Predictive Model #BPM BPM = predict(poll.zs, estimator = "BPM") BPM$bestmodel ## [1] 0 1 2 4 5 6 (poll.zs$namesx[attr(bpm$fit, 'model') +1])[-1] ## [1] "temp" "log(firms)" "wind" "precip" #HPM HPM = predict(poll.zs, estimator = "HPM") HPM$bestmodel ## [1] 0 1 2 4 5

HPM MPM BPM BMA 0.75 0.50 0.25 0.00 3.5 3.0 2.5 3.5 3.0 2.5 3.5 3.0 2.5 Corr: 1 Corr: 1 Corr: 1 Corr: 0.991 Corr: 0.991 Corr: 0.992 HPM MPM BPM BMA 2.53.03.5 2.53.03.5 2.53.03.5 2.5 3.0 3.5

Summary BMA shown in practice to have better out of sample predictions than selection (in many cases) avoids selecting a single model and accounts for out uncertainty if one model dominates BMA is very close to selection (asymptotically will put probability one on model that is closest to the true model) MCMC allows one to implement without enumerating all models BMA depends on prior on coefficients, variance and models (sensitivity to choice?) Mixtures of g priors preferred to usual g prior but can use g = n