Model Choice. Hoff Chapter 9. Dec 8, 2010

Size: px

Start display at page:

Download "Model Choice. Hoff Chapter 9. Dec 8, 2010"

Rafe Simon
6 years ago
Views:

1 Model Choice Hoff Chapter 9 Dec 8, 2010

2 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model Averaging

3 Variable Selection Reasons for reducing the number of variables in the model: Philosophical Avoid the use of redundant variables (problems with interpretations) KISS Occam s Razor Practical Inclusion of un-necessary terms yields less precise estimates, particularly if explanatory variables are highly correlated with each other

4 Variable Selection Procedures Stepwise Regression: Forward, Stepwise, Backward add/delete variables until all t-statistics are significant (easy, but not recommended) Use a Model Selection Criterion to pick the best model R2 (picks largest model) Adjusted R2 Mallow s Cp Cp = (SSE/ˆσ Full 2 ) + 2p m n AIC (Akaike Information Criterion) proportional to Cp for linear models BIC(m) (Bayes Information Criterion) ˆσ 2 m + log(n)p m Trade off model complexity (number of coefficients p m ) with goodness of fit ( ˆσ 2 m)

5 Model Selection Selection of a single model has the following problems When the criteria suggest that several models are equally good, what should we report? Still pick only one model? What do we report for our uncertainty after selecting a model? Typical analysis ignores model uncertainty!

6 Bayesian Model Choice Models for the variable selection problem are based on a subset of the X 1,... X p variables Encode models with a vector γ = (γ 1,... γ p ) where γ j {0, 1} is an indicator for whether variable X j should be included in the model M γ. γ j = 0 β j = 0 Each value of γ represents one of the 2 p models. Under model M γ : Y β, σ 2, γ N(X γ β γ, σ 2 I ) Where X γ is design matrix using the columns in X where γ j = 1 and β γ is the subset of β that are non-zero.

7 Bayesian Model Averaging Rather than use a single model, BMA uses all (or potentially a lot) models, but weights model predictions by their posterior probabilities (measure of how much each model is supported by the data) Posterior model probabilities p(m j Y) = p(y M j)p(m j ) j p(y M j)p(m j ) Marginal likelihod of a model is proportional to p(y M γ ) = p(y β γ, σ 2 )p(β γ γ, σ 2 )p(σ 2 γ)dβ dσ 2 Probability β j 0: M j :β j 0 p(m j Y) Predictions ˆ Y Y = j p(m j Y)ŶM j

8 Prior Distributions Bayesian Model choice requires proper prior distributions on regression coefficients Vague but proper priors may lead to paradoxes! Conjugate Normal-Gammas lead to closed form expressions for marginal likelihoods, Zellner s g-prior is the most popular.

9 Zellner s g-prior Centered model: Y = 1 n α + X c β + ɛ where X c is the centered design matrix where all variables have had their mean subtracted p(α) 1 p(σ 2 ) 1/σ 2 β γ σ 2 γ N(0, gσ 2 (X c X c ) 1 ) take g = n which leads to marginal likelihood of M γ that is proportional to p(y M γ ) = C(1 + g) n p 1 2 (1 + g(1 Rγ 2 (n 1) )) 2 where R 2 is the usual R 2 for model M γ. Trade-off of model complexity versus goodness of fit Lastly, assign uniform distribution to space of models

10 USair Data library(bas) poll.bma = bas.lm(log(so2) ~ temp + log(firms) + log(popn) + wind + precip+ rain, data=pollution, prior="g-prior", alpha=41, n.models=2^7, update=50, initprobs="uniform") par(mfrow=c(2,2)) plot(poll.bma, ask=f)

11 Plots Residuals vs Fitted Model Probabilities Residuals Cumulative Probability Predictions under BMA Model Search Order Model Complexity Inclusion Probabilities log(marginal) Marginal Inclusion Probability Model Dimension Intercept temp log(firms) log(popn) wind precip rain

12 Model Space image(poll.bma) Intercept temp log(firms) log(popn) wind precip rain Log Posterior Odds Model Rank

13 Coefficients beta = coef(poll.bma) par(mfrow=c(2,3)); plot(beta, subset=2:7,ask=f) temp log(firms) log(popn) wind precip rain

14 Mortality & Pollution Data from Statistical Sleuth cities response Mortality measures of HC, NOX, SO2 Is pollution associated with mortality after adjusting for other socio-economic and meteorological factors? 15 predictor variables implies 2 15 = 32, 768 possible models

15 5 Model Dimension DENSITY Model Complexity Inclusion Probabilities logso loghc lognox Model Search Order POOR Predictions under BMA WHITECOL NONWHITE EDUC 5000 SOUND HOUSE Residuals Cumulative Probability Residuals vs Fitted OVER JANTEMP JULYTEMP 25 4 PRECIP HUMIDITY Intercept 10 log(marginal) 5 Marginal Inclusion Probability Posterior Distributions 60 Model Probabilities 30000

16 Model Space Intercept PRECIP HUMIDITY JANTEMP JULYTEMP OVER65 HOUSE EDUC SOUND DENSITY NONWHITE WHITECOL POOR loghc lognox logso2 Log Posterior Odds Model Rank

17 Coefficients Intercept PRECIP HUMIDITY JANTEMP JULYTEMP OVER65 HOUSE EDUC

18 Coefficients SOUND DENSITY NONWHITE WHITECOL POOR loghc lognox logso

19 Posterior Probabilities What is the probability that there is no pollution effect? Sum posterior model probabilities over all models that include no pollution variables which.mat = list2matrix.which(mort.bma,1:(2^15)) poll.in = (which.mat[, 14:16] %*% rep(1, 3)) == 0 sum(poll.in * mort.bma$postprob) Posterior probability is Odds that there is an effect (1.011)/(.011) = 88.6 Prior Odds 7.9 = (1.2 3 )/.2 3 Bayes Factor for a pollution effect 88.6/7.9 = 11.2 Bayes Factor for an NOX 0.918/( ) = 11.2 Bayes Factors are not monotonic!

20 Problems if p > enumeration is difficult Gibbs sampler on γ poor convergence/mixing with high correlations Metropolis Hastings algorithms more flexibility Stochastic Search (no guarantee samples represent posterior) in BMA all variables are included, but coefficients are shrunk to 0; alternative is to use Shrinkage methods Choice of prior distributions on β and on γ Model averaging versus Model Selection what are objectives?

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015 Model Choice Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci October 27, 2015 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model