Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Size: px

Start display at page:

Download "Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson"

Mervin Phillips
5 years ago
Views:

1 Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1

2 Motivation big p, small n problems are ubiquitous in modern applications. We propose a new approach that: is computationally efficient has good theoretical properties makes new connections between Bayesian and penalized regression methods performs well in high dimensions. In addition, we extend this approach to the challenging problem of selecting an appropriate set of confounders in a health effects study. Brian Reich, NCSU Penalized credible regions 2

3 Variable selection for linear regression Linear regression model: y i N(x T i β, σ 2 ). n observations and p predictors. Only a subset of predictors are relevant. There are many existing approaches for the p << n case, including penalization and Bayesian methods. Fewer methods have been proposed for the ultra-high dimensional case with p >> n. Brian Reich, NCSU Penalized credible regions 3

4 Penalization methods Minimize: n i=1 (y i x T i β) 2 + λj(β). LASSO: J(β) = p j=1 β j Adaptive LASSO: J(β) = p j=1 β j / ˆβ j q Elastic Net, SCAD, OSCAR,... λ chosen by AIC, BIC, CV, GCV. Achieves selection by setting some β j exactly to zero. Brian Reich, NCSU Penalized credible regions 4

5 Bayesian variable selection Candidate models are indexed by δ = (δ 1,, δ p ) T, where δ j = { 1 if xj is included in the model, 0 if x j is excluded from the model. The most common prior over model space is δ j iid Bern(π). The regression coefficients included in the model typically have Gaussian priors. Brian Reich, NCSU Penalized credible regions 5

6 Bayesian variable selection Bayes rule then gives the posterior of δ. We can select the model with highest posterior probability, or select variables with large marginal inclusion probability P(δ j = 1 y). When p is large, computing the probability for all possible δ is impossible. Stochastic search (SSVS) can be used to estimate these probabilities by including the model indictor δ as an unknown parameter updated in MCMC. Brian Reich, NCSU Penalized credible regions 6

7 Lindley s Paradox Posterior model probabilities can be sensitive to priors. Consider y N(β 1, 1) to test H 0 : β 1 = 0 vs. H 1 : β 1 0. Posterior probability of the null y= 0 y= 2 y= 3 Posterior 95% credible set y= 0 y= 2 y= Prior SD Prior SD (a) Posterior Probability in favor of Null for various prior standard deviations. (b) 95% Posterior Credible Set for various prior standard deviations. Brian Reich, NCSU Penalized credible regions 7

8 Other drawbacks Bayesian variable selection using SSVS requires: Choice of prior on model space (inclusion probabilities) Proper prior distributions for model parameters (regression coefficients and variances) Posterior threshold for inclusion probabilities MCMC convergence can be slow. Marginal inclusion probabilities may be poor under high correlation because highly correlated predictors may each show up equally often, each only a small number of times Brian Reich, NCSU Penalized credible regions 8

9 Posterior credible regions are less sensitive to the prior Posterior probability of the null y= 0 y= 2 y= 3 Posterior 95% credible set y= 0 y= 2 y= Prior SD Prior SD (a) Posterior Probability in favor of Null for various prior standard deviations. (b) 95% Posterior Credible Set for various prior standard deviations. Brian Reich, NCSU Penalized credible regions 9

10 Using credible regions for variable selection Denote C α R p as the (1 α) 100% credible region for β. We could consider any β C α as a feasible estimate. With p = 1, we remove x 1 if 0 C α. Generalizing to higher dimensions, we could select the element of C α with the most zeros. This is the sparsest feasible value. The following examples plot C α with p = 2 predictors. Brian Reich, NCSU Penalized credible regions 10

11 Drop both X 1 and X 2 β 2 β 1 Brian Reich, NCSU Penalized credible regions 11

12 Keep X 2, drop X 1 β 2 β 1 Brian Reich, NCSU Penalized credible regions 12

13 Keep both X 1 and X 2 β 2 β 1 Brian Reich, NCSU Penalized credible regions 13

14 Formalizing the approach Fit the full model using, say, priors β N(0, σ2 τ I p) and σ 2 = IG(0.01, 0.01). From this model fit, compute the elliptical credible region C α = {β : (β ˆβ) T Σ 1 (β ˆβ) C α } where ˆβ and Σ are the posterior mean and covariance. If τ is fixed, C α is the highest density region and ˆβ and Σ have closed forms (no MCMC). If τ has a prior, C α is still a valid region and a simple, short MCMC run gives ˆβ and Σ. Brian Reich, NCSU Penalized credible regions 14

15 Joint credible regions All points within the region may be feasible parameter values Among these, we seek a sparsest solution ˆβ = arg max β C α β 0 where β 0 is the number of non-zero elements in β. The chosen model for a given α is defined by the set of non-zero indices, A α n = {j : βj 0}. Brian Reich, NCSU Penalized credible regions 15

16 Problems with searching for sparsest solution The solution may not be unique, and requires a high-dimensional combinatorial search. Replace (Lv and Fan, 2009) L 0 by smooth bridge between L 0 and L 1, p j=1 ρ a( β j ), where for t [0, ) ρ a (t) = This has limits (a + 1)t a + t = ( ) ( ) t a I (t 0) + t. a + t a + t ρ 0 (t) = I (t 0) and ρ (t) = t. a 0 is the most relevant for our setting. Brian Reich, NCSU Penalized credible regions 16

17 Computation This penalty function is non-convex, so we consider a local linear approximation about the posterior mean ˆβ with ρ a( ˆβ j ) = ρ a ( β j ) ρ a ( ˆβ j ) + ρ a( ˆβ ( j ) β j ˆβ ) j, a(a+1) (a+ ˆβ j ) 2. Using Lagrangian gives β = arg min (β ˆβ) T Σ 1 (β ˆβ) + λ α p β j ( a + ˆβ j j=1 ) 2. There is a one-to-one correspondence between λ α and α. Brian Reich, NCSU Penalized credible regions 17

18 Computation For a 0, optimization becomes β arg min (β ˆβ) T Σ 1 (β ˆβ) + λ α This has the adaptive LASSO form. p β j ˆβ j 2. j=1 In fact, with a flat prior for β, this is equivalent to the adaptive LASSO. The LARS algorithm gives the full path for varying α. Brian Reich, NCSU Penalized credible regions 18

19 Selection consistency There is a one-to-one correspondence between α n and C n, where the credible set is defined by (β ˆβ) T Σ 1 (β ˆβ) C n. Varying C n gives a sequence of models A α n n. The true model is denoted A. Theorem: Under general conditions, if C n and n 1 C n 0, then the credible set method is consistent in variable selection, i.e. P(A α n n = A) 1 Also holds for p, but with p/n 0. Brian Reich, NCSU Penalized credible regions 19

20 Selection consistency What about p >> n? The posterior mean has the ridge regression form ˆβ = ( X T X + τi ) 1 X T y. If p/n 0, it can shown that ˆβ is not mean square consistent, and thus the method is not selection consistent. Brian Reich, NCSU Penalized credible regions 20

21 Selection consistency Consider marginal posterior thresholding Rectangular credible regions - not elliptical Theorem: Let τ and τ = O ( (n 2 log p ) 1/3 ) then the posterior thresholding approach is consistent in selection when the dimension p satisfies p = O (exp{n c }) for some 0 c < 1. Selection consistency for exponential growing dimension, p = o(exp{n}). Also applies to ridge regression with ridge parameter τ. Brian Reich, NCSU Penalized credible regions 21

22 Simulation study Linear regression model with N(0, 1) errors. n = 60 observations and p {50, 500, 2000} predictors. The predictors are also standard normal with AR(1) correlation and lag-1 correlation ρ {0.5, 0.8}. β = (0 10, B 1, 0 20, B 2, 0 p40 ) T, where B 1 and B 2 are vectors of length 5 with elements drawn uniformly on (-5,5). Results are based on 200 datasets for each of the 6 setups. Brian Reich, NCSU Penalized credible regions 22

23 Simulation study Consider ordering of predictors induced by: Joint credible regions Marginal posterior thresholding Stochastic Search (with various choices of prior) (A)LASSO Two measures of the reliability of ordering: ROC curve - measures sensitivity vs. specificity - related to type I error. PRC (Precision-Recall) curve - related to false discovery rate. Brian Reich, NCSU Penalized credible regions 23

24 Simulation study with p = 50 and n = 60 ρ = 0.5 (Top) and ρ = 0.8 (Bottom) sensitivity Cred. Set SSVS (f, f) SSVS (r, r) precision Cred. Set SSVS (f, f) SSVS (r, r) one.minus.specificity recall sensitivity Cred. Set SSVS (f, f) SSVS (r, r) precision Cred. Set SSVS (f, f) SSVS (r, r) one.minus.specificity recall Brian Reich, NCSU Penalized credible regions 24

25 Simulation study with p = 500 and n = 60 ρ = 0.5 (Top) and ρ = 0.8 (Bottom) sensitivity Cred. Set SSVS (f, f) SSVS (r, r) precision Cred. Set SSVS (f, f) SSVS (r, r) one.minus.specificity recall sensitivity Cred. Set SSVS (f, f) SSVS (r, r) precision Cred. Set SSVS (f, f) SSVS (r, r) one.minus.specificity recall Brian Reich, NCSU Penalized credible regions 25

26 Results with p = 500, n = 60, and ρ = 0.8 Model PRC Area CPU time (sec) Joint regions (0.007) 21 Marginal regions (0.007) 21 SSVS (fixed, fixed) (0.010) 1223 SSVS (random, fixed) (0.009) 1223 SSVS (fixed, random) (0.010) 1223 SSVS (random, random) (0.009) 1223 Brian Reich, NCSU Penalized credible regions 26

27 Simulation study with p = 2000 and n = 60 ρ = 0.5 (Top) and ρ = 0.8 (Bottom) sensitivity Cred. Set Lasso precision Cred. Set Lasso one.minus.specificity recall sensitivity Cred. Set Lasso precision Cred. Set Lasso one.minus.specificity recall Brian Reich, NCSU Penalized credible regions 27

28 Real data analysis Mouse Gene Expression (Lan et al., 2006) n =60 arrays (31 female, 29 male mice) and p =22,690 genes. Screen via correlations to top 2000 genes Fit with n = 55, leave out 5 for testing. Repeat 100 splits Response Model MSPE Model size PEPCK Joint regions 1.63 (0.10) 24.7 (1.19) Lasso 1.91 (0.13) 48.7 (1.15) GPAT Joint regions 3.44 (0.35) 8.56 (0.44) Lasso 3.52 (0.27) 53.8 (0.44) SCD1 Joint regions 1.84 (0.19) 37.6 (0.55) Lasso 1.64 (0.17) 51.1 (0.94) Brian Reich, NCSU Penalized credible regions 28

29 Extension to confounder adjustment Consider the linear model Y = β 0 + Xβ x + Zβ z + ϵ y. We are interested in the effect of the exposure X on the outcome Y. Z is a set of potential confounding variables, and are not of interest. For example, X is an environmental exposure and Z are other factors such as SES. Brian Reich, NCSU Penalized credible regions 29

30 Confounder adjustment Often there are many potential confounders. If confounders are omitted, ˆβ x may be biased. Including too many covariates increases the variance of ˆβ x. The ideal model includes all confounders (variables correlated with X and Y) and all additional explanatory variables (correlated with Y). Brian Reich, NCSU Penalized credible regions 30

31 Previous approaches to confounder adjustment Crainiceanu et al (2008) proposed a two-step approach, fitting an exposure model to identify variables correlated with X X = γ 0 + Zγ z + ϵ x, where ϵ x N(0, σ 2 xi ), (1) and outcome model to identify additional covariates Y = β 0 + Xβ x + Zβ z + ϵ y, where ϵ y N(0, σ 2 y I ). (2) Bayesian adjustment for confounding (BAC; Wang et al, 2012) combined these two steps and simultaneously fit the two models using SSVS. Covariates included in (1) are automatically included in (2). Brian Reich, NCSU Penalized credible regions 31

32 Our approach to confounder adjustment We separately fit the exposure and outcome models Y = β 0 + Xβ x + Zβ z + ϵ y and X = γ 0 + Zγ z + ϵ x. Let C β α and C γ α be (1 α) 100% posterior credible regions of β = (β 0, β x, β u ) and γ = (γ 0, γ z ). We wish to identify the set of variables A = {j β j 0} {j γ j 0}. The proposed estimate is β = arg max β C β α,γ C γ α ϵ(β β 2 x ) + A where ϵ is fixed and small. Brian Reich, NCSU Penalized credible regions 32

33 Penalized regression reformulation Using an approximation similar to the regression case gives β = arg max(β ˆβ) T Σ 1 Y (β ˆβ) + λ β p β j ( ˆβ ) 2. j + ˆγ j j=1 This resembles the adaptive lasso with weights ˆβ j + ˆγ j. These weights reduce the penalty on coefficients for variables that appear to be important in either the response or confounder regression. Brian Reich, NCSU Penalized credible regions 33

34 Oracle properties (fixed p) Theorem: Under general conditions for the linear model, if λ n / n 0 and λ n n 1/2 the penalized credible region confounder method is consistent in variable selection and asymptotically normal. This implies that the true model will be contained in the solution path obtained by varying λ with probability tending to one. Brian Reich, NCSU Penalized credible regions 34

35 Simulation We use the design of Wang et al (2012). Y N(Wβ, I ). W = (X, Z 1,..., Z 57 ). W i be independent N(0, Σ). Σ = {σ ij } with diagonals σ jj = 1 and σ jk = 0.7 j+k2 for j k = 0,..., 7 β x = β 1 = = β 14 = 0.1 and β 15 = = β 57 = 0. Brian Reich, NCSU Penalized credible regions 35

36 Simulation We compare 3 priors of the form β j N(0, σ 2 y /τ y ) (and analogously for γ j ) Flat: τ = 0. [ Empirical Bayes: ˆτ y = ˆσ y 2 / (p + 1) 1 p ˆβ ] j=0 j 2. Full Bayes: τ y Ga(0.001, 0.001). σ 2 y and σ 2 x have Gamma(0.001, 0.001) priors. For flat and empirical Bayes priors no MCMC is required. All models with elliptical credible regions can be fit with the LARS package in R. Brian Reich, NCSU Penalized credible regions 36

37 Simulation We compare with: True model: Y = Xβ x + 14 j=1 Z jβ j + ϵ. Full model: Y = Xβ x + 57 j=1 Z jβ j + ϵ. Bayesian model averaging () with noninformative priors on β. Bayesian adjustment for confounding (BAC) method of Wang et al (2012). Adaptive LASSO. Brian Reich, NCSU Penalized credible regions 37

38 Simulation Results - Bias Mean Bias, n=60 Mean Bias, n= True Full (flat) (EB) (Gamma) Mean Bias, n=200 BAC Adpt. LASSO True Full (flat) (EB) (Gamma) Mean Bias, n=500 BAC Adpt. LASSO True Full (flat) (EB) (Gamma) BAC Adpt. LASSO True Full (flat) (EB) (Gamma) BAC Adpt. LASSO Brian Reich, NCSU Penalized credible regions 38

39 Simulation Results - MSE Mean MSE, n=60 Mean MSE, n= True Full (flat) (EB) (Gamma) Mean MSE, n=200 BAC Adpt. LASSO True Full (flat) (EB) (Gamma) Mean MSE, n=500 BAC Adpt. LASSO True Full (flat) (EB) (Gamma) BAC Adpt. LASSO True Full (flat) (EB) (Gamma) BAC Adpt. LASSO Brian Reich, NCSU Penalized credible regions 39

40 Simulation Results - Computing time Mean CPU Time (seconds), n=60 Mean CPU Time (seconds), n= True Full (flat) (EB) (Gamma) Mean CPU Time (seconds), n=200 BAC Adpt. LASSO True Full (flat) (EB) (Gamma) Mean CPU Time (seconds), n=500 BAC Adpt. LASSO True Full (flat) (EB) (Gamma) BAC Adpt. LASSO True Full (flat) (EB) (Gamma) BAC Adpt. LASSO Brian Reich, NCSU Penalized credible regions 40

41 Analysis of air pollution data We estimate the effect of PM 2.5 birth weight in Mecklenburg County, NC. Data from the State Center for Health Statistics includes birth records and covariates for Potential confounders include: season of birth, mean temperature in first trimester, mean dewpoint in first trimester, and several maternal health and SES variables. We used the credible regions method with empirical Bayes priors. Brian Reich, NCSU Penalized credible regions 41

42 Estimated PM effect Coefficient for PM Cred Reg ALASSO OLS OLS PM only Cred Reg ALASSO OLS OLS PM only Cred Reg ALASSO OLS OLS PM only Hispanic Black White The five methods give different results. The credible region estimates are similar to OLS. Brian Reich, NCSU Penalized credible regions 42

43 Confounder selection Mean Dewpoint in First Trimester Mean Temperature in First Trimester Inclusion Probability Inclusion Probability Cred Reg ALASSO OLS OLS PM only Cred Reg ALASSO OLS OLS PM only Cred Reg ALASSO OLS OLS PM only Cred Reg ALASSO OLS OLS PM only Cred Reg ALASSO OLS OLS PM only Cred Reg ALASSO OLS OLS PM only Hispanic Black White Hispanic Black White Birth in Fall Birth in Spring Inclusion Probability Cred Reg ALASSO OLS OLS PM only Cred Reg ALASSO OLS OLS PM only Cred Reg ALASSO OLS OLS PM only Cred Reg ALASSO OLS OLS PM only Cred Reg ALASSO OLS OLS PM only Inclusion Probability Cred Reg ALASSO OLS OLS PM only Hispanic Black White Hispanic Black White Brian Reich, NCSU Penalized credible regions 43

44 Summary We have proposed new approaches for variable and confounder selection. The methods are computationally efficient, theoretically justified, and have good empirical performance. References: Bondell, Reich. Consistent high-dimensional Bayesian variable selection via penalized credible regions. JASA. Wilson, Reich. Confounder selection via penalized credible regions. Submitted. Support: EPA - R835228, NIH - 5R01ES , NSF Thanks! Brian Reich, NCSU Penalized credible regions 44

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable