Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors

Size: px

Start display at page:

Download "Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors"

Duane Quinn
5 years ago
Views:

1 The Canadian Journal of Statistics Vol. xx No. yy 0?? Pages?? La revue canadienne de statistique Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors Aixin Tan * and Jian Huang Department of Statistics and Actuarial Science University of Iowa. THE MNET PENALTY FUNCTION AND ITS SPECIAL CASES We mentioned five penalty functions in Table of the main text. They are the ridge the lasso the enet the mcp and the mnet penalty functions. The first four are indeed special cases of the mnet penalty function and Figure provides one way to display their relationship. λ λ mnet λ 0 ridge λ lasso enet mc mnet Figure : Relationship between the five penalty functions and the corresponding priors. FIGURE : Relationship between the five penalty functions and the corresponding priors. 3. The Bayesian mnet model Let Nt µ V denote the multivariate normal density with mean µ and variance covariance. CORRESPONDENCE matrix V evaluated at t. TheBETWEEN general form of THE a BVS BAYESIAN model is given MNET by PRIOR AND ITS PENALIZED REGRESSION fy β γσ COUNTERPART λwny X γ β γ σ I n 3a Besides the formal correspondence fβ γσ λwπ between the q j γ prior j f f λσ λσ β β j + j andγthe j δ 0 penalty function 3b p ; λ fσ λw σ 3c f λσ β fγ w j σ βj f c λ w γ exp. p σ ; λ 3d Here a q-dim binary vector γ γ...γ q 0 q : Γ indicates a selected set of predictors and β γ denotes the subvector of coefficients for the predictors selected by γ. Prior of the * Author BVSto model whomare correspondence specified in 3b may be 3d. addressed. aixin-tan@uiowa.edu In this BVS model 3c specifies a uniform prior on log σ a typical choice of non-informative prior for these parameters in linear regression models. And 3d specifies a prior on the indicator vector Statistical γ. A simple Society choice of Canada is the independent / Société statistique Bernoullidu distribution Canada with success probability c 0?? w. We consider w a hyperparameter and discuss CJS??? ways to specify its value in the next section. We mention that there are other dependent priors proposed for γ that could take advantage of known structure of the predictors when that information is available. See for example Chip-

2 AIXIN TAN AND JIAN HUANG Vol. xx No. yy the Bayesian posterior and the penalized regression PR solution are related in the following way. When conditional on a fixed σ λ the posterior distribution of β is given by πβ Y σ Y Xβ β λ exp σ p mn σ ; λ λ exp n [ ] Y Xβ σ + σ β n n p mn σ ; λ λ exp n [ Y Xβ σ + p mn β; λ σ n n λ n ] n the last equality follows from the fact that for any a b > 0 t a p mc b ; λ p mc t; aλ b a b and a λ t aλ b b t. Hence the mode of the above posterior density coincides with the solution to the mnet PR model with penalty parameter λ λ λ σ n λ n. n Recall that the mnet penalty function p mn ; λ reduces to the normal the lasso and the enet penalty functions under special choices of λ λ λ. Naturally we will refer to the corresponding priors in in these special cases as the normal the lasso and the enet priors respectively. Comparing to existing literature the way in which σ enters our definition of the lasso prior agrees with that of the double exponential prior of Park & Casella 008. And our definition of the enet prior and the mnet prior can be considered a natural extension to this lasso prior. But our definition of the enet prior is different from that of Hans 0 and Li & Lin 00. Specifically the conditional prior for β j in their Bayesian enet model is f en β j λ λ σ exp σ λ β j + λ βj hence the mode of its conditional posterior density π enβ Y σ λ is the naive enet solution with penalty parameter λ λ λ /n λ /n which unlike is free of σ. Hence their Bayesian enet model can not be re-parameterized to match ours. It is beyond the scope of this paper to study how the two versions of the Bayesian enet models compare as we will focus on models allowed within the Bmnet framework defined in sec. 3 of the paper. 3. QUANTITIES NEEDED IN THE BLOCK GIBBS SAMPLER Recall in sec. 4 of the main text integrals I I and I 3 are needed in updating β j γ j for j... p. We have I λ σ 0 λ σ 0 exp nt σ + XT j r jt σ λ t σ λ σ t exp A t + B t A n + λ σ and B XT j r j σ λ σ.

3 0?? 3 Note that A > 0 because /n corresponds to the parameter in the classical mcp or mnet penalty function and is always set below. Therefore I can be expressed in terms of the pdf and the cdf of certain normal distributions: B λ σ I exp exp A 0 A t B Φ 0 λ σ ; B A A A / φ 0; B A Φa b; µ v represents the probability that a Normal random variable with mean µ and variance v falls in the interval a b. Similarly we have Finally we have I 3 I λ σ 0 λ σ 0 Φ λ σ exp nt σ + XT j r jt σ + λ t σ λ σ t exp A t + B t λ σ 0 ; B A A A n + λ σ A / φ 0; B A A and B XT j r j σ + λ σ. + exp nt λ σ σ + XT j r jt σ λ λ λ σ exp λ exp λ λ σ [ Φ λ σ ; B 3 + exp λ σ + exp λ σ + Φ n + λ σ λ σ ; B 3 σ t t + XT j r j σ t t + B 3 t ] / φ 0; B 3 A n + λ σ and B 3 XT j r j σ. Further the conditional distribution of β j given the others can be expressed as a mixture of three truncated normal distributions B πβ j γ β j Y I TN A B I TN A ; λ σ A 0 ; 0 λ σ A + I 3 TN + B3 ; λ σ λ

4 4 AIXIN TAN AND JIAN HUANG Vol. xx No. yy TN µ v; a b stands for the truncated normal density with mean µ standard deviation v and support [a b]. 4. SUMMARY STATISTICS FOR THE SIMULATION STUDIES IN SEC. 5.. This section complements the graphical comparisons of different methods in sec. 5. with numerical ones. The tables below provide the median of the prediction mean squared error pmse the false discoveries FD the false negatives FN and the number of variables selected NVS of several methods under different combinations of correlation strength ρ and signal to noise ratio s. Recall from sec. 5 that at each ρ s 00 datasets are randomly generated each contains n 00 observations and q 50 predictors while the number of true predictors is 0. The bootstrap standard error is reported for the median pmse. Specifically we draw B 000 samples with replacement each of size 00 from the 00 pmse values obtained in the simulation study. We calculate their sample medians m m B and report their standard deviation as an approximation to the standard error of the median pmse. TABLE : Under two simulation setups of ρ 0.3 and ρ 0.9 with signal to noise ratio fixed at s this table provides median of the prediction mean squared error pmse the false discoveries FD the false negatives FN and the number of variables selected NVS of several methods. For each setup 00 datasets were randomly generated each contains n 00 observations and q 50 potential predictors while the number of true predictors is 0. Means of the 00 repetitions yield similar results and hence are not shown. ρ 0.3 pmse std. err FD FP NVS ρ 0.9 pmse std. err FD FP NVS

5 0?? 5 ρ 0.3 TABLE : Medians over 00 replications of various statistics for different methods at ρ.3 and.9. Signal to noise ratio is fixed at s 4. pmse std. err FD FP NVS ρ 0.9 pmse std. err FD FP NVS ρ 0.3 TABLE 3: Medians over 00 replications of various statistics for different methods at ρ.3 and.9. Signal to noise ratio is fixed at s 8. pmse std. err FD FN NVS ρ 0.9 pmse std. err FD FN NVS BIBLIOGRAPHY Hans C. 0. Elastic net regression modeling with the orthant normal prior. Journal of the American Statistical Association Li Q. & Lin N. 00. The Bayesian elastic net. Bayesian Analysis Park T. & Casella G The Bayesian LASSO. Journal of the American Statistical Association

Bayesian inference for high-dimensional linear regression under mnet priors

The Canadian Journal of Statistics Vol. xx, No. yy, 20??, Pages 1?? La revue canadienne de statistique 1 Bayesian inference for high-dimensional linear regression under mnet priors Aixin Tan 1 * and Jian