Generalized Linear Models and Its Asymptotic Properties

Similar documents
Consistent high-dimensional Bayesian variable selection via penalized credible regions

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

High-dimensional Ordinary Least-squares Projection for Screening Variables

Feature selection with high-dimensional data: criteria and Proc. Procedures

Chapter 3. Linear Models for Regression

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bi-level feature selection with applications to genetic association

Learning Bayesian Networks for Biomedical Data

VARIABLE SELECTION FOR ULTRA HIGH DIMENSIONAL DATA. A Dissertation QIFAN SONG

Sparse Linear Models (10/7/13)

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Semi-Penalized Inference with Direct FDR Control

Or How to select variables Using Bayesian LASSO

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Regularization Paths

Linear Regression Models P8111

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Machine Learning for OR & FE

Some models of genomic selection

Learning with Sparsity Constraints

Lecture 14: Variable Selection - Beyond LASSO

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

Bayesian shrinkage approach in variable selection for mixed

Equivalent Partial Correlation Selection for High Dimensional Gaussian Graphical Models

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces

Generalized Elastic Net Regression

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Linear Regression (1/1/17)

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Inference

Bayesian linear regression

On High-Dimensional Cross-Validation

ESL Chap3. Some extensions of lasso

Bayesian Regression (1/31/13)

MS-C1620 Statistical inference

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Variable selection for model-based clustering

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

High dimensional Ising model selection

Sparsity Regularization

arxiv: v1 [stat.me] 30 Dec 2017

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

The Iterated Lasso for High-Dimensional Logistic Regression

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Building a Prognostic Biomarker

Regularization Paths. Theme

Linear Regression. Volker Tresp 2018

Group exponential penalties for bi-level variable selection

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Regularization and Variable Selection via the Elastic Net

Penalized methods in genome-wide association studies

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Bayesian Grouped Horseshoe Regression with Application to Additive Models

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

Regression, Ridge Regression, Lasso

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

Marginal Screening and Post-Selection Inference

The lasso: some novel algorithms and applications

Fast Regularization Paths via Coordinate Descent

Towards a Regression using Tensors

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Permutation-invariant regularization of large covariance matrices. Liza Levina

Applied Machine Learning Annalisa Marsico

Forward Regression for Ultra-High Dimensional Variable Screening

Regularization Path Algorithms for Detecting Gene Interactions

FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL

Variable Selection for Highly Correlated Predictors

Introduction to Machine Learning

Bayesian Regression Linear and Logistic Regression

MSA220/MVE440 Statistical Learning for Big Data

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

Feature Screening in Ultrahigh Dimensional Cox s Model

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

When is MLE appropriate

Generalized Linear Models (1/29/13)

Chapter 7: Model Assessment and Selection

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Iterative Selection Using Orthogonal Regression Techniques

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Transcription:

for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L

Abstract Literature Review In this talk, we present a new prior setting for high dimensional generalized linear models (GLMs), which leads to a Bayesian subset regression (BSR) with the maximum a posteriori model coinciding with the minimum extended BIC model. Under mild conditions, we establish consistency of the posterior. Further, we propose a variable screening procedure based on the marginal inclusion probability and show that this procedure shares the same properties of sure screening and consistency with the existing sure independence screening (SIS) procedure. However, since the proposed procedure makes use of joint information of all predictors, it generally outperforms SIS and its improved version in real applications. Our numerical results indicate that BSR can generally outperform the penalized likelihood methods, such as Lasso, elastic net, SIS and ISIS. The models selected by BSR tend to be sparser and, more importantly, of higher generalization ability. In addition, we find that the performance of the penalized likelihood methods tend to deteriorate as the number of predictors increases, while this is not significant for BSR. for High Dimensional Generalized L

The problem Literature Review Let D n = {(x i, y i ) : x i R Pn, y i R, i = 1,..., n} denote a data set of n pairs of P n predictors and a response, where P n can increase with the sample size n. Suppose that the data can be modeled by a generalized linear model (GLM): and Y has the density given by η = g(µ) = β 0 + β 1 x 1 +... + β Pn x Pn, (1) j ff yθ b(θ) f (y θ) = exp + c(y, φ). (2) a(φ) Under the situation of small-n-large-p, variable selection can pose a great challenge on the existing statistical methods. for High Dimensional Generalized L

Penalized Likelihood Approach Penalized Likelihood Methods Bayesian Approach The penalized likelihood approach is to find a model, i.e., a subset of {x 1,..., x Pn }, to minimize a penalized likelihood function of the form nx log f (y i θ) + p λ (β), (3) i=1 where p λ ( ) is the penalty function, λ is a tunable scale parameter, and β = (β 0, β 1,..., β Pn ) is the vector of regression coefficients. for High Dimensional Generalized L

Penalized Likelihood Methods Bayesian Approach Penalized Likelihood Approach Different choices of p λ ( ) lead to different methods: Lasso (Tibshirani, 1996): L 1 norm. elastic net (Zou and Hastie, 2005): A linear combination of L 1 and L 2 norms. SCAD (Fan and Li, 2001), MCP (Zhang, 2009): Concave penalty norms. A nice property shared by these penalty norms is that they are singular at zero and thus can induce sparsity of the model by increasing λ. In practice, λ can be determined through a cross-validation procedure. for High Dimensional Generalized L

Penalized Likelihood Methods Bayesian Approach Penalized Likelihood Approach: Marginal utility-based Sure independence screening (SIS) (Fan and Lv, 2008; Fan and Song, 2010): SIS first ranks predictors according to their marginal utility, namely, each predictor is used independently as a predictor to decide its usefulness for predicting the response, then selects a subset of predictors with the marginal utility exceeding a predefined threshold, and finally refine the model using Lasso or SCAD. Fan and Song (2010) suggested to measure the marginal utility for GLMs using marginal regression coefficients or marginal likelihood ratios (with respect to the null model). Iterated SIS (ISIS) is an iterative version of SIS: It considers information from those already chosen predictors in the previous SIS steps when measuring the marginal utility of remaining predictors, and has, in general, an improved performance over SIS. for High Dimensional Generalized L

Penalized Likelihood Methods Bayesian Approach Penalized Likelihood Approach: Information theory-based Extended BIC criterion: p λ (β) = β 2 log n + γ β log Pn, (4) where β denotes the model size, and γ > 0 is a tunable parameter. Under certain regularity conditions, Chen and Chen (2011) showed that EBIC is consistent if P n = O(n κ ) and γ > 1 1, where the consistency means the causal 2κ features can be identified with probability 1 by minimizing (3) as the sample size n. for High Dimensional Generalized L

Bayesian Methods Literature Review Penalized Likelihood Methods Bayesian Approach Bayesian Lasso: A Laplace-like prior is used for Bayesian variable selection, e.g., Figueriredo (2003), Bae and Mallick (2004), Park and Casella (2008) Normal inverse gamma prior: e.g., Hans et al. (2007), Bottolo and Richardson (2010). Theoretical study: Jiang (2006, 2007) showed that the true density (2) can be consistently estimated based on the models sampled from the posterior distribution through a Bayesian variable selection procedure as n. This property is called posterior consistency. for High Dimensional Generalized L

The prior Literature Review Let ξ n = {x 1,..., x k n } {x 1, x 2,..., x Pn } denote a subset model of the GLM, where x i, i = 1,..., k n, denote the predictors included in the subset model. Let β ξn be subject to the Gaussian prior: 1 π(β ξn ) = (2πσξ 2 n ) k n/2 exp 8 < : 1 2σ 2 ξ n k n X i=1 β 2 i 9 = ;, (5) where σ 2 ξ n denotes a pre-specified variance and choose σ 2 ξ n for each model such that log π(β ξn ) = O p (1), (6) i.e., the prior information of β ξn can be ignored for sufficiently large n. Under condition (A 2 ) (given latter), σ 2 ξ n = 1 2π ec 0/k n, for some positive constant C 0, (7) ensures (6) to hold for sufficiently large n. for High Dimensional Generalized L

The prior (continued) Further, we let the model ξ n be subject to the prior π(ξ n ) = ν k n n (1 ν n) P n k n, (8) i.e., each variable has a prior probability ν n, independent of other variables, to be selected for the subset model. for High Dimensional Generalized L

The posterior Let the prior probability ν n in (8) take a value in the form 1 ν n = 1 + Pn γ, (9) 2π for some parameter γ, then we have the posterior, log π(ξ n D n ) C + log f (y n ˆβ ξn, ξ n, X n ) kn 2 log(n) k nγ log(p n ). (10) Maximizing this posterior probability is equivalent to minimizing the EBIC given by EBIC = 2 log f (y n ˆβ ξn, ξ n, X n) + k n log(n) + k nγ log(p n). for High Dimensional Generalized L

The posterior (continued) To facilitate calculation of the MLE ˆβ ξn, in practice, one often needs to put an upper bound for the model size k n, e.g., k n < K n, where K n may depend on the sample size n. With this bound, the prior for the model ξ n becomes and the posterior becomes log π(ξ n D n ) π(ξ n) ν kn n (1 ν n) Pn kn I [k n < K n], (11) ( C + log f (y n ˆβ ξn, ξ n, X n ) kn 2 log(n) k nγ log(p n ), if k n < K n,, otherwise. (12) Since this posterior leads to sampling of subset models without shrinkage of regression coefficients, for which the Bayesian estimator of β ξn is reduced to the MLE of β ξn, we call it Bayesian subset regression (BSR). for High Dimensional Generalized L

Posterior Consistency: Conditions We assume the following conditions hold: (A 1 ) (Predictor Size) P n n δ for some δ > 0, where b n a n means lim n a n/b n = 0. (A 2 ) (sparsity) lim n P Pn i=1 β j <. for High Dimensional Generalized L

Posterior Consistency Theorem 1. Consider the GLM specified by (1) and (2), which satisfies the conditions (A 1 ), (A 2 ) and x j 1 for all j. Let ɛ n be a sequence such that ɛ n (0, 1] for each n and nɛ 2 n log(p n). Suppose the priors for the GLM are specified by (5) and (11) with the hyperparameter ν n chosen in (9), and γ and K n are chosen such that the following conditions hold: (B 1 ) P n e C 1n α for some C 1 > 0 and some α (0, 1) for all large enough n; (B 2 ) (r n ) = inf ξn:k n=r n Pj:j / ξ n β j e C 2r n for some constant C 2 > 0, where r n = P nν n denotes the up-rounded expectation of the prior (8); (B 3 ) C 1 2 log(n) r n K n n β for some β (0, q), where q = min(1 α, δ) for logistic, probit and normal linear regressions, and q = min{1 α, δ, α/4} for Poisson regression and exponential regressions with log-linear link functions. Then for some c > 0 and for all sufficiently large n, n o P π[d(ˆf, f ) > ɛ n D n ] e 0.5cnɛ2 n e 0.5cnɛ2 n, (13) where P{ } denotes the probability measure for the data D n, and d(, ) denotes the Hellinger distance between ˆf and f. for High Dimensional Generalized L

Determination of γ Theorem 1 suggests to choose γ (1 1/κ, 1) for κ > 1 and (0,1) otherwise. In practice, we set γ = inf{ γ : arg max ξ n π( ξn Dn, γ) = ξ map, γ }, (14) which is to find the minimum value of γ for which the mode of π( ξ n D n, γ) attains at the size of the MAP model. Both Theorem 1 and Chen and Chen (2011) suggest that the MAP model is the model that is most likely true. However, the MAP model depends on the value of γ. If γ is overly large, then the equality arg max ξn π( ξ n D n, γ) = ξ map, γ always holds as, in this case, there will be only a single-size model sampled from the posterior, but the resulting MAP model will be likely a subset of the true model. On the other hand, if γ is too small, then we have always arg max ξn π( ξ n D n, γ) > ξ map, γ. To balance between the two ends, we suggest the rule (14). for High Dimensional Generalized L

Algorithm Setting (Target distribution) f (x) = 1 Z ψ(x), where Z denotes the normalizing constant. (Sample space partitioning) Partition the sample space according to a pre-specified function λ(x) into m subregions: E 1 = {x : λ(x) u 1 }, E 2 = {x : u 1 < λ(x) u 2 },..., E m 1 = {x : u m 2 < λ(x) u m 1 }, E m = {x : λ(x) u m 1 }, where < u 1 < < u m 1 <. The λ( ) can be any function of x, such as a component of x, the energy function log ψ(x), etc. Define the subregion weight g i = R E i ψ(x)dx for i = 1,..., m, and g = (g 1,..., g m ). for High Dimensional Generalized L

Algorithm Setting (continued) Let {a t} be a positive non-decreasing sequence satisfying (i) X a t =, t=1 (ii) X a ζ t t=1 <, (15) for some ζ (1, 2). For example, a t = ˆ t 0 τ, t = 1, 2,... (16) max(t 0, t) for some specified value of t 0 > 1 and τ ( 1 2, 1]. Let π = (π 1,..., π m ) be an m-vector with 0 < π i < 1 and P π i = 1. The π is called the desired sampling distribution. for High Dimensional Generalized L

SAMC Algorithm Literature Review (Sampling) Draw sample x t+1 with a single MH iteration with the invariant distribution bp (t) (x) = 1 X m ψ(x) Z t i=1 bg (t) I(x E i ). i (Weight updating) Set log bg (t+1) = log bg (t) + γ t+1 (e t+1 π), where bg (t) = (bg (t) i,..., bg m (t) ), e t+1 = (e t+1,1,..., e t+1,m ) and e t+1,i = I (x t+1 E i ). for High Dimensional Generalized L

Convergence Theorem 2 As t, for any non-negative function ψ(x) with R X ψ(x)dx <, we have log bg (t) i ( R c + log( E ψ(x)dx) log(π i i + ν), if E i,. if E i =, (17) where ν = P j {i:e i = } π j /(m m 0 ) and m 0 is the number of empty subregions. Let bπ ti denote the realized sampling frequency for the subregion E i at iteration t. As t, bπ ti converges to π i + ν if E i and 0 otherwise. for High Dimensional Generalized L

Self-adjusting Mechanism A remarkable feature of the SAMC algorithm is that it possesses the self-adjusting mechanism: If a proposed move is rejected at an iteration, then the weight of the subregion that the current sample belongs to will be adjusted to a larger value, and the total rejection probability of the next iteration will be reduced. This mechanism enables the algorithm to escape from local energy minima very quickly. The SAMC algorithm represents a significant advance for simulations of complex systems for which the energy landscape is rugged. for High Dimensional Generalized L

Sure Variable Screening: the rule Let M denote the set of causal features for a dataset D n. Let e n = (e 1, e 2,..., e Pn ) denote the indicator vector of P n features; that is, e j = 1 if x j M and 0 otherwise. Let q j denote the marginal inclusion probability of x j, i.e., q j = X ξ e j ξ π(ξ D n ), where e j ξ indicates whether x j is included in the model ξ. A conventional rule is to choose the features for which the marginal inclusion probability is greater than a threshold value ˆq; i.e., setting b Mˆq = {j : q j > ˆq, j = 1,..., P n } as an estimator of M. for High Dimensional Generalized L

Sure Variable Screening: identifiability condition Let A ɛn = {ξ : d(ˆf (y, x ξ), f (y, x)) ɛ n}, where d(, ) denotes the Hellinger distance between ˆf and f. Define ρ j (ɛ n) = X ξ A ɛ n e j e j ξ π(ξ D n ), which measures the distance between the true model and the sampled models for feature j in the ɛ n -neighborhood A ɛn. (C 1 ) (Identifiability of M ) For all j = 1, 2,..., P n, ρ j (ɛ n ) 0 as n and ɛ n 0. for High Dimensional Generalized L

Sure Variable Screening: Model consistency Theorem 3. Assume the conditions of Theorem 1 (i.e., A 1 and A 2 ) and the condition (C 1 ) hold. (i) For any δ n > 0 and all sufficiently large n, q P max q j e j 2 1 j P n (ii) (Sure screening) For all sufficiently large n, δ n + e 0.5cnɛ2 n «P n e 0.5cnɛ2 n. P(M b Mˆq ) 1 s n e 0.5cnɛ2 n, where s n denotes the size of M, for some choice of ˆq (0, 1) but preferably being away from 0 and 1. (iii) (Consistency) For all sufficiently large n, P(M = b M 0.5 ) 1 P ne 0.5cnɛ2 n. for High Dimensional Generalized L

Lymph Example Literature Review This dataset consists of n = 148 samples with 100 node-negative cases (low risk for breast cancer) and 48 node-positive cases (high risk for breast cancer). After pre-screening, 4512 genes were selected for further study, which shows evidence of variation above the noise level. In addition, two clinical factors, estimated tumor size (in centimeters) and protein assay-based estrogen receptor status, were included as candidate predictors, with each coded as a binary variable, This brings the total number of predictors to P n = 4514. The subset data consists of only 34 genes, which corresponds to the set of significant genes identified by SVS at a FDR level of 0.01 based on the marginal inclusion probabilities produced by SAMC in a run on the full data with γ = 0.85. for High Dimensional Generalized L

Lymph Example Literature Review Table 1. Comparison of BSR, Lasso, elastic net, SIS and ISIS for the subset and full data of lymph: r cv (%) refers to the minimum 1-CVMR (leave-one-out cross-validation misclassification rate), and MAP min refers to the minimum size MAP i model with zero 1-CVMR, where MAP i denotes the MAP model consisting of i features. Lasso elastic net SIS ISIS MAP MAP min Data ξ r cv (%) ξ r cv (%) ξ r cv (%) ξ r cv (%) ξ r cv (%) ξ r cv (%) Sub 28 2.03 31 2.70 7 16.9 7 7.43 8 2.70 10 0 Full 23 15.5 51 16.9 7 17.6 7 10.8 8 0.68 10 0 BSR for High Dimensional Generalized L

Lymph Example (a) Subset data (b)full Data CV Misclassification Rate 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 CV Misclassification Rate 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0 5 10 15 20 25 30 0 20 40 60 80 Number of Variables Number of Variables Figure 1. Comparison of Lasso and BSR for the subset data (a) and the full data (b) of lymph: 1-CVMRs of the models selected by Lasso (while circles connected by solid line) and the MAP 1 MAP 16 models selected by BSR (black squares connected by dotted line). for High Dimensional Generalized L

Lymph Example Literature Review (a) BSR (b) SIS (c) ISIS q value 0.0 0.2 0.4 0.6 0.8 1.0 q value 0.0 0.2 0.4 0.6 0.8 1.0 q value 0.0 0.2 0.4 0.6 0.8 1.0 0 1000 2000 3000 4000 variable index 0 1000 2000 3000 4000 variable index 0 1000 2000 3000 4000 variable index (d) Lasso (e) Elastic net q value 0.0 0.2 0.4 0.6 0.8 1.0 q value 0.0 0.2 0.4 0.6 0.8 1.0 0 1000 2000 3000 4000 0 1000 2000 3000 4000 variable index variable index Figure 2. Plots of q-values for the genes selected by BSR, SIS, ISIS, Lasso and elastic net. The model shown in (a) is one of the MAP models found by BSR with γ = 0.85 and it has a 1-CVMR of 2.7%. The models shown in (b), (c), (d) and (e) are selected by SIS, ISIS, Lasso and elastic net, and have 1-CVMRs 17.6%, 10.8%, 15.5% and 16.9%, respectively. for High Dimensional Generalized L

A Simulated Example This example mimics a case-control genetic association study (GAS). The response variable y, which represents the disease status of a subject, takes value 1 for the case of disease and 0 for the control. The explanatory variables are generated as single nucleotide polymorphisms (SNPs) in the human genome. Each variable x ij, the genotype of SNP j of subject i, takes values 0, 1 or 2, where 0 stands for a homozygous site with major allele, 1 stands for a heterozygous site with minor allele, and 2 stands for a homozygous site with two copies of minor allele. We independently generated 10 datasets with n 1 = n 2 = 500, P n = 10000, k = 8, and (β 1,..., β 8 ) = (0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3). for High Dimensional Generalized L

A Simulated Example (continued) Table 2. Comparison of BSR with Lasso, elastic net, SIS and ISIS for the simulated data: a results of SVS at a FDR level of 0.001; b Average model size (over 10 datasets) produced by different methods with the standard deviation presented in the parentheses. BSR(γ = 0.85) Methods MAP SVS a Lasso Elastic net SIS ISIS Size b 8.2(0.25) 12.1(0.43) 44.9(20.48) 30.1(9.04) 36(0) 36(0) fsr(%) 3.66 25.6 81.8 73.4 77.8 77.8 nsr(%) 12.22 0 0 0 0 0 for High Dimensional Generalized L

More Gene Expression Examples Table 3. of four gene expression datasets. Dataset Publication n P Response Lymph Hans et al. (2007) 148 4514 positive/negative node Colon Alon et al. (1999) 62 2000 Tumor/normal tissue Leukemia Golub et al. (1999) 72 3571 Subtype of leukemia Prostate Singh et al. (2002) 102 6033 Tumor/normal tissue for High Dimensional Generalized L

More Gene Expression Examples Table 4. Posterior distribution π( ξ D n, γ = 0.99) obtained by BSR for colon, leukemia and prostate data. Number of variables ( ξ ) Dataset 1 2 3 4 5 6 7 8 Colon 0.001 0.387 0.201 0.260 0.118 0.029 0.004 < 0.001 < Leukemia 0.023 0.561 0.337 0.070 0.007 < 0.001 < 0.001 < 0.001 < Prostate 0 0.005 0.098 0.681 0.191 0.024 0.002 < 0.001 < for High Dimensional Generalized L

More Gene Expression Examples Table 5. Comparison of BSR, Lasso, elastic net, SIS and ISIS for three gene expression data. a minimum 1-CVMR (r cv %) and the corresponding model size ( ξ ); b 1-CVMR (r cv %) and the corresponding model size ( ξ ); c zero 1-CVMR models among MAP i s sampled by SAMC. Lasso elastic net SIS ISIS MAP MAP min Dataset ξ r cv % ξ r cv % ξ r cv % ξ r cv % ξ r cv % ξ r cv % Colon 15 12.9 26 11.3 3 12.9 3 12.9 3 4.84 4-7,9 0 Leukemia 16 4.17 37 2.78 4 4.17 2 0 2 0 2-10 0 Prostate 3 6.86 42 5.88 5 6.86 4 4.90 3 3.92 4-9 0 BSR for High Dimensional Generalized L

More Gene Expression Examples (a) Colon (b) Leukemia (c) Prostate CV Misclassification Rate 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 CV Misclassification Rate 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 CV Misclassification Rate 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 5 10 15 20 25 30 35 Model size 5 10 15 20 Model size 0 10 20 30 40 Model size Figure 3. Comparison of 1-CVMRs produced by Lasso (while circles connected by solid line) and BSR (black squares connected by dotted line) for (a) colon data, (b) leukemia data, and (c) prostate data. for High Dimensional Generalized L

CPU time analysis Literature Review Table 6. of CPU times of BSR: The CPU times are measured (in hours) on a Dell desktop of 3.0Ghz and 8GB RAM, and E D n ( ξ ) denotes the posterior expectation of ξ. Dataset n P n γ Iterations ( 10 6 ) E D n ( ξ ) CPU(h) Biliary stone 715 1748 0.75 5.05 1.9 2.0 Lymph 148 4514 0.85 10.05 8.5 12.7 Colon 62 2000 0.99 5.05 3.2 1.0 Leukemia 72 3571 0.99 5.05 2.5 1.7 Prostate 102 6033 0.99 5.05 4.1 1.5 for High Dimensional Generalized L

CPU time analysis(continued) After adjusting for the number of iterations, a linear regression analysis for the CPU time versus n, P n and [E D n ( ξ )] 3 indicates that the CPU time mainly depends on [E D n ( ξ )] 3, then n and P n in the order of significance. Actually, a simple linear regression for the CPU time versus [E D n ( ξ )] 3 has a R 2 of 95.1%. A cubic transformation for E D n ( ξ ) is used here, because the CPU time for inverting a matrix is known in the cubic order of the matrix size. This analyis shows that BSR (Bayesian subset regression) can be applied to high dimensional regression problems with reasonable CPU time. for High Dimensional Generalized L

Literature Review We propose a new prior setting for GLMs, which leads to a Bayesian subset regression with the MAP model coinciding with the minimum EBIC model. Under mild conditions, we establish consistency of the posterior. We propose a variable screening procedure based on the marginal inclusion probability and show that the procedure shares the same theoretical properties of sure screening and consistency with the SIS procedure proposed by Fan and Song (2010). Since the proposed procedure makes use of the joint information of all predictors, it generally outperforms SIS and its iterative extension ISIS. We make extensive comparisons of BSR with the popular penalized likelihood methods, including Lasso, elastic net, SIS and ISIS. Through a comparison study conducted on the subset and full data, we find that the performance of the penalized likelihood methods tend to deteriorate with the dimension P n. Given the stable performance of BSR on the subset and full data, we conclude that BSR is more suitable than these penalized likelihood methods for high dimensional variable selection, although the latter are computationally more attractive. Our further results on the simulated data, real SNP data and four gene expression data make this conclusion more convincing. for High Dimensional Generalized L

Acknowledgement Collaborators: Qifan Song and Kai Yu. NSF grant KAUST grant for High Dimensional Generalized L