Generalized Linear Models and Its Asymptotic Properties

Size: px
Start display at page:

Download "Generalized Linear Models and Its Asymptotic Properties"

Transcription

1 for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L

2 Abstract Literature Review In this talk, we present a new prior setting for high dimensional generalized linear models (GLMs), which leads to a Bayesian subset regression (BSR) with the maximum a posteriori model coinciding with the minimum extended BIC model. Under mild conditions, we establish consistency of the posterior. Further, we propose a variable screening procedure based on the marginal inclusion probability and show that this procedure shares the same properties of sure screening and consistency with the existing sure independence screening (SIS) procedure. However, since the proposed procedure makes use of joint information of all predictors, it generally outperforms SIS and its improved version in real applications. Our numerical results indicate that BSR can generally outperform the penalized likelihood methods, such as Lasso, elastic net, SIS and ISIS. The models selected by BSR tend to be sparser and, more importantly, of higher generalization ability. In addition, we find that the performance of the penalized likelihood methods tend to deteriorate as the number of predictors increases, while this is not significant for BSR. for High Dimensional Generalized L

3 The problem Literature Review Let D n = {(x i, y i ) : x i R Pn, y i R, i = 1,..., n} denote a data set of n pairs of P n predictors and a response, where P n can increase with the sample size n. Suppose that the data can be modeled by a generalized linear model (GLM): and Y has the density given by η = g(µ) = β 0 + β 1 x β Pn x Pn, (1) j ff yθ b(θ) f (y θ) = exp + c(y, φ). (2) a(φ) Under the situation of small-n-large-p, variable selection can pose a great challenge on the existing statistical methods. for High Dimensional Generalized L

4 Penalized Likelihood Approach Penalized Likelihood Methods Bayesian Approach The penalized likelihood approach is to find a model, i.e., a subset of {x 1,..., x Pn }, to minimize a penalized likelihood function of the form nx log f (y i θ) + p λ (β), (3) i=1 where p λ ( ) is the penalty function, λ is a tunable scale parameter, and β = (β 0, β 1,..., β Pn ) is the vector of regression coefficients. for High Dimensional Generalized L

5 Penalized Likelihood Methods Bayesian Approach Penalized Likelihood Approach Different choices of p λ ( ) lead to different methods: Lasso (Tibshirani, 1996): L 1 norm. elastic net (Zou and Hastie, 2005): A linear combination of L 1 and L 2 norms. SCAD (Fan and Li, 2001), MCP (Zhang, 2009): Concave penalty norms. A nice property shared by these penalty norms is that they are singular at zero and thus can induce sparsity of the model by increasing λ. In practice, λ can be determined through a cross-validation procedure. for High Dimensional Generalized L

6 Penalized Likelihood Methods Bayesian Approach Penalized Likelihood Approach: Marginal utility-based Sure independence screening (SIS) (Fan and Lv, 2008; Fan and Song, 2010): SIS first ranks predictors according to their marginal utility, namely, each predictor is used independently as a predictor to decide its usefulness for predicting the response, then selects a subset of predictors with the marginal utility exceeding a predefined threshold, and finally refine the model using Lasso or SCAD. Fan and Song (2010) suggested to measure the marginal utility for GLMs using marginal regression coefficients or marginal likelihood ratios (with respect to the null model). Iterated SIS (ISIS) is an iterative version of SIS: It considers information from those already chosen predictors in the previous SIS steps when measuring the marginal utility of remaining predictors, and has, in general, an improved performance over SIS. for High Dimensional Generalized L

7 Penalized Likelihood Methods Bayesian Approach Penalized Likelihood Approach: Information theory-based Extended BIC criterion: p λ (β) = β 2 log n + γ β log Pn, (4) where β denotes the model size, and γ > 0 is a tunable parameter. Under certain regularity conditions, Chen and Chen (2011) showed that EBIC is consistent if P n = O(n κ ) and γ > 1 1, where the consistency means the causal 2κ features can be identified with probability 1 by minimizing (3) as the sample size n. for High Dimensional Generalized L

8 Bayesian Methods Literature Review Penalized Likelihood Methods Bayesian Approach Bayesian Lasso: A Laplace-like prior is used for Bayesian variable selection, e.g., Figueriredo (2003), Bae and Mallick (2004), Park and Casella (2008) Normal inverse gamma prior: e.g., Hans et al. (2007), Bottolo and Richardson (2010). Theoretical study: Jiang (2006, 2007) showed that the true density (2) can be consistently estimated based on the models sampled from the posterior distribution through a Bayesian variable selection procedure as n. This property is called posterior consistency. for High Dimensional Generalized L

9 The prior Literature Review Let ξ n = {x 1,..., x k n } {x 1, x 2,..., x Pn } denote a subset model of the GLM, where x i, i = 1,..., k n, denote the predictors included in the subset model. Let β ξn be subject to the Gaussian prior: 1 π(β ξn ) = (2πσξ 2 n ) k n/2 exp 8 < : 1 2σ 2 ξ n k n X i=1 β 2 i 9 = ;, (5) where σ 2 ξ n denotes a pre-specified variance and choose σ 2 ξ n for each model such that log π(β ξn ) = O p (1), (6) i.e., the prior information of β ξn can be ignored for sufficiently large n. Under condition (A 2 ) (given latter), σ 2 ξ n = 1 2π ec 0/k n, for some positive constant C 0, (7) ensures (6) to hold for sufficiently large n. for High Dimensional Generalized L

10 The prior (continued) Further, we let the model ξ n be subject to the prior π(ξ n ) = ν k n n (1 ν n) P n k n, (8) i.e., each variable has a prior probability ν n, independent of other variables, to be selected for the subset model. for High Dimensional Generalized L

11 The posterior Let the prior probability ν n in (8) take a value in the form 1 ν n = 1 + Pn γ, (9) 2π for some parameter γ, then we have the posterior, log π(ξ n D n ) C + log f (y n ˆβ ξn, ξ n, X n ) kn 2 log(n) k nγ log(p n ). (10) Maximizing this posterior probability is equivalent to minimizing the EBIC given by EBIC = 2 log f (y n ˆβ ξn, ξ n, X n) + k n log(n) + k nγ log(p n). for High Dimensional Generalized L

12 The posterior (continued) To facilitate calculation of the MLE ˆβ ξn, in practice, one often needs to put an upper bound for the model size k n, e.g., k n < K n, where K n may depend on the sample size n. With this bound, the prior for the model ξ n becomes and the posterior becomes log π(ξ n D n ) π(ξ n) ν kn n (1 ν n) Pn kn I [k n < K n], (11) ( C + log f (y n ˆβ ξn, ξ n, X n ) kn 2 log(n) k nγ log(p n ), if k n < K n,, otherwise. (12) Since this posterior leads to sampling of subset models without shrinkage of regression coefficients, for which the Bayesian estimator of β ξn is reduced to the MLE of β ξn, we call it Bayesian subset regression (BSR). for High Dimensional Generalized L

13 Posterior Consistency: Conditions We assume the following conditions hold: (A 1 ) (Predictor Size) P n n δ for some δ > 0, where b n a n means lim n a n/b n = 0. (A 2 ) (sparsity) lim n P Pn i=1 β j <. for High Dimensional Generalized L

14 Posterior Consistency Theorem 1. Consider the GLM specified by (1) and (2), which satisfies the conditions (A 1 ), (A 2 ) and x j 1 for all j. Let ɛ n be a sequence such that ɛ n (0, 1] for each n and nɛ 2 n log(p n). Suppose the priors for the GLM are specified by (5) and (11) with the hyperparameter ν n chosen in (9), and γ and K n are chosen such that the following conditions hold: (B 1 ) P n e C 1n α for some C 1 > 0 and some α (0, 1) for all large enough n; (B 2 ) (r n ) = inf ξn:k n=r n Pj:j / ξ n β j e C 2r n for some constant C 2 > 0, where r n = P nν n denotes the up-rounded expectation of the prior (8); (B 3 ) C 1 2 log(n) r n K n n β for some β (0, q), where q = min(1 α, δ) for logistic, probit and normal linear regressions, and q = min{1 α, δ, α/4} for Poisson regression and exponential regressions with log-linear link functions. Then for some c > 0 and for all sufficiently large n, n o P π[d(ˆf, f ) > ɛ n D n ] e 0.5cnɛ2 n e 0.5cnɛ2 n, (13) where P{ } denotes the probability measure for the data D n, and d(, ) denotes the Hellinger distance between ˆf and f. for High Dimensional Generalized L

15 Determination of γ Theorem 1 suggests to choose γ (1 1/κ, 1) for κ > 1 and (0,1) otherwise. In practice, we set γ = inf{ γ : arg max ξ n π( ξn Dn, γ) = ξ map, γ }, (14) which is to find the minimum value of γ for which the mode of π( ξ n D n, γ) attains at the size of the MAP model. Both Theorem 1 and Chen and Chen (2011) suggest that the MAP model is the model that is most likely true. However, the MAP model depends on the value of γ. If γ is overly large, then the equality arg max ξn π( ξ n D n, γ) = ξ map, γ always holds as, in this case, there will be only a single-size model sampled from the posterior, but the resulting MAP model will be likely a subset of the true model. On the other hand, if γ is too small, then we have always arg max ξn π( ξ n D n, γ) > ξ map, γ. To balance between the two ends, we suggest the rule (14). for High Dimensional Generalized L

16 Algorithm Setting (Target distribution) f (x) = 1 Z ψ(x), where Z denotes the normalizing constant. (Sample space partitioning) Partition the sample space according to a pre-specified function λ(x) into m subregions: E 1 = {x : λ(x) u 1 }, E 2 = {x : u 1 < λ(x) u 2 },..., E m 1 = {x : u m 2 < λ(x) u m 1 }, E m = {x : λ(x) u m 1 }, where < u 1 < < u m 1 <. The λ( ) can be any function of x, such as a component of x, the energy function log ψ(x), etc. Define the subregion weight g i = R E i ψ(x)dx for i = 1,..., m, and g = (g 1,..., g m ). for High Dimensional Generalized L

17 Algorithm Setting (continued) Let {a t} be a positive non-decreasing sequence satisfying (i) X a t =, t=1 (ii) X a ζ t t=1 <, (15) for some ζ (1, 2). For example, a t = ˆ t 0 τ, t = 1, 2,... (16) max(t 0, t) for some specified value of t 0 > 1 and τ ( 1 2, 1]. Let π = (π 1,..., π m ) be an m-vector with 0 < π i < 1 and P π i = 1. The π is called the desired sampling distribution. for High Dimensional Generalized L

18 SAMC Algorithm Literature Review (Sampling) Draw sample x t+1 with a single MH iteration with the invariant distribution bp (t) (x) = 1 X m ψ(x) Z t i=1 bg (t) I(x E i ). i (Weight updating) Set log bg (t+1) = log bg (t) + γ t+1 (e t+1 π), where bg (t) = (bg (t) i,..., bg m (t) ), e t+1 = (e t+1,1,..., e t+1,m ) and e t+1,i = I (x t+1 E i ). for High Dimensional Generalized L

19 Convergence Theorem 2 As t, for any non-negative function ψ(x) with R X ψ(x)dx <, we have log bg (t) i ( R c + log( E ψ(x)dx) log(π i i + ν), if E i,. if E i =, (17) where ν = P j {i:e i = } π j /(m m 0 ) and m 0 is the number of empty subregions. Let bπ ti denote the realized sampling frequency for the subregion E i at iteration t. As t, bπ ti converges to π i + ν if E i and 0 otherwise. for High Dimensional Generalized L

20 Self-adjusting Mechanism A remarkable feature of the SAMC algorithm is that it possesses the self-adjusting mechanism: If a proposed move is rejected at an iteration, then the weight of the subregion that the current sample belongs to will be adjusted to a larger value, and the total rejection probability of the next iteration will be reduced. This mechanism enables the algorithm to escape from local energy minima very quickly. The SAMC algorithm represents a significant advance for simulations of complex systems for which the energy landscape is rugged. for High Dimensional Generalized L

21 Sure Variable Screening: the rule Let M denote the set of causal features for a dataset D n. Let e n = (e 1, e 2,..., e Pn ) denote the indicator vector of P n features; that is, e j = 1 if x j M and 0 otherwise. Let q j denote the marginal inclusion probability of x j, i.e., q j = X ξ e j ξ π(ξ D n ), where e j ξ indicates whether x j is included in the model ξ. A conventional rule is to choose the features for which the marginal inclusion probability is greater than a threshold value ˆq; i.e., setting b Mˆq = {j : q j > ˆq, j = 1,..., P n } as an estimator of M. for High Dimensional Generalized L

22 Sure Variable Screening: identifiability condition Let A ɛn = {ξ : d(ˆf (y, x ξ), f (y, x)) ɛ n}, where d(, ) denotes the Hellinger distance between ˆf and f. Define ρ j (ɛ n) = X ξ A ɛ n e j e j ξ π(ξ D n ), which measures the distance between the true model and the sampled models for feature j in the ɛ n -neighborhood A ɛn. (C 1 ) (Identifiability of M ) For all j = 1, 2,..., P n, ρ j (ɛ n ) 0 as n and ɛ n 0. for High Dimensional Generalized L

23 Sure Variable Screening: Model consistency Theorem 3. Assume the conditions of Theorem 1 (i.e., A 1 and A 2 ) and the condition (C 1 ) hold. (i) For any δ n > 0 and all sufficiently large n, q P max q j e j 2 1 j P n (ii) (Sure screening) For all sufficiently large n, δ n + e 0.5cnɛ2 n «P n e 0.5cnɛ2 n. P(M b Mˆq ) 1 s n e 0.5cnɛ2 n, where s n denotes the size of M, for some choice of ˆq (0, 1) but preferably being away from 0 and 1. (iii) (Consistency) For all sufficiently large n, P(M = b M 0.5 ) 1 P ne 0.5cnɛ2 n. for High Dimensional Generalized L

24 Lymph Example Literature Review This dataset consists of n = 148 samples with 100 node-negative cases (low risk for breast cancer) and 48 node-positive cases (high risk for breast cancer). After pre-screening, 4512 genes were selected for further study, which shows evidence of variation above the noise level. In addition, two clinical factors, estimated tumor size (in centimeters) and protein assay-based estrogen receptor status, were included as candidate predictors, with each coded as a binary variable, This brings the total number of predictors to P n = The subset data consists of only 34 genes, which corresponds to the set of significant genes identified by SVS at a FDR level of 0.01 based on the marginal inclusion probabilities produced by SAMC in a run on the full data with γ = for High Dimensional Generalized L

25 Lymph Example Literature Review Table 1. Comparison of BSR, Lasso, elastic net, SIS and ISIS for the subset and full data of lymph: r cv (%) refers to the minimum 1-CVMR (leave-one-out cross-validation misclassification rate), and MAP min refers to the minimum size MAP i model with zero 1-CVMR, where MAP i denotes the MAP model consisting of i features. Lasso elastic net SIS ISIS MAP MAP min Data ξ r cv (%) ξ r cv (%) ξ r cv (%) ξ r cv (%) ξ r cv (%) ξ r cv (%) Sub Full BSR for High Dimensional Generalized L

26 Lymph Example (a) Subset data (b)full Data CV Misclassification Rate CV Misclassification Rate Number of Variables Number of Variables Figure 1. Comparison of Lasso and BSR for the subset data (a) and the full data (b) of lymph: 1-CVMRs of the models selected by Lasso (while circles connected by solid line) and the MAP 1 MAP 16 models selected by BSR (black squares connected by dotted line). for High Dimensional Generalized L

27 Lymph Example Literature Review (a) BSR (b) SIS (c) ISIS q value q value q value variable index variable index variable index (d) Lasso (e) Elastic net q value q value variable index variable index Figure 2. Plots of q-values for the genes selected by BSR, SIS, ISIS, Lasso and elastic net. The model shown in (a) is one of the MAP models found by BSR with γ = 0.85 and it has a 1-CVMR of 2.7%. The models shown in (b), (c), (d) and (e) are selected by SIS, ISIS, Lasso and elastic net, and have 1-CVMRs 17.6%, 10.8%, 15.5% and 16.9%, respectively. for High Dimensional Generalized L

28 A Simulated Example This example mimics a case-control genetic association study (GAS). The response variable y, which represents the disease status of a subject, takes value 1 for the case of disease and 0 for the control. The explanatory variables are generated as single nucleotide polymorphisms (SNPs) in the human genome. Each variable x ij, the genotype of SNP j of subject i, takes values 0, 1 or 2, where 0 stands for a homozygous site with major allele, 1 stands for a heterozygous site with minor allele, and 2 stands for a homozygous site with two copies of minor allele. We independently generated 10 datasets with n 1 = n 2 = 500, P n = 10000, k = 8, and (β 1,..., β 8 ) = (0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3). for High Dimensional Generalized L

29 A Simulated Example (continued) Table 2. Comparison of BSR with Lasso, elastic net, SIS and ISIS for the simulated data: a results of SVS at a FDR level of 0.001; b Average model size (over 10 datasets) produced by different methods with the standard deviation presented in the parentheses. BSR(γ = 0.85) Methods MAP SVS a Lasso Elastic net SIS ISIS Size b 8.2(0.25) 12.1(0.43) 44.9(20.48) 30.1(9.04) 36(0) 36(0) fsr(%) nsr(%) for High Dimensional Generalized L

30 More Gene Expression Examples Table 3. of four gene expression datasets. Dataset Publication n P Response Lymph Hans et al. (2007) positive/negative node Colon Alon et al. (1999) Tumor/normal tissue Leukemia Golub et al. (1999) Subtype of leukemia Prostate Singh et al. (2002) Tumor/normal tissue for High Dimensional Generalized L

31 More Gene Expression Examples Table 4. Posterior distribution π( ξ D n, γ = 0.99) obtained by BSR for colon, leukemia and prostate data. Number of variables ( ξ ) Dataset Colon < < Leukemia < < < < Prostate < < for High Dimensional Generalized L

32 More Gene Expression Examples Table 5. Comparison of BSR, Lasso, elastic net, SIS and ISIS for three gene expression data. a minimum 1-CVMR (r cv %) and the corresponding model size ( ξ ); b 1-CVMR (r cv %) and the corresponding model size ( ξ ); c zero 1-CVMR models among MAP i s sampled by SAMC. Lasso elastic net SIS ISIS MAP MAP min Dataset ξ r cv % ξ r cv % ξ r cv % ξ r cv % ξ r cv % ξ r cv % Colon ,9 0 Leukemia Prostate BSR for High Dimensional Generalized L

33 More Gene Expression Examples (a) Colon (b) Leukemia (c) Prostate CV Misclassification Rate CV Misclassification Rate CV Misclassification Rate Model size Model size Model size Figure 3. Comparison of 1-CVMRs produced by Lasso (while circles connected by solid line) and BSR (black squares connected by dotted line) for (a) colon data, (b) leukemia data, and (c) prostate data. for High Dimensional Generalized L

34 CPU time analysis Literature Review Table 6. of CPU times of BSR: The CPU times are measured (in hours) on a Dell desktop of 3.0Ghz and 8GB RAM, and E D n ( ξ ) denotes the posterior expectation of ξ. Dataset n P n γ Iterations ( 10 6 ) E D n ( ξ ) CPU(h) Biliary stone Lymph Colon Leukemia Prostate for High Dimensional Generalized L

35 CPU time analysis(continued) After adjusting for the number of iterations, a linear regression analysis for the CPU time versus n, P n and [E D n ( ξ )] 3 indicates that the CPU time mainly depends on [E D n ( ξ )] 3, then n and P n in the order of significance. Actually, a simple linear regression for the CPU time versus [E D n ( ξ )] 3 has a R 2 of 95.1%. A cubic transformation for E D n ( ξ ) is used here, because the CPU time for inverting a matrix is known in the cubic order of the matrix size. This analyis shows that BSR (Bayesian subset regression) can be applied to high dimensional regression problems with reasonable CPU time. for High Dimensional Generalized L

36 Literature Review We propose a new prior setting for GLMs, which leads to a Bayesian subset regression with the MAP model coinciding with the minimum EBIC model. Under mild conditions, we establish consistency of the posterior. We propose a variable screening procedure based on the marginal inclusion probability and show that the procedure shares the same theoretical properties of sure screening and consistency with the SIS procedure proposed by Fan and Song (2010). Since the proposed procedure makes use of the joint information of all predictors, it generally outperforms SIS and its iterative extension ISIS. We make extensive comparisons of BSR with the popular penalized likelihood methods, including Lasso, elastic net, SIS and ISIS. Through a comparison study conducted on the subset and full data, we find that the performance of the penalized likelihood methods tend to deteriorate with the dimension P n. Given the stable performance of BSR on the subset and full data, we conclude that BSR is more suitable than these penalized likelihood methods for high dimensional variable selection, although the latter are computationally more attractive. Our further results on the simulated data, real SNP data and four gene expression data make this conclusion more convincing. for High Dimensional Generalized L

37 Acknowledgement Collaborators: Qifan Song and Kai Yu. NSF grant KAUST grant for High Dimensional Generalized L

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Learning Bayesian Networks for Biomedical Data

Learning Bayesian Networks for Biomedical Data Learning Bayesian Networks for Biomedical Data Faming Liang (Texas A&M University ) Liang, F. and Zhang, J. (2009) Learning Bayesian Networks for Discrete Data. Computational Statistics and Data Analysis,

More information

VARIABLE SELECTION FOR ULTRA HIGH DIMENSIONAL DATA. A Dissertation QIFAN SONG

VARIABLE SELECTION FOR ULTRA HIGH DIMENSIONAL DATA. A Dissertation QIFAN SONG VARIABLE SELECTION FOR ULTRA HIGH DIMENSIONAL DATA A Dissertation by QIFAN SONG Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the requirements

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Some models of genomic selection

Some models of genomic selection Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/

More information

Learning with Sparsity Constraints

Learning with Sparsity Constraints Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Probabilistic machine learning group, Aalto University  Bayesian theory and methods, approximative integration, model Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Equivalent Partial Correlation Selection for High Dimensional Gaussian Graphical Models

Equivalent Partial Correlation Selection for High Dimensional Gaussian Graphical Models Equivalent Partial Correlation Selection for High Dimensional Gaussian Graphical Models Faming Liang University of Florida August 23, 2015 Abstract Gaussian graphical models (GGMs) are frequently used

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces

A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces Journal of Machine Learning Research 17 (2016) 1-26 Submitted 6/14; Revised 5/15; Published 4/16 A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces Xiang Zhang Yichao

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

Linear Regression. Volker Tresp 2018

Linear Regression. Volker Tresp 2018 Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w

More information

Group exponential penalties for bi-level variable selection

Group exponential penalties for bi-level variable selection for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Penalized methods in genome-wide association studies

Penalized methods in genome-wide association studies University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Penalized methods in genome-wide association studies Jin Liu University of Iowa Copyright 2011 Jin Liu This dissertation is

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Towards a Regression using Tensors

Towards a Regression using Tensors February 27, 2014 Outline Background 1 Background Linear Regression Tensorial Data Analysis 2 Definition Tensor Operation Tensor Decomposition 3 Model Attention Deficit Hyperactivity Disorder Data Analysis

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Forward Regression for Ultra-High Dimensional Variable Screening

Forward Regression for Ultra-High Dimensional Variable Screening Forward Regression for Ultra-High Dimensional Variable Screening Hansheng Wang Guanghua School of Management, Peking University This version: April 9, 2009 Abstract Motivated by the seminal theory of Sure

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL

FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL Statistica Sinica 26 (2016), 881-901 doi:http://dx.doi.org/10.5705/ss.2014.171 FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL COX S MODEL Guangren Yang 1, Ye Yu 2, Runze Li 2 and Anne Buu 3 1 Jinan University,

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Feature Screening in Ultrahigh Dimensional Cox s Model

Feature Screening in Ultrahigh Dimensional Cox s Model Feature Screening in Ultrahigh Dimensional Cox s Model Guangren Yang School of Economics, Jinan University, Guangzhou, P.R. China Ye Yu Runze Li Department of Statistics, Penn State Anne Buu Indiana University

More information

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors The Canadian Journal of Statistics Vol. xx No. yy 0?? Pages?? La revue canadienne de statistique Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors Aixin Tan

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

When is MLE appropriate

When is MLE appropriate When is MLE appropriate As a rule of thumb the following to assumptions need to be fulfilled to make MLE the appropriate method for estimation: The model is adequate. That is, we trust that one of the

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information