A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

Size: px
Start display at page:

Download "A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables"

Transcription

1 A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8, / 46

2 Outline Introduction of Bayesian Lasso Bayesian Lasso with Pseudo Variables Bayesian Group Lasso with Pseudo Variables Conclusions and Future Work 2 / 46

3 Variable Selection Why? Interpretation: principle of parsimony. Prediction: bias and variance tradeoff. What if number of variables is greater than number of observations (p > n)? Shrinkage. Frequentist: loss + penalty. Examples: Ridge regression(hoerl and Kennard, 1970), Lasso (Tibshirani, 1996). Bayesian: Likelihood Shrinkage prior. Griffin and Brown (2005), Park and Casella (2008). 3 / 46

4 Notation Consider a data set with one response variable, p predictors and n observations. Focus on linear models: y = X β + ɛ, ɛ N(0, σ 2 I n ). y is the centered response; X i s, columns of X, are standardized to have mean 0 and unit L 2 norm. 4 / 46

5 Bayesian Interpretation of Lasso Lasso (Tibshirani 1996): min{ y X β 2 + λ β p β i } (1) i=1 Bayesian interpretation: Consider the Bayesian model y N(X β, In ) and β i λ 2 e λ β i (Laplacian prior). The solution of (1) can be interpreted as the posterior mode of β. 5 / 46

6 Laplacian Priors The Laplacian prior is more sparsity promoting than the normal prior. 6 / 46

7 The Bayesian Lasso (Park and Casella, 2008) Model y (X, β, σ 2 ) N(X β, σ 2 I n ). Propose the conditional Laplacian prior β i (σ 2, λ 2 ) λ 2σ e λ β i /σ, Rewrite the laplacian prior into a mixture of β i (σ 2, γ 2 i ) N(0, σ 2 γ 2 i ) and γ 2 i σ 2 Exp(λ 2 /2). (2) Empirical Bayesian treatment of λ: Estimate λ by the marginal maximum likelihood ˆλ. Assign a hyperprior that places high density at ˆλ. Estimate β i by its posterior median. Limitation: heavy computation load and sparsity NOT achieved. 7 / 46

8 Outline Introduction of Bayesian Lasso Bayesian Lasso with Pseudo Variables Bayesian Group Lasso with Pseudo Variables Conclusions and Future Work 8 / 46

9 Benefit of Our Method Avoid the computation burden of finding the marginal maximum likelihood estimate. Assign a prior to λ 2 that does not depend on the data. Achieve sparsity. 9 / 46

10 Intuition for Achieving Sparsity Find an unimportant pseudo variable Z as the benchmark. Augment the model: y = β z Z + X β + ɛ. Criteria: Orthogonal with y (true value of βz is 0). Orthogonal with Xi s (keep the data structure). 10 / 46

11 Benchmark: Intercept! Z int = (1/ n,..., 1/ n) }{{} T. n Orthogonal with y and all the X i s. Does NOT depend on the specific observations. 11 / 46

12 Variable Selection Regression model: Y = β int Z int + X β + ɛ. Assign hierarchical priors and obtain posterior distributions of β int and β i s. Measure the importance of X i by d i = P( β i > β int y, X ). If Xi is orthogonal with y and other variables, then d i = 0.5. The Xi will be selected as an important variable, if d i > c, where c > / 46

13 Some Thoughts on Tuning c 1. Choose the c such that the false discovery rate is controlled. 2. Find the lim n d i for the X i that is unimportant but weakly correlated with the important variables. Use it as a guideline to choose c. 13 / 46

14 Illustration Consider the following simulation setting (Tibshirani, 1996): y = X β + ɛ, β = (3, 1.5, 0, 0, 2, 0, 0, 0) T. X = (X 1,..., X p ), X i N(0, 1), cor(x i, X j ) = 0.5 i j. σ 2 = 9. n = / 46

15 Posterior Distribution of β int Z int is a good benchmark for unimportant variables. 15 / 46

16 Changes on Posterior Distributions of β i s For each β i, the estimated posterior densities are almost unaffected by adding Z int. 16 / 46

17 Estimated d i ˆdi : proportion of (β i, β int ) satisfying β i > β int. ˆβi,PC : posterior median of β i by Park and Casella s Bayesian Lasso method. All unimportant variables have ˆd i Posterior medians do NOT yield sparsity. β i ˆd i ˆβ i,pc / 46

18 Variable Selection Result Empirically, c = 0.9 yields good sparsity. When c = 0.6, the result is almost the same as Lasso. Table: Frequencies that each variable is selected. β i c = c = c = Lasso / 46

19 Posteriors of β i s after Adding Z int Lemma Consider regression model and priors y = X β + ɛ with ɛ N(0, σ 2 I n ) β i (σ 2, λ 2 ) λ/(2σ)e λ β i /σ for i = 1,..., p. Let π 1 and π 2 be the joint posteriors of β i s conditional on σ 2 and λ 2 before and after adding Z int, respectively. Then we have, π 1 = π / 46

20 Outline Introduction of Bayesian Lasso Bayesian Lasso with Pseudo Variables Bayesian Group Lasso with Pseudo Variables Conclusions and Future Work 20 / 46

21 Motivation of Group Selection Method Assayed genes or proteins are naturally grouped by biological roles or biological pathways. It is desired to first select important pathways (group selection), and then select important genes (within group selection). Correlated important variables in the same group should all be selected. Lasso tends to pick only a few of them. 21 / 46

22 Extra Notation g: Number of groups k: index of groups; j: index of variables inside groups. For example, X k,j is the jth variable in group k. p k : number of variables in group k. Assume there is no overlap, that is, p = g k=1 p k. 22 / 46

23 Current Lasso Type Methods for Group Selection Frequentist approach Designed for group selection only: Yuan & Lin (2006) Designed for both group selection and within group selection: Ma & Huang (2007); Huang at el. (2009); Wang at el. (2009). Bayesian approach. Raman et al.(2009). 23 / 46

24 Hierarchical Priors with Group Structure Model: y = β int Z int + g pk k=1 j=1 β k,jx k,j + ɛ with ɛ N(0, σ 2 I n ). β k,j N(0, γ 2 k σ2 /p k ). Variables in the same group are shrunk simultaneously. γ 2 k measures the total variations of p k variables: β k,1,..., β k,pk. γ 2 k Exp ( λ 2 2p k ). Treat βk,j equally across groups. E(β k,j ) and V (β k,j ) do not depend on k or j. 24 / 46

25 Group Selection Definition of important group: groups have at least one important variable. Selection of important groups: the kth group is selected, if max j {P( β k,j > β int )} > c. 25 / 46

26 Within Group Selection: More Benchmarks Limitation of variable selection by β int : unimportant variables in the important groups are less likely to be removed. Solution: find a benchmark Z k,ben for group with p k > 1. The regression model becomes ( ) p m k y = β int Z int + β k,ben Z k,ben + β k,j X k,j + g k=m+1 k=1 β k,1 X k,1 + ɛ, where m is the number of groups with size greater than 1. How to make Z k,ben orthogonal with other variables and benchmarks? Data augmentation! 26 / 46 j=1

27 An Example: Construction of Two More Benchmarks Data: 7 observations, two groups, {X 1,1, X 1,2 } and {X 2,1, X 2,2 }. Z int is orthogonal to y and X k,j s. Obs. y X 1,1 X 1,2 X 2,1 X 2,2 Z int 1 1/ 7 2 1/ 7 3 1/ 7 4 Data 1/ 7 5 1/ 7 6 1/ 7 7 1/ 7 27 / 46

28 Data Augmentation Obs. y X 1,1 X 1,2 X 2,1 X 2,2 Z 1,ben Z 2,ben Z int 1 a 1 a 2 1/ 9 2 a 1 a 2 1/ 9 3 a 1 a 2 1/ 9 4 Data a 1 a 2 1/ 9 5 a 1 a 2 1/ 9 6 a 1 a 2 1/ 9 7 a 1 a 2 1/ a 1 b 1 a 2 1/ a 2 b 2 1/ 9 a 1 = 1/ 56; b 1 = 7 a 2 = 1/ 72; b 2 = 8 Z 1,ben, Z 2,ben and Z int are pairwise orthogonal and also orthogonal with response and predictors. 28 / 46

29 Geometry Interpretation of Data Augmentation Adding one zero observation brings the original data to n + 1 dimensional space. 29 / 46

30 Steps of Constructing Benchmarks 1. Add m zero observations to the original data, where m is the number of groups with p k > Let Z k,ben = {a k,..., a }{{} k, a k b k, 0,..., 0}, where }{{} n+k 1 m k a k = [(n + k 1)(n + k)] 1/2 and b k = n + k Let Z int = {(m + n) 1/2,..., (m + n) 1/2 }. }{{} m+n 30 / 46

31 Group Selection and Within Group Selection Regression model: y = β int Z int + g k=m+1 ( ) p m k β k,ben Z k,ben + β k,j X k,j + k=1 β k,1 X k,1 + ɛ. Assign hierarchical priors with group structure and obtain posterior distributions of the coefficients. j=1 Group selection: the kth group is selected, if max j {P( β k,j > β int )} > c. Within group selection: suppose group k is selected, then X k,j is selected if P( β k,j > β k,ben ) > c. 31 / 46

32 Illustration Consider that p = 20, g = 6, and β = [1.5, 0.8, 0, 0, 0, 1.2, 0, 0, 0.8, 0, }{{}}{{} G1 G2 1.2, 0, 0, 0, 0, 0, 0, 0, }{{}}{{}}{{} 0, }{{} 0.8 ] T. G3 G4 G5 G6 y = X β + ɛ; X k,j N(0, 1), cov(x k,i, X k,j ) = 0.5 i j for k = 1, 2; cov(x k,j, X k,l ) = 0 for k = 3, 4 and j l. Signal to noise ratio is 3. n = 100 Let c = / 46

33 Posteriors of Variables in An Important Group Important variables deviate further from the benchmark. 33 / 46

34 Posteriors of Variables in An Unimportant Group All the unimportant variables are very close to the benchmark. 34 / 46

35 Group Selection Result Table: The frequency each group is selected in 100 simulations. Group Size Important Y Y Y N N Y Selected / 46

36 Within Group Selection Result Average false discovery rate is (0.011); average false negative rate is (0.003). Average number of selected variables is 6.13 (0.08). (True number is 6) Table: Number of times each variable is selected in 100 simulations. Variable X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 True Coef Selected / 46

37 A Big p Small n Example Let p = 200 and n = 100. There are 40 groups and each group consisted of 5 variables. β 1,j s in group 1 : (1.2, 0.8, 0, 0, 1.6) β 2,j s in group 2 : (1, 0.9, 1.1, 1.3, 0.8) β 3,j s in group 3: (0.8, 0, 0, 0, 0) β k,j s in group 4 to 8 are all zero. The above 8 groups form a block and is replicated 5 times to yield the coefficients of 240 variables in total. There are 45 important variables and 255 unimportant variables. 37 / 46

38 Covariance Structure The X k,j s in the each block are generated from multivariate normal with mean 0 and covariance structure: cov(x k,i, X m,j ) = 1/3(0.5) k m. Variables in different blocks are uncorrelated. Signal to noise ratio is / 46

39 Group Selection Result When c = 0.7, unimportant groups are effectively removed. When c = 0.7, false discovery rate is 5.1% (0.8%) and the group false negative rate is 24.0% (0.4%). When c = 0.6, false discovery rate is 23.9% (0.9%) and the group false negative rate is 16.1% (0.5%). Table: Frequencies that first 8 groups are selected based on 100 simulations. Group Important Y Y Y N N N N N Selected(c = 0.7) Selected(c = 0.6) / 46

40 Within Group Selection Result Unimportant variables in group 1 (important group) are effectively removed when c = 0.7. When c = 0.6, average false discovery rate over all groups is 33% (0.8%) and average false negative rate is 12% (0.2%). When c = 0.7, average false discovery rate over all groups is 11% (0.8%) and average false negative rate is 17% (0.2%). Table: Frequencies that 5 variables in the first group are selected based on 100 simulations. (k, j) (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) True β k,j Selected(c = 0.7) Selected(c = 0.6) / 46

41 Outline Introduction of Bayesian Lasso Bayesian Lasso with Pseudo Variables Bayesian Group Lasso with Pseudo Variables Conclusions and Future Work 41 / 46

42 Conclusions Intercept is a good benchmark for unimportant variables. Bayesian Lasso with pseudo variables achieve the sparsity. Bayesian Group Lasso with pseudo variables achieve both good group selection and within group selection results. 42 / 46

43 Future Work Optimize the threshold. More numerical comparisons with other variable selection methods. Real data analysis. 43 / 46

44 Other Work Shao, J. & Tang, Q., Random Group Variance Estimators for Survey Data with Random Hot Deck Imputation. (Submitted) Tang, Q. & Qian, P.Z.G., Enhancing the Sample Average Approximation method with U designs. (In revision) 44 / 46

45 Acknowledgement (Alphabetic) Jun Shao, University of Wisconsin-Madison Kam-Wah Tsui, University of Wisconsin-Madison Peter Qian, University of Wisconsin-Madison Sijian Wang, University of Wisconsin-Madison 45 / 46

46 Thank you! 46 / 46

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors The Canadian Journal of Statistics Vol. xx No. yy 0?? Pages?? La revue canadienne de statistique Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors Aixin Tan

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Regularized Regression A Bayesian point of view

Regularized Regression A Bayesian point of view Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Machine Learning for Economists: Part 4 Shrinkage and Sparsity Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Bayesian Variable Selection Under Collinearity

Bayesian Variable Selection Under Collinearity Bayesian Variable Selection Under Collinearity Joyee Ghosh Andrew E. Ghattas. June 3, 2014 Abstract In this article we provide some guidelines to practitioners who use Bayesian variable selection for linear

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss

More information

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Ratemaking application of Bayesian LASSO with conjugate hyperprior Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Hierarchically penalized Cox regression with grouped variables

Hierarchically penalized Cox regression with grouped variables Biometrika (2009), 96, 2,pp. 307 322 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asp016 Hierarchically penalized Cox regression with grouped variables BY S. WANG AND B. NAN Department

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Effect of outliers on the variable selection by the regularized regression

Effect of outliers on the variable selection by the regularized regression Communications for Statistical Applications and Methods 2018, Vol. 25, No. 2, 235 243 https://doi.org/10.29220/csam.2018.25.2.235 Print ISSN 2287-7843 / Online ISSN 2383-4757 Effect of outliers on the

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

IEOR165 Discussion Week 5

IEOR165 Discussion Week 5 IEOR165 Discussion Week 5 Sheng Liu University of California, Berkeley Feb 19, 2016 Outline 1 1st Homework 2 Revisit Maximum A Posterior 3 Regularization IEOR165 Discussion Sheng Liu 2 About 1st Homework

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Scale Mixture Modeling of Priors for Sparse Signal Recovery Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Penalized Regression

Penalized Regression Penalized Regression Deepayan Sarkar Penalized regression Another potential remedy for collinearity Decreases variability of estimated coefficients at the cost of introducing bias Also known as regularization

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers

More information

Robust Bayesian Variable Selection for Modeling Mean Medical Costs

Robust Bayesian Variable Selection for Modeling Mean Medical Costs Robust Bayesian Variable Selection for Modeling Mean Medical Costs Grace Yoon 1,, Wenxin Jiang 2, Lei Liu 3 and Ya-Chen T. Shih 4 1 Department of Statistics, Texas A&M University 2 Department of Statistics,

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline

More information

Aspects of Feature Selection in Mass Spec Proteomic Functional Data

Aspects of Feature Selection in Mass Spec Proteomic Functional Data Aspects of Feature Selection in Mass Spec Proteomic Functional Data Phil Brown University of Kent Newton Institute 11th December 2006 Based on work with Jeff Morris, and others at MD Anderson, Houston

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information