Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Size: px
Start display at page:

Download "Stepwise Searching for Feature Variables in High-Dimensional Linear Regression"

Transcription

1 Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics Joint work with: Hongzhi An, Chinese Academy of Sciences Da Huang, Peking University Cun-Hui Zhang, Rutgers University p.1

2 Regression with p >> n: (some) recent developments Algorithms: stepwise addition and deletion new information criteria BICP and BICC Numerical results simulation with independent and dependent regressors comparison with Lasso Asymptotic results: consistency for BICP p.2

3 Consider linear model y = Xβ + ε = (x 1,,x p )β + ε, where X is an n p design matrix, ε N(0,σ 2 I n ). Assume β β n = (β 1,,β p ) varies with n, and p p n together with n Let d d n = I n, where Then d n. I n = {1 i p : β n,i β i 0} Sparsity: d << p p.3

4 Lasso estimator (Tibshirani 1996): β lasso = arg min β { y Xβ 2 + λ p j=1 β j }, where λ > 0 is a constant. Due to the L 1 penalty, some β j are shrunk to exactly 0 for large λ. Therefore sparsity is achieved. For a given λ, Lasso can be solved by quadratic programming. Efron et al. (2004) LARS: solve the whole Lasso solution path (for all λ > 0) in the same order of computations as a single LS fit, and β j (λ) are piecewise linear. Adaptive Lasso: use β j 1 γ instead of β j to achieve the oracle properties (Zou 2006). p.4

5 Dantzig selector (Candes and Tao 2007): β DS is the solution of the following l 1 -regularization problem min ζ R p p i=1 β i subject to X (y Xβ) λ p σ. Usually we take λ p = 2 log p MSE( β DS ) is a log p-factor of the MSE for an oracle estimator. Recasted into a linear programming problem. Approximately equivalent to Lasso (Bickel, Ritov and Tsybakov 2008) p.5

6 Sure independence (correlation) screening (Fan and Lv 2008): (i) Marginal regression: choose p 0 regressors which are most correlated individually with y for some d << p 0 << p. (ii) Apply other methods, such as adaptive Lasso, SCAD or Dantzig selector, to identify the sparse model among p 0 candidate regressors. Computationally most efficient, may apply when p is huge. p.6

7 Consider linear model y = Xβ + ε = (x 1,,x p )β + ε, where X is an n p design matrix, ε N(0,σ 2 I n ). Assume β β n = (β n,1,,β n,p ) varies with n, and p p n together with n Let d d n = I n, where I n = {1 i p : β n,i 0} p.7

8 For J {1,,p}, Put Notation X J : n J matrix consisting of the columns of X corresponding to the indices in J β J : J -vector consisting of the components β corresponding to the indices in J. P J = X J (X J X J ) X J L u,v (J ) = u (I n P J )v, u,v R n. Then L y,y (J ) is the SSR from the LS fitting ŷ = X J βj = P J y. p.8

9 Algorithm Stage I Forward addition: 1. Let j 1 = arg min 1 i p L y,y ({i}) and J 1 = {j 1 }. Put BICP 1 = log{l y,y (J 1 )/n} + 2 log p / n. p.9

10 Algorithm Stage I Forward addition: 1. Let j 1 = arg min 1 i p L y,y ({i}) and J 1 = {j 1 }. Put BICP 1 = log{l y,y (J 1 )/n} + 2 log p / n. 2. Continue with k = 2, 3,, provided BICP k < BICP k 1, where BICP k = log{l y,y (J k )/n} + 2k n log p. In the above expression, J k = J k 1 {j k }, and j k = arg max i J k 1 [L y,y (J k 1 ) L y,y (J k 1 {i})] = arg max i J k 1 L 2 y,x i (J k 1 )/L xi,x i (J k 1 ). p.9

11 Algorithm Stage I Forward addition: 1. Let j 1 = arg min 1 i p L y,y ({i}) and J 1 = {j 1 }. Put BICP 1 = log{l y,y (J 1 )/n} + 2 log p / n. 2. Continue with k = 2, 3,, provided BICP k < BICP k 1, where BICP k = log{l y,y (J k )/n} + 2k n log p. In the above expression, J k = J k 1 {j k }, and j k = arg max i J k 1 [L y,y (J k 1 ) L y,y (J k 1 {i})] = arg max i J k 1 L 2 y,x i (J k 1 )/L xi,x i (J k 1 ). 3. For BICP k BICP k 1, let k = k 1, and În,1 = J ek. p.9

12 Stage II Backward deletion; 4. Let BICP e k = BICP ek and J e k = În,1. p.10

13 Stage II Backward deletion; 4. Let BICP e k = BICP ek and J e k = În,1. 5. Continue with k = k 1, k 2,, provided BICP k BICP k+1, where BICP k = log{l y,y (J k )/n} + 2k n log p. In the above expression, J k = J k+1 \ {j k}, and j k = arg min [L y,y (J i Jk+1 k+1 \ {i}) L y,y(jk+1 )]. p.10

14 Stage II Backward deletion; 4. Let BICP e k = BICP ek and J e k = În,1. 5. Continue with k = k 1, k 2,, provided BICP k BICP k+1, where BICP k = log{l y,y (J k )/n} + 2k n log p. In the above expression, J k = J k+1 \ {j k}, and j k = arg min [L y,y (J i Jk+1 k+1 \ {i}) L y,y(jk+1 )]. 6. For BICP k > BICP k+1, let k = k + 1, and În,2 = J b k. p.10

15 Implementation: Sweep operation Forward addition: Set L 0 = (X,y) (X,y) (l 0 i,j ): a (p + 1) (p + 1) matrix. Adding one variable, say, x i, in the k-th step corresponds to transfer L k 1 = (l k 1 i,j ) to L k = (l k i,j ) by the sweep operation: l k i,i = 1/lk 1 i,i, l k j,m = lk 1 j,m lk 1 i,m lk 1 j,i /l k 1 i,i for j i and m i, l k i,j = lk 1 i,j /l k 1 i,i and l k j,i = lk 1 j,i /l k 1 i,i for j i. Then L y,y (J k 1 ) L y,y (J k 1 {i}) = ( l k 1 i,p+1 Backward deletion: Same as the above with L 0 = L ek obtained in Stage I. For k = k 1, k 2,, L y,y (J k+1 {i}) L y,y(j k+1 ) = ( l e k k+1 i,p+1 ) 2/l k 1 i,i, i J k 1. ) 2/l e k k+1 i,i, i N k. p.11

16 Remarks 1. BICP k = log{l y,y (J k )/n} + k 2 log p n, replaces the penalty log n n in standard BIC by 2 log p n, is designed for the cases with p n or p > n. p.12

17 Remarks 1. BICP k = log{l y,y (J k )/n} + k 2 log p n, replaces the penalty log n n in standard BIC by 2 log p n, is designed for the cases with p n or p > n. 2. An alternative BICC BICC k = log{l y,y (J k )/n) + c 0 } + k log n n, where c 0 > 0 is a constant. p.12

18 Remarks 1. BICP k = log{l y,y (J k )/n} + k 2 log p n, replaces the penalty log n n in standard BIC by 2 log p n, is designed for the cases with p n or p > n. 2. An alternative BICC BICC k = log{l y,y (J k )/n) + c 0 } + k log n n, where c 0 > 0 is a constant. Why insert c 0? For k close to n, L y,y (J k ) 0, log{l y,y (J k 1 )} log{l y,y (J k )} L y,y(j k 1 ) L y,y (J k ) L y,y (J k ) may be very large, even when L y,y (J k 1 ) L y,y (J k ) is negligible. p.12

19 Remarks 3. In the forward search, x i should be excluded from the further search if L y,y (J k 1 ) L y,y (J k 1 {i}) is practically 0. This may improve the computation efficiency. p.13

20 Remarks 3. In the forward search, x i should be excluded from the further search if L y,y (J k 1 ) L y,y (J k 1 {i}) is practically 0. This may improve the computation efficiency. 4. When p n n, the true mean µ n = X In β In may be represented as linear combinations of any full-ranked n n submatrix of X. In practice, we may start the forward search based on the genuine optimal regression subset with j regressors, where j 1 is a small integer. This should effectively eliminate the possibility that În,1 ends as a non-sparse set. p.13

21 Simulation: Example 1 Consider model y = Xβ + ε with x ij and ε i are indep N(0, 1). Setting I: n = 200, p = 1000 or 2000, and d = 10 or 25 Setting II: n = 800, p = or 20000, and d = 25 or 40 Non-zero β i are of the form ( 1) u (2.5 2 log p/n + v ), v N(0, 1), and P(u = 1) = P(u = 0) = 0.5. Replication: 200 times p.14

22 No. of selected regressors in forward and backward searches BICP: n=200, p=1000 BICC: n=200, p=1000 BICP: n=200, p=2000 BICC: n=200, p= BICP: n=800, p=10000 BICC: n=800, p=10000 BICP: n=800, p=20000 BICC: n=800, p= p.15

23 Comparison with Lasso Lasso estimator is defined as the minimizer of 1 p 2n y Xβ 2 + λ β j, with λ = 2(log p)/n. We standard the data: x j = n for all j. j=1 For a fitted model, relative error is defined as r = 1 d ( number of selected wrong variables + number of unselected true variables ) p.16

24 d d r n p d Method Mean STD Mean STD BICP BICC LASSO BICP BICC LASSO BICP BICC LASSO BICP BICC LASSO p.17

25 d d r n p d Method Mean STD Mean STD BICP BICC LASSO BICP BICC LASSO BICP BICC LASSO BICP BICC LASSO p.18

26 BICP and LASSO Comparable? BICP adds a new variable by performing an F -test: ( Ly,y (J BICP k < BICP k ) ) k 1 log + 2 log p L y,y (J k 1 ) n L y,y(j k 1 ) L y,y (J k ) L y,y (J k ) i.e. F 1,n k 1 > (n k 1){e 2 log p/n 1} 2 log p. < 0 > e (2 log p)/n 1. LASSO selects x j by performing approximately a z-test: 1 2n y X jβ j 2 1 2n y Xβ 2 > λ β n,j 1 n β n,jx j(y X j β j ) + β 2 j x n,j 2 /n > λ β n,j 1 n β n,jx j(y X j β j ) + β 2 n,j > λ β n,j i.e. approximately x j (y X jβ j ) > nλ. p.19

27 As y X j β j ε N(0,σ 2 I n ), x j (y X jβ j ) N(O,σ 2 x j 2 ). Hence it is approximately as χ 2 1 > n 2 λ 2 /{ x j 2 σ 2 } = 2 log p Note σ 2 = 1 in our example. As F 1,q χ 2 1 methods are approximately comparable. for large q, the two Remark. The methods which penalizing log( y Xβ 2 ) (such as BIC) are F -test based and do not require to know σ 2. The methods which penalizing y Xβ 2 directly (such as LASSO) are z-test based and do require the info on σ 2. p.20

28 Simulation: Example 2 Same setting as in Example 1 with added dependence: for 1 k n and 1 i j d, Corr(X ki,x kj ) = ( 1) u 1 (0.5) i j, Corr(X ki,x k,i+d ) = ( 1) u 2 ρ, Corr(X ki,x k,i+2d ) = ( 1) u 3 (1 ρ 2 ) 1/2, where ρ U[0.2, 0.8], u 1, u 2 and u 3 are independent from the uniform distribution on the two points {1, 0}. The first d regression variables have the non-zero coefficients. p.21

29 No. of selected regressors in forward and backward searches BICP: n=200, p=1000 BICC: n=200, p=1000 BICP: n=200, p=2000 BICC: n=200, p= BICP: n=800, p=10000 BICC: n=800, p=10000 BICP: n=800, p=20000 BICC: n=800, p= p.22

30 Asymptotic results Consistency Goal: P { În,1 I n = În,2 } 1 Major difficulty: No. of candidate models Key idea: find a series of collections C k (k 1) of deterministic models such that No. of candidate models in C k diverge to not too fast models selected in forward path are always in C k, k 1 Note. The construction of {C k } is for deriving the consistency, which is not required for practical implementation. p.23

31 Heuristics for the forward search Let C k be a collection of deterministic models of size k, satisfying two conditions: 1. Let k be an integer for which I n J for all J C k 1, and P { k < k : J k 1 C k 1, I n J k 1, J k C k } 0, 2. P { k k : J k 1 C k 1, I n J k 1, BICP k < BICP k 1 } 0. Condition 1 sets k as an upper bound for the size of the selected model, which may go to, and furthermore J k C k unless I n J k 1 Condition 2 requires to stop the search with În,1 = J k 1 as soon as I n J k 1. p.24

32 The stopping rule is effectively an F -test: BICP k BICP k 1 log( L y,y(j k ) ) L y,y (J k 1 ) L y,y(j k 1 ) L y,y (J k ) L y,y (J k ) For a deterministic model J I n and j J, + 2 log p n 0 e (2 log p)/n 1. F(J,j) = L y,y(j ) L y,y (J {j}) L y,y (J {j})/(n J 1) F 1,n J 1 p.25

33 Since j k is selected among p k + 1 choices, we have P { k k : J k 1 C k 1, I n J k 1, BICP k < BICP k 1 } k k=d n +1 k k=d n +1 k k=d n +1 P {J k 1 C k 1, I n J k 1, BICP k < BICP k 1 } P { max J C k 1 max j J F(J,j) > (n k)(e(2 log p)/n 1) } Ck 1 (p k + 1)P {F1,n k > (n k)(e (2 log p)/n 1)}. p.26

34 Based the tail property of F -distribution, the RHS of the above is bounded by k k=d n +1 2 C k 1 (p k + 1) exp { (log p)(1 (k + 1)/n) } (1 (k + 1)/n)π log p which converges to 0 if k /p 0, k (log p)/n = O(1), k k=d n +1 Ck 1 log p 0. The above condition needs to be relaxed, though it may easily hold if k is bounded (when n,p ). Chen and Chen (2008): extended BIC Huang and Wang (2008): BICC p.27

35 Regularity conditions 1. The sparse Riesz condition: ) ( ) 0 < c λ min (X J X J /n λ max X J X J /n c < for any J {1,...,p} and J d n, where c, c are fixed const. and d n (e.g. d n log(p/d n) = a 0 n for small a 0 > 0). This ensures that the sparse representation of the model is unique. If the underlying distribution of all the p regressors is non-degenerate, any n n submatrix of X is full-ranked with probability 1. p.28

36 Let µ = Xβ = X In β In, and d n = #{β n,j 0, 1 j p}, β = min β n,j β n,j For some constants γ (0, 1), the upper bound for the size of estimated sets is defined as k = [ d n log{ µ 2 /(nc β 2 )} / {c (1 γ) 2 } ]. 2. Condition on β, d n and p = p n : (1 + d n ) log p n = o(n), and β 2 2(1 + ǫ 0) 3 σ 2 γ 2 c 3 log p n, max{ d n + k 2, d n c c 2 (1 γ) 2 } < d n, (1 + d n ) log(2 + d n) log p n log ( µ 2 nc β 2 k { log k + log ( d n c )} log pn c 2 (1 γ) 2, ) 0, where ǫ 0 > 0 is a const. p.29

37 3. Condition (adjustment) for BICP: BICP k = log{l y,y (J k )/n} + k l=1 2(1 + η 0 ) log p, n l 1 where η 0 ( 0, (1 γ) 2 (1 + ǫ 0 ) 3 /(γ 2 c ) 1 ) is a small constant. Remark. For k k, k l=1 2(1 + η 0 ) log p n l 1 = {1 + η 0 + o(1)} 2k log p. n The adjustment increases the penalty by a factor of (1 + η 0 ). p.30

38 Theorem. Under conditions 1, 2 and 3, P { Î n,1 I n = În,2, k k < k } 1. Final remark. The consistency was proved for a slightly aggressive BICP penalty, while the proof used conservative Bonferroni estimates of multiple testing errors in all stages. The simple BICP penalty (2k log p)/n is recommended in practice. We repeated simulation for Example 1 with the adjusted BICP, the results are good, but not as good as those with the simple penalty (2k log p)/n. p.31

39 No. of selected regressors in forward and backward searches η 0 = 0 : n = 200, p = 1000 η 0 = 0.1 : n = 200, p = 1000 η 0 = 0 : n = 200, p = 2000 η 0 = 0.1 : n = 200, p = η 0 = 0 : n = 800, p = η 0 = 0.1 : n = 800, p = η 0 = 0 : n = 800, p = η 0 = 0.1 : n = 800, p = p.32

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Statistica Sinica 18(2008), 1603-1618 ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang, Shuangge Ma and Cun-Hui Zhang University of Iowa, Yale University and Rutgers University Abstract:

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS. November The University of Iowa. Department of Statistics and Actuarial Science

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS. November The University of Iowa. Department of Statistics and Actuarial Science ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Shuangge Ma 2, and Cun-Hui Zhang 3 1 University of Iowa, 2 Yale University, 3 Rutgers University November 2006 The University

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Forward Regression for Ultra-High Dimensional Variable Screening

Forward Regression for Ultra-High Dimensional Variable Screening Forward Regression for Ultra-High Dimensional Variable Screening Hansheng Wang Guanghua School of Management, Peking University This version: April 9, 2009 Abstract Motivated by the seminal theory of Sure

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Sample Size Requirement For Some Low-Dimensional Estimation Problems Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,

More information

arxiv: v1 [math.st] 8 Jan 2008

arxiv: v1 [math.st] 8 Jan 2008 arxiv:0801.1158v1 [math.st] 8 Jan 2008 Hierarchical selection of variables in sparse high-dimensional regression P. J. Bickel Department of Statistics University of California at Berkeley Y. Ritov Department

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

High-dimensional variable selection via tilting

High-dimensional variable selection via tilting High-dimensional variable selection via tilting Haeran Cho and Piotr Fryzlewicz September 2, 2010 Abstract This paper considers variable selection in linear regression models where the number of covariates

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

Bayesian Perspectives on Sparse Empirical Bayes Analysis (SEBA)

Bayesian Perspectives on Sparse Empirical Bayes Analysis (SEBA) Bayesian Perspectives on Sparse Empirical Bayes Analysis (SEBA) Natalia Bochkina & Ya acov Ritov February 27, 2010 Abstract We consider a joint processing of n independent similar sparse regression problems.

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

arxiv: v1 [math.st] 13 Feb 2012

arxiv: v1 [math.st] 13 Feb 2012 Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models

A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models Tze Leung Lai, Stanford University Ching-Kang Ing, Academia Sinica, Taipei Zehao Chen, Lehman Brothers

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual

More information

Sparse survival regression

Sparse survival regression Sparse survival regression Anders Gorst-Rasmussen gorst@math.aau.dk Department of Mathematics Aalborg University November 2010 1 / 27 Outline Penalized survival regression The semiparametric additive risk

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Variable selection using Adaptive Non-linear Interaction Structures in High dimensions

Variable selection using Adaptive Non-linear Interaction Structures in High dimensions Variable selection using Adaptive Non-linear Interaction Structures in High dimensions Peter Radchenko and Gareth M. James Abstract Numerous penalization based methods have been proposed for fitting a

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

Consistent Model Selection Criteria on High Dimensions

Consistent Model Selection Criteria on High Dimensions Journal of Machine Learning Research 13 (2012) 1037-1057 Submitted 6/11; Revised 1/12; Published 4/12 Consistent Model Selection Criteria on High Dimensions Yongdai Kim Department of Statistics Seoul National

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Logistic Regression with the Nonnegative Garrote

Logistic Regression with the Nonnegative Garrote Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Combined l 1 and Greedy l 0 Penalized Least Squares for Linear Model Selection

Combined l 1 and Greedy l 0 Penalized Least Squares for Linear Model Selection Journal of Machine Learning Research 16 2015) 961-992 Submitted 10/13; Revised 11/14; Published 5/15 Combined l 1 and Greedy l 0 Penalized Least Squares for Linear Model Selection Piotr Pokarowski Faculty

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

By Gareth M. James, Jing Wang and Ji Zhu University of Southern California, University of Michigan and University of Michigan

By Gareth M. James, Jing Wang and Ji Zhu University of Southern California, University of Michigan and University of Michigan The Annals of Statistics 2009, Vol. 37, No. 5A, 2083 2108 DOI: 10.1214/08-AOS641 c Institute of Mathematical Statistics, 2009 FUNCTIONAL LINEAR REGRESSION THAT S INTERPRETABLE 1 arxiv:0908.2918v1 [math.st]

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Variable selection and estimation in high-dimensional models

Variable selection and estimation in high-dimensional models Variable selection and estimation in high-dimensional models Joel L. Horowitz Department of Economics, Northwestern University Abstract. Models with high-dimensional covariates arise frequently in economics

More information

Variable Screening in High-dimensional Feature Space

Variable Screening in High-dimensional Feature Space ICCM 2007 Vol. II 735 747 Variable Screening in High-dimensional Feature Space Jianqing Fan Abstract Variable selection in high-dimensional space characterizes many contemporary problems in scientific

More information

High-dimensional variable selection via tilting

High-dimensional variable selection via tilting High-dimensional variable selection via tilting Haeran Cho and Piotr Fryzlewicz September 2, 2011 Abstract This paper considers variable selection in linear regression models where the number of covariates

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Joel L. Horowitz 2, and Shuangge Ma 3 1 Department of Statistics and Actuarial Science, University

More information

arxiv: v1 [math.st] 7 Dec 2018

arxiv: v1 [math.st] 7 Dec 2018 Variable selection in high-dimensional linear model with possibly asymmetric or heavy-tailed errors Gabriela CIUPERCA 1 Institut Camille Jordan, Université Lyon 1, France arxiv:1812.03121v1 [math.st] 7

More information

ON VARYING-COEFFICIENT INDEPENDENCE SCREENING FOR HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS

ON VARYING-COEFFICIENT INDEPENDENCE SCREENING FOR HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS Statistica Sinica 24 (214), 173-172 doi:http://dx.doi.org/1.7/ss.212.299 ON VARYING-COEFFICIENT INDEPENDENCE SCREENING FOR HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS Rui Song 1, Feng Yi 2 and Hui Zou

More information

DASSO: connections between the Dantzig selector and lasso

DASSO: connections between the Dantzig selector and lasso J. R. Statist. Soc. B (2009) 71, Part 1, pp. 127 142 DASSO: connections between the Dantzig selector and lasso Gareth M. James, Peter Radchenko and Jinchi Lv University of Southern California, Los Angeles,

More information

SURE INDEPENDENCE SCREENING IN GENERALIZED LINEAR MODELS WITH NP-DIMENSIONALITY 1

SURE INDEPENDENCE SCREENING IN GENERALIZED LINEAR MODELS WITH NP-DIMENSIONALITY 1 The Annals of Statistics 2010, Vol. 38, No. 6, 3567 3604 DOI: 10.1214/10-AOS798 Institute of Mathematical Statistics, 2010 SURE INDEPENDENCE SCREENING IN GENERALIZED LINEAR MODELS WITH NP-DIMENSIONALITY

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

A Constructive Approach to L 0 Penalized Regression

A Constructive Approach to L 0 Penalized Regression Journal of Machine Learning Research 9 (208) -37 Submitted 4/7; Revised 6/8; Published 8/8 A Constructive Approach to L 0 Penalized Regression Jian Huang Department of Applied Mathematics The Hong Kong

More information

FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL

FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL Statistica Sinica 26 (2016), 881-901 doi:http://dx.doi.org/10.5705/ss.2014.171 FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL COX S MODEL Guangren Yang 1, Ye Yu 2, Runze Li 2 and Anne Buu 3 1 Jinan University,

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Variable selection in high-dimensional regression problems

Variable selection in high-dimensional regression problems Variable selection in high-dimensional regression problems Jan Polish Academy of Sciences and Warsaw University of Technology Based on joint research with P. Pokarowski, A. Prochenka, P. Teisseyre and

More information

arxiv: v2 [math.st] 15 Sep 2015

arxiv: v2 [math.st] 15 Sep 2015 χ 2 -confidence sets in high-dimensional regression Sara van de Geer, Benjamin Stucky arxiv:1502.07131v2 [math.st] 15 Sep 2015 Abstract We study a high-dimensional regression model. Aim is to construct

More information

Math 533 Extra Hour Material

Math 533 Extra Hour Material Math 533 Extra Hour Material A Justification for Regression The Justification for Regression It is well-known that if we want to predict a random quantity Y using some quantity m according to a mean-squared

More information

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION 1

INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION 1 The Annals of Statistics 2015, Vol. 43, No. 3, 1243 1272 DOI: 10.1214/14-AOS1308 Institute of Mathematical Statistics, 2015 INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information