A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Size: px
Start display at page:

Download "A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models"

Transcription

1 A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los Angeles Joint work with with Hanzhong Liu (Tsinghua) and Xin Xu (Yale)

2 Table of contents 1. Introduction 2. Bootstrap Lasso+Partial Ridge 3. Theoretical Results 4. Simulation Results 5. Real Data Applications 6. Conclusions 2

3 Introduction

4 Sparse Linear Models Y = Xβ 0 + ϵ, where ϵ = (ϵ 1,..., ϵ n ) T is a vector of independent and identically distributed (i.i.d.) error random variables with mean 0 and variance σ 2 Y = (y 1,..., y n ) T R n is an n dimensional response vector X = (x T 1,..., xt n ) T = (X 1,..., X p ) R n p is a deterministic or random design matrix, with 1 n n i=1 x ij = 0, j = 1,..., p β 0 R p as a vector of coefficients High dimensionality: n p Sparsity: s = β 0 0 p 4

5 Perspective 1: Sparse Point Estimation (Variable Selection) Penalized Least Squares: 1 ˆβ = arg min β R p 2n Y Xβ p p λ (β j ) j=1 Lasso: p λ (t) = λ t [Tibshirani, 1996] Bridge: p λ (t) = λ t q for 0 < q 2 [Frank and Friedman, 1993] } SCAD: p λ {I(t (t) = λ λ) + (aλ t)+ (a 1)λ I(t > λ) for some a > 2, often, a = 3.7 [Fan and Li, 2001] MCP: p λ (t) = (aλ t) +/a and many others [Bühlmann and van de Geer, 2011, Fan and Lv, 2010] 5

6 Perspective 2: Statistical Inference Question: How to construct confidence intervals and hypothesis tests for individual β j s? Challenge: Inference is difficult for high-dimensional model parameters, because the limiting distribution of common estimators is complicated and hard to compute 6

7 Review: The Lasso Estimator { } 1 ˆβ Lasso = arg min β 2n Y Xβ λ 1 β 1, The limiting distribution of the Lasso is complicated [Knight and Fu, 2000], and the usual residual Bootstrap Lasso fails in estimating the limiting distribution and thus cannot be used to construct valid confidence intervals Various modifications of the Lasso have been proposed to form a valid inference procedure Bootstrap thresholded Lasso [Chatterjee and Lahiri, 2011] Bootstrap Lasso+OLS [Liu and Yu, 2013] De-sparsified (de-biased) Lasso methods [Zhang and Zhang, 2014, Van de Geer et al., 2014, Javanmard and Montanari, 2014] 7

8 Existing Inference Approaches Approaches: Sample splitting based methods [Wasserman and Roeder, 2009] Bootstrap / resampling based methods Perturbation resampling based method for a fixed p [Minnier et al., 2009] Modified residual Bootstrap Lasso method for a fixed p [Chatterjee and Lahiri, 2011] Risidual Bootstrap adaptive Lasso for p at a polynomial rate [Chatterjee and Lahiri, 2013] Residual Boostrap method based on a two stage estimator Lasso+OLS [Liu and Yu, 2013] De-sparsified (de-biased) Lasso methods LDPE [Zhang and Zhang, 2014] JM [Javanmard and Montanari, 2014] Other methods: Post-selection inference [Berk et al., 2013, Lee et al., 2016] Knockoff filter [Barber and Candès, 2015] and others [Dezeure et al., 2014] 8

9 Existing Inference Approaches Approaches: Sample splitting based methods [Wasserman and Roeder, 2009] Bootstrap / resampling based methods Perturbation resampling based method for a fixed p [Minnier et al., 2009] Modified residual Bootstrap Lasso method for a fixed p [Chatterjee and Lahiri, 2011] Risidual Bootstrap adaptive Lasso for p at a polynomial rate [Chatterjee and Lahiri, 2013] Residual Boostrap method based on a two stage estimator Lasso+OLS [Liu and Yu, 2013] De-sparsified (de-biased) Lasso methods LDPE [Zhang and Zhang, 2014] JM [Javanmard and Montanari, 2014] Other methods: Post-selection inference [Berk et al., 2013, Lee et al., 2016] Knockoff filter [Barber and Candès, 2015] and others [Dezeure et al., 2014] 9

10 De-sparsified Lasso Methods De-sparsified Lasso methods aim to remove the biases of the Lasso estimates and produce an asymptotically Normal estimate for each individual parameter Use Lasso to select variables Use Ordinary Least Squares (OLS) to estimate the coefficients of the selected variables Advantages: Do not rely on the beta-min condition: min j:β 0 j 0 β 0 j 1/ n Theoretically proven benchmark for high-dimensional inference Disadvantages: High computational cost Require good estimation of the precision matrix Require s log p/ n 0 as n to remove the asymptotic bias Rely heavily on the sparse linear model assumption and may have poor performance for misspecified models 10

11 Bootstrap Lasso+OLS A two-stage estimator Lasso+OLS: Use Lasso to select variables Use Ordinary Least Squares (OLS) to estimate the coefficients of the selected variables Advantages: Canonical and simple statistical techniques Comparable coverage probabilities and interval lengths as the de-sparsified Lasso methods Disadvantages: Requires hard sparsity (β 0 has at most s (s n) non-zero elements) Poor coverage probabilities for small but non-zero coefficients ([0, 0] confidence intervals in extreme cases) Requires the beta-min condition 11

12 Bootstrap Lasso+Partial Ridge

13 Contribution 1: Hard Sparsity Cliff-weak-sparsity Definition (Cliff-weak-sparsity) β 0 satisfies the cliff-weak-sparsity if its elements can be divided into two groups: the first group has s (s n) large elements with absolute values much larger than 1/ n) the second group contains (p s) small elements with absolute values much smaller than 1/ n) Without loss of generality, we assume β 0 = (β 0 1,..., β0 s, β 0 s+1,..., β0 p) T with β 0 j 1/ n for j = 1,..., s and β 0 j 1/ n for j = s + 1,..., p. Let S = {1,..., s}, and denote β 0 S = (β 0 1,..., β0 s ) 13

14 Contribution 2: Lasso+OLS Lasso+Partial Ridge Estimator Motivation: To increase the variance of our estimates for small coefficients whose corresponding predictors are missed by the Lasso A two-stage estimator Lasso+Partial Ridge (LPR) Use Lasso to select variables Use Partial Ridge to estimate the coefficients Partial Ridge is defined to minimize the empirical l 2 loss with no penalty on the selected predictors but an l 2 penalty on the unselected predictors, so as to reduce the bias of the coefficient estimates of the selected predictors while increasing the variance of the coefficient estimates of the unselected predictors. } Formally, let {j Ŝ = {1, 2,..., p} : ( ˆβ Lasso ) j 0 be the set of selected predictors by the Lasso, then we define the LPR estimator as: 1 ˆβ LPR = arg min β 2n Y Xβ λ 2 βj 2 2. j/ Ŝ 14

15 Approach 1: Residual Bootstrap Lasso+Partial Ridge (rblpr) For a deterministic design matrix X in a linear regression model, the residual Bootstrap is a standard method for constructing confidence intervals. How to define residuals? Lasso or Lasso+OLS or LPR? Simulation suggests the residuals obtained from the Lasso+OLS estimates approximate the true distribution of the error ϵ i s the best Let ˆβ Lasso+OLS denote the Lasso+OLS estimator, where βŝc = { } 1 ˆβ Lasso+OLS = arg min β: βŝc =0 2n Y Xβ 2 2, { } β j : j Ŝ. The residual vector is given by: ˆϵ = (ˆϵ 1,..., ˆϵ n ) T = Y Xˆβ Lasso+OLS 15

16 The rblpr Algorithm Input: Data (X, Y); Confidence level 1 α; Number of replications B Output: (1 α) confidence interval [l j, u j ] for β 0 j, j = 1,..., p Algorithm: 1. Compute the Lasso+OLS estimator ˆβ Lasso+OLS given data (X, Y) 2. Compute residual vector ˆϵ = (ˆϵ 1,..., ˆϵ n ) T = Y Xˆβ Lasso+OLS 3. Re-sample from the empirical distribution of the centered residual {ˆϵ i ˆϵ, i = 1,..., n}, where ˆϵ n = 1 n ˆϵ i, to form ϵ = (ϵ 1,..., ϵ n) T 4. Generate residual Bootstrap response Y rboot = Xˆβ Lasso+OLS + ϵ 5. Compute the residual Bootstrap Lasso (rblasso) estimator β rlasso as { } ˆβ 1 rblasso = arg min β 2n Y rboot Xβ λ 1 β 1, ( ) } and define Ŝ rblasso {j = {1, 2,..., p} : ˆβ rblasso 0 i=1 j 16

17 The rblpr Algorithm Algorithm (Cont d): 6. Compute the residual Bootstrap LPR (rblpr) estimator ˆβ rblpr based on (X, Y rboot ) as ˆβ 1 rblpr = arg min β 2n Y rboot Xβ λ 2 βj 2 2 j/ Ŝ rblasso (1) (B) 7. Repeat steps 3-6 for B times to obtain ˆβ rblpr,, ˆβ rblpr 8. { For each j = 1,..., p, compute } the α/2 and (1 α/2) quantiles of (b) ( ˆβ rblpr ) j, b = 1,..., B and denote them as a j and b j respectively 9. Output l j = ( ˆβ LPR ) j + ( ˆβ Lasso+OLS ) j b j u j = ( ˆβ LPR ) j + ( ˆβ Lasso+OLS ) j a j 17

18 Approach 2: Paired Bootstrap Lasso+Partial Ridge (pblpr) For a random design matrix X in a linear regression model, the paired Bootstrap is a standard method for constructing confidence intervals In the paired Boostrap, one generates a Boostrap sample {(x i, y i ), i = 1,..., n} from the empirical joint distribution of {(x i, y i ), i = 1,..., n} and then computes the estimator based on the Boostrap sample 18

19 The pblpr Algorithm Input: Data (X, Y); Confidence level 1 α; Number of replications B Output: (1 α) confidence interval [l j, u j ] for β 0 j, j = 1,..., p Algorithm: 1. Generate a Bootstrap sample (X pboot, Y pboot ) = {(x i, y i ), i = 1,..., n} from the empirical distribution of {(x i, y i ), i = 1,..., n} 2. Compute the paired Bootstrap Lasso (pblasso) estimator ˆβ pblasso as { } ˆβ 1 pblasso = arg min β 2n Y pboot X pbootβ λ 1 β 1, } and define Ŝ pblasso {j = {1, 2,..., p} : ( ˆβ pblasso ) j 0 19

20 The rblpr Algorithm Algorithm (Cont d): 3. Compute the paired Bootstrap LPR (pblpr) estimator as ˆβ 1 pblpr = arg min β 2n Y pboot X pbootβ λ Repeat steps 1-3 for B times and obtain j/ Ŝ plasso (1) (B) ˆβ pblpr,..., ˆβ pblpr 5. { For each j = 1,..., p, compute } the α/2 and (1 α/2) quantiles of (ˆβ (b) pblpr) j, b = 1,..., B and output them as l j and u j β 2 j 20

21 Theoretical Results

22 Model Selection Consistency of Lasso under Cliff-weak-sparsity Theorem (Model selection consistency of Lasso) Under the cliff-weak-sparsity and other reasonable conditions, we have ( ) P (ˆβ Lasso ) S = s β 0 S, (ˆβ Lasso ) S c = 0 = 1 o(e nc 2 ) 1 as n, where 0 < c 2 < 1. 22

23 Model Selection Consistency of rblasso Let P denote the conditional probability given the data (X, Y). The following Theorem shows that the residual Bootstrap Lasso (rblasso) estimator also has sign consistency under the cliff-weak-sparsity and other appropriate conditions. Theorem (Model selection consistency of rblasso) Under the cliff-weak-sparsity and other reasonable conditions, the residual Bootstrap Lasso estimator has sign consistency, i.e., ( ) P (ˆβ rblasso) S = s β 0 S, (ˆβ rblasso) S c = 0 = 1 o p (e nc 2 ) 1 as n, where 0 < c 2 < 1. 23

24 Convergence in Distribution By the above two Theorems and under the orthogonality condition on the design matrix X, we can show that the residual Bootstrap LPR (rblpr) can consistently estimate the distribution of ˆβ LPR and hence can construct asymptotically valid confidence intervals for β 0 Theorem Under reasonable conditions and the orthogonality of X, for any u R p with u 2 = 1 and max 1 i n ut x i = o( n), we have d (L n, L n ) P 0, where L n is the conditional distribution of nu T (ˆβ rblpr ˆβ Lasso+OLS ) given ϵ, L n is the distribution of nu T (ˆβ LPR β 0 ), and d denotes the Kolmogorov-Simirnov distance (sup norm between the distribution functions). 24

25 Simulation Results

26 Simulation Setups: Generative Model 1 We consider two generative models for data simulation: 1. Linear regression model. The simulated data are drawn from the linear model: y i = x T i β 0 + ϵ i, ϵ i N(0, σ 2 ), i = 1,..., n. We fix n = 200 and p = 500. We generate the design matrix X in three scenarios (using the R package mvtnorm ). 26

27 Simulation Setups: Three Scenarios to Generate X Scenario 1 (Normal): x i i.i.d. N(0, Σ), i = 1,..., n We consider three types of Σ [Dezeure et al., 2014]: Toeplitz : Σ ij = ρ i j with ρ = 0.5, 0.9 Exponential decay : (Σ 1 ) ij = ρ i j with ρ = 0.5, 0.9 Equal correlation : Σ ij = ρ with ρ = 0.5,

28 Simulation Setups: Three Scenarios to Generate X Scenario 1 (Normal): x i i.i.d. N(0, Σ), i = 1,..., n We consider three types of Σ [Dezeure et al., 2014]: Toeplitz : Σ ij = ρ i j with ρ = 0.5, 0.9 Exponential decay : (Σ 1 ) ij = ρ i j with ρ = 0.5, 0.9 Equal correlation : Σ ij = ρ with ρ = 0.5, 0.9 Scenario 2 (t 2 ): x i i.i.d. t 2 (0, Σ), i = 1,..., n with the Toeplitz matrix Σ: Σ ij = ρ i j, where ρ = 0.5, 0.9 In Scenarios 1 and 2, we choose σ such that the Signal-to-Noise-Ratio (SNR) = Xβ /(nσ2 ) =

29 Simulation Setups: Three Scenarios to Generate X Scenario 1 (Normal): x i i.i.d. N(0, Σ), i = 1,..., n We consider three types of Σ [Dezeure et al., 2014]: Toeplitz : Σ ij = ρ i j with ρ = 0.5, 0.9 Exponential decay : (Σ 1 ) ij = ρ i j with ρ = 0.5, 0.9 Equal correlation : Σ ij = ρ with ρ = 0.5, 0.9 Scenario 2 (t 2 ): x i i.i.d. t 2 (0, Σ), i = 1,..., n with the Toeplitz matrix Σ: Σ ij = ρ i j, where ρ = 0.5, 0.9 In Scenarios 1 and 2, we choose σ such that the Signal-to-Noise-Ratio (SNR) = Xβ /(nσ2 ) = 10. Scenario 3 (fmri data): A design matrix X is generated by random sampling without replacement from the real design matrix in the functional Magnetic Resonance Imaging (fmri) data [Kay et al., 2008]. Every column of X is normalized to have zero mean and unit variance, and we choose σ such that SNR = 1, 5 or 10 27

30 Simulation Setups: Two Cases to Generate β 0 Case 1 (hard sparsity): β 0 has 10 nonzero elements whose indices are randomly sampled without replacement from {1, 2,..., p} and whose values are generated from U[1/3, 1], a uniform distribution on the interval [1/3, 1]. The remaining 490 elements are set to 0. Case 2 (weak sparsity): The setup is similar to the paper [Zhang and Zhang, 2014]. β 0 has 10 large elements whose indices are randomly sampled without replacement from {1, 2,..., p} and whose values are generated from a normal distribution N(1, 0.001). The remaining 490 elements decay at a rate of 1/(j + 3) 2, i.e., β 0 j = 1/(j + 3) 2. 28

31 Simulation Setups X and β 0 are generated once and then kept fixed We simulate Y = (y 1,..., y n ) T from the linear model by generating independent error terms for 1000 replications We construct confidence intervals for each individual regression coefficient and compute their coverage probabilities and mean interval lengths 29

32 Simulation Setups: Generative Model 2 2. Misspecified linear model. Let X and Y f denote the design matrix (with n = 1750 and p = 2000) from the fmri data set. We first compute the Lasso+OLS estimator β f Lasso+OLS (selecting the tuning parameter λ 1 by 5-fold cross validation on Lasso+OLS): β f Lasso = arg min β β f Lasso+OLS = { 1 2n Yf Xβ λ 1 β 1 1 arg min β:β j=0, j/ S 2n Yf Xβ 2 2, where S = { j : (β f Lasso ) j 0 } is the relevant predictor set } 30

33 Simulation Setups: Generative Model 2 Then we generate the simulated response Y = (y 1,..., y n ) T from the following model: where y i = E(y i x i ) + ϵ i, ϵ i N(0, σ 2 ) 4 E(y i x i ) = xi T β f Lasso+OLS + α j x 2 ij + j=1 1 j<k 4 α jk x ij x ik, α j, j = 1,..., 4 and α jk, 1 j k 4 are independently generated from a uniform distribution U(0, 0.1) The values of α j s and α jk s are generated once and then kept fixed We set σ such that SNR = n i=1 E(y i x i ) 2 /(nσ 2 ) = 1, 5 or 10 31

34 Selection of the Partial Ridge tuning parameter λ 2 Coverage probability Interval length coverage probability interval length λ 2 = 0.1/n λ 2 = 0.5/n λ 2 = 1/n λ 2 = 5/n λ 2 = 10/n Figure 1: The effects of λ 2 on coverage probabilities and mean confidence interval lengths. The predictors are generated from a Normal distribution in Scenario 1 with a Toeplitz covariance matrix and ρ = 0.5. The coefficient vector β 0 is hard sparse. 32

35 pblpr (rblpr) vs. pblasso+ols (rblasso+ols) coverage probability coverage probability hard sparsity; ρ=0.5 hard sparsity; ρ=0.9 weak sparsity; ρ=0.5 weak sparsity; ρ=0.9 interval length hard sparsity; ρ=0.5 hard sparsity; ρ=0.9 weak sparsity; ρ=0.5 weak sparsity; ρ=0.9 interval length pblpr rblpr pblassools rblassools Figure 2: The design matrix is generated from a Normal distribution with a Toeplitz type covariance matrix. 33

36 pblpr vs. De-sparsified Lasso Methods coverage probability hard sparsity; ρ=0.5 coverage probability hard sparsity; ρ=0.9 weak sparsity; ρ=0.5 weak sparsity; ρ=0.9 hard sparsity; ρ=0.5 interval length hard sparsity; ρ=0.9 weak sparsity; ρ=0.5 weak sparsity; ρ=0.9 interval length pblpr LDPE JM Figure 3: The design matrix is generated from a Normal distribution with a Toeplitz type covariance matrix. 34

37 pblpr vs. De-sparsified Lasso Methods coverage probability hard sparsity; ρ=0.5 coverage probability hard sparsity; ρ=0.9 weak sparsity; ρ=0.5 weak sparsity; ρ=0.9 hard sparsity; ρ=0.5 interval length hard sparsity; ρ=0.9 weak sparsity; ρ=0.5 weak sparsity; ρ=0.9 interval length pblpr LDPE JM Figure 4: The design matrix is generated from a Normal distribution with a Equi.corr type covariance matrix. 35

38 pblpr vs. De-sparsified Lasso Methods coverage probability hard sparsity; ρ=0.5 coverage probability hard sparsity; ρ=0.9 weak sparsity; ρ=0.5 weak sparsity; ρ=0.9 interval length hard sparsity; ρ=0.5 hard sparsity; ρ=0.9 weak sparsity; ρ=0.5 weak sparsity; ρ=0.9 interval length pblpr LDPE JM Figure 5: The design matrix is generated from a t 2 distribution with a Toeplitz type covariance matrix. 36

39 pblpr vs. De-sparsified Lasso Methods pblpr LDPE JM Figure 6: This plot is for hard sparsity and a Normal design matrix with a Toeplitz type covariance matrix and ρ =

40 pblpr vs. De-sparsified Lasso Methods pblpr LDPE JM Figure 7: This plot is for hard sparsity and a Normal design matrix with a Toeplitz type covariance matrix and ρ =

41 pblpr vs. De-sparsified Lasso Methods pblpr LDPE JM Figure 8: This plot is for weak sparsity and a Normal design matrix with a Toeplitz type covariance matrix and ρ =

42 pblpr vs. De-sparsified Lasso Methods pblpr LDPE JM Figure 9: This plot is for weak sparsity and a Normal design matrix with a Toeplitz type covariance matrix and ρ =

43 pblpr vs. De-sparsified Lasso Methods as SNR Changes coverage probability 0.95 SNR=0.5 SNR=1 SNR=5 SNR=10 coverage probability SNR=0.5 SNR=1 interval length SNR=5 SNR=10 interval length pblpr LDPE JM Figure 10: This plot is for hard sparsity and a Normal design matrix with a Toeplitz type covariance matrix and ρ =

44 pblpr vs. De-sparsified Lasso Methods for the Misspecified Model SNR=1 SNR=5 SNR=10 coverage probability coverage probability coverage probability SNR=1 SNR=5 SNR=10 pblpr LDPE JM interval length interval length interval length Figure 11: The results is based on data simulated from the misspecified model. 42

45 Real Data Applications

46 fmri Data The 95% confidence intervals constructed by pblpr, LDPE and JM cover 95.8%, 97% and 99.6% of the 500 components of β 0, respectively. pblpr LDPE JM Figure 12: Comparison of interval lengths produced by pblpr, LDPE and JM. The plot is generated using the ninth voxel as the response. 44

47 Conclusions

48 Contributions 1. Our proposed Bootstrap LPR method relaxes the beta-min condition required by the Bootstrap Lasso+OLS method. 46

49 Contributions 1. Our proposed Bootstrap LPR method relaxes the beta-min condition required by the Bootstrap Lasso+OLS method. 2. We conduct comprehensive simulation studies to evaluate the finite sample performance of the Bootstrap LPR method for both sparse linear models and misspecified models. Our main findings include: 46

50 Contributions 1. Our proposed Bootstrap LPR method relaxes the beta-min condition required by the Bootstrap Lasso+OLS method. 2. We conduct comprehensive simulation studies to evaluate the finite sample performance of the Bootstrap LPR method for both sparse linear models and misspecified models. Our main findings include: Compared with Bootstrap Lasso+OLS, Bootstrap LPR improves the coverage probabilities of 95% confidence intervals by about 50% on average for small but non-zero regression coefficients, at the price of 15% heavier computational burden. 46

51 Contributions 1. Our proposed Bootstrap LPR method relaxes the beta-min condition required by the Bootstrap Lasso+OLS method. 2. We conduct comprehensive simulation studies to evaluate the finite sample performance of the Bootstrap LPR method for both sparse linear models and misspecified models. Our main findings include: Compared with Bootstrap Lasso+OLS, Bootstrap LPR improves the coverage probabilities of 95% confidence intervals by about 50% on average for small but non-zero regression coefficients, at the price of 15% heavier computational burden. Compared with two de-sparsified Lasso methods, LDPE and JM, Bootstrap LPR has comparably good coverage probabilities for large and small regression coefficients, and in some cases outperforms LDPE and JM by producing conference intervals with more than 50% shorter interval lengths on average. Moreover, Bootstrap LPR is more than 30% faster than LDPE and JM and is robust to model misspecification. 46

52 Contributions 3. We extend model selection consistency of the Lasso from the hard sparsity case [Zhao and Yu, 2006, Wainwright, 2009], where the parameter β 0 is assumed to be exactly sparse (β 0 has s (s n) non-zero elements with absolute values larger than 1/ n), to a more general cliff-weak-sparsity case. Under the irrepresentable condition and other reasonable conditions, we show that the Lasso can correctly select all the large elements of β 0 while shrinking all the small elements to zero. 47

53 Contributions 3. We extend model selection consistency of the Lasso from the hard sparsity case [Zhao and Yu, 2006, Wainwright, 2009], where the parameter β 0 is assumed to be exactly sparse (β 0 has s (s n) non-zero elements with absolute values larger than 1/ n), to a more general cliff-weak-sparsity case. Under the irrepresentable condition and other reasonable conditions, we show that the Lasso can correctly select all the large elements of β 0 while shrinking all the small elements to zero. 4. We develop an R package HDCI to implement the Bootstrap Lasso, the Bootstrap Lasso+OLS and our new Bootstrap LPR methods. This package makes these methods easily accessible to practitioners. 47

54 Paper A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models by Hanzhong Liu, Xin Xu, and Jingyi Jessica Li jli@stat.ucla.edu 48

55 References I Barber, R. F. and Candès, E. J. (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43: Berk, R., Brown, L., Buja, A., Zhang, K., and Zhao, L. (2013). Valid post-selection inference. The Annals of Statististics, 41: Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer. Chatterjee, A. and Lahiri, S. N. (2011). Bootstrapping lasso estimators. Journal of the American Statistical Association, 106:

56 References II Chatterjee, A. and Lahiri, S. N. (2013). Rates of convergence of the adaptive lasso estimators to the oracle distribution and higher order refinements by the bootstrap. The Annals of Statistics, 41: Dezeure, R., Bühlmann, P., Meier, L., and Meinshausen, N. (2014). High-dimensional inference: Confidence intervals, p-values and r-software hdi. Statistical Science, 30: Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):

57 References III Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20: Frank, L. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics, 35(2): Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. The Journal of Machine Learning Research, 15: Kay, K. N., Naselaris, T., Prenger, R. J., and Gallant, J. L. (2008). Identifying natural images from human brain activity. Nature, 452:

58 References IV Knight, K. and Fu, W. J. (2000). Asymptotics for lasso-type estimators. The Annals of Statistics, 28: Lee, J. D., Sun, D. L., Sun, Y., and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso. The Annals of Statistics, 44(3): Liu, H. and Yu, B. (2013). Asymptotic properties of lasso+mls and lasso+ridge in sparse high-dimensional linear regression. Electronic Journal of Statistics, 7: Minnier, J., Tian, L., and Cai, T. (2009). A perturbation method for inference on regularized regression estimates. Journal of the American Statistical Association, 106:

59 References V Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58: Van de Geer, S., Bühlmann, P., Ritov, Y., Dezeure, R., et al. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42(3): Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using-constrained quadratic programming (lasso). IEEE transactions on information theory, 55: Wasserman, L. and Roeder, K. (2009). High dimensional variable selection. The Annals of Statistics, 37:

60 References VI Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society Series B, 76(1): Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7:

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Sample Size Requirement For Some Low-Dimensional Estimation Problems Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Uncertainty quantification in high-dimensional statistics

Uncertainty quantification in high-dimensional statistics Uncertainty quantification in high-dimensional statistics Peter Bühlmann ETH Zürich based on joint work with Sara van de Geer Nicolai Meinshausen Lukas Meier 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Knockoffs as Post-Selection Inference

Knockoffs as Post-Selection Inference Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017 Controlled Variable Selection Conditional modeling setup:

More information

On Model Selection Consistency of Lasso

On Model Selection Consistency of Lasso On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Statistica Sinica 18(2008), 1603-1618 ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang, Shuangge Ma and Cun-Hui Zhang University of Iowa, Yale University and Rutgers University Abstract:

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Package Grace. R topics documented: April 9, Type Package

Package Grace. R topics documented: April 9, Type Package Type Package Package Grace April 9, 2017 Title Graph-Constrained Estimation and Hypothesis Tests Version 0.5.3 Date 2017-4-8 Author Sen Zhao Maintainer Sen Zhao Description Use

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

P-Values for High-Dimensional Regression

P-Values for High-Dimensional Regression P-Values for High-Dimensional Regression Nicolai einshausen Lukas eier Peter Bühlmann November 13, 2008 Abstract Assigning significance in high-dimensional regression is challenging. ost computationally

More information

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo) Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: 1304.0282 Victor MIT, Economics + Center for Statistics Co-authors: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Statistica Sinica Preprint No: SS R3

Statistica Sinica Preprint No: SS R3 Statistica Sinica Preprint No: SS-2015-0413.R3 Title Regularization after retention in ultrahigh dimensional linear regression models Manuscript ID SS-2015-0413.R3 URL http://www.stat.sinica.edu.tw/statistica/

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

Relaxed Lasso. Nicolai Meinshausen December 14, 2006

Relaxed Lasso. Nicolai Meinshausen December 14, 2006 Relaxed Lasso Nicolai Meinshausen nicolai@stat.berkeley.edu December 14, 2006 Abstract The Lasso is an attractive regularisation method for high dimensional regression. It combines variable selection with

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Goodness-of-fit tests for high dimensional linear models

Goodness-of-fit tests for high dimensional linear models J. R. Statist. Soc. B (2018) 80, Part 1, pp. 113 135 Goodness-of-fit tests for high dimensional linear models Rajen D. Shah University of Cambridge, UK and Peter Bühlmann Eidgenössiche Technische Hochschule

More information

An ensemble learning method for variable selection

An ensemble learning method for variable selection An ensemble learning method for variable selection Vincent Audigier, Avner Bar-Hen CNAM, MSDMA team, Paris Journées de la statistique 2018 1 / 17 Context Y n 1 = X n p β p 1 + ε n 1 ε N (0, σ 2 I) β sparse

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

A UNIFIED APPROACH TO MODEL SELECTION AND SPARS. REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009)

A UNIFIED APPROACH TO MODEL SELECTION AND SPARS. REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009) A UNIFIED APPROACH TO MODEL SELECTION AND SPARSE RECOVERY USING REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009) Mar. 19. 2010 Outline 1 2 Sideline information Notations

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

A knockoff filter for high-dimensional selective inference

A knockoff filter for high-dimensional selective inference 1 A knockoff filter for high-dimensional selective inference Rina Foygel Barber and Emmanuel J. Candès February 2016; Revised September, 2017 Abstract This paper develops a framework for testing for associations

More information

Distribution-Free Predictive Inference for Regression

Distribution-Free Predictive Inference for Regression Distribution-Free Predictive Inference for Regression Jing Lei, Max G Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman Department of Statistics, Carnegie Mellon University Abstract We

More information

arxiv: v1 [math.st] 27 May 2014

arxiv: v1 [math.st] 27 May 2014 The Annals of Statistics 2014, Vol. 42, No. 2, 518 531 DOI: 10.1214/14-AOS1175REJ Main article DOI: 10.1214/13-AOS1175 c Institute of Mathematical Statistics, 2014 REJOINDER: A SIGNIFICANCE TEST FOR THE

More information

PENALIZED PRINCIPAL COMPONENT REGRESSION. Ayanna Byrd. (Under the direction of Cheolwoo Park) Abstract

PENALIZED PRINCIPAL COMPONENT REGRESSION. Ayanna Byrd. (Under the direction of Cheolwoo Park) Abstract PENALIZED PRINCIPAL COMPONENT REGRESSION by Ayanna Byrd (Under the direction of Cheolwoo Park) Abstract When using linear regression problems, an unbiased estimate is produced by the Ordinary Least Squares.

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

Sure Independence Screening

Sure Independence Screening Sure Independence Screening Jianqing Fan and Jinchi Lv Princeton University and University of Southern California August 16, 2017 Abstract Big data is ubiquitous in various fields of sciences, engineering,

More information

Exact Post Model Selection Inference for Marginal Screening

Exact Post Model Selection Inference for Marginal Screening Exact Post Model Selection Inference for Marginal Screening Jason D. Lee Computational and Mathematical Engineering Stanford University Stanford, CA 94305 jdl17@stanford.edu Jonathan E. Taylor Department

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) Electronic Journal of Statistics Vol. 0 (2010) ISSN: 1935-7524 The adaptive the thresholded Lasso for potentially misspecified models ( a lower bound for the Lasso) Sara van de Geer Peter Bühlmann Seminar

More information

High Dimensional Propensity Score Estimation via Covariate Balancing

High Dimensional Propensity Score Estimation via Covariate Balancing High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)

More information

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

Effect of outliers on the variable selection by the regularized regression

Effect of outliers on the variable selection by the regularized regression Communications for Statistical Applications and Methods 2018, Vol. 25, No. 2, 235 243 https://doi.org/10.29220/csam.2018.25.2.235 Print ISSN 2287-7843 / Online ISSN 2383-4757 Effect of outliers on the

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information