arxiv: v1 [stat.me] 30 Dec 2017

Size: px
Start display at page:

Download "arxiv: v1 [stat.me] 30 Dec 2017"

Transcription

1 arxiv: v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng and Tzee-Ming Huang and Su-Yun Huang In the last ten years, the problem of variable selection has received increasing attention, especially in the analysis of large scale data sets such as gene data sets (e.g. [2]). In this paper, we consider the variable selection problem in the linear regression model p Y = β j X j +ε, j=1 where Y is the response variable, X 1,..., X p are p predictors, and ε is the error, which is of mean zero. Suppose that we have n independent observations for(y,x 1,...,X p ). When the sample size n is greaterthan p, it is easyto obtain an estimator for β = (β 1,..., β p ) T using least square approach. However, when p n, finding an estimator for β is a challenging problem since the usual least square approach fails. To overcome the estimation difficulty for the large p small n case, it is common to assume that only few predictors are influential in the model ([4], [6]) and then perform variable selection. Many variable selection methods for linear regression model can be found in the literature. For some of these methods, variable selection and parameter estimation are performed simultaneously, such as forward regression, selection based on criteria like BIC or AIC, LASSO and it variants. Alternatively, one can first perform variable screening without estimating the parameter vector β to reduce the number of predictors, and then perform usual variable selection Corresponding author. 1

2 Cheng and Huang and Huang/ISIS variable selection 2 methods. An example of screening method is SIS (sure independence screening), proposed by Fan and Lv [1], where predictors with large absolute sample correlations to the response variable are selected. Another example of screening method, SIRS (sure independent ranking and screening) is proposed by Zhu et al. [5], which can be used for variable selection for many extensions of linear regression models. When p is very large, it is suitable to include the screening step as a part of the variable selection process so that the selection process can be done efficiently. Fan and Lv [1] also propose an extension of SIS called ISIS (iterative sure independence screening) to improve selection accuracy. In each iterative step in ISIS, SIS is performed to reduce the number of predictors and then an additional variable selection method, such as adaptive LASSO or SCAD, is applied to the predictors obtained from SIS. Here the screening is based on the absolute sample correlation between a predictor and the residuals obtained from regressing the response on the predictors selected in the previous iteration. The iterative procedure stops until a pre-specified number of predictors are selected. While ISIS performs much better than SIS according to the simulation results in [1], its implementation requires one to determine the number of predictors to be selected, as well as the tuning parameter(s) in the additional variable selection method, which may take extra efforts. For instance, if one would like to use ISIS with LASSO, and select the λ parameter by cross-validation in each iteration, then this task can be computationally expansive. To solve the difficulty in the implementation of ISIS, we propose an iterative SIS procedure with specific screening threshold in each iteration and the iteration stops automatically, where the threshold in the first iteration is based on the (1 α) quantile G 1 (1 α Ỹ,F 1,...,F q ), where G( Ỹ,F 1,...,F q ) is the cumulative distribution function for the maximum of absolute sample correlations between Ỹ: the observation vector for the response variable and observations for q variables that are independent of the response variable, where the q variables are independent and have CDF s F 1,..., F q respectively. When p is not too large, we take q = p and take F j to be ˆF Xj, the empirical CDF of X j, for j = 1,..., p for the first iteration. If one uses the quantile G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) asscreening threshold for SIS, then one can compare the effect of a single predictor (measured in absolute correlation to the response) to the maximum effect of p unimportant predictors. If a predictor has stronger effect than the maximum effect of unimportant predictors, then the predictor can be important. By using G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) as screening threshold, we expect that all unimportant predictors are screened out with probability at least (1 α), so there is a good chance to remove unimportant predictors effectively. In our proposed procedures, we use approximation of G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) as screening threshold with α = 0.5 instead of a very small value, so that predictors with weaker association with the response are not easily screened out. Zhu et al. [5] also use a similar approach to determine the threshold for

3 Cheng and Huang and Huang/ISIS variable selection 3 their SIRS method. They propose a predictor ranking score to measure variable contribution and use a random threshold that is the maximum score for unimportant variables, which include d auxiliary variables that are generated independently from the data. Their d auxiliary variables play the same role as our q variables with cumulative distribution functions F 1,..., F q respectively that are independent of the response variable. When p is very large, the quantile G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) can be very large since it increases with p. As a result, we may fail to detect some important predictors using an approximation of G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) as the screening threshold in the first iteration. To solve this problem, we propose the second screening procedure, which includes a partitioning step that divides predictors into smaller subgroups and performs screening on subgroups. Simulation results show that the second screening procedure works better than the first one when p is large. The rest of the paper is organized as follows. In Section 2, we provide a theorem that gives justification for using G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) as screening threshold for SIS when p is not very large, and describe our two screening procedures, which are designed for moderate p and very large p respectively. Simulation results are given in Section 3. The proof of the theorem is given in Section Screening based on maximum absolute correlation In this section, we first provide in Section 2.1 a theorem that gives justification for using G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) as the screening threshold in SIS without iteration. Next, we state in Section 2.2 our first screening procedure, which is an iterative SIS procedure, where in the first iteration, an approximation of G 1 (0.5 Ỹ, ˆF X1,..., ˆF Xp ) is used the screening threshold. Our first screening procedure works for p = O(n δ ) for some δ (0,2). When p is very large, we propose the second screening procedure, which includes an additional partitioning step to divide predictors into smaller subgroups. The second screening procedure is described in Section SIS using G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) as threshold In this section, we consider the case where p = O(n δ ) for some δ (0,2) and all p predictors are independent. We will first state Theorem 1, which justifies the use of G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) as the screening threshold in SIS without iteration. Theorem 1. Suppose that max 1 j p ( 1 n n ( ) ) 6 Xi,j µ j,n = O p (1), σ j,n

4 Cheng and Huang and Huang/ISIS variable selection 4 where X i,j is the i-th observation for X j, µ j,n = n X i,j/n and σ 2 j,n = n (X i,j µ j,n ) 2 /n. Suppose that p Cn δ for some constant C > 0 and 0 < δ < 2, then for t > 0, G(t Ỹ, ˆF X1,..., ˆF Xp ) 1 in probability as n. The proof of Theorem 1 is given in Section 5. Note that under the conditions intheorem1,ifthenumberofimportantpredictorsiso(1),thenthereexistst > 0 such that the event that the sample correlations between Ỹ and all important predictors exceed t, denoted by A n (t), occurs with probability tending to one as n, and when A n (t) occurs, all the important predictors are selected if G(t Ỹ, ˆF X1,..., ˆF Xp ) > 1 α. Therefore, under the conditions in Theorem 1, if the number of important predictors is O(1), then all the important predictors are selected with probability tending to one as n. To evaluate G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ), we consider two approaches: bootstrap approximation and normal approximation. For bootstrap approximation, for j = 1,..., p, we first obtain m bootstrap samples of size n based on the observations for X j, and compute the sample correlations to Ỹ to obtain m maximum absolute correlations. Compute the (1 α) quantile of the m maximum absolute correlations and denote it by G 1 boot (1 α Ỹ, ˆF X1,..., ˆF Xp ). Then G 1 boot (1 α Ỹ, ˆF X1,..., ˆF Xp )isusedtoapproximateg 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ). When n is large, we consider using normal approximation when evaluating G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ). In such case, since the distribution of the sample correlation between an unimportant predictor and Ỹ can be approximated by N(0,1/n), the resulting approximation for G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) is given by z(n,p,α) def = Φ 1( ) (1 (1 α) 1/p ) n. Note that under the condition that p Cn δ for δ (0,2), it can be shown that lim n z(n,p,α) = 0, so the sample correlation between Y and an important predictor also exceeds z(n,p,α) with probability tending to one as n and we do not need to be concerned about screening out important predictors when using z(n, p, α) as the screening threshold Iterative screening when p is O(n δ ) When the predictors are correlated, it is possible that some important predictors are only weakly correlated with the response, but they may have stronger association with the response jointly. In such case, performing iterative screening can help improve the variable selection ability of screening. This point is mentioned in both [1] and [5]. In this section, we will describe our first screening procedure, which is an iterative SIS procedure. The algorithm is stated as

5 Cheng and Huang and Huang/ISIS variable selection 5 follows. Algorithm 1 (for moderate p): Suppose that X 1,..., X p are predictors and Y is the response. Let S all denote the collection of all p predictors. (1) Calculate the sample correlation between the observations for Y and that for X j and denote it by ρ j for j = 1,..., p. (2) For j = 1,..., p, X j is considered an important predictor if ρ j > approximate G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ), where α (0,1) is pre-specified. (3) Let S sel denote the set of the selected predictors. Let S c sel = S all \ S sel be the set of predictors that are not yet selected. Suppose that there are q predictors Z 1,..., Z q in S c sel. Regress Y on all the predictors in S sel based on the observations and let y resid denote the residuals. Calculate the sample correlation between y resid and Z j and denote it by ρ j for j = 1,...q. (4) Z j is considered an important predictor if ρ j > approximate G 1 (1 α y resid, ˆF Z1,..., ˆF Zq ), and we add Z j to S sel. Here for j = 1,..., q, ˆFZj denote the empirical cumulative distribution function of Z j. (5) Carry out Steps (3) and (4) until no more important predictors can be obtained or the number of selected predictors is greater than or equal to n. (6) The predictors selected are the predictors in S sel. Note. As mentioned in the previous section, G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) is approximated using bootstrap approximation when n is small, and is approximated using the normal approximation z(n, p, α) for large n. The approximation of G 1 (1 α y resid, ˆF Z1,..., ˆF Zq ) is obtained similarly. In our simulation studies, we use normal approximation for n 200 and bootstrap approximation for n < Large p case When p is very large, the screening threshold G 1 (1 α Ỹ, ˆF X1,..., ˆF Xp ) or z(n,p,α) can be improper. In such case, we can partition the predictors into smaller subgroups and perform screening on subgroups. Indeed, {X 1,...,X p } will be randomly partitioned into k disjoint subsets of sizes p 1,..., p k respectively several times, where p 1,..., p k are approximately equal and each p i does not exceed n 1.99 for i = 1,..., k. Below is our second iterative screening procedure, where the random partition will be carried out T times. Here we assume n 200 and normal approximation is used when evaluating the screening

6 Cheng and Huang and Huang/ISIS variable selection 6 threshold. Algorithm 2 (for large p): Suppose that X 1,..., X p are predictors and Y is the response. (1) Let Ω = {X 1,...,X p }. For t = 1,..., T, carry out the tasks in (a) and (b) below: (a) Partition Ω into k disjoint several subsets Π 1,..., Π k, where Π i is of size p i for i = 1,..., k. (b) i. Take S kernel =, S sel = and y resid be the observations for Y. ii. For i = 1,..., k, let Π i = Π i S kernel and let Z i,1,...,z i,p i denote the predictors in Π i. Calculate the sample correlation between y resid and Z i,j and denote it by ρ i,j for j = 1,...p i. Let A i = {Z i,j : ρ i,j > z(n,p i,α),j {1,...,p i}}. Calculate the adjusted R-squared when regressing Y on predictors in A i S kernel. Add the predictors in A i to S sel. iii. Let A be the A i with the largest adjusted R-squared in Step iii among A 1,..., A k. Add the predictors in A to S kernel. iv. Take y resid to be the residuals obtained by regressing Y on all the predictors in S kernel. v. Repeat Steps ii iv until no more predictors are added to S sel, the largest adjusted R-squared does not increase, or the number of predictors in S sel exceeds n. vi. Take S t = S sel. (2) The predictors selected are the predictors in S 1... S T. 3. Simulation studies In this section, we carry out two sets of simulation studies to check the performance for the proposed screening procedures in Sections 2.2 and 2.3 respectively. Throughout this section, the p predictors X 1,...,X p are generated from N(0,1) with specific covariance matrix Σ, and for each predictor, n observations are generated. Here we consider Σ {Σ 1,Σ 2,Σ 3 }, where Σ 1 is an identity matrix, and the remaining covariance matrices corresponding two different dependent structures for the predictors, which will be described later. Suppose that X = (X 1,...,X p ) n p, the response Y is given by Y = Xβ +ε, (1) where β T = (1 + β 1,...,1 + β 10,0,...,0) is the coefficient vector and β i s are generated from the uniform distribution on [ 0.5, 0.5]. Elements in the error

7 Cheng and Huang and Huang/ISIS variable selection 7 vector ε are generated independently from N(0,σ 2 ), where variance σ 2 is set such that E ( Var(Xβ β) ) E ( Var(Xβ β) ) +σ 2 = r. Note that when β is not random, r is the sum of squares for the linear model in (1), which is used in [5] to indicate the relative noise level in the model. In order to evaluate the selection accuracy for a given variable selection method result Ŝ, we use the following accuracy measure #(S I Ŝ) #S I, (2) where #S denotes the number of elements in a set S, Ŝ is the collection of the variables selected by the selection method, and S I is the collection of all influential variables. In our simulation studies, S I = {X 1,...,X 10 } and #S I = 10. We will compare the performances of LASSO, adaptive LASSO, ISIS-SCAD and our screening approach. For LASSO and adaptive LASSO approaches, the L 1 penalty is multiplied by a positive parameter λ, which can be determined using cross-validation. However, according to our experience, using crossvalidation to choose λ for LASSO often leads to choosing too many predictors, so in our simulation studies, the penalty parameter λ for LASSO/adaptive LASSO is chosen so that the number of predictors selected by LASSO/adaptive LASSO is approximately the same (or a little larger) as the number of predictors selected by our method, if our method selects at least 10 predictors. In cases where our method selects less than 10 predictors, the penalty parameter λ for LASSO/adaptive LASSO is chosen so that the number of predictors selected by LASSO/adaptive LASSO is approximately 10(or a little larger). When performing ISIS-SCAD in the following studies, we use the function SIS in R package SIS and all default settings are used exepct that the method for tuning the regularization parameter is SCAD p is O(n δ ) In this section, we present simulation results for the case where p is less than n 2. We carry out Algorithm 1 in Section 2.2 for four combinations of n and p: (n, p) {(100, 2000),(200, 34000),(300, 75000),(400, )}, where p is closed to n 1.97 in the latter three cases. In the proposed approach, we use different screening thresholds for different sample sizes. When the sample size n is small (n < 200), we use the approximate quantile based on bootstrap as the screening threshold. When n is greaterthan or equal to 200, we use z(n,p,α) with α = 0.5 as the screening threshold. We carry out 500 replications for each combination of (n,p). Below we will present results for Σ = I p p, Σ 2 and Σ 3 respectively. We first considerthe case Σ = I p p, where all the predictors areindependent. Table 1 shows the averages for accuracy measure in (2) based on 500 trials for

8 Cheng and Huang and Huang/ISIS variable selection 8 Algorithm 1, LASSO, adaptive LASSO and ISIS-SCAD. The medians for the numbers of selected predictors are also included in parentheses. Based on Table 1, we find that the performance of our Algorithm 1 is better than those of LASSO, adaptive LASSO and ISIS-SCAD for all two r values, especially when the sample size n is small. r n = 100 n = 200 n = 300 n = Algorithm (12) (12) 1(12) 1(12) LASSO (12) (12) (12) (12) ad. LASSO (12) (12) (12) (12) ISIS-SCAD (21) (37) (52) (66) Algorithm (13) 1(12) 1(11) 1(12) LASSO (13) (12) (11) 1(12) ad. LASSO (13) (12) (11) (12) ISIS-SCAD (21) (37) (52) 1(66) Table 1 Screening results when Σ = I p p Next, we consider cases where predictors are dependent. Two different covariance matrices for predictors, Σ 2 and Σ 3, are considered. The matrix Σ 2 is a p p matrix with the (i,j)-th element = 0.75 i j for 1 i, j p. The matrix Σ 3 is a p p matrix with the (i,j)-th element is ρ 1 if i 10,j 10; 0.05 if 11 i p,11 j p; 0 otherwise. Covariance matrices of the same type of Σ 2 or Σ 3 are also considered in [5]. The simulation results of our method, LASSO, adaptive LASSO and ISIS- SCAD when Σ = Σ 2 are presented in Table 2. The results when Σ = Σ 3 with ρ 1 = 0.5andρ 1 = 0.3arepresentedin Tables3and 4respectively.Comparingto the results for Σ = I p p, all four methods can detect more influential predictors with Σ = Σ 2 or Σ = Σ 3 under the same r level. From the results in Tables 2, 3 and 4, it is clear that our method performs better than LASSO, adaptive LASSO and ISIS-SCAD for all two values of r. The performances of all four methods improve as r or n increases. It is worthy to note that adaptive LASSO outperforms LASSO and ISIS-SCAD when Σ = Σ 2 or Σ = Σ 3 with ρ 1 = 0.5, which is quite different from the independent case (Σ = I p p ) p is large In this section, the case p n is considered and Algorithm 2 is applied to perform predictor screening. For comparison purpose, we also present the results for Algorithm 1 and ISIS-SCAD. The data generating processes are the same as

9 Cheng and Huang and Huang/ISIS variable selection 9 r n = 100 n = 200 n = 300 n = Algorithm (11) (11) (12) (12) LASSO (11) (11) (12) (12) ad. LASSO (11) (11) (12) (12) ISIS-SCAD (21) (37) (52) (66) Algorithm (11) (11) (12) (12) LASSO (11) (11) (12) (12) ad. LASSO (11) (11) (12) (12) ISIS-SCAD (21) (37) (52) (66) Table 2 Screening results when Σ = Σ 2 r n = 100 n = 200 n = 300 n = Algorithm (11) (11) 1(11) 1(11) LASSO (11) (11) (11) (11) ad. LASSO (11) (11) (11) (11) ISIS-SCAD (21) (37) (52) (66) Algorithm (11) 1(11) 1(11) 1(11) LASSO (11) (11) (11) (11) ad. LASSO (11) (11) (11) (11) ISIS-SCAD (21) (37) (52) (66) Table 3 Screening results when Σ = Σ 3 and ρ 1 = 0.5 in Section 3.1 but with different parameter values. The covariance matrix Σ can be I p p, Σ 2 or Σ 3. We set r = 0.8 for Σ = I p p and r = 0.3 for Σ = Σ 2. When Σ = Σ 3, we consider ρ 1 {0.3,0.5} and set r = 0.4 for ρ 1 = 0.3 and r = 0.3 for ρ 1 = 0.5. The sample size n is 200. Several values for p are considered: p = 34000P 0, where P 0 {1,2,4,6,8}. Note that here we choose smaller r values than those in previous simulation studies, because under such set-up, we can observe clearly that the screening accuracy decreases as p increases for Algorithm 1 and ISIS-SCAD, and further investigate whether the same pattern occurs for Algorithm 2. ForeachΣand eachp 0 {1,2,4,6,8},wecarryout ISIS-SCAD, Algorithm1 and Algorithm 2 with T {1,2,3} for 100 simulated data sets and compute the corresponding accuracy measures in (2). We also report medians of the numbers of selected predictors in parentheses. The results are given in Tables 5 8. As shown in Tables 5 8, for ISIS-SCAD and Algorithm 1, the average screening accuracy tends to decrease as P 0 increases. For Algorithm 2 with T = 3, the average screening accuracy does not always decrease as P 0 increases and the accuracy is better than that for ISIS-SCAD and Algorithm 1. Moreover, for Algorithm 2 with T = 3 and P 0 {2,4,6,8}, the screening accuracy is not

10 Cheng and Huang and Huang/ISIS variable selection 10 r n = 100 n = 200 n = 300 n = Algorithm (9.5) (10.5) (11) (11) LASSO (10) (11) (11) (11) ad. LASSO (10) (10.5) (11) (11) ISIS-SCAD (21) (37) (52) (66) Algorithm (10) (11) (11) 1(11) LASSO (10) (11) (11) (11) ad. LASSO (10) (11) (11) (11) ISIS-SCAD (21) (37) (52) (66) Table 4 Screening results when Σ = Σ 3 and ρ 1 = 0.3 a lot worse than that for Algorithm 1 with P 0 = 1, so the performance for Algorithm 2 is very satisfactory for the large p case. The effect of T is also of our interest. Based on the results in Tables 5 8, we find that the screening accuracy can be improved by increasing T. However, based on results not presented here, we find that such improvement is not sigificant when T 3. Thus we suggest to use T = 3. n = 200 P 0 = 1 P 0 = 2 P 0 = 4 P 0 = 6 P 0 = 8 ISIS-SCAD 0.800(37) 0.727(37) 0.677(37) 0.632(37) 0.609(37) Algorithm (11) 0.829(10) 0.802(10) 0.758(10) 0.740(10) Algorithm 2, T = (13) 0.826(21.5) 0.768(30) 0.725(65.5) Algorithm 2, T = (16) 0.866(30) 0.848(53) 0.807(117.5) Algorithm 2, T = (19.5) 0.891(38) 0.873(73) 0.840(177.5) Table 5 Screening results for various P 0 and T when Σ = I p p and r = 0.8 n = 200 P 0 = 1 P 0 = 2 P 0 = 4 P 0 = 6 P 0 = 8 ISIS-SCAD 0.558(37) 0.587(37) 0.537(37) 0.541(37) 0.518(37) Algorithm (9) 0.816(9) 0.757(8) 0.733(8.5) 0.705(8) Algorithm 2, T = (10) 0.821(13.5) 0.814(20) 0.808(32) Algorithm 2, T = (11) 0.821(16) 0.814(31) 0.810(55) Algorithm 2, T = (11) 0.821(21) 0.814(39) 0.811(83.5) Table 6 Screening results for various P 0 s and Ts when Σ = Σ 2 and r = 0.3

11 Cheng and Huang and Huang/ISIS variable selection 11 n = 200 P 0 = 1 P 0 = 2 P 0 = 4 P 0 = 6 P 0 = 8 ISIS-SCAD 0.705(37) 0.716(37) 0.688(37) 0.669(37) 0.654(37) Algorithm (10) 0.918(10) 0.876(10) 0.856(10) 0.862(10) Algorithm 2, T = (11) 0.928(14) 0.930(20) 0.946(25) Algorithm 2, T = (12.5) 0.928(16) 0.931(28) 0.946(44.5) Algorithm 2, T = (13) 0.928(19) 0.931(37) 0.946(61.5) Table 7 Screening results for various P 0 s and Ts when Σ = Σ 3, ρ 1 = 0.5 and r = 0.3 n = 200 P 0 = 1 P 0 = 2 P 0 = 4 P 0 = 6 P 0 = 8 ISIS-SCAD 0.848(37) 0.831(37) 0.810(37) 0.797(37) 0.792(37) Algorithm (10) 0.842(10) 0.792(9) 0.761(9) 0.765(9) Algorithm 2, T = (11) 0.877(14) 0.865(19) 0.877(29) Algorithm 2, T = (12) 0.877(17) 0.866(26) 0.878(45.5) Algorithm 2, T = (13) 0.877(21) 0.866(34) 0.879(68.5) Table 8 Screening results for various P 0 s and Ts when Σ = Σ 3, ρ 1 = 0.3 and r = Concluding remarks In this paper, we cosider the problem of predictor screening in linear regression. We propose two iterative SIS-based screening procedures, Algorithm 1 and Algorithm 2, for predictor selection. For Algorithm 1, we provide a screening threshold for SIS in each iteration. The use of the proposed screening threshold is justified in Theorem 1 when p = O(n δ ) for some δ (0,2). When p is very large, we include in our procedure a step of partitioning the predictors into smaller subgroups to perform screening, which gives our Algorithm 2. Simulation results show that Algorithm 1 works well when p n 1.97, and the performance of Algorithm 2 is also satisfactory when p is very large. In particular, Algorithm 2 outperforms Algorithm 1 when p is very large, which shows that including the partitioning step is important. 5. Proof of Theorem 1 In this section, we provide the proof of Theorem 1. Suppose that given Ỹ, Z i,j: 1 i n, 1 j p are independent, and Z 1,j,..., Z n,j are IID from ˆF Xj for j = 1,..., p. Let Z j = n Z i,j/n for j = 1,..., p. Suppose that the observation vector Ỹ = (Y 1,...,Y n ) and Ȳ = n Y i/n. Let X denote the

12 Cheng and Huang and Huang/ISIS variable selection 12 n p data matrix whose (i,j)-th component is X i,j. Then G(t Ỹ, ˆF X1,..., ˆF Xp ) ( n = P max (Z i,j Z ) j )(Y i Ȳ) 1 j p ( n (Z i,j Z j ) 2) 1/2( n (Y i Ȳ)2) 1/2 t X Ỹ, ( p ) n = P (Z i,j µ j,n )(Y i Ȳ) ( n (Z i,j Z j ) 2) 1/2( n (Y i Ȳ)2) 1/2 t X, Ỹ, j=1 where for j = 1,..., p, µ j,n = n X i,j/n, and ( ) n P (Z i,j µ j,n )(Y i Ȳ) ( n (Z i,j Z j ) 2) 1/2( n (Y i Ȳ)2) 1/2 t X Ỹ, n P (Z i,j µ j,n )(Y i Ȳ) ( nσj,n 2 n (Y i Ȳ)2) 1/2 t(1 δ 1) Ỹ, X }{{} nσ 2 j,n P ( n (Z i,j Z j ) 2) 1/2 > 1 1 δ 1 X }{{} II j nσ 2 j,n I j for δ 1 (0,1) and σj,n 2 = n (X i,j µ j,n ) 2 /n. Let Φ denote the CDF for N(0,1). Below we will give bounds for I j and II j respectively by comparing them with probabilities related to N(0, 1). We will first show that max 1 j p II j = O p (1/n 2 ). For δ 2 (0,1 (1 δ 1 ) 2 ), we have nσ 2 j,n II j = P ( n (Z i,j Z j ) 2) 1/2 > 1 1 δ 1 X ( n ) ( ) P (Z i,j µ j,n ) 2 < (1 δ 1 ) 2 n( Z j µ j,n ) δ 2 X 2 +P > δ 2 X. nσ 2 j,n From Theorem 2 in Osipov and Petrov [3], ( n ) P (Z i,j µ j,n ) 2 < (1 δ 1 ) 2 δ 2 X +P nσ 2 j,n ( n( Z j µ j,n ) 2 nσ 2 j,n > δ 2 X ) can be expressed as Φ( n((1 δ 1 ) 2 δ 2 1)/ v j,n )+(1 2Φ( nδ 2 ))+a n,j S n,1

13 Cheng and Huang and Huang/ISIS variable selection 13 for 1 j p, where ( ) (Z i,j µ j,n ) 2 v j,n = Var X = 1 n σ 2 j,n max 1 j p a n,j = O(1/n 2 ), 1 S n,1 = max 1 j p n n (X i,j µ j,n ) 2 1 σ 2 j,n 3 n ( (X i,j µ j,n ) max 1 j p n σ 2 j,n 1) 2, n X i,j µ j,n σ j,n Since max 1 j p v j,n = O p (1) and S n,1 = O p (1), we have max 1 j p II j = O p (1/n 2 ). To control I j, note that P n (Z i,j µ j,n )(Y i Ȳ) ( nσj,n 2 n (Y i Ȳ)2) 1/2 t(1 δ 1) Ỹ, X ( 1 2Φ(t(1 δ 1 ) n) ) n P (Z i,j µ j,n )(Y i ( Ȳ) t(1 δ nσj,n 2 n (Y i Ȳ)2) 1/2 1) Ỹ, X ( 1 2Φ(t(1 δ 1 ) n) ) + P n (Z i,j µ j,n )(Y i ( Ȳ) t(1 δ nσj,n 2 n (Y i Ȳ)2) 1/2 1) Ỹ, X ( 1 2Φ(t(1 δ 1 ) n) ) 2C 1( n X i,j µ j,n 3 /n) n Y ( ) 3 i ( Ȳ 3 1 σj,n 3 n Y i Ȳ 2) 3/2 1+ n t(1 δ 1 ) ( ) n ( 2C 1 S n,1 Y ) 3 i Ȳ 3 /n ( n n Y i Ȳ 2 /n ) 1 3/2 1+ (3) n t(1 δ 1 ) 3. for some absolute constant C 1. Here the last inequality follows from Theorem 2 in [3]. Note that in (3), the quantity so S n,1 n ( Y i Ȳ 3 /n ( n (Y i Ȳ)2 /n ) 3/2 = O p(1), max Ij ( 1 2Φ(t(1 δ 1 ) n) ) = Op (1/n 2 ), 1 j p which, together with the fact that max 1 j p II j = O p (1/n 2 )), implies that G(t Ỹ, ˆF X1,..., ˆF Xp ) p (I j II j ) converges to one in probability as n. The proof of Theorem 1 is complete. j=1

14 Cheng and Huang and Huang/ISIS variable selection 14 References [1] Jianqing Fan and Jinchi Lv. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5): , [2] Qianchuan He and Dan-Yu Lin. A variable selection method for genome-wide association studies. Bioinformatics, 27(1):1 8, [3] V. V. Osipov, L. V. & Petrov. On an estimate of the remainder term in the central limit theorem. Theory of Probability and its Applications, 12(2): , [4] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58: , [5] Li-Ping Zhu, Lexin Li, Runze Li, and Lixing Zhu. Model-free feature screening for ultrahigh dimensional data. Journal of the American Statistical Association, 106(496): , [6] Hui Zou. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101: , 2006.

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Ranking-Based Variable Selection for high-dimensional data.

Ranking-Based Variable Selection for high-dimensional data. Ranking-Based Variable Selection for high-dimensional data. Rafal Baranowski 1 and Piotr Fryzlewicz 1 1 Department of Statistics, Columbia House, London School of Economics, Houghton Street, London, WC2A

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Sparse survival regression

Sparse survival regression Sparse survival regression Anders Gorst-Rasmussen gorst@math.aau.dk Department of Mathematics Aalborg University November 2010 1 / 27 Outline Penalized survival regression The semiparametric additive risk

More information

FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL

FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL Statistica Sinica 26 (2016), 881-901 doi:http://dx.doi.org/10.5705/ss.2014.171 FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL COX S MODEL Guangren Yang 1, Ye Yu 2, Runze Li 2 and Anne Buu 3 1 Jinan University,

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Joel L. Horowitz 2, and Shuangge Ma 3 1 Department of Statistics and Actuarial Science, University

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

A UNIFIED APPROACH TO MODEL SELECTION AND SPARS. REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009)

A UNIFIED APPROACH TO MODEL SELECTION AND SPARS. REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009) A UNIFIED APPROACH TO MODEL SELECTION AND SPARSE RECOVERY USING REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009) Mar. 19. 2010 Outline 1 2 Sideline information Notations

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Feature Screening in Ultrahigh Dimensional Cox s Model

Feature Screening in Ultrahigh Dimensional Cox s Model Feature Screening in Ultrahigh Dimensional Cox s Model Guangren Yang School of Economics, Jinan University, Guangzhou, P.R. China Ye Yu Runze Li Department of Statistics, Penn State Anne Buu Indiana University

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates

Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates Jingyuan Liu, Runze Li and Rongling Wu Abstract This paper is concerned with feature screening and variable selection

More information

In Search of Desirable Compounds

In Search of Desirable Compounds In Search of Desirable Compounds Adrijo Chakraborty University of Georgia Email: adrijoc@uga.edu Abhyuday Mandal University of Georgia Email: amandal@stat.uga.edu Kjell Johnson Arbor Analytics, LLC Email:

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

Robust methods and model selection. Garth Tarr September 2015

Robust methods and model selection. Garth Tarr September 2015 Robust methods and model selection Garth Tarr September 2015 Outline 1. The past: robust statistics 2. The present: model selection 3. The future: protein data, meat science, joint modelling, data visualisation

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS The Annals of Statistics 2008, Vol. 36, No. 2, 587 613 DOI: 10.1214/009053607000000875 Institute of Mathematical Statistics, 2008 ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

An algorithm for the multivariate group lasso with covariance estimation

An algorithm for the multivariate group lasso with covariance estimation An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

An automatic report for the dataset : affairs

An automatic report for the dataset : affairs An automatic report for the dataset : affairs (A very basic version of) The Automatic Statistician Abstract This is a report analysing the dataset affairs. Three simple strategies for building linear models

More information

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Statistica Sinica 18(2008), 1603-1618 ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang, Shuangge Ma and Cun-Hui Zhang University of Iowa, Yale University and Rutgers University Abstract:

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Grouping Pursuit in regression. Xiaotong Shen

Grouping Pursuit in regression. Xiaotong Shen Grouping Pursuit in regression Xiaotong Shen School of Statistics University of Minnesota Email xshen@stat.umn.edu Joint with Hsin-Cheng Huang (Sinica, Taiwan) Workshop in honor of John Hartigan Innovation

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

High-dimensional variable selection via tilting

High-dimensional variable selection via tilting High-dimensional variable selection via tilting Haeran Cho and Piotr Fryzlewicz September 2, 2010 Abstract This paper considers variable selection in linear regression models where the number of covariates

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

P-Values for High-Dimensional Regression

P-Values for High-Dimensional Regression P-Values for High-Dimensional Regression Nicolai einshausen Lukas eier Peter Bühlmann November 13, 2008 Abstract Assigning significance in high-dimensional regression is challenging. ost computationally

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Generalized Linear Models and Its Asymptotic Properties

Generalized Linear Models and Its Asymptotic Properties for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L Abstract Literature Review In this talk, we present a new prior setting for

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Data analysis strategies for high dimensional social science data M3 Conference May 2013

Data analysis strategies for high dimensional social science data M3 Conference May 2013 Data analysis strategies for high dimensional social science data M3 Conference May 2013 W. Holmes Finch, Maria Hernández Finch, David E. McIntosh, & Lauren E. Moss Ball State University High dimensional

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation

More information

Regression tree methods for subgroup identification I

Regression tree methods for subgroup identification I Regression tree methods for subgroup identification I Xu He Academy of Mathematics and Systems Science, Chinese Academy of Sciences March 25, 2014 Xu He (AMSS, CAS) March 25, 2014 1 / 34 Outline The problem

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Statistica Sinica Preprint No: SS R3

Statistica Sinica Preprint No: SS R3 Statistica Sinica Preprint No: SS-2015-0413.R3 Title Regularization after retention in ultrahigh dimensional linear regression models Manuscript ID SS-2015-0413.R3 URL http://www.stat.sinica.edu.tw/statistica/

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

ON VARYING-COEFFICIENT INDEPENDENCE SCREENING FOR HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS

ON VARYING-COEFFICIENT INDEPENDENCE SCREENING FOR HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS Statistica Sinica 24 (214), 173-172 doi:http://dx.doi.org/1.7/ss.212.299 ON VARYING-COEFFICIENT INDEPENDENCE SCREENING FOR HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS Rui Song 1, Feng Yi 2 and Hui Zou

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information