Lingmin Zeng a & Jun Xie b a Department of Biostatistics, MedImmune, One MedImmune Way,

Size: px
Start display at page:

Download "Lingmin Zeng a & Jun Xie b a Department of Biostatistics, MedImmune, One MedImmune Way,"

Transcription

1 This article was downloaded by: [Jun Xie] On: 27 July 2011, At: 09:44 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: Registered office: Mortimer House, Mortimer Street, London W1T 3JH, UK Journal of Statistical Computation and Simulation Publication details, including instructions for authors and subscription information: Group variable selection for data with dependent structures Lingmin Zeng a & Jun Xie b a Department of Biostatistics, MedImmune, One MedImmune Way, Gaithersburg, MD, 20878, USA b Department of Statistics, Purdue University, 250 N. University Street, West Lafayette, IN, 47907, USA Available online: 25 May 2011 To cite this article: Lingmin Zeng & Jun Xie (2011): Group variable selection for data with dependent structures, Journal of Statistical Computation and Simulation, DOI: / To link to this article: PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan, sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

2 Journal of Statistical Computation and Simulation ifirst, 2011, 1 12 Group variable selection for data with dependent structures Lingmin Zeng a and Jun Xie b a Department of Biostatistics, MedImmune, One MedImmune Way, Gaithersburg, MD 20878, USA; b Department of Statistics, Purdue University, 250 N. University Street, West Lafayette, IN 47907, USA (Received 19 April 2010; final version received 4 October 2010 ) Variable selection methods have been widely used in the analysis of high-dimensional data, for example, gene expression microarray data and single nucleotide polymorphism data. A special feature of the genomic data is that genes participating in a common metabolic pathway or sharing a similar biological function tend to have high correlations. The collinearity naturally embedded in these data requires special handling, which cannot be provided by existing variable selection methods. In this paper, we propose a set of new methods to select variables in correlated data. The new methods follow the forward selection procedure of least angle regression (LARS) but conduct grouping and selecting at the same time. The methods specially work when no prior information on group structures of data is available. Simulations and real examples show that our proposed methods often outperform the existing variable selection methods, including LARS and elastic net, in terms of both reducing prediction error and preserving sparsity of representation. Keywords: hard thresholding; least angle regression; variable selection 1. Introduction Regression is simple but the most useful statistical method in data analysis. The goal of regression analysis is to discover the relationship between a response y and a set of predictors x 1,x 2,...,x p. Variable selection methods have long been used in regression analysis, for example forward selection, backward elimination, best subset regression. The number of variables p in the traditional setting is typically 10 or at most a few dozen. Modern scientific technology, led by the microarray, has produced data dramatically above the conventional scale. We have p = 1000 to 10,000 in gene expression microarray data, and p up to 500,000 in single nucleotide polymorphism (SNP) data. To make things more complicated, the large number of variables in the biological data are dependent. For example, in analysis of gene expression microarray data, it is well known that for genes that share a common biological function or participate in the same metabolic pathway, the pairwise correlations among them can be very high [1]. Traditional variable selection methods that select variables one by one may miss important group effects on Corresponding author. junxie@purdue.edu ISSN print/issn online 2011 Taylor & Francis DOI: /

3 2 L. Zeng and J. Xie pathways. Consequently, when traditional variable selection methods are applied in multiple data sets from a common biological system, the selected variables from the multiple studies may show little overlap. The vast dimension of genomic data posts a challenge to the traditional variable selection methods. Several promising techniques have been proposed to overcome the dimension limitation, including Lasso [2], LARS [3], and elastic net [4]. Lasso (least absolute shrinkage and selection operator) imposes an L 1 norm bound on the regression coefficients while minimizing the residual sum of squares. Due to the nature of the L 1 penalty, Lasso shrinks the regression coefficients towards 0 and produces some coefficients that are exactly 0, and hence implements variable selection. Efron et al. [3] proposed a less greedy forward variable selection algorithm named least angle regression (LARS). With a slight modification, LARS calculates all possible Lasso estimators but uses computer times an order of magnitude less than Lasso. The efficiency of the LARS algorithm makes it widely used in variable selection. However, for highly correlated variables, Lasso tends to select only one variable instead of the whole group. Zou and Hastie [4] proposed the elastic net method by combining L 2 and L 1 penalties on the regression coefficients. The convex function induced by the L 2 penalty helped elastic net to achieve a grouping effect that highly correlated variables would be in or out of the model together. More specifically, elastic net estimators of regression coefficients of highly correlated variables tended to be equal (up to a change of sign). Analogously, Bondell and Reich [5] proposed a penalized regression method called OSCAR, with an octagonal-shaped constraint for the coefficients, to encourage selections of a group of highly correlated variables. Elastic net often outperformed Lasso in terms of prediction error on correlated data. OSCAR had a comparable performance with elastic net [5]. However, it was unclear how groups were defined in both methods. Alternatively, for data with dependent structures, a naive approach is a two-step procedure: first cluster highly correlated variables into groups then select among the groups. Park et al. [6] applied hierarchical clustering on gene expression data in the first step. For each cluster (group), they then averaged the genes and took the average as a supergene to fit a regression model. Yuan and Lin [7] developed group LARS and group Lasso, which extended LARS and Lasso to select groups of variables. However, group LARS (group Lasso) required information of group structures before hand. Hence the method was in fact the second step of a two-step procedure. Another method was supervised group Lasso proposed by Ma et al. [8], which was also a two-step procedure. After dividing genes into clusters using the K-means method, they first identified important genes within each group by Lasso and then selected important clusters using the group Lasso. A limitation of the two-step procedures is that the first step of clustering analysis does not incorporate the information of the response variable y. In other words, grouping variables based on their correlations alone is not appropriate for variable selection problem, which requires the joint information of the predictors and the response variable. In this paper, we propose two new variable selection algorithms glars and gridge, for data with dependent structures. Our methods conduct grouping and selecting at the same time therefore work well when prior information of group structures is not available. Similar to LARS, the proposed methods are forward procedures, which are computationally efficient and provide a piecewise linear solution path. In Section 2, we define a variable selection criterion in twodimensional space and then specify the glars and gridge algorithms. We compare the proposed methods with ordinary least squares (OLS), ridge regression, LARS, and elastic net through simulations in Section 3. We study an example on prostate cancer in Section 4. In Section 5, we apply the methods in an example of genetic variation SNP data, which contains a few thousands of variables (SNPs) with complicated dependence. We summarize and discuss future works in Section 6.

4 2. glars and gridge algorithms Consider a linear regression model Journal of Statistical Computation and Simulation 3 y = Xβ + ɛ, where y = (y 1,y 2,...,y n ) T is the response variable, X = (x 1, x 2,...,x p ) is the predictor matrix, and ɛ = (ɛ 1,ɛ 2,...,ɛ n ) T is a vector of independent and identically distributed random errors with mean 0 and variance σ 2. There are n observations and p predictors, where p may be larger than n. We centre the response variable and standardize the column vectors of the predictor matrix. Hence, there is no intercept in our model. n y i = 0, i=1 n x ij = 0, i=1 n xij 2 = 1 i=1 for j = 1, 2,...,p Grouping definition We start our methods with a grouping definition. Predictors form a group if they satisfy both the two criterions: They are highly correlated with the response variable (or current residual); They are highly correlated with each other. The correlation thresholds for the two criterions are data driven. The first threshold is suggested to be the 75th percentile of all correlations (in absolute value) between the current residual and unselected predictors. The second criterion decides group size. Its threshold is chosen from a set of grids, for instance, {0.9, 0.8, 0.7, 0.6}, by cross-validation (CV). In practices, the distribution of all pairwise correlations among predictors gives an idea on the set of grids. Then we decide on the threshold by CV, in addition to the tuning parameters introduced in Section 2.4. An important difference of the proposed methods from the traditional selection procedures is that our variable selection criterion has two components hence is defined by a region in the twodimensional space, i.e. (t 1,t 2 ) in the algorithm specified next. In addition, the first requirement of selecting a variable highly correlated with the response variable is not affected by collinearity among predictors glars algorithm Before introducing the new glars algorithm, we first briefly review the LARS procedure. At the beginning of LARS, a predictor enters the model if its absolute correlation with the response is the largest one among all predictors. The coefficient of this predictor grows in its OLS direction until another predictor has the same correlation with the current residual (i.e. equal-angle). Next, both coefficients of the two predictors begin to move along their OLS directions until a third predictor has the same correlation with the current residual as the first two. The whole process continues until all predictors enter the model. In each step, one variable adds into the model and the solution paths, which are the coefficient estimators as functions of the tuning parameter (defined later in Formula (1)), are extended in a piecewise linear fashion. After all variables enter the model, the whole LARS solution paths complete. In the glars algorithm, we start as LARS to select a predictor which has the largest correlation with the response. We call this predictor a leader element. We then build a group based on this leader element and the current residual according to the grouping definition. Note that both criterions in Section 2.1 have to be satisfied when selecting a variable. Once a group has been

5 4 L. Zeng and J. Xie constructed, it will be represented by a unique direction in R n as the linear combination of the OLS directions of all variables in the group. Next, we choose another leader element, analogous to the equal-angle requirement of the LARS algorithm. A new group is formed again following the grouping definition. We extend the solution paths in a piecewise linear format. The whole process continues until all predictors enter the model. The detailed algorithm is described below. Note that in Step 5 we adopt the idea of Park and Hastie [9] to replace the average squared correlation by the average absolute correlation to make the algorithm robust. (1) Initialization: Set the step index k = 1, β [0] = 0, residual r [0] = Y, active set A 0 =, inactive set A C 0 ={X 1,X 2,...,X p }. (2) Identify the leader predictor x for the first group, where x = argmax xi x i r[k 1], x i A C k 1. (3) Construct the group G k with the leader predictor x from Step 2 according to the two criterions: x j r[k 1] >t 1 and x j x >t 2, x j A C k 1, where t 1 = 75th percentile of all correlations between x j and r [k 1], and t 2 = t {0.9, 0.8, 0.7, 0.6}. Set A k = A k 1 G k, A C k = AC k 1 \ G k. (4) Compute the current direction γ with components γ Ak = (X A k X Ak ) 1 X A k r [k 1], γ A C k = 0, where X Ak denotes the matrix comprised of the columns of X corresponding to A k. (5) Calculate how far the glars algorithm progresses in direction γ. It divides into two small steps: Find x j in A C k which corresponds to the smallest α (0, 1] such that X G j (r [k 1] αxγ) L1 p j = x j (r[k 1] αxγ), where G j is a group from A k, p j is the number of variables in group G j, and. L1 represents the sum of absolute values. Justification. As in Step 3, find the group with the leader predictor x j selected above and denote the group as X Gj. Recalculate α (0, 1] for this selected new group such that X G j (r [k 1] αxγ) L1 p j = X (r[k 1] Gj αxγ) L1. p j Update β [k] = β [k 1] + αγ, r [k] = Y Xβ [k]. (6) Update k to k + 1, and A k = A k 1 G j, A C k = AC k 1 \ G j. (7) If the number of variables in A k is smaller than min(p, n), return to Step 4. Otherwise, set γ, β, and r to be the OLS solutions and stop gridge algorithm OLS would perform poorly when the correlations among the predictors are high and/or the noise level is high. Since both LARS and glars move towards OLS directions in each step, they have the same limitation. Ridge estimators, on the other hand, perform better in this situation. We propose a gridge algorithm, which moves towards ridge estimator direction in each step. The relationship

6 Journal of Statistical Computation and Simulation 5 between ridge estimator ˆβ(λ) and OLS estimator ˆβ satisfies ˆβ(λ) = (X X + λi) 1 X Y = (X X(I + λ(x X) 1 )) 1 X Y = (I + λ(x X) 1 ) 1 ˆβ = C ˆβ, where C = (I + λ(x X) 1 ) 1 and λ is the ridge parameter. The gridge algorithm is thus a simple modification of the glars algorithm. When a group is constructed, gridge represents the group by a unique direction from the linear combination of the ridge directions of all variables in the group. The variable coefficients are moving towards the ridge directions. As we run simulations, we notice that gridge outperforms other methods in terms of relative prediction errors (RPEs, defined in the simulations next). However, this method is limited by its relatively large number of false positives due to an over-grouping effect. In other words, although ridge shrinks small coefficients, it does not truncate them to zero for variable selection. We propose to add a hard threshold δ to gridge estimators so that small (but nonzero) coefficients will be removed, i.e. β j = βˆ j I( βˆ j >δ). Based on simulations, we define a threshold δ = σ log(p)/n. Hence smaller error term, smaller number of predictors, or larger sample size give smaller threshold. We name the modified gridge algorithm gridge_new, with this hard threshold filtering. The following simulation studies will show that gridge_new not only preserves low prediction errors but also greatly reduces false positives Tuning parameters As LARS does, both glars and gridge produce the entire piecewise linear solution paths with all p (or n when p>n) variables in the model. Groups of variables are selected when we stop the paths after a certain number of steps. The number of steps k is the tuning parameter. Equivalently, we may use a tuning parameter as the fraction of the L 1 norm of the coefficients s = j_selected ˆβ j L1 j ˆβ j L1. (1) For glars, s (or k) is the only tuning parameter. We determine it by a standard five-fold CV. For gridge, there are two tuning parameters, the ridge parameter λ in addition to s (or k). Hence we cross-validate on two dimensions. We first choose a grid for λ, say {0.01, 0.1, 1, 10, 100, 1000}. The gridge algorithm produces the entire solution paths for each λ. Then the parameter s (or k) is selected by five-fold CV. At the end, we choose the λ value which gives the smallest CV error. 3. Simulation studies Simulation studies are used to compare the proposed glars and gridge with OLS, ridge regression, LARS, and elastic net. The simulated data are generated from the true model y = Xβ + σɛ, ɛ N(0, 1). Four examples are presented for different scenarios in the following. Each example contains a training set and a test set. The tuning parameters are selected on the training set by fivefold CV. Variable selection methods are compared in terms of RPE [10] and selection accuracy on the test set. The RPE is defined as RPE = ( ˆβ β) T ( ˆβ β)/σ 2, where is the population covariance matrix of X. Selection accuracy is illustrated by the numbers of true and false positives

7 6 L. Zeng and J. Xie of the selected variables. In each example, we replicate the simulation 100 times and report the median RPEs and median true/false positives. The four scenarios are given by (1) Example 1: there are 100 and 200 observations in the training and test sets, respectively. The true parameters are β = (3, 1.5, 0, 0, 2, 0, 0, 0) and σ = 3. The pairwise correlation between x i and x j is set to be corr(x i, x j ) = 0.5 i j, i, j = 1,...,8. This example creates a sparse model with a few large effects and the predictors have first-order autoregressive correlations. (2) Example 2 (adopted from [11]): we simulate 100 and 400 observations in the training and test sets, respectively. We set the true parameters as and σ = 6. The predictors are generated as β = (3,...,3, 1.5,...,1.5, 0,...,0) }{{}}{{}}{{} x i = Z + εi x, Z N(0, 1), i = 1,...,15, x i N(0, 1), i.i.d., i = 16,...,40, where εi x are independent identically distributed N(0, 0.01), i = 1,...,15. This example creates one group from the first 15 highly correlated covariates. The next five covariates are independent but provide signals on the response variable. (3) Example 3: we simulate 100 and 400 observations in the training and test sets, respectively. We set the true parameters as and σ = 15. The predictors are generated as β = (3,...,3, 0,...,0) }{{}}{{} x i = Z 1 + εi x, Z 1 N(0, 1), i = 1,...,5, x i = Z 2 + εi x, Z 2 N(0, 1), i = 6,...,10, x i = Z 3 + εi x, Z 3 N(0, 1), i = 11,...,15, x i N(0, 1), i.i.d., i = 16,...,40, where εi x are independent identically distributed N(0, 0.01), i = 1,...,15. There are three equally important groups with five members in each. There are also 25 noise variables. (4) Example 4: we simulate 100 and 200 observations in the training and test sets, respectively. We set the true parameters as β = (3, 3, 3, 0, 0, 3, 3, 3, 0, 0, 3, 3, 3, 0, 0, 0,...,0). }{{}}{{}}{{}}{{} The predictors and the error terms are the same as in Example 4. There are also three equally important groups with five members in each of them. However, in each group, there are two noise variables, which have no effect on the response variable but are highly correlated with the other three important variables. There are totally 31 noise variables. Table 1 summarizes the prediction results in terms of RPE. The methods with the smallest RPE are highlighted in each example. The OLS estimate, which includes all variables in the regression model, has the highest prediction error. The other variable selection methods improve performance with smaller prediction errors in all the examples. Specifically, elastic net produces

8 Journal of Statistical Computation and Simulation 7 Table 1. Median relative prediction errors (RPE) and their standard errors (SE) for the four simulated examples based on 100 replications. Example 1 Example 2 Example 3 Example 4 Methods RPE SE RPE SE RPE SE RPE SE OLS Ridge LARS Elastic net glars gridge gridge new The SEs of RPEs are estimated by using bootstrap with B = 1000 resampling iterations on the 100 RPE. The bold numbers indicate the methods with the smallest prediction errors. the smallest prediction errors in Example 1, where variables are correlated but not in group structures. gridge and gridge_new produce the smallest prediction errors in most of the examples. Their reductions of RPEs in Examples 2 4 are 23.1%, 12.5%, and 37.8%, respectively, compared with elastic net. We also notice that while estimating the large coefficients close to the true coefficients, gridge tends to select more variables than elastic net, owing to its over grouping effect from ridge. After we add a hard threshold to gridge, gridge_new estimators achieve the best performance. Our first method glars gives comparable prediction errors with LARS. However, coefficient estimators show that glars has a clear pattern of selecting groups of variables in Examples 2 4. That is, variables in a common group enter the model together. The grouping effect of glars and gridge is partially demonstrated in Table 2, which compares different methods in terms of variable selection accuracy. Column TP represents the median number of nonzero coefficients correctly identified as nonzeros, or true positives. Column FP represents the median number of zero coefficients incorrectly specified as nonzeros, or false positives. The OLS and ridge estimators have large true positives but also large false positives. The LARS algorithm greatly lowers the false positives from OLS but misses many true effective variables in Example 2 4, due to the limitation in analysing data with dependent (group) structures. Elastic net and our proposed glars improve OLS and LARS, especially in Examples 2 4, whereas glars is better than elastic net in Examples 2 and 4. gridge has larger false positives than glars and elastic net. After adding the threshold, gridge_new achieves the best results. In summary, our simulations show that gridge_new outperforms the existing variable selection methods and has a greater advantage for data with group structures. Table 2. Median numbers of true positives (TP, nonzero coefficients) and false positives (FP, zero coefficients misspecified as nonzero coefficients) of the four simulations based on 100 replications. Example 1 Example 2 Example 3 Example 4 Methods TP FP TP FP TP FP TP FP OLS Ridge LARS Elastic net glars gridge gridge new The bold numbers indicate the methods with the largest true positives and the smallest false positives.

9 8 L. Zeng and J. Xie 4. Prostate cancer example An example that has been analysed by many variable selection methods, including Lasso and elastic net, is about prostate cancer from a study by Stamey et al. [12]. There were 97 observations collected from men who were about to receive a radical prostatectomy. We want to discover the relationship between the level of prostate-specific antigen and several clinical endpoints. The response variable is log(prostate specific antigen) (lpsa). The clinical endpoints are x 1 log(cancer volume) (lcavol), x 2 log(prostate weight) (lweight), x 3 age, x 4 log(benign prostatic hyperplasia amount) (lbph), x 5 seminal vesicle invasion (svi), x 6 log(capsular penetration) (lcp), x 7 Gleason score (gleason), and x 8 percentage Gleason scores 4 or 5 (pgg45). The correlation between x 7 gleason and x 8 pgg45 is and that between x 1 lcavol and x 6 lcp is These correlations indicate a moderate collinearity among the data. We split data into two parts: a training set with 67 observations and a test set with 30 observations. We choose the tuning parameters by 10-fold CV in the training set. After fitting the model, we calculate prediction errors, which are residual sum of squares, on the test data and compare our proposed methods with OLS, LARS, and elastic net. All predictors are centred and standardized to have mean 0 and standard deviation 1. Standardized coefficients LAR beta /max beta Standardized coefficients Elastic net beta /max beta lcp gleason svi lcavol glar gridge Standardized coefficients beta /max beta Standardized coefficients beta /max beta Figure 1. Solution paths of LARS, elastic net, glars, and gridge on the prostate cancer data.

10 Journal of Statistical Computation and Simulation 9 Table 3. Lars Elastic net glar gridge The order of covariates entering the model in the prostate cancer example. lcavol svi lweight pgg45 lbph gleason age lcp lcavol svi lweight pgg45 lbph gleason age lcp lcavol, lcp svi lweight gleason, pgg45 lbph age lcavol, lcp svi lweight lbph, gleason, pgg45 age Table 4. Coefficients estimators from OLS, LARS, elastic net, glars, and gridge methods for the prostate cancer example. Coefficients OLS LARS Elastic glars gridge gridge new lcavol lweight age lbph svi lcp gleason pgg Figure 1 gives the solution paths of LARS, elastic net, glars, and gridge. The orders that the variables are selected into the models are specified in Table 3. The methods of glars and gridge select two groups of variables, lcavol and lcp, and gleason and pgg45, respectively, due to their high correlations. Table 4 gives the coefficient estimators from different methods. All methods suggest that logarithm of cancer volume (lcavol), logarithm of prostate weight (lweight), and seminal vesicle invasion (svi) are important variables in explaining the response variable of prostate-specific antigen. LARS and elastic net also select logarithm of benign prostate hyperplasia (lbph), gleason, and pgg45, whereas glars and gridge select lcp with lcavol together and pgg45 with gleason together because of the grouping effect. The gridge_new method gives the most sparse model, removing gleason and pgg45 after the hard thresholding. Table 5 demonstrates the prediction errors in the test set. Our proposed methods outperform other existing methods by reducing prediction errors. This example also shows that gridge_new has the best performance with the lowest prediction error and the most parsimonous model. Table 5. The prediction errors by OLS, LARS, elastic net, glars, and gridge on prostate cancer data. Methods Prediction error OLS LARS Elastic net glars gridge gridge new An example on SNP data analysis SNPs are the most common genetic variations in the human genome, occurring once in several hundred base pairs. An SNP is a position at which two alternative bases occur at appreciable frequency (>1%) in the human population. SNPs can serve as genetic markers for identifying disease

11 10 L. Zeng and J. Xie genes by linkage studies in families, linkage disequilibrium in isolated populations, association analysis of patients and controls, and loss-of-heterozygosity studies in tumours [13]. With the technique advances, genome-wide association studies become popular to detect the specific DNA variants that contribute to human phenotypes and particularly human diseases. Natural variation in the baseline expression of many genes can be considered as heritable traits. Morley et al. [14] collected microarray and SNP data to localize the DNA variants that contribute to the expression phenotypes. The data consist of 14 families with 56 unrelated individuals (the grandparents). There are 8500 genes on the microarray and 2756 SNP markers genotyped for each individual. The expression level of a given gene is the response variable and the 2756 SNP markers are the predictors in our model. We code each SNP as 0, 1, 2 for wild-type homozygous, heterozygous, and mutation (rare) homozygous genotypes, respectively, according to the genotype frequency. We first screen data to exclude SNPs that have a call rate less than 95% or minor allele frequency less than 2.5%. The number of SNPs is reduced to 1739 after screening. Then we select 500 most variable SNPs as the potential predictors. The variability of an SNP is measured by its sample variance. Consider gene ICAP-1A, which is the top gene in Morley et al. s [14] Table 1 with the strongest linkage evidence. ICAP-1A, also named ITGB1BP1, encodes a protein binding to the beta 1 integrin cytoplasmic domain. It may play a role during integrin-dependent cell adhesion. Here, we apply multiple linear regression and the proposed group selection methods to search for an optimal set of SNPs for gene ICAP-1A. Linear regression over SNPs coded as 0, 1, and 2 captures linear relationship between the gene expression and different types of SNPs, and hence is commonly used as a simple tool in SNP data analysis. We split data into the training set with 42 observations and the test set with 14 observations. Model fitting and choices of the tuning parameters are based on the training set. The first grouping criterion is set up to be the 75th percentile of all correlations between x and y. The second grouping criterion requires the correlation with the leader element to be greater than 0.6. The prediction error Table 6. Test prediction errors of Lasso, elastic net, glars, and gridge for the SNP data. Methods Test prediction error Number of genes Tuning parameter s LARS Elastic net (λ 2 = 0.01) glars gridge (λ 2 = 0.01) Table 7. The first 12 steps of predictors selected by Lasso, elastic net, glars, and gridge for the SNP data. Methods LASSO Elastic net glars gridge Step 1 x 458 x 458 x 458 x 458 Step 2 x 481 x 481 x 481 x 481 Step 3 x 321 x 321 x 321, x 131 x 321, x 131 Step 4 x 287 x 287 x 287 x 287 Step 5 x 240 x 240 x 240 x 240 Step 6 x 76 x 76 x 406 x 406 Step 7 x 406 x 131 x 76 x 76 Step 8 x 131 x 406 x 102 x 102 Step 9 x 345 x 345 x 345 x 345 Step 10 x 102 x 102 x 498 x 498 Step 11 x 498 x 498 x 30, x 27 x 30, x 27 Step 12 x 30 x 30 x 167 x 167

12 Journal of Statistical Computation and Simulation 11 (residual sum of squares) is evaluated on the test data. Table 6 shows that glars and gridge have lower prediction errors than LARS and elastic net with about 30 SNPs selected for gene ICAP-1A. We notice that R 2 of the LARS fitted model with 24 SNP covariates is 0.779, which supports the hypothesis of Morley et al. s study [14] that gene expression phenotypes are controlled by genetic variants. On the other hand, the coefficient estimators of all SNP covariates are very small, in the scale of 0.01, suggesting small additive effects of multiple SNPs. Table 7 lists the first 12 steps that the predictors are selected in each algorithm. For instance, variable x 458 enters the model at the first step in all algorithms. At Step 3, glars and gridge depart from LARS and elastic net due to the grouping effect. The first group consists of two SNPs: x 321 SNP rs and x 131 SNP rs The correlation of these two variables is The two SNPs are in chromosome 3 with 14K base pairs apart. They are in an intergenic region. The two closest genes have no functional annotation. Another group consisting of the two SNPs x 30 rs and x 27 rs are selected by glars and gridge at Step 11. These two SNPs are in chromosome 7 with over 2.5 million base pairs apart. SNP x 30 rs is in an intergenic region, whereas x 27 rs resides in the gene FOXK1. According to Swiss- Prot functional annotation, FOXK1 is a transcriptional regulator that binds to the upstream of myoglobin gene. Our results suggest more SNP associations with a gene expression phenotype than the simple linkage analysis. For example, variables x 240 SNP rs and x 76 SNP rs are jointly selected as important covariates for the expression of ICAP-1A according to all four variable selection methods. However, they are not significant in simple regression analysis with a p-value cutoff of These two SNPs locate in the same chromosome as ICAP-1A (chromosome 2) with nearly 27 million base pairs and 220 million base pairs, respectively, away from ICAP-1A. Figure 2 shows their locations and regression plots. SNP rs is in the promotor region (about 200 base pairs upstream) of gene FAM82A1. SNP rs is in the promotor region (about 300 base pairs upstream) of gene DNER. The significant effects of these two SNPs on gene ICAP-1A may suggest associations among the corresponding genes. Expression level(log2) G,G rs A,G A,A Expression level(log2) G,G rs A,G A,A SNP Genotype SNP Genotype 5 3 Figure 2. Regression of the expression phenotype of ICAP-1A on the two nearby SNPs. The bottom left arrow points to the location of ICAP-1A.

13 12 L. Zeng and J. Xie 6. Discussion Traditional forward selection is a heuristic approach, not guaranteeing an optimal solution. LARS, a less greedy version of traditional forward selection methods, however, is shown by Efron et al. [3] to be closely related to Lasso, which possesses optimal properties under appropriate conditions [15 17]. Our proposed methods take advantage of the LARS procedure while aiming at group selections for dependent data. The proposed methods do not require prior information on the underlying group structures but construct groups along the selection procedure. Our grouping criterions consider the joint information of x and y therefore better fit the context of variable selection than standard clustering on x alone. On the other hand, any prior information on the model or groups can be easily incorporated into the algorithms of glars and gridge by manually selecting certain variables at specific steps. The current methods may be improved by exploring different thresholds (t 1,t 2 ) in the grouping definition. One of our future works is to extend the proposed group selection methods to general regression models, where y may depend on x through any nonlinear function. The proposed methods offer fast approaches that are more appropriate than other variable selection algorithms for data with complicated dependent structures. Acknowledgements This work is supported by US National Science Foundation Grant DMS References [1] M. Segal, K. Dahlquist, and B. Conklin, Regression approach for microarray data analysis, J. Comput. Biol. 10(6) (2003), pp [2] R. Tibshirani, Regression shrinkage and selection via the lasso, JRSS B 58(1) (1996), pp [3] B. Efron, I. Johnstone, T. Hastie, and R. Tibshirani, Least angle regression, Ann. Stat. 32(2) (2004), pp [4] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, JRSS B 67(2) (2005), pp [5] H.D. Bondell and B.J. Reich, Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR, Biometrics 64(1) (2008), pp [6] M.Y. Park, T. Hastie, and R. Tibshirani, Averaged gene expression for regression, Biostatistics 8(2), (2006), pp [7] M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, JRSS B 68(1) (2006), pp [8] S. Ma, X. Song, and J. Huang, Supervised group lasso with applications to microarray data analysis, BMC Bioinform. 8 (2007), 60, doi: / [9] M.Y. Park and T. Hastie, Regularization path algorithms for detecting gene interactions, Technical Report, Department of Statistics, Standford University, [10] H. Zou, The adaptive lasso and its oracle properties, JASA 101(476) (2006), pp [11] Z.J. Daye and X.J. Jeng, Shrinkage and model selection with correlated variables via weighted fusion, Technical Report, Department of Statistics, Purdue University, [12] T. Stamey, J. Kabalin, J.I. Johnstone, F. Freiha, E. Redwine, and N. Young, Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate, ii: Radical prostatectomy treated patients, J. Urol. 141(5) (1989), pp [13] D.G. Wang, J. Fan, C. Siao, A. Berno, P. Young, R. Sapolsky, G. Ghandour, N. Perkins, E. Winchester, J. Spencer, L. Kruglyak, L. Stein, L. Hsie, T. Topaloglou, E. Hubbell, E. Robinson, M.S. Morris, N. Shen, D. Kilburn, J. Rioux, C. Nusbaum, S. Rozen, T.J. Hudson, R. Lipshutz, M. Chee, and E.S. Lander, Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome, Science 280(5366) (1998), pp [14] M. Morley, C.M. Molony, T.M. Weber, J.L. Devlin, K.G. Ewens, R.S. Spielman, and V.G. Cheung, Genetic analysis of genome-wide variation in human gene expression, Nature 430(7001) (2004), pp [15] K. Knight and W. Fu, Asymptotics for lasso-type estimators, Technometrics 12(1) (2000), pp [16] P. Zhao and B. Yu, On model selection consistency of lasso, J. Mach. Learn. Res. 7 (2006), pp [17] C. Zhang and J. Huang, The sparsity and bias of the lasso selection in high-dimensional linear regression, Ann. Stat. 36(4) (2008), pp

Group Variable Selection Methods and Their Applications in Analysis of Genomic Data

Group Variable Selection Methods and Their Applications in Analysis of Genomic Data 1 Group Variable Selection Methods and Their Applications in Analysis of Genomic Data Jun Xie 1 and Lingmin Zeng 2 1 Department of Statistics, Purdue University, 250 N. University Street, West Lafayette,

More information

Nonnegative Garrote Component Selection in Functional ANOVA Models

Nonnegative Garrote Component Selection in Functional ANOVA Models Nonnegative Garrote Component Selection in Functional ANOVA Models Ming Yuan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 3033-005 Email: myuan@isye.gatech.edu

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays

Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Hui Zou and Trevor Hastie Department of Statistics, Stanford University December 5, 2003 Abstract We propose the

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

Accepted author version posted online: 28 Nov 2012.

Accepted author version posted online: 28 Nov 2012. This article was downloaded by: [University of Minnesota Libraries, Twin Cities] On: 15 May 2013, At: 12:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

Consistent Selection of Tuning Parameters via Variable Selection Stability

Consistent Selection of Tuning Parameters via Variable Selection Stability Journal of Machine Learning Research 14 2013 3419-3440 Submitted 8/12; Revised 7/13; Published 11/13 Consistent Selection of Tuning Parameters via Variable Selection Stability Wei Sun Department of Statistics

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Self-adaptive Lasso and its Bayesian Estimation

Self-adaptive Lasso and its Bayesian Estimation Self-adaptive Lasso and its Bayesian Estimation Jian Kang 1 and Jian Guo 2 1. Department of Biostatistics, University of Michigan 2. Department of Statistics, University of Michigan Abstract In this paper,

More information

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy This article was downloaded by: [Ferdowsi University] On: 16 April 212, At: 4:53 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Online publication date: 22 March 2010

Online publication date: 22 March 2010 This article was downloaded by: [South Dakota State University] On: 25 March 2010 Access details: Access Details: [subscription number 919556249] Publisher Taylor & Francis Informa Ltd Registered in England

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Use and Abuse of Regression

Use and Abuse of Regression This article was downloaded by: [130.132.123.28] On: 16 May 2015, At: 01:35 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Online publication date: 12 January 2010

Online publication date: 12 January 2010 This article was downloaded by: [Zhang, Lanju] On: 13 January 2010 Access details: Access Details: [subscription number 918543200] Publisher Taylor & Francis Informa Ltd Registered in England and Wales

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Park, Pennsylvania, USA. Full terms and conditions of use:

Park, Pennsylvania, USA. Full terms and conditions of use: This article was downloaded by: [Nam Nguyen] On: 11 August 2012, At: 09:14 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Online publication date: 01 March 2010 PLEASE SCROLL DOWN FOR ARTICLE

Online publication date: 01 March 2010 PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [2007-2008-2009 Pohang University of Science and Technology (POSTECH)] On: 2 March 2010 Access details: Access Details: [subscription number 907486221] Publisher Taylor

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Guangzhou, P.R. China

Guangzhou, P.R. China This article was downloaded by:[luo, Jiaowan] On: 2 November 2007 Access Details: [subscription number 783643717] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number:

More information

Characterizations of Student's t-distribution via regressions of order statistics George P. Yanev a ; M. Ahsanullah b a

Characterizations of Student's t-distribution via regressions of order statistics George P. Yanev a ; M. Ahsanullah b a This article was downloaded by: [Yanev, George On: 12 February 2011 Access details: Access Details: [subscription number 933399554 Publisher Taylor & Francis Informa Ltd Registered in England and Wales

More information

Post-selection Inference for Forward Stepwise and Least Angle Regression

Post-selection Inference for Forward Stepwise and Least Angle Regression Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September

More information

Gilles Bourgeois a, Richard A. Cunjak a, Daniel Caissie a & Nassir El-Jabi b a Science Brunch, Department of Fisheries and Oceans, Box

Gilles Bourgeois a, Richard A. Cunjak a, Daniel Caissie a & Nassir El-Jabi b a Science Brunch, Department of Fisheries and Oceans, Box This article was downloaded by: [Fisheries and Oceans Canada] On: 07 May 2014, At: 07:15 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable

More information

Discussion on Change-Points: From Sequential Detection to Biology and Back by David Siegmund

Discussion on Change-Points: From Sequential Detection to Biology and Back by David Siegmund This article was downloaded by: [Michael Baron] On: 2 February 213, At: 21:2 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive: Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

A significance test for the lasso

A significance test for the lasso 1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding

Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding arxiv:204.2353v4 [stat.ml] 9 Oct 202 Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding Kun Yang September 2, 208 Abstract In this paper, we propose a new approach, called

More information

Diatom Research Publication details, including instructions for authors and subscription information:

Diatom Research Publication details, including instructions for authors and subscription information: This article was downloaded by: [Saúl Blanco] On: 26 May 2012, At: 09:38 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

STK Statistical Learning: Advanced Regression and Classification

STK Statistical Learning: Advanced Regression and Classification STK4030 - Statistical Learning: Advanced Regression and Classification Riccardo De Bin debin@math.uio.no STK4030: lecture 1 1/ 42 Outline of the lecture Introduction Overview of supervised learning Variable

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Linear Regression. Volker Tresp 2018

Linear Regression. Volker Tresp 2018 Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w

More information

Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R

Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R Fadel Hamid Hadi Alhusseini Department of Statistics and Informatics, University

More information

Erciyes University, Kayseri, Turkey

Erciyes University, Kayseri, Turkey This article was downloaded by:[bochkarev, N.] On: 7 December 27 Access Details: [subscription number 746126554] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number:

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Lecture 4: Newton s method and gradient descent

Lecture 4: Newton s method and gradient descent Lecture 4: Newton s method and gradient descent Newton s method Functional iteration Fitting linear regression Fitting logistic regression Prof. Yao Xie, ISyE 6416, Computational Statistics, Georgia Tech

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Dissipation Function in Hyperbolic Thermoelasticity

Dissipation Function in Hyperbolic Thermoelasticity This article was downloaded by: [University of Illinois at Urbana-Champaign] On: 18 April 2013, At: 12:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

A significance test for the lasso

A significance test for the lasso 1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra

More information

Full terms and conditions of use:

Full terms and conditions of use: This article was downloaded by:[rollins, Derrick] [Rollins, Derrick] On: 26 March 2007 Access Details: [subscription number 770393152] Publisher: Taylor & Francis Informa Ltd Registered in England and

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Introduction to Statistics and R

Introduction to Statistics and R Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary

More information

Some models of genomic selection

Some models of genomic selection Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical

More information

Exploiting Covariate Similarity in Sparse Regression via the Pairwise Elastic Net

Exploiting Covariate Similarity in Sparse Regression via the Pairwise Elastic Net Exploiting Covariate Similarity in Sparse Regression via the Pairwise Elastic Net Alexander Lorbert, David Eis, Victoria Kostina, David M. Blei, Peter J. Ramadge Dept. of Electrical Engineering, Dept.

More information

Derivation of SPDEs for Correlated Random Walk Transport Models in One and Two Dimensions

Derivation of SPDEs for Correlated Random Walk Transport Models in One and Two Dimensions This article was downloaded by: [Texas Technology University] On: 23 April 2013, At: 07:52 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Online publication date: 30 March 2011

Online publication date: 30 March 2011 This article was downloaded by: [Beijing University of Technology] On: 10 June 2011 Access details: Access Details: [subscription number 932491352] Publisher Taylor & Francis Informa Ltd Registered in

More information

PENALIZED METHOD BASED ON REPRESENTATIVES & NONPARAMETRIC ANALYSIS OF GAP DATA

PENALIZED METHOD BASED ON REPRESENTATIVES & NONPARAMETRIC ANALYSIS OF GAP DATA PENALIZED METHOD BASED ON REPRESENTATIVES & NONPARAMETRIC ANALYSIS OF GAP DATA A Thesis Presented to The Academic Faculty by Soyoun Park In Partial Fulfillment of the Requirements for the Degree Doctor

More information

Tutz, Binder: Boosting Ridge Regression

Tutz, Binder: Boosting Ridge Regression Tutz, Binder: Boosting Ridge Regression Sonderforschungsbereich 386, Paper 418 (2005) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Boosting Ridge Regression Gerhard Tutz 1 & Harald Binder

More information

Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding

Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding Kun Yang Trevor Hastie April 0, 202 Abstract Variable selection in linear models plays a pivotal role in modern statistics.

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015 Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Some new ideas for post selection inference and model assessment

Some new ideas for post selection inference and model assessment Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve

More information

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Exploratory quantile regression with many covariates: An application to adverse birth outcomes Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Priors on the Variance in Sparse Bayesian Learning; the demi-bayesian Lasso

Priors on the Variance in Sparse Bayesian Learning; the demi-bayesian Lasso Priors on the Variance in Sparse Bayesian Learning; the demi-bayesian Lasso Suhrid Balakrishnan AT&T Labs Research 180 Park Avenue Florham Park, NJ 07932 suhrid@research.att.com David Madigan Department

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Penalized methods in genome-wide association studies

Penalized methods in genome-wide association studies University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Penalized methods in genome-wide association studies Jin Liu University of Iowa Copyright 2011 Jin Liu This dissertation is

More information

University, Wuhan, China c College of Physical Science and Technology, Central China Normal. University, Wuhan, China Published online: 25 Apr 2014.

University, Wuhan, China c College of Physical Science and Technology, Central China Normal. University, Wuhan, China Published online: 25 Apr 2014. This article was downloaded by: [0.9.78.106] On: 0 April 01, At: 16:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 10795 Registered office: Mortimer House,

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

The Fourier transform of the unit step function B. L. Burrows a ; D. J. Colwell a a

The Fourier transform of the unit step function B. L. Burrows a ; D. J. Colwell a a This article was downloaded by: [National Taiwan University (Archive)] On: 10 May 2011 Access details: Access Details: [subscription number 905688746] Publisher Taylor & Francis Informa Ltd Registered

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

PLEASE SCROLL DOWN FOR ARTICLE. Full terms and conditions of use:

PLEASE SCROLL DOWN FOR ARTICLE. Full terms and conditions of use: This article was downloaded by: [Stanford University] On: 20 July 2010 Access details: Access Details: [subscription number 917395611] Publisher Taylor & Francis Informa Ltd Registered in England and Wales

More information

The American Statistician Publication details, including instructions for authors and subscription information:

The American Statistician Publication details, including instructions for authors and subscription information: This article was downloaded by: [National Chiao Tung University 國立交通大學 ] On: 27 April 2014, At: 23:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification

Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification Tong Tong Wu Department of Epidemiology and Biostatistics

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

STRUCTURED VARIABLE SELECTION AND ESTIMATION

STRUCTURED VARIABLE SELECTION AND ESTIMATION The Annals of Applied Statistics 2009, Vol. 3, No. 4, 1738 1757 DOI: 10.1214/09-AOAS254 Institute of Mathematical Statistics, 2009 STRUCTURED VARIABLE SELECTION AND ESTIMATION BY MING YUAN 1,V.ROSHAN JOSEPH

More information

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection Jamal Fathima.J.I 1 and P.Venkatesan 1. Research Scholar -Department of statistics National Institute For Research In Tuberculosis, Indian Council For Medical Research,Chennai,India,.Department of statistics

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information