Lingmin Zeng a & Jun Xie b a Department of Biostatistics, MedImmune, One MedImmune Way,
|
|
- Sabrina Reynolds
- 6 years ago
- Views:
Transcription
1 This article was downloaded by: [Jun Xie] On: 27 July 2011, At: 09:44 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: Registered office: Mortimer House, Mortimer Street, London W1T 3JH, UK Journal of Statistical Computation and Simulation Publication details, including instructions for authors and subscription information: Group variable selection for data with dependent structures Lingmin Zeng a & Jun Xie b a Department of Biostatistics, MedImmune, One MedImmune Way, Gaithersburg, MD, 20878, USA b Department of Statistics, Purdue University, 250 N. University Street, West Lafayette, IN, 47907, USA Available online: 25 May 2011 To cite this article: Lingmin Zeng & Jun Xie (2011): Group variable selection for data with dependent structures, Journal of Statistical Computation and Simulation, DOI: / To link to this article: PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan, sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
2 Journal of Statistical Computation and Simulation ifirst, 2011, 1 12 Group variable selection for data with dependent structures Lingmin Zeng a and Jun Xie b a Department of Biostatistics, MedImmune, One MedImmune Way, Gaithersburg, MD 20878, USA; b Department of Statistics, Purdue University, 250 N. University Street, West Lafayette, IN 47907, USA (Received 19 April 2010; final version received 4 October 2010 ) Variable selection methods have been widely used in the analysis of high-dimensional data, for example, gene expression microarray data and single nucleotide polymorphism data. A special feature of the genomic data is that genes participating in a common metabolic pathway or sharing a similar biological function tend to have high correlations. The collinearity naturally embedded in these data requires special handling, which cannot be provided by existing variable selection methods. In this paper, we propose a set of new methods to select variables in correlated data. The new methods follow the forward selection procedure of least angle regression (LARS) but conduct grouping and selecting at the same time. The methods specially work when no prior information on group structures of data is available. Simulations and real examples show that our proposed methods often outperform the existing variable selection methods, including LARS and elastic net, in terms of both reducing prediction error and preserving sparsity of representation. Keywords: hard thresholding; least angle regression; variable selection 1. Introduction Regression is simple but the most useful statistical method in data analysis. The goal of regression analysis is to discover the relationship between a response y and a set of predictors x 1,x 2,...,x p. Variable selection methods have long been used in regression analysis, for example forward selection, backward elimination, best subset regression. The number of variables p in the traditional setting is typically 10 or at most a few dozen. Modern scientific technology, led by the microarray, has produced data dramatically above the conventional scale. We have p = 1000 to 10,000 in gene expression microarray data, and p up to 500,000 in single nucleotide polymorphism (SNP) data. To make things more complicated, the large number of variables in the biological data are dependent. For example, in analysis of gene expression microarray data, it is well known that for genes that share a common biological function or participate in the same metabolic pathway, the pairwise correlations among them can be very high [1]. Traditional variable selection methods that select variables one by one may miss important group effects on Corresponding author. junxie@purdue.edu ISSN print/issn online 2011 Taylor & Francis DOI: /
3 2 L. Zeng and J. Xie pathways. Consequently, when traditional variable selection methods are applied in multiple data sets from a common biological system, the selected variables from the multiple studies may show little overlap. The vast dimension of genomic data posts a challenge to the traditional variable selection methods. Several promising techniques have been proposed to overcome the dimension limitation, including Lasso [2], LARS [3], and elastic net [4]. Lasso (least absolute shrinkage and selection operator) imposes an L 1 norm bound on the regression coefficients while minimizing the residual sum of squares. Due to the nature of the L 1 penalty, Lasso shrinks the regression coefficients towards 0 and produces some coefficients that are exactly 0, and hence implements variable selection. Efron et al. [3] proposed a less greedy forward variable selection algorithm named least angle regression (LARS). With a slight modification, LARS calculates all possible Lasso estimators but uses computer times an order of magnitude less than Lasso. The efficiency of the LARS algorithm makes it widely used in variable selection. However, for highly correlated variables, Lasso tends to select only one variable instead of the whole group. Zou and Hastie [4] proposed the elastic net method by combining L 2 and L 1 penalties on the regression coefficients. The convex function induced by the L 2 penalty helped elastic net to achieve a grouping effect that highly correlated variables would be in or out of the model together. More specifically, elastic net estimators of regression coefficients of highly correlated variables tended to be equal (up to a change of sign). Analogously, Bondell and Reich [5] proposed a penalized regression method called OSCAR, with an octagonal-shaped constraint for the coefficients, to encourage selections of a group of highly correlated variables. Elastic net often outperformed Lasso in terms of prediction error on correlated data. OSCAR had a comparable performance with elastic net [5]. However, it was unclear how groups were defined in both methods. Alternatively, for data with dependent structures, a naive approach is a two-step procedure: first cluster highly correlated variables into groups then select among the groups. Park et al. [6] applied hierarchical clustering on gene expression data in the first step. For each cluster (group), they then averaged the genes and took the average as a supergene to fit a regression model. Yuan and Lin [7] developed group LARS and group Lasso, which extended LARS and Lasso to select groups of variables. However, group LARS (group Lasso) required information of group structures before hand. Hence the method was in fact the second step of a two-step procedure. Another method was supervised group Lasso proposed by Ma et al. [8], which was also a two-step procedure. After dividing genes into clusters using the K-means method, they first identified important genes within each group by Lasso and then selected important clusters using the group Lasso. A limitation of the two-step procedures is that the first step of clustering analysis does not incorporate the information of the response variable y. In other words, grouping variables based on their correlations alone is not appropriate for variable selection problem, which requires the joint information of the predictors and the response variable. In this paper, we propose two new variable selection algorithms glars and gridge, for data with dependent structures. Our methods conduct grouping and selecting at the same time therefore work well when prior information of group structures is not available. Similar to LARS, the proposed methods are forward procedures, which are computationally efficient and provide a piecewise linear solution path. In Section 2, we define a variable selection criterion in twodimensional space and then specify the glars and gridge algorithms. We compare the proposed methods with ordinary least squares (OLS), ridge regression, LARS, and elastic net through simulations in Section 3. We study an example on prostate cancer in Section 4. In Section 5, we apply the methods in an example of genetic variation SNP data, which contains a few thousands of variables (SNPs) with complicated dependence. We summarize and discuss future works in Section 6.
4 2. glars and gridge algorithms Consider a linear regression model Journal of Statistical Computation and Simulation 3 y = Xβ + ɛ, where y = (y 1,y 2,...,y n ) T is the response variable, X = (x 1, x 2,...,x p ) is the predictor matrix, and ɛ = (ɛ 1,ɛ 2,...,ɛ n ) T is a vector of independent and identically distributed random errors with mean 0 and variance σ 2. There are n observations and p predictors, where p may be larger than n. We centre the response variable and standardize the column vectors of the predictor matrix. Hence, there is no intercept in our model. n y i = 0, i=1 n x ij = 0, i=1 n xij 2 = 1 i=1 for j = 1, 2,...,p Grouping definition We start our methods with a grouping definition. Predictors form a group if they satisfy both the two criterions: They are highly correlated with the response variable (or current residual); They are highly correlated with each other. The correlation thresholds for the two criterions are data driven. The first threshold is suggested to be the 75th percentile of all correlations (in absolute value) between the current residual and unselected predictors. The second criterion decides group size. Its threshold is chosen from a set of grids, for instance, {0.9, 0.8, 0.7, 0.6}, by cross-validation (CV). In practices, the distribution of all pairwise correlations among predictors gives an idea on the set of grids. Then we decide on the threshold by CV, in addition to the tuning parameters introduced in Section 2.4. An important difference of the proposed methods from the traditional selection procedures is that our variable selection criterion has two components hence is defined by a region in the twodimensional space, i.e. (t 1,t 2 ) in the algorithm specified next. In addition, the first requirement of selecting a variable highly correlated with the response variable is not affected by collinearity among predictors glars algorithm Before introducing the new glars algorithm, we first briefly review the LARS procedure. At the beginning of LARS, a predictor enters the model if its absolute correlation with the response is the largest one among all predictors. The coefficient of this predictor grows in its OLS direction until another predictor has the same correlation with the current residual (i.e. equal-angle). Next, both coefficients of the two predictors begin to move along their OLS directions until a third predictor has the same correlation with the current residual as the first two. The whole process continues until all predictors enter the model. In each step, one variable adds into the model and the solution paths, which are the coefficient estimators as functions of the tuning parameter (defined later in Formula (1)), are extended in a piecewise linear fashion. After all variables enter the model, the whole LARS solution paths complete. In the glars algorithm, we start as LARS to select a predictor which has the largest correlation with the response. We call this predictor a leader element. We then build a group based on this leader element and the current residual according to the grouping definition. Note that both criterions in Section 2.1 have to be satisfied when selecting a variable. Once a group has been
5 4 L. Zeng and J. Xie constructed, it will be represented by a unique direction in R n as the linear combination of the OLS directions of all variables in the group. Next, we choose another leader element, analogous to the equal-angle requirement of the LARS algorithm. A new group is formed again following the grouping definition. We extend the solution paths in a piecewise linear format. The whole process continues until all predictors enter the model. The detailed algorithm is described below. Note that in Step 5 we adopt the idea of Park and Hastie [9] to replace the average squared correlation by the average absolute correlation to make the algorithm robust. (1) Initialization: Set the step index k = 1, β [0] = 0, residual r [0] = Y, active set A 0 =, inactive set A C 0 ={X 1,X 2,...,X p }. (2) Identify the leader predictor x for the first group, where x = argmax xi x i r[k 1], x i A C k 1. (3) Construct the group G k with the leader predictor x from Step 2 according to the two criterions: x j r[k 1] >t 1 and x j x >t 2, x j A C k 1, where t 1 = 75th percentile of all correlations between x j and r [k 1], and t 2 = t {0.9, 0.8, 0.7, 0.6}. Set A k = A k 1 G k, A C k = AC k 1 \ G k. (4) Compute the current direction γ with components γ Ak = (X A k X Ak ) 1 X A k r [k 1], γ A C k = 0, where X Ak denotes the matrix comprised of the columns of X corresponding to A k. (5) Calculate how far the glars algorithm progresses in direction γ. It divides into two small steps: Find x j in A C k which corresponds to the smallest α (0, 1] such that X G j (r [k 1] αxγ) L1 p j = x j (r[k 1] αxγ), where G j is a group from A k, p j is the number of variables in group G j, and. L1 represents the sum of absolute values. Justification. As in Step 3, find the group with the leader predictor x j selected above and denote the group as X Gj. Recalculate α (0, 1] for this selected new group such that X G j (r [k 1] αxγ) L1 p j = X (r[k 1] Gj αxγ) L1. p j Update β [k] = β [k 1] + αγ, r [k] = Y Xβ [k]. (6) Update k to k + 1, and A k = A k 1 G j, A C k = AC k 1 \ G j. (7) If the number of variables in A k is smaller than min(p, n), return to Step 4. Otherwise, set γ, β, and r to be the OLS solutions and stop gridge algorithm OLS would perform poorly when the correlations among the predictors are high and/or the noise level is high. Since both LARS and glars move towards OLS directions in each step, they have the same limitation. Ridge estimators, on the other hand, perform better in this situation. We propose a gridge algorithm, which moves towards ridge estimator direction in each step. The relationship
6 Journal of Statistical Computation and Simulation 5 between ridge estimator ˆβ(λ) and OLS estimator ˆβ satisfies ˆβ(λ) = (X X + λi) 1 X Y = (X X(I + λ(x X) 1 )) 1 X Y = (I + λ(x X) 1 ) 1 ˆβ = C ˆβ, where C = (I + λ(x X) 1 ) 1 and λ is the ridge parameter. The gridge algorithm is thus a simple modification of the glars algorithm. When a group is constructed, gridge represents the group by a unique direction from the linear combination of the ridge directions of all variables in the group. The variable coefficients are moving towards the ridge directions. As we run simulations, we notice that gridge outperforms other methods in terms of relative prediction errors (RPEs, defined in the simulations next). However, this method is limited by its relatively large number of false positives due to an over-grouping effect. In other words, although ridge shrinks small coefficients, it does not truncate them to zero for variable selection. We propose to add a hard threshold δ to gridge estimators so that small (but nonzero) coefficients will be removed, i.e. β j = βˆ j I( βˆ j >δ). Based on simulations, we define a threshold δ = σ log(p)/n. Hence smaller error term, smaller number of predictors, or larger sample size give smaller threshold. We name the modified gridge algorithm gridge_new, with this hard threshold filtering. The following simulation studies will show that gridge_new not only preserves low prediction errors but also greatly reduces false positives Tuning parameters As LARS does, both glars and gridge produce the entire piecewise linear solution paths with all p (or n when p>n) variables in the model. Groups of variables are selected when we stop the paths after a certain number of steps. The number of steps k is the tuning parameter. Equivalently, we may use a tuning parameter as the fraction of the L 1 norm of the coefficients s = j_selected ˆβ j L1 j ˆβ j L1. (1) For glars, s (or k) is the only tuning parameter. We determine it by a standard five-fold CV. For gridge, there are two tuning parameters, the ridge parameter λ in addition to s (or k). Hence we cross-validate on two dimensions. We first choose a grid for λ, say {0.01, 0.1, 1, 10, 100, 1000}. The gridge algorithm produces the entire solution paths for each λ. Then the parameter s (or k) is selected by five-fold CV. At the end, we choose the λ value which gives the smallest CV error. 3. Simulation studies Simulation studies are used to compare the proposed glars and gridge with OLS, ridge regression, LARS, and elastic net. The simulated data are generated from the true model y = Xβ + σɛ, ɛ N(0, 1). Four examples are presented for different scenarios in the following. Each example contains a training set and a test set. The tuning parameters are selected on the training set by fivefold CV. Variable selection methods are compared in terms of RPE [10] and selection accuracy on the test set. The RPE is defined as RPE = ( ˆβ β) T ( ˆβ β)/σ 2, where is the population covariance matrix of X. Selection accuracy is illustrated by the numbers of true and false positives
7 6 L. Zeng and J. Xie of the selected variables. In each example, we replicate the simulation 100 times and report the median RPEs and median true/false positives. The four scenarios are given by (1) Example 1: there are 100 and 200 observations in the training and test sets, respectively. The true parameters are β = (3, 1.5, 0, 0, 2, 0, 0, 0) and σ = 3. The pairwise correlation between x i and x j is set to be corr(x i, x j ) = 0.5 i j, i, j = 1,...,8. This example creates a sparse model with a few large effects and the predictors have first-order autoregressive correlations. (2) Example 2 (adopted from [11]): we simulate 100 and 400 observations in the training and test sets, respectively. We set the true parameters as and σ = 6. The predictors are generated as β = (3,...,3, 1.5,...,1.5, 0,...,0) }{{}}{{}}{{} x i = Z + εi x, Z N(0, 1), i = 1,...,15, x i N(0, 1), i.i.d., i = 16,...,40, where εi x are independent identically distributed N(0, 0.01), i = 1,...,15. This example creates one group from the first 15 highly correlated covariates. The next five covariates are independent but provide signals on the response variable. (3) Example 3: we simulate 100 and 400 observations in the training and test sets, respectively. We set the true parameters as and σ = 15. The predictors are generated as β = (3,...,3, 0,...,0) }{{}}{{} x i = Z 1 + εi x, Z 1 N(0, 1), i = 1,...,5, x i = Z 2 + εi x, Z 2 N(0, 1), i = 6,...,10, x i = Z 3 + εi x, Z 3 N(0, 1), i = 11,...,15, x i N(0, 1), i.i.d., i = 16,...,40, where εi x are independent identically distributed N(0, 0.01), i = 1,...,15. There are three equally important groups with five members in each. There are also 25 noise variables. (4) Example 4: we simulate 100 and 200 observations in the training and test sets, respectively. We set the true parameters as β = (3, 3, 3, 0, 0, 3, 3, 3, 0, 0, 3, 3, 3, 0, 0, 0,...,0). }{{}}{{}}{{}}{{} The predictors and the error terms are the same as in Example 4. There are also three equally important groups with five members in each of them. However, in each group, there are two noise variables, which have no effect on the response variable but are highly correlated with the other three important variables. There are totally 31 noise variables. Table 1 summarizes the prediction results in terms of RPE. The methods with the smallest RPE are highlighted in each example. The OLS estimate, which includes all variables in the regression model, has the highest prediction error. The other variable selection methods improve performance with smaller prediction errors in all the examples. Specifically, elastic net produces
8 Journal of Statistical Computation and Simulation 7 Table 1. Median relative prediction errors (RPE) and their standard errors (SE) for the four simulated examples based on 100 replications. Example 1 Example 2 Example 3 Example 4 Methods RPE SE RPE SE RPE SE RPE SE OLS Ridge LARS Elastic net glars gridge gridge new The SEs of RPEs are estimated by using bootstrap with B = 1000 resampling iterations on the 100 RPE. The bold numbers indicate the methods with the smallest prediction errors. the smallest prediction errors in Example 1, where variables are correlated but not in group structures. gridge and gridge_new produce the smallest prediction errors in most of the examples. Their reductions of RPEs in Examples 2 4 are 23.1%, 12.5%, and 37.8%, respectively, compared with elastic net. We also notice that while estimating the large coefficients close to the true coefficients, gridge tends to select more variables than elastic net, owing to its over grouping effect from ridge. After we add a hard threshold to gridge, gridge_new estimators achieve the best performance. Our first method glars gives comparable prediction errors with LARS. However, coefficient estimators show that glars has a clear pattern of selecting groups of variables in Examples 2 4. That is, variables in a common group enter the model together. The grouping effect of glars and gridge is partially demonstrated in Table 2, which compares different methods in terms of variable selection accuracy. Column TP represents the median number of nonzero coefficients correctly identified as nonzeros, or true positives. Column FP represents the median number of zero coefficients incorrectly specified as nonzeros, or false positives. The OLS and ridge estimators have large true positives but also large false positives. The LARS algorithm greatly lowers the false positives from OLS but misses many true effective variables in Example 2 4, due to the limitation in analysing data with dependent (group) structures. Elastic net and our proposed glars improve OLS and LARS, especially in Examples 2 4, whereas glars is better than elastic net in Examples 2 and 4. gridge has larger false positives than glars and elastic net. After adding the threshold, gridge_new achieves the best results. In summary, our simulations show that gridge_new outperforms the existing variable selection methods and has a greater advantage for data with group structures. Table 2. Median numbers of true positives (TP, nonzero coefficients) and false positives (FP, zero coefficients misspecified as nonzero coefficients) of the four simulations based on 100 replications. Example 1 Example 2 Example 3 Example 4 Methods TP FP TP FP TP FP TP FP OLS Ridge LARS Elastic net glars gridge gridge new The bold numbers indicate the methods with the largest true positives and the smallest false positives.
9 8 L. Zeng and J. Xie 4. Prostate cancer example An example that has been analysed by many variable selection methods, including Lasso and elastic net, is about prostate cancer from a study by Stamey et al. [12]. There were 97 observations collected from men who were about to receive a radical prostatectomy. We want to discover the relationship between the level of prostate-specific antigen and several clinical endpoints. The response variable is log(prostate specific antigen) (lpsa). The clinical endpoints are x 1 log(cancer volume) (lcavol), x 2 log(prostate weight) (lweight), x 3 age, x 4 log(benign prostatic hyperplasia amount) (lbph), x 5 seminal vesicle invasion (svi), x 6 log(capsular penetration) (lcp), x 7 Gleason score (gleason), and x 8 percentage Gleason scores 4 or 5 (pgg45). The correlation between x 7 gleason and x 8 pgg45 is and that between x 1 lcavol and x 6 lcp is These correlations indicate a moderate collinearity among the data. We split data into two parts: a training set with 67 observations and a test set with 30 observations. We choose the tuning parameters by 10-fold CV in the training set. After fitting the model, we calculate prediction errors, which are residual sum of squares, on the test data and compare our proposed methods with OLS, LARS, and elastic net. All predictors are centred and standardized to have mean 0 and standard deviation 1. Standardized coefficients LAR beta /max beta Standardized coefficients Elastic net beta /max beta lcp gleason svi lcavol glar gridge Standardized coefficients beta /max beta Standardized coefficients beta /max beta Figure 1. Solution paths of LARS, elastic net, glars, and gridge on the prostate cancer data.
10 Journal of Statistical Computation and Simulation 9 Table 3. Lars Elastic net glar gridge The order of covariates entering the model in the prostate cancer example. lcavol svi lweight pgg45 lbph gleason age lcp lcavol svi lweight pgg45 lbph gleason age lcp lcavol, lcp svi lweight gleason, pgg45 lbph age lcavol, lcp svi lweight lbph, gleason, pgg45 age Table 4. Coefficients estimators from OLS, LARS, elastic net, glars, and gridge methods for the prostate cancer example. Coefficients OLS LARS Elastic glars gridge gridge new lcavol lweight age lbph svi lcp gleason pgg Figure 1 gives the solution paths of LARS, elastic net, glars, and gridge. The orders that the variables are selected into the models are specified in Table 3. The methods of glars and gridge select two groups of variables, lcavol and lcp, and gleason and pgg45, respectively, due to their high correlations. Table 4 gives the coefficient estimators from different methods. All methods suggest that logarithm of cancer volume (lcavol), logarithm of prostate weight (lweight), and seminal vesicle invasion (svi) are important variables in explaining the response variable of prostate-specific antigen. LARS and elastic net also select logarithm of benign prostate hyperplasia (lbph), gleason, and pgg45, whereas glars and gridge select lcp with lcavol together and pgg45 with gleason together because of the grouping effect. The gridge_new method gives the most sparse model, removing gleason and pgg45 after the hard thresholding. Table 5 demonstrates the prediction errors in the test set. Our proposed methods outperform other existing methods by reducing prediction errors. This example also shows that gridge_new has the best performance with the lowest prediction error and the most parsimonous model. Table 5. The prediction errors by OLS, LARS, elastic net, glars, and gridge on prostate cancer data. Methods Prediction error OLS LARS Elastic net glars gridge gridge new An example on SNP data analysis SNPs are the most common genetic variations in the human genome, occurring once in several hundred base pairs. An SNP is a position at which two alternative bases occur at appreciable frequency (>1%) in the human population. SNPs can serve as genetic markers for identifying disease
11 10 L. Zeng and J. Xie genes by linkage studies in families, linkage disequilibrium in isolated populations, association analysis of patients and controls, and loss-of-heterozygosity studies in tumours [13]. With the technique advances, genome-wide association studies become popular to detect the specific DNA variants that contribute to human phenotypes and particularly human diseases. Natural variation in the baseline expression of many genes can be considered as heritable traits. Morley et al. [14] collected microarray and SNP data to localize the DNA variants that contribute to the expression phenotypes. The data consist of 14 families with 56 unrelated individuals (the grandparents). There are 8500 genes on the microarray and 2756 SNP markers genotyped for each individual. The expression level of a given gene is the response variable and the 2756 SNP markers are the predictors in our model. We code each SNP as 0, 1, 2 for wild-type homozygous, heterozygous, and mutation (rare) homozygous genotypes, respectively, according to the genotype frequency. We first screen data to exclude SNPs that have a call rate less than 95% or minor allele frequency less than 2.5%. The number of SNPs is reduced to 1739 after screening. Then we select 500 most variable SNPs as the potential predictors. The variability of an SNP is measured by its sample variance. Consider gene ICAP-1A, which is the top gene in Morley et al. s [14] Table 1 with the strongest linkage evidence. ICAP-1A, also named ITGB1BP1, encodes a protein binding to the beta 1 integrin cytoplasmic domain. It may play a role during integrin-dependent cell adhesion. Here, we apply multiple linear regression and the proposed group selection methods to search for an optimal set of SNPs for gene ICAP-1A. Linear regression over SNPs coded as 0, 1, and 2 captures linear relationship between the gene expression and different types of SNPs, and hence is commonly used as a simple tool in SNP data analysis. We split data into the training set with 42 observations and the test set with 14 observations. Model fitting and choices of the tuning parameters are based on the training set. The first grouping criterion is set up to be the 75th percentile of all correlations between x and y. The second grouping criterion requires the correlation with the leader element to be greater than 0.6. The prediction error Table 6. Test prediction errors of Lasso, elastic net, glars, and gridge for the SNP data. Methods Test prediction error Number of genes Tuning parameter s LARS Elastic net (λ 2 = 0.01) glars gridge (λ 2 = 0.01) Table 7. The first 12 steps of predictors selected by Lasso, elastic net, glars, and gridge for the SNP data. Methods LASSO Elastic net glars gridge Step 1 x 458 x 458 x 458 x 458 Step 2 x 481 x 481 x 481 x 481 Step 3 x 321 x 321 x 321, x 131 x 321, x 131 Step 4 x 287 x 287 x 287 x 287 Step 5 x 240 x 240 x 240 x 240 Step 6 x 76 x 76 x 406 x 406 Step 7 x 406 x 131 x 76 x 76 Step 8 x 131 x 406 x 102 x 102 Step 9 x 345 x 345 x 345 x 345 Step 10 x 102 x 102 x 498 x 498 Step 11 x 498 x 498 x 30, x 27 x 30, x 27 Step 12 x 30 x 30 x 167 x 167
12 Journal of Statistical Computation and Simulation 11 (residual sum of squares) is evaluated on the test data. Table 6 shows that glars and gridge have lower prediction errors than LARS and elastic net with about 30 SNPs selected for gene ICAP-1A. We notice that R 2 of the LARS fitted model with 24 SNP covariates is 0.779, which supports the hypothesis of Morley et al. s study [14] that gene expression phenotypes are controlled by genetic variants. On the other hand, the coefficient estimators of all SNP covariates are very small, in the scale of 0.01, suggesting small additive effects of multiple SNPs. Table 7 lists the first 12 steps that the predictors are selected in each algorithm. For instance, variable x 458 enters the model at the first step in all algorithms. At Step 3, glars and gridge depart from LARS and elastic net due to the grouping effect. The first group consists of two SNPs: x 321 SNP rs and x 131 SNP rs The correlation of these two variables is The two SNPs are in chromosome 3 with 14K base pairs apart. They are in an intergenic region. The two closest genes have no functional annotation. Another group consisting of the two SNPs x 30 rs and x 27 rs are selected by glars and gridge at Step 11. These two SNPs are in chromosome 7 with over 2.5 million base pairs apart. SNP x 30 rs is in an intergenic region, whereas x 27 rs resides in the gene FOXK1. According to Swiss- Prot functional annotation, FOXK1 is a transcriptional regulator that binds to the upstream of myoglobin gene. Our results suggest more SNP associations with a gene expression phenotype than the simple linkage analysis. For example, variables x 240 SNP rs and x 76 SNP rs are jointly selected as important covariates for the expression of ICAP-1A according to all four variable selection methods. However, they are not significant in simple regression analysis with a p-value cutoff of These two SNPs locate in the same chromosome as ICAP-1A (chromosome 2) with nearly 27 million base pairs and 220 million base pairs, respectively, away from ICAP-1A. Figure 2 shows their locations and regression plots. SNP rs is in the promotor region (about 200 base pairs upstream) of gene FAM82A1. SNP rs is in the promotor region (about 300 base pairs upstream) of gene DNER. The significant effects of these two SNPs on gene ICAP-1A may suggest associations among the corresponding genes. Expression level(log2) G,G rs A,G A,A Expression level(log2) G,G rs A,G A,A SNP Genotype SNP Genotype 5 3 Figure 2. Regression of the expression phenotype of ICAP-1A on the two nearby SNPs. The bottom left arrow points to the location of ICAP-1A.
13 12 L. Zeng and J. Xie 6. Discussion Traditional forward selection is a heuristic approach, not guaranteeing an optimal solution. LARS, a less greedy version of traditional forward selection methods, however, is shown by Efron et al. [3] to be closely related to Lasso, which possesses optimal properties under appropriate conditions [15 17]. Our proposed methods take advantage of the LARS procedure while aiming at group selections for dependent data. The proposed methods do not require prior information on the underlying group structures but construct groups along the selection procedure. Our grouping criterions consider the joint information of x and y therefore better fit the context of variable selection than standard clustering on x alone. On the other hand, any prior information on the model or groups can be easily incorporated into the algorithms of glars and gridge by manually selecting certain variables at specific steps. The current methods may be improved by exploring different thresholds (t 1,t 2 ) in the grouping definition. One of our future works is to extend the proposed group selection methods to general regression models, where y may depend on x through any nonlinear function. The proposed methods offer fast approaches that are more appropriate than other variable selection algorithms for data with complicated dependent structures. Acknowledgements This work is supported by US National Science Foundation Grant DMS References [1] M. Segal, K. Dahlquist, and B. Conklin, Regression approach for microarray data analysis, J. Comput. Biol. 10(6) (2003), pp [2] R. Tibshirani, Regression shrinkage and selection via the lasso, JRSS B 58(1) (1996), pp [3] B. Efron, I. Johnstone, T. Hastie, and R. Tibshirani, Least angle regression, Ann. Stat. 32(2) (2004), pp [4] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, JRSS B 67(2) (2005), pp [5] H.D. Bondell and B.J. Reich, Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR, Biometrics 64(1) (2008), pp [6] M.Y. Park, T. Hastie, and R. Tibshirani, Averaged gene expression for regression, Biostatistics 8(2), (2006), pp [7] M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, JRSS B 68(1) (2006), pp [8] S. Ma, X. Song, and J. Huang, Supervised group lasso with applications to microarray data analysis, BMC Bioinform. 8 (2007), 60, doi: / [9] M.Y. Park and T. Hastie, Regularization path algorithms for detecting gene interactions, Technical Report, Department of Statistics, Standford University, [10] H. Zou, The adaptive lasso and its oracle properties, JASA 101(476) (2006), pp [11] Z.J. Daye and X.J. Jeng, Shrinkage and model selection with correlated variables via weighted fusion, Technical Report, Department of Statistics, Purdue University, [12] T. Stamey, J. Kabalin, J.I. Johnstone, F. Freiha, E. Redwine, and N. Young, Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate, ii: Radical prostatectomy treated patients, J. Urol. 141(5) (1989), pp [13] D.G. Wang, J. Fan, C. Siao, A. Berno, P. Young, R. Sapolsky, G. Ghandour, N. Perkins, E. Winchester, J. Spencer, L. Kruglyak, L. Stein, L. Hsie, T. Topaloglou, E. Hubbell, E. Robinson, M.S. Morris, N. Shen, D. Kilburn, J. Rioux, C. Nusbaum, S. Rozen, T.J. Hudson, R. Lipshutz, M. Chee, and E.S. Lander, Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome, Science 280(5366) (1998), pp [14] M. Morley, C.M. Molony, T.M. Weber, J.L. Devlin, K.G. Ewens, R.S. Spielman, and V.G. Cheung, Genetic analysis of genome-wide variation in human gene expression, Nature 430(7001) (2004), pp [15] K. Knight and W. Fu, Asymptotics for lasso-type estimators, Technometrics 12(1) (2000), pp [16] P. Zhao and B. Yu, On model selection consistency of lasso, J. Mach. Learn. Res. 7 (2006), pp [17] C. Zhang and J. Huang, The sparsity and bias of the lasso selection in high-dimensional linear regression, Ann. Stat. 36(4) (2008), pp
Group Variable Selection Methods and Their Applications in Analysis of Genomic Data
1 Group Variable Selection Methods and Their Applications in Analysis of Genomic Data Jun Xie 1 and Lingmin Zeng 2 1 Department of Statistics, Purdue University, 250 N. University Street, West Lafayette,
More informationNonnegative Garrote Component Selection in Functional ANOVA Models
Nonnegative Garrote Component Selection in Functional ANOVA Models Ming Yuan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 3033-005 Email: myuan@isye.gatech.edu
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationLecture 5: Soft-Thresholding and Lasso
High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized
More informationRegression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays
Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Hui Zou and Trevor Hastie Department of Statistics, Stanford University December 5, 2003 Abstract We propose the
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationUniversity, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA
This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More informationLeast Angle Regression, Forward Stagewise and the Lasso
January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,
More informationAccepted author version posted online: 28 Nov 2012.
This article was downloaded by: [University of Minnesota Libraries, Twin Cities] On: 15 May 2013, At: 12:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954
More informationConsistent Selection of Tuning Parameters via Variable Selection Stability
Journal of Machine Learning Research 14 2013 3419-3440 Submitted 8/12; Revised 7/13; Published 11/13 Consistent Selection of Tuning Parameters via Variable Selection Stability Wei Sun Department of Statistics
More informationRegularization Paths. Theme
June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationSelf-adaptive Lasso and its Bayesian Estimation
Self-adaptive Lasso and its Bayesian Estimation Jian Kang 1 and Jian Guo 2 1. Department of Biostatistics, University of Michigan 2. Department of Statistics, University of Michigan Abstract In this paper,
More informationTesting Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy
This article was downloaded by: [Ferdowsi University] On: 16 April 212, At: 4:53 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationOnline publication date: 22 March 2010
This article was downloaded by: [South Dakota State University] On: 25 March 2010 Access details: Access Details: [subscription number 919556249] Publisher Taylor & Francis Informa Ltd Registered in England
More informationIterative Selection Using Orthogonal Regression Techniques
Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department
More informationUse and Abuse of Regression
This article was downloaded by: [130.132.123.28] On: 16 May 2015, At: 01:35 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationOnline publication date: 12 January 2010
This article was downloaded by: [Zhang, Lanju] On: 13 January 2010 Access details: Access Details: [subscription number 918543200] Publisher Taylor & Francis Informa Ltd Registered in England and Wales
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationPark, Pennsylvania, USA. Full terms and conditions of use:
This article was downloaded by: [Nam Nguyen] On: 11 August 2012, At: 09:14 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationOnline publication date: 01 March 2010 PLEASE SCROLL DOWN FOR ARTICLE
This article was downloaded by: [2007-2008-2009 Pohang University of Science and Technology (POSTECH)] On: 2 March 2010 Access details: Access Details: [subscription number 907486221] Publisher Taylor
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationGuangzhou, P.R. China
This article was downloaded by:[luo, Jiaowan] On: 2 November 2007 Access Details: [subscription number 783643717] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number:
More informationCharacterizations of Student's t-distribution via regressions of order statistics George P. Yanev a ; M. Ahsanullah b a
This article was downloaded by: [Yanev, George On: 12 February 2011 Access details: Access Details: [subscription number 933399554 Publisher Taylor & Francis Informa Ltd Registered in England and Wales
More informationPost-selection Inference for Forward Stepwise and Least Angle Regression
Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September
More informationGilles Bourgeois a, Richard A. Cunjak a, Daniel Caissie a & Nassir El-Jabi b a Science Brunch, Department of Fisheries and Oceans, Box
This article was downloaded by: [Fisheries and Oceans Canada] On: 07 May 2014, At: 07:15 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:
More informationInstitute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR
DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable
More informationDiscussion on Change-Points: From Sequential Detection to Biology and Back by David Siegmund
This article was downloaded by: [Michael Baron] On: 2 February 213, At: 21:2 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationRegularization Path Algorithms for Detecting Gene Interactions
Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable
More informationA significance test for the lasso
1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph
More informationLeast Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding
arxiv:204.2353v4 [stat.ml] 9 Oct 202 Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding Kun Yang September 2, 208 Abstract In this paper, we propose a new approach, called
More informationDiatom Research Publication details, including instructions for authors and subscription information:
This article was downloaded by: [Saúl Blanco] On: 26 May 2012, At: 09:38 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationSTK Statistical Learning: Advanced Regression and Classification
STK4030 - Statistical Learning: Advanced Regression and Classification Riccardo De Bin debin@math.uio.no STK4030: lecture 1 1/ 42 Outline of the lecture Introduction Overview of supervised learning Variable
More informationSimultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationLinear Regression. Volker Tresp 2018
Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w
More informationIdentify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R
Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R Fadel Hamid Hadi Alhusseini Department of Statistics and Informatics, University
More informationErciyes University, Kayseri, Turkey
This article was downloaded by:[bochkarev, N.] On: 7 December 27 Access Details: [subscription number 746126554] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number:
More informationBias-free Sparse Regression with Guaranteed Consistency
Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March
More informationLecture 4: Newton s method and gradient descent
Lecture 4: Newton s method and gradient descent Newton s method Functional iteration Fitting linear regression Fitting logistic regression Prof. Yao Xie, ISyE 6416, Computational Statistics, Georgia Tech
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationDissipation Function in Hyperbolic Thermoelasticity
This article was downloaded by: [University of Illinois at Urbana-Champaign] On: 18 April 2013, At: 12:23 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954
More informationA significance test for the lasso
1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra
More informationFull terms and conditions of use:
This article was downloaded by:[rollins, Derrick] [Rollins, Derrick] On: 26 March 2007 Access Details: [subscription number 770393152] Publisher: Taylor & Francis Informa Ltd Registered in England and
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationIntroduction to Statistics and R
Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary
More informationSome models of genomic selection
Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical
More informationExploiting Covariate Similarity in Sparse Regression via the Pairwise Elastic Net
Exploiting Covariate Similarity in Sparse Regression via the Pairwise Elastic Net Alexander Lorbert, David Eis, Victoria Kostina, David M. Blei, Peter J. Ramadge Dept. of Electrical Engineering, Dept.
More informationDerivation of SPDEs for Correlated Random Walk Transport Models in One and Two Dimensions
This article was downloaded by: [Texas Technology University] On: 23 April 2013, At: 07:52 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
More informationOnline publication date: 30 March 2011
This article was downloaded by: [Beijing University of Technology] On: 10 June 2011 Access details: Access Details: [subscription number 932491352] Publisher Taylor & Francis Informa Ltd Registered in
More informationPENALIZED METHOD BASED ON REPRESENTATIVES & NONPARAMETRIC ANALYSIS OF GAP DATA
PENALIZED METHOD BASED ON REPRESENTATIVES & NONPARAMETRIC ANALYSIS OF GAP DATA A Thesis Presented to The Academic Faculty by Soyoun Park In Partial Fulfillment of the Requirements for the Degree Doctor
More informationTutz, Binder: Boosting Ridge Regression
Tutz, Binder: Boosting Ridge Regression Sonderforschungsbereich 386, Paper 418 (2005) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Boosting Ridge Regression Gerhard Tutz 1 & Harald Binder
More informationLeast Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding
Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding Kun Yang Trevor Hastie April 0, 202 Abstract Variable selection in linear models plays a pivotal role in modern statistics.
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationLecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015
Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationSome new ideas for post selection inference and model assessment
Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve
More informationExploratory quantile regression with many covariates: An application to adverse birth outcomes
Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationPriors on the Variance in Sparse Bayesian Learning; the demi-bayesian Lasso
Priors on the Variance in Sparse Bayesian Learning; the demi-bayesian Lasso Suhrid Balakrishnan AT&T Labs Research 180 Park Avenue Florham Park, NJ 07932 suhrid@research.att.com David Madigan Department
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationPenalized methods in genome-wide association studies
University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Penalized methods in genome-wide association studies Jin Liu University of Iowa Copyright 2011 Jin Liu This dissertation is
More informationUniversity, Wuhan, China c College of Physical Science and Technology, Central China Normal. University, Wuhan, China Published online: 25 Apr 2014.
This article was downloaded by: [0.9.78.106] On: 0 April 01, At: 16:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 10795 Registered office: Mortimer House,
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationThe Fourier transform of the unit step function B. L. Burrows a ; D. J. Colwell a a
This article was downloaded by: [National Taiwan University (Archive)] On: 10 May 2011 Access details: Access Details: [subscription number 905688746] Publisher Taylor & Francis Informa Ltd Registered
More informationThe MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010
Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have
More information27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1
10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationPLEASE SCROLL DOWN FOR ARTICLE. Full terms and conditions of use:
This article was downloaded by: [Stanford University] On: 20 July 2010 Access details: Access Details: [subscription number 917395611] Publisher Taylor & Francis Informa Ltd Registered in England and Wales
More informationThe American Statistician Publication details, including instructions for authors and subscription information:
This article was downloaded by: [National Chiao Tung University 國立交通大學 ] On: 27 April 2014, At: 23:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954
More informationVariable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification
Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification Tong Tong Wu Department of Epidemiology and Biostatistics
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationSTRUCTURED VARIABLE SELECTION AND ESTIMATION
The Annals of Applied Statistics 2009, Vol. 3, No. 4, 1738 1757 DOI: 10.1214/09-AOAS254 Institute of Mathematical Statistics, 2009 STRUCTURED VARIABLE SELECTION AND ESTIMATION BY MING YUAN 1,V.ROSHAN JOSEPH
More informationRegression Model In The Analysis Of Micro Array Data-Gene Expression Detection
Jamal Fathima.J.I 1 and P.Venkatesan 1. Research Scholar -Department of statistics National Institute For Research In Tuberculosis, Indian Council For Medical Research,Chennai,India,.Department of statistics
More informationHomogeneity Pursuit. Jianqing Fan
Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More information