Package Grace. R topics documented: April 9, Type Package

Similar documents
Package covtest. R topics documented:

Package metafuse. October 17, 2016

Package HIMA. November 8, 2017

Package spcov. R topics documented: February 20, 2015

Package abundant. January 1, 2017

Package EMVS. April 24, 2018

Package scio. February 20, 2015

Package denseflmm. R topics documented: March 17, 2017

Package netcoh. May 2, 2016

Package SpatialVS. R topics documented: November 10, Title Spatial Variable Selection Version 1.1

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Package RI2by2. October 21, 2016

Package rbvs. R topics documented: August 29, Type Package Title Ranking-Based Variable Selection Version 1.0.

Package horseshoe. November 8, 2016

Package varband. November 7, 2016

Package basad. November 20, 2017

Package effectfusion

Package SimSCRPiecewise

Package R1magic. August 29, 2016

Package sscor. January 28, 2016

Introduction to the genlasso package

MSA220/MVE440 Statistical Learning for Big Data

Package gma. September 19, 2017

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Variable Selection for Highly Correlated Predictors

Package IsingFit. September 7, 2016

Package CovSel. September 25, 2013

Package CPE. R topics documented: February 19, 2015

Package reinforcedpred

Package rarhsmm. March 20, 2018

Package msir. R topics documented: April 7, Type Package Version Date Title Model-Based Sliced Inverse Regression

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

Package IGG. R topics documented: April 9, 2018

Package dhsic. R topics documented: July 27, 2017

Package mmabig. May 18, 2018

Package MultisiteMediation

Package ltsbase. R topics documented: February 20, 2015

Journal of Statistical Software

Package bigtime. November 9, 2017

Package BNN. February 2, 2018

Package R1magic. April 9, 2013

Package jmcm. November 25, 2017

Package rnmf. February 20, 2015

Package SimCorMultRes

Graph Wavelets to Analyze Genomic Data with Biological Networks

Regularization and Variable Selection via the Elastic Net

Package NonpModelCheck

Package sindyr. September 10, 2018

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Package icmm. October 12, 2017

Package covsep. May 6, 2018

Package pfa. July 4, 2016

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

Package tsvd. March 11, 2015

A Confidence Region Approach to Tuning for Variable Selection

Package multivariance

Prediction & Feature Selection in GLM

The lasso, persistence, and cross-validation

Package MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1.

High-dimensional regression with unknown variance

Package anoint. July 19, 2015

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Package SpatialNP. June 5, 2018

A robust hybrid of lasso and ridge regression

Package ZIBBSeqDiscovery

Package blme. August 29, 2016

Package ShrinkCovMat

Package HDtest. R topics documented: September 13, Type Package

Package SPreFuGED. July 29, 2016

Package MicroMacroMultilevel

Different types of regression: Linear, Lasso, Ridge, Elastic net, Ro

25 : Graphical induced structured input/output models

Package LIStest. February 19, 2015

Regularization Path Algorithms for Detecting Gene Interactions

Package face. January 19, 2018

Package CompQuadForm

Package Delaporte. August 13, 2017

Package gsarima. February 20, 2015

Package factorqr. R topics documented: February 19, 2015

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

Package penmsm. R topics documented: February 20, Type Package

High-dimensional regression

Package tsvd. March 24, 2015

Package mbvs. August 28, 2018

Package lsei. October 23, 2017

Package IndepTest. August 23, 2018

Package multdm. May 18, 2018

Package ForwardSearch

Package r2glmm. August 5, 2017

Package MiRKATS. May 30, 2016

Package FinCovRegularization

Package RRR. April 12, 2013

Outlier detection and variable selection via difference based regression model and penalized regression

Homework 1: Solutions

Package polypoly. R topics documented: May 27, 2017

Package matrixnormal

Software for Prediction and Estimation with Applications to High-Dimensional Genomic and Epidemiologic Data. Stephan Johannes Ritter

Indirect High-Dimensional Linear Regression

The Iterated Lasso for High-Dimensional Logistic Regression

Transcription:

Type Package Package Grace April 9, 2017 Title Graph-Constrained Estimation and Hypothesis Tests Version 0.5.3 Date 2017-4-8 Author Sen Zhao Maintainer Sen Zhao <sen-zhao@sen-zhao.com> Description Use the graph-constrained estimation (Grace) procedure (Zhao and Shojaie, 2016 <doi:10.1111/biom.12418>) to estimate graph-guided linear regression coefficients and use the Grace/GraceI/GraceR tests to perform graph-guided hypothesis tests on the association between the response and the predictors. License GPL-3 Depends MASS, glmnet, scalreg LazyData TRUE URL http://onlinelibrary.wiley.com/doi/10.1111/biom.12418/abstract NeedsCompilation no Repository CRAN Date/Publication 2017-04-09 21:14:22 UTC R topics documented: grace............................................. 2 grace.test.......................................... 3 gracei.test.......................................... 6 make.l........................................... 8 Index 9 1

2 grace grace Graph-Constrained Estimation Description Calculate coefficient estimates of Grace based on methods described in Li and Li (2008). Usage grace(y, X, L, lambda.l, lambda.1=0, lambda.2=0, normalize.l=false, K=10, verbose=false) Arguments Y X outcome vector. matrix of predictors. L penalty weight matrix L. lambda.l tuning parameter value for the penalty induced by the L matrix (see details). If a sequence of lambda.l values is supplied, K-fold cross-validation is performed. lambda.1 tuning parameter value for the lasso penalty (see details). If a sequence of lambda.1 values is supplied, K-fold cross-validation is performed. lambda.2 tuning parameter value for the ridge penalty (see details). If a sequence of lambda.2 values is supplied, K-fold cross-validation is performed. normalize.l K verbose whether the penalty weight matrix L should be normalized. number of folds in cross-validation. whether computation progress should be printed. Details The Grace estimator is defined as (ˆα, ˆβ) = arg min α,β Y α1 Xβ 2 2 + lambda.l(β T Lβ) + lambda.1 β 1 + lambda.2 β 2 2 In the formulation, L is the penalty weight matrix. Tuning parameters lambda.l, lambda.1 and lambda.2 may be chosen by cross-validation. In practice, X and Y are standardized and centered, respectively, before estimating ˆβ. The resulting estimate is then rescaled back into the original scale. Note that the intercept ˆα is not penalized. The Grace estimator could be considered as a generalized elastic net estimator (Zou and Hastie, 2005). It penalizes the regression coefficient towards the space spanned by eigenvectors of L with the smallest eigenvalues. Therefore, if L is informative in the sense that Lβ is small, then the Grace estimator could be less biased than the elastic net.

grace.test 3 Value An R list with elements: intercept beta fitted intercept. fitted regression coefficients. Author(s) Sen Zhao References Zou, H., and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301-320. Li, C., and Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24, 1175-1182. Examples set.seed(120) n <- 100 p <- 200 L <- matrix(0, nrow = p, ncol = p) for(i in 1:10){ L[((i - 1) * p / 10 + 1), ((i - 1) * p / 10 + 1):(i * (p / 10))] <- -1 } diag(l) <- 0 ind <- lower.tri(l, diag = FALSE) L[ind] <- t(l)[ind] diag(l) <- -rowsums(l) beta <- c(rep(1, 10), rep(0, p - 10)) Sigma <- solve(l + 0.1 * diag(p)) sigma.error <- sqrt(t(beta) %*% Sigma %*% beta) / 2 X <- mvrnorm(n, mu = rep(0, p), Sigma = Sigma) Y <- c(x %*% beta + rnorm(n, sd = sigma.error)) grace(y, X, L, lambda.l = c(0.08, 0.12), lambda.2 = c(0.08, 0.12)) grace.test Graph-Constrained Hypothesis Testing Procedure Description Test for the association between Y and each column of X using the Grace test based on Zhao and Shojaie (2016).

4 grace.test Usage Arguments Y X grace.test(y, X, L=NULL, lambda.l=null, lambda.2=0, normalize.l=false, eta=0.05, C=4*sqrt(3), K=10, sigma.error=null, verbose=false) outcome vector. matrix of predictors. L penalty weight matrix L. lambda.l tuning parameter value for the penalty induced by the L matrix (see details). If a sequence of lambda.l values is supplied, K-fold cross-validation is performed. lambda.2 tuning parameter value for the ridge penalty (see details). If a sequence of lambda.2 values is supplied, K-fold cross-validation is performed. normalize.l eta C K Details sigma.error verbose whether the penalty weight matrix L should be normalized. sparsity parameter η (see details). parameter for the initial estimator (see details). It could also be "cv" or "scaled.lasso", in which case cross-validation or the scaled lasso are applied to estimate the initial estimator. number of folds in cross-validation. error standard deviation. If NULL, scaled lasso is then applied to estimate it. whether computation progress should be printed This function performs the Grace test (if lambda.2 = 0), the GraceI test (if lambda.l = 0) and the GraceR test as introduced in Zhao and Shojaie (2016). The Grace tests examine the null hypothesis β j = 0 conditional on all other covariates, even if the design matrix X has more covariates (columns) than observations (rows). Network information on associations between covariates, which is represented by the L matrix, could be used to improve the power of the test. When L is informative (see Zhao and Shojaie (2016) for details), the Grace test is expected to deliver higher power than the GraceI test and other competing methods which ignores the network information (see e.g. Buhlmann (2013), van de Geer et al. (2014), Zhang and Zhang (2014)). When L is potentially uninformative or inaccurate, the GraceR test could be used. Using data-adaptive choices (e.g. by cross-validation) of lambda.l and lambda.2, the GraceR test could adaptively choose the amount of outside information to be incorporated, which leads to more robust performance. Regardless of the informativeness of L, type-i error rates of the Grace, GraceI and GraceR tests are asymptotically controlled. The Grace tests are based on the following Grace estimator: (ˆα, ˆβ) = arg min α,β Y α1 Xβ 2 2 + lambda.l(β T Lβ) + lambda.2 β 2 2 In the formulation, L is the penalty weight matrix. Tuning parameters lambda.l and lambda.2 may be chosen by cross-validation. In practice, X and Y are standardized and centered, respectively, before estimating ˆβ. The resulting estimate is then rescaled back into the original scale. Note that the intercept ˆα is not penalized.

grace.test 5 Value To perform the Grace tests, the lasso initial estimator is calculated using lasso tuning parameter σ ɛ C log(p)/n. The Grace test requires C to be larger than 2 2. σ ɛ is the error standard deviation, which is estimated using the scaled lasso (Sun and Zhang, 2012). η < 0.5 is the sparsity parameter of β, which controls the level of bias correction. It is assumed that the number of nonzero elements in β, s = o([n/ log(p)] η ). This parameter is usually unknown. Using larger values of η leads to more conservative results. An R list with elements: intercept beta pvalue group.test fitted intercept. fitted regression coefficients. p-values based on the Grace tests. function to perform the group test, with the null hypothesis being that all the regression coefficients in the group equal zero. The argument of this function needs to be an index vector of variables. There is an optimal second argument, which specifies whether the group test should be performed based on the "holm" procedure (default), or based on the "max" test statstic. The output is the p-value of the group test. See examples below. Author(s) Sen Zhao References Li, C., and Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24, 1175-1182. Sun, T., and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika, 99, 879-898. Buhlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli, 19, 1212-1242. van de Geer, S., Buhlmann, P., Rotiv, Y., and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42, 1166-1202. Zhang, C.-H., and Zhang, S.S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B, 76, 217-242. Zhao, S., and Shojaie, A. (2016). A signifiance test for graph-constrained estimation. Biometrics, 72, 484-493. Examples set.seed(120) n <- 100 p <- 200 L <- matrix(0, nrow = p, ncol = p) for(i in 1:10){

6 gracei.test L[((i - 1) * p / 10 + 1), ((i - 1) * p / 10 + 1):(i * (p / 10))] <- -1 } diag(l) <- 0 ind <- lower.tri(l, diag = FALSE) L[ind] <- t(l)[ind] diag(l) <- -rowsums(l) beta <- c(rep(1, 10), rep(0, p - 10)) Sigma <- solve(l + 0.1 * diag(p)) sigma.error <- sqrt(t(beta) %*% Sigma %*% beta) / 2 X <- mvrnorm(n, mu = rep(0, p), Sigma = Sigma) Y <- c(x %*% beta + rnorm(n, sd = sigma.error)) grace.test.result <- grace.test(y, X, L, lambda.l = c(0.08, 0.12), lambda.2 = c(0.08, 0.12)) mean(grace.test.result$pvalue[beta == 0] < 0.05) grace.test.result$group.test(1:2) grace.test.result$group.test(3:50) grace.test.result$group.test(11:100) grace.test.result$group.test(1:5, method = "max") grace.test.result <- grace.test(y, X, L, lambda.l = 0.08, lambda.2 = 0.08, C = "cv") mean(grace.test.result$pvalue[beta == 0] < 0.05) gracei.test Graph-Constrained Hypothesis Testing Procedure Description Test for the association between Y and each column of X using the GraceI test based on Zhao and Shojaie (2016). Usage gracei.test(y,x,lambda.2,eta=0.05,c=4*sqrt(3),k=10,sigma.error=null,verbose=false) Arguments Y X outcome vector. matrix of predictors. lambda.2 tuning parameter value for the ridge penalty (see details). If a sequence of lambda.2 values is supplied, K-fold cross-validation is performed. eta sparsity parameter η (see details).

gracei.test 7 C K sigma.error verbose parameter for the initial estimator (see details). It could also be "cv" or "scaled.lasso", in which case cross-validation or the scaled lasso are applied to estimate the initial estimator. number of folds in cross-validation. error standard deviation. If NULL, scaled lasso is then applied to estimate it. whether computation progress should be printed. Details This function performs the GraceI test. See the documentation for grace.test. Value An R list with elements: intercept beta pvalue group.test fitted intercept. fitted regression coefficients. p-values based on the Grace tests. function to perform the group test, with the null hypothesis being that all the regression coefficients in the group equal zero. The argument of this function needs to be an index vector of variables. There is an optimal second argument, which specifies whether the group test should be performed based on the "holm" procedure (default), or based on the "max" test statstic. The output is the p-value of the group test. See examples below. Author(s) Sen Zhao References Li, C., and Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24, 1175-1182. Sun, T., and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika, 99, 879-898. Buhlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli, 19, 1212-1242. van de Geer, S., Buhlmann, P., Rotiv, Y., and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42, 1166-1202. Zhang, C.-H., and Zhang, S.S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B, 76, 217-242. Zhao, S., and Shojaie, A. (2016). A signifiance test for graph-constrained estimation. Biometrics, 72, 484-493.

8 make.l make.l Graph Laplacian Matrix Builder Description Usage This function builds the graph Laplacian matrix from an adjacency matrix. Arguments Value make.l(adj,normalize.laplacian=false) adj adjacency matrix of the graph. normalize.laplacian whether the graph Laplacian matrix should be normalized to make all diagonal entries equal to 1. Grace test with the normalized Laplacian matrix is more powerful in identifying hub-covariates. A matrix object of the graph Laplacian matrix. Author(s) Sen Zhao

Index grace, 2 grace.test, 3 gracei.test, 6 make.l, 8 9