Package metafuse. October 17, 2016

Similar documents
Package covtest. R topics documented:

Package Grace. R topics documented: April 9, Type Package

Package HIMA. November 8, 2017

Package penmsm. R topics documented: February 20, Type Package

Package SpatialVS. R topics documented: November 10, Title Spatial Variable Selection Version 1.1

Package IsingFit. September 7, 2016

Package icmm. October 12, 2017

Package netcoh. May 2, 2016

Package flexcwm. R topics documented: July 11, Type Package. Title Flexible Cluster-Weighted Modeling. Version 1.0.

Package REGARMA. R topics documented: October 1, Type Package. Title Regularized estimation of Lasso, adaptive lasso, elastic net and REGARMA

Package JointModel. R topics documented: June 9, Title Semiparametric Joint Models for Longitudinal and Counting Processes Version 1.

Package flexcwm. R topics documented: July 2, Type Package. Title Flexible Cluster-Weighted Modeling. Version 1.1.

Package scio. February 20, 2015

Package anoint. July 19, 2015

Package EnergyOnlineCPM

Package CoxRidge. February 27, 2015

Package ICGOR. January 13, 2017

Package GORCure. January 13, 2017

Package CopulaRegression

Package rbvs. R topics documented: August 29, Type Package Title Ranking-Based Variable Selection Version 1.0.

Package RI2by2. October 21, 2016

Package SimSCRPiecewise

Package ZIBBSeqDiscovery

Package basad. November 20, 2017

Package rnmf. February 20, 2015

Package aspi. R topics documented: September 20, 2016

Package factorqr. R topics documented: February 19, 2015

Package severity. February 20, 2015

Package MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1.

Package MultisiteMediation

Package ShrinkCovMat

Package LBLGXE. R topics documented: July 20, Type Package

Package BayesNI. February 19, 2015

Package ICBayes. September 24, 2017

Package ltsbase. R topics documented: February 20, 2015

Package breakaway. R topics documented: March 30, 2016

Package gtheory. October 30, 2016

Package mmabig. May 18, 2018

Package milineage. October 20, 2017

Package SimCorMultRes

Package SPreFuGED. July 29, 2016

Package CPE. R topics documented: February 19, 2015

Package SightabilityModel

Package gwqs. R topics documented: May 7, Type Package. Title Generalized Weighted Quantile Sum Regression. Version 1.1.0

Package hds. December 31, 2016

Package threeboost. February 20, 2015

Package covsep. May 6, 2018

Package MicroMacroMultilevel

Package ARCensReg. September 11, 2016

Package effectfusion

Package Delaporte. August 13, 2017

Package rgabriel. February 20, 2015

Package gma. September 19, 2017

Package jmcm. November 25, 2017

Package RootsExtremaInflections

Package dhga. September 6, 2016

Package TwoStepCLogit

Package generalhoslem

Package r2glmm. August 5, 2017

Package bhrcr. November 12, 2018

Package penalized. February 21, 2018

Package mmm. R topics documented: February 20, Type Package

Package mbvs. August 28, 2018

Package MiRKATS. May 30, 2016

Package rarhsmm. March 20, 2018

Package R1magic. August 29, 2016

Package varband. November 7, 2016

Package spcov. R topics documented: February 20, 2015

Package bivariate. December 10, 2018

Package CorrMixed. R topics documented: August 4, Type Package

Package rsq. January 3, 2018

Package BNN. February 2, 2018

Package SK. May 16, 2018

Package spatial.gev.bma

Package locfdr. July 15, Index 5

Package horseshoe. November 8, 2016

Package hierdiversity

Package FinCovRegularization

Package SEMModComp. R topics documented: February 19, Type Package Title Model Comparisons for SEM Version 1.0 Date Author Roy Levy

Package idmtpreg. February 27, 2018

Package fwsim. October 5, 2018

Package mma. March 8, 2018

Package multivariance

Package mixture. R topics documented: February 13, 2018

Package drgee. November 8, 2016

Package invgamma. May 7, 2017

Package alphaoutlier

Package spef. February 26, 2018

Package mlmmm. February 20, 2015

Package LCAextend. August 29, 2013

Package sklarsomega. May 24, 2018

Package My.stepwise. June 29, 2017

Package EMVS. April 24, 2018

Part 8: GLMs and Hierarchical LMs and GLMs

Package complex.surv.dat.sim

Package dispmod. March 17, 2018

Package depth.plot. December 20, 2015

Package InferenceSMR

Package CovSel. September 25, 2013

Package reinforcedpred

Transcription:

Package metafuse October 17, 2016 Type Package Title Fused Lasso Approach in Regression Coefficient Clustering Version 2.0-1 Date 2016-10-16 Author Lu Tang, Ling Zhou, Peter X.K. Song Maintainer Lu Tang <lutang@umich.edu> Description Fused lasso method to cluster and estimate regression coefficients of the same covariate across different data sets when a large number of independent data sets are combined. Package supports Gaussian, binomial, Poisson and Cox PH models. License GPL-2 Imports glmnet, Matrix, MASS, evd Repository CRAN NeedsCompilation no Date/Publication 2016-10-17 00:37:55 R topics documented: metafuse-package...................................... 2 datagenerator........................................ 3 metafuse........................................... 4 metafuse.l.......................................... 7 Index 9 1

2 metafuse-package metafuse-package Fused Lasso Approach in Regression Coefficient Clustering Description Details Fused lasso method to cluster and estimate regression coefficients of the same covariate across different data sets when a large number of independent data sets are combined. Package supports Gaussian, binomial, Poisson and Cox PH models. Simple to use. Accepts X, y, and sid (numerica data source ID for which data entry belongs to) for regression models. Returns regression coefficient estimates and clusterings patterns of coefficients across different datasets, for each covariate. Provides visualization by fusogram, a dendrogram-type of presentation of coefficient clustering pattern across data sources. Author(s) Lu Tang, Ling Zhou, Peter X.K. Song Maintainer: Lu Tang <lutang@umich.edu> References Lu Tang, and Peter X.K. Song. Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration. Journal of Machine Learning Research, 17(113):1-23, 2016. Fei Wang, Lu Wang, and Peter X.K. Song. Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements. Biometrics, DOI:10.1111/biom.12496, 2016. Examples ########### generate data ########### n <- 200 # sample size in each dataset (can also be a K-element vector) K <- 10 # number of datasets for data integration p <- 3 # number of covariates in X (including the intercept) # the coefficient matrix of dimension K * p, used to specify the heterogeneous pattern beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # beta_0 of intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1 of X_1 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), # beta_2 of X_2 K, p) # generate a data set, =c("gaussian", "binomial", "poisson", "cox") data <- datagenerator(n=n, beta0=beta0, ="gaussian", seed=123)

datagenerator 3 # prepare the input for metafuse y <- data$y sid <- data$group X <- data[,-c(1,ncol(data))] ########### run metafuse ########### # fuse slopes of X1 (which is heterogeneous with 2 clusters) metafuse(x=x, y=y, sid=sid, fuse.which=c(1), ="gaussian", intercept=true, alpha=0, # fuse slopes of X2 (which is heterogeneous with 3 clusters) metafuse(x=x, y=y, sid=sid, fuse.which=c(2), ="gaussian", intercept=true, alpha=0, # fuse all three covariates metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), ="gaussian", intercept=true, alpha=0, # fuse all three covariates, with sparsity penalty metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), ="gaussian", intercept=true, alpha=1, # fit metafuse at a given lambda metafuse.l(x=x, y=y, sid=sid, fuse.which=c(0,1,2), ="gaussian", intercept=true, alpha=1, lambda=0.5) datagenerator simulate data Description Simulate a dataset with data from K different sources, for demonstration of metafuse. Usage datagenerator(n, beta0,, seed = NA) Arguments n beta0 seed a vector of length K (the total number of datasets being integrated), specifying the sample sizes of individual datasets; can also be an scalar, in which case the function simulates K datasets of equal sample size a coefficient matrix of dimension K * p, where K is the number of datasets being integrated and p is the number of covariates, including the intercept the type of the response vector, c("gaussian", "binomial", "poisson", "cox"); "gaussian" for continuous response, "binomial" for binary response, "poisson" for count response, "cox" for observed time-to-event response, with censoring indicator the random seed for data generation, default is NA

4 metafuse Details These datasets are artifical, and are used to demonstrate the features of metafuse. In the case when ="cox", the response will contain two vectors, a time-to-event variable time and a censoring indicator status. Value Returns data frame with n*k rows (if n is a scalar), or sum(n) rows (if n is a K-element vector). The data frame contains columns "y", "x1",..., "x_p-1" and "group" if ="gaussian", "binomial" or "poisson"; or contains columns "time", "status", "x1",..., "x_p-1" and "group" if ="cox". Examples ########### generate data ########### n <- 200 # sample size in each dataset (can also be a K-element vector) K <- 10 # number of datasets for data integration p <- 3 # number of covariates in X (including the intercept) # the coefficient matrix of dimension K * p, used to specify the heterogeneous pattern beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # beta_0 of intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1 of X_1 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), # beta_2 of X_2 K, p) # generate a data set, =c("gaussian", "binomial", "poisson", "cox") data <- datagenerator(n=n, beta0=beta0, ="gaussian", seed=123) names(data) # if ="cox", returned dataset contains columns "time"" and "status" instead of "y" data <- datagenerator(n=n, beta0=beta0, ="cox", seed=123) names(data) metafuse fit a GLM with fusion penalty for data integraion Description Fit a GLM with fusion penalty on coefficients within each covariate across datasets, generate solution path and fusograms for visualization of the model selection. Usage metafuse(x = X, y = y, sid = sid, fuse.which = c(0:ncol(x)), = "gaussian", intercept = TRUE, alpha = 0, criterion = "EBIC", verbose = TRUE, plots = FALSE, loglambda = TRUE)

metafuse 5 Arguments X y sid fuse.which intercept alpha criterion verbose plots loglambda a matrix (or vector) of predictor(s), with dimensions of N*(p-1), where N is the total sample size of the integrated dataset a vector of response, with length N; when ="cox", y is a data frame with cloumns time and status data source ID of length N, must contain integers numbered from 1 to K a vector of integers from 0 to p-1, indicating which covariates are considered for fusion; 0 corresponds to the intercept; coefficients of covariates not in this vector are homogeneously estimated across all datasets response vector type, "gaussian" if y is a continuous vector, "binomial" if y is binary vector, "poisson" if y is a count vector, "cox" if y is a data frame with cloumns time and status if TRUE, intercept will be included, default is TRUE the ratio of sparsity penalty to fusion penalty, default is 0 (i.e., no variable selection, only fusion) "AIC" for AIC, "BIC" for BIC, "EBIC" for extended BIC,default is "BIC" if TRUE, outputs whenever a fusion event happens, and returns the current value of lambda, default is TRUE if TRUE, create solution paths and fusogram plots to visualize the clustering of regression coefficients across datasets, default is FALSE if TRUE, lambda will be plotted in log-10 scale, default is TRUE Details Adaptive lasso penalty is used. See Zou (2006) for detail. Value A list containing the following items will be returned: criterion alpha the response/model type model selection criterion used the ratio of sparsity penalty to fusion penalty if.fuse whether covariate is assumed to be heterogeneous (1) or homogeneous (0) betahat betainfo the estimated regression coefficients additional information about the fit, including degree of freedom, optimal lambda value, maximum lambda value to fuse all coefficients, and estimated friction of fusion

6 metafuse References Lu Tang, and Peter X.K. Song. Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration. Journal of Machine Learning Research, 17(113):1-23, 2016. Fei Wang, Lu Wang, and Peter X.K. Song. Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements. Biometrics, DOI:10.1111/biom.12496, 2016. Examples ########### generate data ########### n <- 200 # sample size in each dataset (can also be a K-element vector) K <- 10 # number of datasets for data integration p <- 3 # number of covariates in X (including the intercept) # the coefficient matrix of dimension K * p, used to specify the heterogeneous pattern beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # beta_0 of intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1 of X_1 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), # beta_2 of X_2 K, p) # generate a data set, =c("gaussian", "binomial", "poisson", "cox") data <- datagenerator(n=n, beta0=beta0, ="gaussian", seed=123) # prepare the input for metafuse y <- data$y sid <- data$group X <- data[,-c(1,ncol(data))] ########### run metafuse ########### # fuse slopes of X1 (which is heterogeneous with 2 clusters) metafuse(x=x, y=y, sid=sid, fuse.which=c(1), ="gaussian", intercept=true, alpha=0, # fuse slopes of X2 (which is heterogeneous with 3 clusters) metafuse(x=x, y=y, sid=sid, fuse.which=c(2), ="gaussian", intercept=true, alpha=0, # fuse all three covariates metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), ="gaussian", intercept=true, alpha=0, # fuse all three covariates, with sparsity penalty metafuse(x=x, y=y, sid=sid, fuse.which=c(0,1,2), ="gaussian", intercept=true, alpha=1,

metafuse.l 7 metafuse.l fit a GLM with fusion penalty for data integraion, with a fixed lambda Description Usage Fit a GLM with fusion penalty on coefficients within each covariate at given lambda. metafuse.l(x = X, y = y, sid = sid, fuse.which = c(0:ncol(x)), = "gaussian", intercept = TRUE, alpha = 0, lambda = lambda) Arguments X y sid fuse.which intercept alpha lambda a matrix (or vector) of predictor(s), with dimensions of N*(p-1), where N is the total sample size of the integrated dataset a vector of response, with length N; when ="cox", y is a data frame with cloumns time and status data source ID of length N, must contain integers numbered from 1 to K a vector of integers from 0 to p-1, indicating which covariates are considered for fusion; 0 corresponds to the intercept; coefficients of covariates not in this vector are homogeneously estimated across all datasets response vector type, "gaussian" if y is a continuous vector, "binomial" if y is binary vector, "poisson" if y is a count vector, "cox" if y is a data frame with cloumns time and status if TRUE, intercept will be included, default is TRUE the ratio of sparsity penalty to fusion penalty, default is 0 (i.e., no variable selection, only fusion) tuning parameter for fusion penalty Details Value Adaptive lasso penalty is used. See Zou (2006) for detail. A list containing the following items will be returned: alpha the response/model type the ratio of sparsity penalty to fusion penalty if.fuse whether covariate is assumed to be heterogeneous (1) or homogeneous (0) betahat betainfo the estimated regression coefficients additional information about the fit, including degree of freedom, optimal lambda value, maximum lambda value to fuse all coefficients, and estimated friction of fusion

8 metafuse.l References Lu Tang, and Peter X.K. Song. Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration. Journal of Machine Learning Research, 17(113):1-23, 2016. Fei Wang, Lu Wang, and Peter X.K. Song. Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements. Biometrics, DOI:10.1111/biom.12496, 2016. Examples ########### generate data ########### n <- 200 # sample size in each dataset (can also be a K-element vector) K <- 10 # number of datasets for data integration p <- 3 # number of covariates in X (including the intercept) # the coefficient matrix of dimension K * p, used to specify the heterogeneous pattern beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, # beta_0 of intercept 0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0, # beta_1 of X_1 0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0), # beta_2 of X_2 K, p) # generate a data set, =c("gaussian", "binomial", "poisson", "cox") data <- datagenerator(n=n, beta0=beta0, ="gaussian", seed=123) # prepare the input for metafuse y <- data$y sid <- data$group X <- data[,-c(1,ncol(data))] ########### run metafuse ########### # fit metafuse at a given lambda metafuse.l(x=x, y=y, sid=sid, fuse.which=c(0,1,2), ="gaussian", intercept=true, alpha=1, lambda=0.5)

Index Topic Data integration Topic Fused lasso Topic Generalized Linear Models Topic data datagenerator, 3 Topic generator datagenerator, 3 Topic package datagenerator, 3 metafuse, 4 metafuse.l, 7 9