The superpc Package. May 19, 2005

Similar documents
Package reinforcedpred

The nltm Package. July 24, 2006

Package CoxRidge. February 27, 2015

Package covtest. R topics documented:

Package MiRKATS. May 30, 2016

Univariate shrinkage in the Cox model for high dimensional data

The locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5

Package IGG. R topics documented: April 9, 2018

Tied survival times; estimation of survival probabilities

Survival Regression Models

Package NonpModelCheck

Package anoint. July 19, 2015

Package locfdr. July 15, Index 5

Package generalhoslem

Package Grace. R topics documented: April 9, Type Package

Extensions of Cox Model for Non-Proportional Hazards Purpose

Package shrink. August 29, 2013

Residuals and model diagnostics

Package penalized. February 21, 2018

Package plsdof. February 20, 2015

Multivariable Fractional Polynomials

Package MAPA. R topics documented: January 5, Type Package Title Multiple Aggregation Prediction Algorithm Version 2.0.4

β j = coefficient of x j in the model; β = ( β1, β2,

The glmpath Package. June 6, Title L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model

The coxvc_1-1-1 package

Package survidinri. February 20, 2015

Package SPreFuGED. July 29, 2016

Package ShrinkCovMat

Package penmsm. R topics documented: February 20, Type Package

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Package polypoly. R topics documented: May 27, 2017

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS. BY DANIELA M. WITTEN 1 AND ROBERT TIBSHIRANI 2 Stanford University

Prediction by supervised principal components

Package netcoh. May 2, 2016

Package convospat. November 3, 2017

Package condmixt. R topics documented: February 19, Type Package

Package threg. August 10, 2015

Relative-risk regression and model diagnostics. 16 November, 2015

A STUDY OF PRE-VALIDATION

Package MicroMacroMultilevel

Package TwoStepCLogit

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Package GeneExpressionSignature

Generalized Boosted Models: A guide to the gbm package

Package factorqr. R topics documented: February 19, 2015

Package scio. February 20, 2015

The paran Package. October 4, 2007

Multivariable Fractional Polynomials

Package effectfusion

Package SimSCRPiecewise

Package RRR. April 12, 2013

The lars Package. R topics documented: May 17, Version Date Title Least Angle Regression, Lasso and Forward Stagewise

Package flexcwm. R topics documented: July 11, Type Package. Title Flexible Cluster-Weighted Modeling. Version 1.0.

Package threeboost. February 20, 2015

Package darts. February 19, 2015

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Package ebal. R topics documented: February 19, Type Package

Package flexcwm. R topics documented: July 2, Type Package. Title Flexible Cluster-Weighted Modeling. Version 1.1.

Package sgpca. R topics documented: July 6, Type Package. Title Sparse Generalized Principal Component Analysis. Version 1.0.

LogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets

TMA 4275 Lifetime Analysis June 2004 Solution

Package paramap. R topics documented: September 20, 2017

Package horseshoe. November 8, 2016

Package SpatialPack. March 20, 2018

Package RTransferEntropy

Package beam. May 3, 2018

Regularized Discriminant Analysis and Its Application in Microarray

Prediction & Feature Selection in GLM

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Lecture 6: Methods for high-dimensional problems

Package r2glmm. August 5, 2017

CMSC858P Supervised Learning Methods

The sbgcop Package. March 9, 2007

Package Rsurrogate. October 20, 2016

Package varband. November 7, 2016

Package MonoPoly. December 20, 2017

Package LBLGXE. R topics documented: July 20, Type Package

Least Angle Regression, Forward Stagewise and the Lasso

MAS3301 / MAS8311 Biostatistics Part II: Survival

1 Introduction to Minitab

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Lecture 5: LDA and Logistic Regression

GS Analysis of Microarray Data

Package dhga. September 6, 2016

Applied Machine Learning Annalisa Marsico

Package plw. R topics documented: May 7, Type Package

The evdbayes Package

Regularization Path Algorithms for Detecting Gene Interactions

Package SEMModComp. R topics documented: February 19, Type Package Title Model Comparisons for SEM Version 1.0 Date Author Roy Levy

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

Package abundant. January 1, 2017

Evaluation requires to define performance measures to be optimized

Package mma. March 8, 2018

Package icmm. October 12, 2017

GS Analysis of Microarray Data

Package misctools. November 25, 2016

Package RootsExtremaInflections

Package ChIPtest. July 20, 2016

GS Analysis of Microarray Data

Transcription:

Title Supervised principal components Version 1.03 Author Eric Bair, R. Tibshirani The superpc Package May 19, 2005 Supervised principal components for regression and survival analsysis. Especially useful for high-dimnesional data, including microarray data. Maintainer Rob Tibshirani <tibs@stanford.edu> LazyLoad false LazyData false License GPL2.0 URL http://www-stat.stanford.edu/~tibs/superpc R topics documented: superpc.cv.......................................... 2 superpc.decorrelate..................................... 3 superpc.fit.to.outcome.................................... 5 superpc-internal....................................... 6 superpc.listfeatures..................................... 7 superpc.lrtest.curv...................................... 8 superpc.plot.lrtest...................................... 9 superpc.plotcv........................................ 11 superpc.plotred.lrtest.................................... 12 superpc.predict....................................... 13 superpc.predict.red..................................... 14 superpc.predict.red.cv.................................... 16 superpc.predictionplot................................... 18 superpc.rainbowplot..................................... 19 superpc.train......................................... 20 Index 22 1

2 superpc.cv superpc.cv Cross-validation for supervised principal components This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components superpc.cv(fit, data, n.threshold = 20, n.fold = NULL, folds = NULL, n.compon compute.preval = TRUE, xl.mode = c("regular", "firsttime", "onetime", "lasttime"), xl.time = NULL, xl.prevfit = NULL) fit data Object returned by superpc.train Data object of form described in superpc.train documentation n.threshold Number of thresholds to consider. Default 20. n.fold folds Number of cross-validation folds. default is around 10 (program pick a convenient value based on the sample size List of indices of cross-validation folds (optional) n.components Number of cross-validation components to use: 1,2 or 3. min.features Minimum number of features to include, in determining range for threshold. Default 5. max.features Maximum number of features to include, in determining range for threshold. Default is total number of features in the dataset compute.fullcv Should full cross-validation be done? compute.preval Should full pre-validation be done? xl.mode xl.time xl.prevfit Used by Excel interface only Used by Excel interface only Used by Excel interface only This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components. To avoid prolems with fitting Cox models to samll validation datastes, it uses the "pre-validation" approach of Tibshirani and Efron (2002) list(threshold = th, nonzero = nonzero, scor = out, scor.preval = out.preval, folds = folds, featurescores.folds = featurescores.folds, v.preval = cur2, type = type, call = this.call) threshold Vector of thresholds considered

superpc.decorrelate 3 Note nonzero scor.preval scor Number of features exceeding each value of the threshold Likelihood ratio scores from pre-validation Full CV scores folds Indices of CV folds used featurescores.folds Feature scores for each fold v.preval type call further notes The pre-validated predictors problem type calling sequence See Also x<-matrix(rnorm(1000*40),ncol=40) y<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) censoring.status<- sample(c(rep(1,30),rep(0,10))) aa<-superpc.cv(a,data) superpc.decorrelate Decorrelate features with respect to competing predictors Fits a linear model to the features as a function of some competing predictors. Replaces the features by the residual from this fit. These "decorrelated" features are then used in the superpc model building process, to explicitly look for predictors that are independent of the competing predictors. Useful for example, when the competing predictors are clinical predictors like stage, grade etc. superpc.decorrelate(x, competing.predictors)

4 superpc.decorrelate x matrix of features. Different features in different rows, one observation per column competing.predictors List of one or more competing predictors. Discrete predictors should be factors Returns lm (linear model) fit of rows of x on compeiting predictors Note further notes References put references to the literature/web site here #generate some data x<-matrix(rnorm(1000*20),ncol=20) y<-10+svd(x[1:30,])$v[,1]+.1*rnorm(20) ytest<-10+svd(x[1:30,])$v[,1]+.1*rnorm(20) censoring.status<- sample(c(rep(1,17),rep(0,3))) censoring.status.test<- sample(c(rep(1,17),rep(0,3))) competing.predictors=list(pred1=rnorm(20), pred2=as.factor(sample(c(1,2),replace=true,siz # decorrelate x foo<-superpc.decorrelate(x,competing.predictors) xnew<-t(foo$res) # now use xnew in superpc data<-list(x=xnew,y=y, censoring.status=censoring.status, featurenames=featurenames) # etc. Remember to decorrelate test data in the same way, before making predictions.

superpc.fit.to.outcome 5 superpc.fit.to.outcome Fit predictive model using outcome of supervised principal components Fit predictive model using outcome of supervised principal components, via either coxph (for surival data) or lm (for regression data) superpc.fit.to.outcome(fit, data.test, score, competing.predictors = NULL, print fit data.test score Object returned by superpc.train Data object for prediction. Same form as data object documented in superpc.train. Supervised principal component score, from superpc.predict competing.predictors Optional- list of competing predictors to be included in the model print iter.max Should a summary of the fit be printed? Default TRUE Max number of iterations used in predictive model fit. Default 5. Currently only relevant for Cox PH model Returns summary of coxph or lm fit Note further notes References put references to the literature/web site here

6 superpc-internal #generate some data x<-matrix(rnorm(1000*20),ncol=20) y<-10+svd(x[1:30,])$v[,1]+.1*rnorm(20) ytest<-10+svd(x[1:30,])$v[,1]+.1*rnorm(20) censoring.status<- sample(c(rep(1,17),rep(0,3))) censoring.status.test<- sample(c(rep(1,17),rep(0,3))) data.test<-list(x=x,y=ytest, censoring.status=censoring.status.test, featurenames= featur fit<- superpc.predict(a, data, data.test, threshold=1.0, n.components=1, prediction.type= superpc.fit.to.outcome(a, data, fit$v.pred) superpc-internal Internal superpc functions Internal superpc functions cor.func(x, y, s0.perc) coxfunc(x, y, censoring.status, s0.perc) coxscor(x, y, ic, offset = rep(0, length(y))) coxvar(x, y, ic, offset = rep(0, length(y)), coxstuff.obj = NULL) mysvd(x, n.components = NULL) superpc.xl.fit.to.clin(fit, data.test,score, pamr.xl.test.sample.labels, pamr.xl superpc.xl.get.threshold.range(train.obj) superpc.xl.listgenes.compute(data, train.obj, fit.red, fitred.cv=null,num.genes= superpc.xl.decorrelate(data, pamr.xl.train.sample.labels, pamr.xl.clindata, pamr superpc.xl.decorrelate.test(object.decorr, xtest, pamr.xl.train.sample.labels, p superpc.xl.rainbowplot(data, pred, pamr.xl.test.sample.labels, pamr.xl.clindat These are not to be called by the user. Eric Bair and Rob Tibshirani

superpc.listfeatures 7 superpc.listfeatures Return a list of the important predictors Return a list of the important predictor superpc.listfeatures(data, train.obj, fit.red, fitred.cv = NULL, num.features=null, component.number = 1) data train.obj fit.red fitred.cv Data object Object returned by superpc.train Object returned by superpc.predict.red, applied to training set (Optional) object returned by superpc.predict.red.cv num.features Number of features to list. Default is all features. component.number Number of principal component (1,2, or 3) used to determine feature importance scores Returns matrix of features and their importance scores, in order of decreasing absolute value of importance score. The importance score is the correlation of the reduced predictor and the full supervised PC predictor. It also lists the raw score- for survival data, this is the Cox score for that feature; for regression, it is the standardized regression coefficient. If fitred.cv is supplied, the function also reports the average rank of the gene in the cross-validation folds, and the proportion of times that the gene is chosen (at the given threshold) in the cross-validation folds. Eric Bair and Rob Tibshirani #generate some data x<-matrix(rnorm(1000*40),ncol=40) y<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) ytest<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) censoring.status<- sample(c(rep(1,30),rep(0,10))) censoring.status.test<- sample(c(rep(1,30),rep(0,10)))

8 superpc.lrtest.curv data.test<-list(x=x,y=ytest, censoring.status=censoring.status.test, featurenames= featur fit<- superpc.predict(a, data, data.test, threshold=1.0, n.components=1, prediction.type= fit.red<- superpc.predict.red(a,data, data.test,.6) superpc.listfeatures(data, a, fit.red, num.features=20) superpc.lrtest.curv Compute values of likelihood ratio test from supervised principal components fit Compute values of likelihood ratio test from supervised principal components fit superpc.lrtest.curv(object, data, newdata, n.components = 1, threshold = NULL, n object data newdata Object returned by superpc.train List of training data, of form described in superpc.train documentation List of test data; same form as training data n.components Number of principal components to compute. Should be 1,2 or 3. threshold Set of thresholds for scoresl default is n.threshold values equally spaced over the range of the feature scores n.threshold Number of thresholds to use; default 20. Should be 1,2 or 3. If it is a LIST, use lrtest s of likelihood ratio test statistic comp2 of comp2 threshold Thresholds used num.features Number of features exceeding threshold type Type of outcome variable call calling sequence

superpc.plot.lrtest 9 Note further notes References put references to the literature/web site here See Also #generate some data x<-matrix(rnorm(1000*20),ncol=20) y<-10+svd(x[1:30,])$v[,1]+.1*rnorm(20) ytest<-10+svd(x[1:30,])$v[,1]+.1*rnorm(20) censoring.status<- sample(c(rep(1,17),rep(0,3))) censoring.status.test<- sample(c(rep(1,17),rep(0,3))) data.test<-list(x=x,y=ytest, censoring.status=censoring.status.test, featurenames= featur fit<- superpc.predict(a, data, data.test, threshold=1.0, n.components=1, prediction.type= aa<- superpc.lrtest.curv(a, data, data.test) superpc.plot.lrtest(aa) superpc.plot.lrtest Plot likelhiood ratio test statistics Plot likelhiood ratio test statistics from output of superpc.predict superpc.plot.lrtest(object.lrtestcurv, call.win.metafile = FALSE)

10 superpc.plot.lrtest object.lrtestcurv Output from superpc.lrtest.curv call.win.metafile For use by PAM Excel interface See Also #generate some data x<-matrix(rnorm(1000*40),ncol=40) y<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) ytest<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) censoring.status<- sample(c(rep(1,30),rep(0,10))) censoring.status.test<- sample(c(rep(1,30),rep(0,10))) data.test<-list(x=x,y=ytest, censoring.status=censoring.status.test, featurenames= featur aa<-superpc.cv(a, data) fit<- superpc.predict(a, data, data.test, threshold=1.0, n.components=1, prediction.type= bb<-superpc.lrtest.curv(a,data,data.test) superpc.plot.lrtest(bb)

superpc.plotcv 11 superpc.plotcv Plot output from superpc.cv Plots pre-validation results from plotcv, to aid in choosing best threshold superpc.plotcv(object, cv.type=c("full","preval"),smooth = TRUE, smooth.df = 10, object cv.type smooth Object returned by superpc.cv Type of cross-validation used- "full" (Default; this is "standard" cross-validation; recommended) and "preval"- pre-validation Should plot be smoothed? Only relevant to "preval". Default FALSE. smooth.df Degrees of freedom for smooth.spline, default 10. If NULL, then degrees of freedom is estimated by cross-validation. call.win.metafile Ignore: for use by PAM Excel program... Additional plotting args to be passed to matplot See Also x<-matrix(rnorm(1000*40),ncol=40) y<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) censoring.status<- sample(c(rep(1,30),rep(0,10))) aa<-superpc.cv(a,data)

12 superpc.plotred.lrtest superpc.plotcv(aa) superpc.plotred.lrtest Plot likelihood ratio test statistics from supervised principal components predictor Plot likelihood ratio test statistics from supervised principal components predictor superpc.plotred.lrtest(object.lrtestred, call.win.metafile=false) object.lrtestred Output from either superpc.predict.red or superpc.predict.redcv call.win.metafile Used only by PAM Excel interface call to function Note further notes References put references to the literature/web site here See Also

superpc.predict 13 #generate some data x<-matrix(rnorm(1000*40),ncol=40) y<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) ytest<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) censoring.status<- sample(c(rep(1,30),rep(0,10))) censoring.status.test<- sample(c(rep(1,30),rep(0,10))) data.test<-list(x=x,y=ytest, censoring.status=censoring.status.test, featurenames= featur aa<-superpc.cv(a, data) fit<- superpc.predict(a, data, data.test, threshold=1.0, n.components=1, prediction.type= fit.red<- superpc.predict.red(a, data, data.test,.6) fit.redcv<- superpc.predict.red.cv(fit.red, aa, data,.6) superpc.plotred.lrtest(fit.redcv) superpc.predict Form principal components predictor from a trained superpc object Computes supervised principal components, using scores from "object" superpc.predict(object, data, newdata, threshold, n.components = 3, prediction.t object data newdata threshold Obect returned by superpc.train List of training data, of form described in superpc.train documentation, List of test data; same form as training data Threshold for scores: features with abs(score)>threshold are retained. n.components Number of principal components to compute. Should be 1,2 or 3. prediction.type "continuous" for raw principal component(s); "discrete" for principal component categorized in equal bins; "nonzero" for indices of features that pass the threshold n.class Number of classes into which predictor is binned (for prediction.type="discrete"

14 superpc.predict.red v.pred u Supervised principal componients predictor U matrix from svd of feature matrix x d singual values from svd of feature matrix x which.features Indices of features exceeding threshold n.components Number of supervised principal components requested call calling sequence #generate some data x<-matrix(rnorm(1000*20),ncol=20) y<-10+svd(x[1:30,])$v[,1]+.1*rnorm(20) ytest<-10+svd(x[1:30,])$v[,1]+.1*rnorm(20) censoring.status<- sample(c(rep(1,17),rep(0,3))) censoring.status.test<- sample(c(rep(1,17),rep(0,3))) data.test<-list(x=x,y=ytest, censoring.status=censoring.status.test, featurenames= featur fit<- superpc.predict(a, data, data.test, threshold=1.0, n.components=1) plot(fit$v.pred,ytest) superpc.predict.red Feature selection for supervised principal components Forms reduced models to approximate the supervised principal component predictor. superpc.predict.red(fit, data, data.test, threshold, n.components = 3, n.shrinka c("continuous", "discrete"), n.class = 2 )

superpc.predict.red 15 fit Note data data.test threshold Object returned by superpc.train Training data object, of form described in superpc.train dcoumentation Test data object; same form as train Feature score threshold; usually estimated from superpc.cv n.components Number of principal components to examine; should equal 1,2, etc up to the number of components used in training n.shrinkage Number of shrinkage values to consider. Default 20. shrinkages Shrinkage values to consider. Default NULL. compute.lrtest Should the likelihood ratio test be computed? Default TRUE sign.wt Signs of feature weights allowed: "both", "pos", or "neg" prediction.type Type of prediction: "continuous" (Default) or "discrete". In the latter, superprc score is divided into n.class groups n.class Number of groups for discrete predictor. Default 2. Soft-thresholding by each of the "shrinkages" values is applied to the PC loadings. This reduce the number of features used in the model. The reduced predictor is then used in place of the supervised PC predictor. shrinkages Shrinkage values used lrtest.reduced Likelihood ratio tests for reduced models num.features Number of features used in each reduced model feature.list List of features used in each reduced model coef import wt v.test v.test.1df Least squares coefficients for each reduced model Importance scores for features Weight for each feature, in constructing the reduced predictor Outcome predictor from reduced models. Array of n.shrinkage by (number of test observations) Outcome combined predictor from reduced models. Array of n.shrinkage by (number of test observations) n.components Number of principal components used type call further notes Type of outcome calling sequence

16 superpc.predict.red.cv References put references to the literature/web site here See Also #generate some data x<-matrix(rnorm(1000*40),ncol=40) y<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) ytest<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) censoring.status<- sample(c(rep(1,30),rep(0,10))) censoring.status.test<- sample(c(rep(1,30),rep(0,10))) data.test<-list(x=x,y=ytest, censoring.status=censoring.status.test, featurenames= featur fit<- superpc.predict(a, data, data.test, threshold=1.0, n.components=1, prediction.type= fit.red<- superpc.predict.red(a,data, data.test, threshold=.6) superpc.plotred.lrtest(fit.red) superpc.predict.red.cv Cross-validation of feature selection for supervised principal components Applies superpc.predict.red to cross-validation folds generates in superpc.cv. Uses the output to evaluate reduced models, and compare them to the full supervised principal components predictor. superpc.predict.red.cv(fitred, fitcv, data, threshold, sign.wt="both") fitred fitcv data threshold sign.wt Output of superpc.predict.red Output of superpc.cv Training data object Feature score threshold; usually estimated from superpc.cv Signs of feature weights allowed: "both", "pos", or "neg"

superpc.predict.red.cv 17 lrtest.reduced Likelihood ratio tests for reduced models Note components Number of supervised principal components used v.preval.red Outcome predictor from reduced models. (number of test observations) type call further notes Type of outcome calling sequence References See Also put references to the literature/web site here Array of num.reduced.models by #generate some data x<-matrix(rnorm(1000*40),ncol=40) y<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) ytest<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) censoring.status<- sample(c(rep(1,30),rep(0,10))) censoring.status.test<- sample(c(rep(1,30),rep(0,10))) data.test<-list(x=x,y=ytest, censoring.status=censoring.status.test, featurenames= featur aa<-superpc.cv(a, data) fit<- superpc.predict(a, data, data.test, threshold=1.0, n.components=1, prediction.type= fit.red<- superpc.predict.red(a,data, data.test, threshold=.6) fit.redcv<- superpc.predict.red.cv(fit.red, aa, data, threshold=.6)

18 superpc.predictionplot superpc.plotred.lrtest(fit.redcv) superpc.predictionplot Plot outcome predictions from superpc Plots outcome predictions from superpc superpc.predictionplot(train.obj, data, data.test, threshold, n.components=3, n.class=2, shrinkage=null, call.win.metafile=false) train.obj data data.test threshold Object returned by superpc.train List of training data, of form described in superpc.train documen tation, List of test data; same form as training data Threshold for scores: features with abs(score)>threshold are retained. n.components Number of principal components to compute. Should be 1,2 or 3. n.class Number of classes for survival stratification. Onply applicable for survival data. Default 2. shrinkage Shrinkage to be applied to feature loadings. Default is NULL meaning no shrinkage call.win.metafile Used only by Excel interface call to function See Also

superpc.rainbowplot 19 x<-matrix(rnorm(1000*40),ncol=40) y<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) censoring.status<- sample(c(rep(1,30),rep(0,10))) superpc.predictionplot(a,data,data,threshold=1) superpc.rainbowplot Make rainbow plot of superpc and compeiting predictors Makes a heatmap display of outcome predictions from superpc, along with expected survival time, and values of competing predictors superpc.rainbowplot(data, pred, sample.labels, call.win.metafile=false) competing.predictors, data List of (test) data, of form described in superpc.train documentation pred Superpc score from superpc.predict or superpc.predict.red sample.labels Vector of sample labels of test data competing.predictors List of competing predictors to be plotted call.win.metafile Used only by Excel interface call to function Any censored survival times are estimated by E(T T > C), where C is the observed censoring time and the Kaplan-Meier estimate from the training set is used to estimate the expectation.

20 superpc.train See Also x<-matrix(rnorm(1000*40),ncol=40) y<-10+svd(x[1:60,])$v[,1]+ 5*rnorm(40) censoring.status<- sample(c(rep(1,30),rep(0,10))) ytest<- 10+svd(x[1:60,])$v[,1]+ 5*rnorm(40) censoring.status.test<- sample(c(rep(1,30),rep(0,10))) competing.predictors.test=list(pred1=rnorm(40), pred2=as.factor(sample(c(1,2),replace =TRUE,size=40))) data.test=list(x=x,y=ytest, censoring.status=censoring.status.test, featurenames=featuren sample.labels=paste("te",as.character(1:40),sep="") pred=superpc.predict(a,data,data.test,threshold=.25, n.components=1)$v.pred superpc.rainbowplot(data,pred, sample.labels,competing.predictors=competing.predictors.te superpc.train Prediction by supervised principal components Does prediction of a quantitative regression or survival outcome, by the supervised principal components method. superpc.train(data, type = c("survival", "regression"), s0.perc=null) data type s0.perc Data object with components x- p by n matrix of features, one observation per column; y- n-vector of outcome measurements; censoring.status- n-vector of censoring censoring.status (1= died or event occurred, 0=survived, or event was censored), needed for a censored survival outcome Problem type: "survival" for censored survival outcome, or "regression" for simple quantitative outcome Factor for denominator of score statistic, between 0 and 1: the percentile of standard deviation values added to the denominator. Default is 0.5 (the median)

superpc.train 21 Compute wald scores for each feature (gene), for later use in superpc.predict and superpc.cv gene.scores=gene.scores, type=type, call = this.call feature.scores Score for each feature (gene) type call problem type calling sequence References Bair E, Tibshirani R (2004) Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2004 April; 2 (4): e108; http://www-stat.stanford.edu/ tibs/superpc #generate some example data x<-matrix(rnorm(1000*40),ncol=40) y<-10+svd(x[1:60,])$v[,1]+.1*rnorm(40) censoring.status<- sample(c(rep(1,30),rep(0,10)))

Index Topic internal superpc-internal, 5 Topic regression superpc.cv, 1 superpc.decorrelate, 3 superpc.fit.to.outcome, 4 superpc.listfeatures, 6 superpc.lrtest.curv, 7 superpc.plot.lrtest, 9 superpc.plotcv, 10 superpc.plotred.lrtest, 11 superpc.predict, 12 superpc.predict.red, 14 superpc.predict.red.cv, 16 superpc.predictionplot, 17 superpc.rainbowplot, 18 superpc.train, 20 Topic survival superpc.cv, 1 superpc.decorrelate, 3 superpc.fit.to.outcome, 4 superpc.listfeatures, 6 superpc.lrtest.curv, 7 superpc.plot.lrtest, 9 superpc.plotcv, 10 superpc.plotred.lrtest, 11 superpc.predict, 12 superpc.predict.red, 14 superpc.predict.red.cv, 16 superpc.predictionplot, 17 superpc.rainbowplot, 18 superpc.train, 20 superpc.fit.to.outcome, 4 superpc.listfeatures, 6 superpc.lrtest.curv, 7 superpc.plot.lrtest, 9 superpc.plotcv, 10 superpc.plotred.lrtest, 11 superpc.predict, 12 superpc.predict.red, 14 superpc.predict.red.cv, 16 superpc.predictionplot, 17 superpc.rainbowplot, 18 superpc.train, 20 superpc.xl.decorrelate (superpc-internal), 5 superpc.xl.fit.to.clin (superpc-internal), 5 superpc.xl.get.threshold.range (superpc-internal), 5 superpc.xl.listgenes.compute (superpc-internal), 5 superpc.xl.rainbowplot (superpc-internal), 5 cor.func (superpc-internal), 5 coxfunc (superpc-internal), 5 coxscor (superpc-internal), 5 coxstuff (superpc-internal), 5 coxvar (superpc-internal), 5 mysvd (superpc-internal), 5 superpc-internal, 5 superpc.cv, 1 superpc.decorrelate, 3 22