Package covrna. R topics documented: September 7, Type Package

Similar documents
Package GeneExpressionSignature

Package NanoStringDiff

Package flowmap. October 8, 2018

Package aspi. R topics documented: September 20, 2016

Package idmtpreg. February 27, 2018

Package Path2PPI. November 7, 2018

Package multivariance

Package beam. May 3, 2018

Package SK. May 16, 2018

Package polypoly. R topics documented: May 27, 2017

Package SEMModComp. R topics documented: February 19, Type Package Title Model Comparisons for SEM Version 1.0 Date Author Roy Levy

Package RATest. November 30, Type Package Title Randomization Tests

Package EnergyOnlineCPM

Package msir. R topics documented: April 7, Type Package Version Date Title Model-Based Sliced Inverse Regression

Package rnmf. February 20, 2015

RNA-seq. Differential analysis

Package cellscape. October 15, 2018

Package Perc. R topics documented: December 5, Type Package

Package ChIPtest. July 20, 2016

Package intansv. October 14, 2018

Package darts. February 19, 2015

Package clonotyper. October 11, 2018

Package Select. May 11, 2018

Package NTW. November 20, 2018

Package dhga. September 6, 2016

Multivariate Statistical Analysis

Package NarrowPeaks. September 24, 2012

Package CMC. R topics documented: February 19, 2015

Package depth.plot. December 20, 2015

Package ShrinkCovMat

Package generalhoslem

Package CEC. R topics documented: August 29, Title Cross-Entropy Clustering Version Date

The CCA Package. March 20, 2007

Package esaddle. R topics documented: January 9, 2017

Package RSwissMaps. October 3, 2017

Package RTransferEntropy

Package AID. R topics documented: November 10, Type Package Title Box-Cox Power Transformation Version 2.3 Date Depends R (>= 2.15.

Package varband. November 7, 2016

Processing microarray data with Bioconductor

Package bhrcr. November 12, 2018

Package pfa. July 4, 2016

Package paramap. R topics documented: September 20, 2017

Package mpmcorrelogram

Package sscor. January 28, 2016

Package bigtime. November 9, 2017

Package chemcal. July 17, 2018

Package rgabriel. February 20, 2015

Package MultisiteMediation

Package Tcomp. June 6, 2018

Package yacca. September 11, 2018

Package SMFilter. December 12, 2018

Package testassay. November 29, 2016

Package plw. R topics documented: May 7, Type Package

Parametric Empirical Bayes Methods for Microarrays

Package MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1.

Package misctools. November 25, 2016

The ICS Package. May 4, 2007

Package cleaver. December 13, 2018

Package vhica. April 5, 2016

Package LIStest. February 19, 2015

Package gwqs. R topics documented: May 7, Type Package. Title Generalized Weighted Quantile Sum Regression. Version 1.1.0

Package HarmonicRegression

Package gtheory. October 30, 2016

Package face. January 19, 2018

Package factorqr. R topics documented: February 19, 2015

Package MPkn. May 7, 2018

Package RcppSMC. March 18, 2018

Package rela. R topics documented: February 20, Version 4.1 Date Title Item Analysis Package with Standard Errors

The nltm Package. July 24, 2006

The cramer Package. June 19, cramer.test Index 5

Dimensionality Reduction

Package drgee. November 8, 2016

Package FinCovRegularization

Package bivariate. December 10, 2018

Package SPreFuGED. July 29, 2016

Package mirlastic. November 12, 2015

Package ICBayes. September 24, 2017

The NanoStringDiff package

ABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences

Package TwoStepCLogit

Introduction to analyzing NanoString ncounter data using the NanoStringNormCNV package

Package noise. July 29, 2016

Package SSPA. December 9, 2018

Package EMVS. April 24, 2018

Package gma. September 19, 2017

Package funrar. March 5, 2018

Package CCP. R topics documented: December 17, Type Package. Title Significance Tests for Canonical Correlation Analysis (CCA) Version 0.

Package clustergeneration

ffmanova.m MANOVA for MATLAB version 2.0.

Package lomb. February 20, 2015

Package timelines. August 29, 2016

Package IMMAN. May 3, 2018

Package covsep. May 6, 2018

Package genphen. March 18, 2019

Package reinforcedpred

Package HDtest. R topics documented: September 13, Type Package

fishr Vignette - Age-Length Keys to Assign Age from Lengths

Package dhsic. R topics documented: July 27, 2017

Lecture: Mixture Models for Microbiome data

FINM 331: MULTIVARIATE DATA ANALYSIS FALL 2017 PROBLEM SET 3

Transcription:

Type Package Package covrna September 7, 2018 Title Multivariate Analysis of Transcriptomic Data Version 1.6.0 Author <lara.h.urban@ebi.ac.uk> Maintainer <lara.h.urban@ebi.ac.uk> This package provides the analysis methods fourthcorner and RLQ analysis for large-scale transcriptomic data. License GPL (>= 2) LazyData TRUE Depends ade4, Biobase Imports parallel, genefilter, grdevices, stats, graphics biocviews GeneExpression, Transcription Suggests BiocStyle, knitr, rmarkdown VignetteBuilder knitr git_url https://git.bioconductor.org/packages/covrna git_branch RELEASE_3_7 git_last_commit e268086 git_last_commit_date 2018-04-30 Date/Publication 2018-09-07 R topics documented: covrna-package...................................... 2 Baca dataset......................................... 2 ord.............................................. 3 plot.ord........................................... 5 plot.stat........................................... 6 stat.............................................. 7 vis.............................................. 9 Index 11 1

2 Baca dataset covrna-package The covrna package Details covrna (covariate analysis of RNA-Seq data) is a fast and user-friendly R package which implements fourthcorner analysis and RLQ of transcriptomic data. Gene expression data normally comes with covariates of the samples and of the genes. To analyze associations between sample and gene covariates, the fourthcorner analysis tests the statistical significance of the associations by permutation tests while the RLQ visualizes associations within and be-tween the covariates. The fourthcorner analysis and RLQ implemented in the ade4 package are adapted to easily analyze large-scale transcriptomic data. (1) Runtime and storage space are significantly reduced, (2) the analysis accounts for tran-scriptome-specific shapes of the empirical permutation distributions, (3) the analysis is rendered user-friendly by supplying automation, simple design-ing of plots and unsupervised gene filtering. To cite covrna, please use citation("covrna"). For further details, please refer to the vignette by openvignette("covrna") and the man pages. Package: covrna Type: Package License: GPL (>=2) LazyLoad: yes Maintainer: <lara.h.urban@ebi.ac.uk> References To be announced soon. Baca dataset The Baca dataset The integrated Baca dataset contains the ExpressionSet Baca; its assaydata contains deep sequenced RNA-Seq data of Bacillus anthracis under four stress conditions (with four replicates per stress conditions). The raw sequence reads derive from Passalacqua et al. (2012) and are availaible at Gene Expression Omnibus (GEO, accession number GSE36506). We have already mapped,

ord 3 counted and DESeq2 normalised these counts. The phenodata assigns the stress condition, i.e. ctrl, cold, salt and alcohol stress, to the samples. The featuredata contains COG annotations of the genes. Baca Format ExpressionSet ExpressionSet Source GEO GSE36506 References Passalacqua, K. D., Varadarajan, A., Weist, C., Ondov, B. D., Byrd, B. et al. (2012) Strand-Specific RNA-Seq Reveals Ordered Patterns of Sense and Antisense Transcription in Bacillus anthracis. PLoS ONE, 7(8):e43350. data(baca) fdata(baca) pdata(baca) exprs(baca) ord RLQ for transcriptomic data The RLQ visualises the association between and within sample and gene covariates by ordination. It applies generalized singular value decomposition (GSVD) to the fourthcorner matrix, which contains the associations between the sample and gene covariates. This is realised by eigendecomposition of the covariance matrices of the fourthcorner matrix. The name RLQ refers to the three dataframes R, L and Q to be analyzed. The function ord automates the rlq function of the ade4 package. The input has to be given as dataframe or matrix. Dataframe/matrix L [n x p] contains transcriptomic data of p samples across n genes, dataframe/matrix R [n x m] contains m gene covariates across the n genes and dataframe/matrix Q [p x s] contains s sample covariates across the p samples. Alternatively, objects of the class ExpressionSet (with assaydata, phenodata and featuredata) can be used as input. If the argument ExprSet is missing, the function will use the dataframes/matrices R, L and Q as input.

4 ord Genes can be filtered with respect to their expression variance before analysis (argument exprvar); the function will automatically discard the gene covariates which do not annotate any of the remaining genes. Warning: If R and Q are given as matrices, they will be converted to dataframes at the beginning of the function. Warning: If R or Q is missing, it will be replaced by an identity matrix. Then, a principal component analysis of this matrix will be performed what might be time-consuming, depending on the size of the identity matrix. ord(exprset, R=NULL, L=NULL, Q=NULL, exprvar=1, nf=2) Arguments ExprSet R L Q exprvar nf An ExpressionSet of the Biobase package. The ExpressionSet is used as default input. If no ExpressionSet is given, the individual dataframes/matrices R, L and Q can be used as input. A dataframe/matrix containing information about each gene. The number of rows in R must match the number of rows in L. If R is missing, it will be replaced by an identity matrix [n x n]. A dataframe/matrix of gene expression values of genes across samples. A dataframe/matrix containing information about each sample. The number of rows in Q must match the number of columns in L. If Q is missing, it will be replaced by an identity matrix [p x p]. The fraction of most variably expressed genes to take into account. If the functions stat and ord shall be combined, this value has to be the same in both analyses. The number of axes to be considered by ordination. Details The function automates the following steps. Firstly, Correspondence Analysis is applied to gene expression table L. Either Principal Component Analysis (only quantitative variables), Multiple Correspondence Analysis (only categorical variables) or Hillsmith analysis (quantitative and categorical variables) are applied to the covariate tables R and Q. Secondly, RLQ is applied to the results of these ordination methods. The function returns a list ob class ord where: call rank nf RV eig variance lw gives the original call of the function. gives the rank. gives number of axes to be considered by ordination. gives the RV coefficient. gives a vector of the eigenvalues. gives the variance explained by the axes. gives the row weights of the fourthcorner table.

plot.ord 5 cw gives the column weights of the fourthcorner table. lw gives the row weights of the fourthcorner table. tab gives the fourthcorner table. li gives the coordinates of the covariates of R. l1 gives the normed scores of the covariates of R. co gives the coordinates of the covariates of Q. c1 gives the normed scores of the covariates of Q. lr gives the row coordinates of R. mr gives the normed row scores of R. lq gives the row coordinates of Q. mq gives the normed row scores of Q. ar gives projection of axis onto co-inertia axis of R. ar gives projection of axis onto co-inertia axis of Q. ngenes gives the number of analysed genes. data(baca) ordbaca <- ord(exprset = Baca, exprvar = 1, nf = 2) ls(ordbaca) plot(ordbaca) plot.ord Plot RLQ for transcriptomic data The function plot can visualise different features of an ord object by adjusting the argument "feature". By default, a barplot of the variance explained by the axes of the RLQ is plotted (see arguments). ## S3 method for class 'ord' plot(x, feature="variance", xaxis=1, yaxis=2, cex=1, range=2,...)

6 plot.stat Arguments x feature xaxis, yaxis cex range An object of class ord that shall be visualised by ordination. Defines which features of the object shall be visualised: "columns L","rows L", "columns R" and "columns Q" visualise the respective variables as oridnation, "variance" shows a barplot of the variance explained by the axes, "correlation circle R" and "correlation circle Q" visualise the projection of the original space into the ordination space. Define which axes of ordination shall be shown by x- and y-axis, respectively. Defines size of covariate text. The range of the axes can be extended or reduced, e.g. for the case that not all covariates are visible in the default setting.... More plotting parameters can be added. Plot of RLQ. ordbaca <- ord(baca) plot(ordbaca) plot.stat Plot the fourthcorner analysis for transcriptomic data The function plot produces a cross table of the gene and sample covariates of a stat object. Colours indicate positive/negative significance or absence of significance of the assciations (per default: white for non-significant, red for negative significant and red for positive significant associations). ## S3 method for class 'stat' plot(x, col=c("lightgrey","deepskyblue","red"), sig=true, alpha=0.05, show=c("adj","non-adj"), cex=1, ynames, xnames, ytext=1, xtext=1, shiftx=0, shifty=0,...) Arguments x col sig An object of class stat that shall be visualised as a cross table. A vector of three colours. The first colour represents non-significant, the second positive significant, the third negative significant associations in the cross table. If TRUE (default), only covariates involved in at least one significant association are plotted.

stat 7 alpha show cex The significance level. adj or non-adj indicate if adjusted or raw p-values shall be plotted, respectively. The magnitude of the text in the cross table. ynames, xnames Row and column names of the cross table. By default, the column names of R and Q are used, respectively. ytext, xtext Rotation of the row and column names of the cross table. shifty, shiftx Shift of the row and column names to the right or to the left.... More plotting parameters can be added. Plot of fourthcorner analysis. statbaca <- stat(baca, nrcor = 2) plot(statbaca) stat Fourthcorner analysis for transcriptomic data The fourthcorner analysis tests for significant associations between each sample covariate and each gene covariate by statistical permutation tests. The sample and gene covariates can be categorical and/or quantitative. The input has to be given as dataframe or matrix. Dataframe/matrix L [n x p] contains transcriptomic data of p samples across n genes, dataframe/matrix R [n x m] contains m gene covariates across the n genes and dataframe/matrix Q [p x s] contains s sample covariates across the p samples. Alternatively, objects of the class ExpressionSet (with assaydata, phenodata and featuredata) can be used as input. If the argument ExprSet is missing, the function will use the dataframes/matrices R, L and Q as input. The number of permutations is set to 9999 per default to assure significance of p-values after multiple testing correction. As computation time increases with size of the matrices/dataframes and with number of permutations, parallelization across multiple cores is highly recommended. Per default, all except one CPU cores on the current host are used. Genes can be filtered with respect to their expression variance before analysis (argument exprvar); the function will automatically discard the gene covariates which do not annotate any of the remaining genes. Warning: If R and Q are given as matrices, they will be converted to dataframes at the beginning of the function. Warning: If R or Q is missing, it will be replaced by an identity matrix.

8 stat stat(exprset, R=NULL, L=NULL, Q=NULL, npermut=9999, padjust="bh", nrcor=detectcores()-1, exprvar=1) Arguments ExprSet R L Q npermut padjust nrcor exprvar An ExpressionSet of the Biobase package. The ExpressionSet is used as default input. If no ExpressionSet is given, the individual dataframes/matrices R, L and Q can be used as input. A dataframe/matrix containing information about each gene. The number of rows in R must match the number of rows in L. If R is missing, it will be replaced by an identity matrix [n x n]. A dataframe/matrix of gene expression values of genes across samples. A dataframe/matrix containing information about each sample. The number of rows in Q must match the number of columns in L. If Q is missing, it will be replaced by an identity matrix [p x p]. The number of permutations. The method of multiple testing adjustment of the pvalues, see p.adjust.methods for all methods implemented in R. The number of cores to be used. The fraction of most variably expressed genes to take into account. If the functions stat and ord shall be combined, this value has to be the same in both analyses. Details Dependent on the covariate combination, a statistic is calculated based on matrix multiplication of the three tables. This statistic amounts to a correlation coefficient for the association between quantitative-quantitative and quantitative-categorical variables and to a Chi2-related statistic for the association between categorical-categorical variables. The function returns a list of class stat where: stat is a cross table (m x s) with the values of the original statistical tests per covariate combination. pvalue, adj.pvalue are cross tables (m x s) which contain the p-values and adjusted p-values, respectively, of the permutation tests per covariate combination. adjust.method npermut ngenes call shows the applied multiple testing adjustment method. gives the number of permutations per permutation test. gives the number of analysed genes ("all" in the case of no filtering of the genes). gives the original call of the function.

vis 9 data(baca) statbaca <- stat(exprset = Baca, npermut = 999, padjust = "BH", nrcor = 2, exprvar = 1) statbaca$adj.pvalue plot(statbaca) vis Simultaneous visualisation of transcriptomic data by combining fourthcorner analysis and RLQ The vis function simultaneously visualizes the results of the functions stat and ord. Firstly, all covariates of R and Q are visualized by ordination in one plot; covariates involved in at least one significant association are shown in black, other covariates are shown in gray. Then, all covariates that are significantly associated according to stat are connected by lines which color represents the character of their significance. vis(stat, Ord=NULL, alpha=0.05, xaxis=1, yaxis=2, col=c("gray", transblue, transred), alphatrans=0.5, cex=1, rangex=2, rangey=2,...) Arguments Stat Ord alpha xaxis, yaxis col alphatrans cex An object of class stat. An object of class ord. The objects stat and ord should have the same value ngenes. The significance level. Define which axes of ordination shall be shown by x- and y-axis, respectively. A vector of three colors. The first color represents non-significant variables, the second positive significant, the third negative significant associations. Defines degree of transparency of the second and third color. The magnitude of the text in the ordination. rangex, rangey The range of the x axis and y axis can be extended or reduced, e.g. for the case that not all covariates are visible in the default setting.... More plotting parameters can be added. Plot of fourthcorner analysis and RLQ.

10 vis data(baca) statbaca <- stat(baca, nrcor = 2) ordbaca <- ord(baca) vis(stat = statbaca, Ord = ordbaca) vis(ord = ordbaca)

Index Topic dataset Baca dataset, 2 Baca (Baca dataset), 2 Baca dataset, 2 covrna (covrna-package), 2 covrna-package, 2 ord, 3 plot.ord, 5 plot.stat, 6 stat, 7 vis, 9 11