Package ChIPtest. July 20, 2016

Similar documents
Package unbalhaar. February 20, 2015

Package ShrinkCovMat

Package BayesNI. February 19, 2015

Package comparec. R topics documented: February 19, Type Package

Package BNN. February 2, 2018

Package GeneExpressionSignature

Package EnergyOnlineCPM

Package SEMModComp. R topics documented: February 19, Type Package Title Model Comparisons for SEM Version 1.0 Date Author Roy Levy

Package locfdr. July 15, Index 5

Package depth.plot. December 20, 2015

Package severity. February 20, 2015

Package LIStest. February 19, 2015

Package MultisiteMediation

Package gtheory. October 30, 2016

Package aspi. R topics documented: September 20, 2016

Package factorqr. R topics documented: February 19, 2015

Package bivariate. December 10, 2018

Package CovSel. September 25, 2013

Package DTK. February 15, Type Package

Package noise. July 29, 2016

Package darts. February 19, 2015

Package flexcwm. R topics documented: July 11, Type Package. Title Flexible Cluster-Weighted Modeling. Version 1.0.

Package rgabriel. February 20, 2015

Package pfa. July 4, 2016

Package spcov. R topics documented: February 20, 2015

Package R1magic. April 9, 2013

Package milineage. October 20, 2017

Package EL. February 19, 2015

The locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5

Package flexcwm. R topics documented: July 2, Type Package. Title Flexible Cluster-Weighted Modeling. Version 1.1.

Package RTransferEntropy

Package dhga. September 6, 2016

Package hierdiversity

Package invgamma. May 7, 2017

Package EMVS. April 24, 2018

Package SpatialPack. March 20, 2018

Package covsep. May 6, 2018

Package NORMT3. February 19, 2015

Package dhsic. R topics documented: July 27, 2017

Package mirlastic. November 12, 2015

Package noncompliance

Package idmtpreg. February 27, 2018

Package gma. September 19, 2017

Package NonpModelCheck

Package R1magic. August 29, 2016

Package lomb. February 20, 2015

Package polypoly. R topics documented: May 27, 2017

Package HarmonicRegression

Package clogitboost. R topics documented: December 21, 2015

Package ProDenICA. February 19, 2015

Package RI2by2. October 21, 2016

Package scio. February 20, 2015

Package geoscale. R topics documented: May 14, 2015

Package TSF. July 15, 2017

Package NarrowPeaks. September 24, 2012

Package CCP. R topics documented: December 17, Type Package. Title Significance Tests for Canonical Correlation Analysis (CCA) Version 0.

Package FDRreg. August 29, 2016

Package emg. R topics documented: May 17, 2018

Package LPTime. March 3, 2015

Package ltsbase. R topics documented: February 20, 2015

Package FinCovRegularization

Package eel. September 1, 2015

Package threg. August 10, 2015

Package Blendstat. February 21, 2018

Package SK. May 16, 2018

Package msir. R topics documented: April 7, Type Package Version Date Title Model-Based Sliced Inverse Regression

Package Rsurrogate. October 20, 2016

Package goftest. R topics documented: April 3, Type Package

Package ICGOR. January 13, 2017

Package BGGE. August 10, 2018

Package CPE. R topics documented: February 19, 2015

Package ProbForecastGOP

Package ebal. R topics documented: February 19, Type Package

Package alabama. March 6, 2015

Package flora. R topics documented: August 29, Type Package. Title flora: taxonomical information on flowering species that occur in Brazil

Package CompQuadForm

Package face. January 19, 2018

Package rnmf. February 20, 2015

Package ensemblepp. R topics documented: September 20, Title Ensemble Postprocessing Data Sets. Version

Package NTW. November 20, 2018

Package GORCure. January 13, 2017

Package robustrao. August 14, 2017

Package gsarima. February 20, 2015

Package survidinri. February 20, 2015

Package reinforcedpred

Package esreg. August 22, 2017

Package cccrm. July 8, 2015

Package CMC. R topics documented: February 19, 2015

Package SPreFuGED. July 29, 2016

Package MAPA. R topics documented: January 5, Type Package Title Multiple Aggregation Prediction Algorithm Version 2.0.4

Package SHIP. February 19, 2015

Package jmcm. November 25, 2017

Package symmoments. R topics documented: February 20, Type Package

Package DiscreteLaplace

Package thief. R topics documented: January 24, Version 0.3 Title Temporal Hierarchical Forecasting

Package LBLGXE. R topics documented: July 20, Type Package

Package covtest. R topics documented:

Package separationplot

Package gk. R topics documented: February 4, Type Package Title g-and-k and g-and-h Distribution Functions Version 0.5.

Package pearson7. June 22, 2016

Transcription:

Type Package Package ChIPtest July 20, 2016 Title Nonparametric Methods for Identifying Differential Enrichment Regions with ChIP-Seq Data Version 1.0 Date 2017-07-07 Author Vicky Qian Wu ; Kyoung-Jae Won ; Hongzhe Li <hongzhe@upenn.edu> Maintainer Vicky Qian Wu <wuqian7@gmail.com> Nonparametric Tests to identify the differential enrichment region for two conditions or time-course ChIP-seq data. It includes: data preprocessing function, estimation of a small constant used in hypothesis testing, a kernel-based two sample nonparametric test, two assumption-free two sample nonparametric test. License GPL (>= 2.15.1) NeedsCompilation yes Repository CRAN Date/Publication 2016-07-20 17:59:16 R topics documented: ChIPtest_1.0-package.................................... 2 data1............................................. 3 data4............................................. 4 est.c............................................. 4 NormTransformation.................................... 5 TS_kernel.......................................... 6 TS_twosample....................................... 7 Index 9 1

2 ChIPtest_1.0-package ChIPtest_1.0-package Nonparametric Methods for Identifying Differential Enrichment Regions with ChIP-seq Data Nonparametric Tests to identify the differential enrichment region for two conditions or time-course ChIP-seq data. It includes: data preprocessing function, estimation of a small constant used in hypothesis testing, a kernel-based two sample nonparametric test, two assumption-free two sample nonparametric test. Package: ChIPtest_1.0 Type: Package Version: 1.0 Date: 2016-07-07 License: GPL ( >=2 LazyLoad: yes Author(s) Vicky Qian Wu ; Kyoung-Jae Won ; Hongzhe Li <hongzhe@upenn.edu> Maintainer: Vicky Qian Wu <wuqian7@gmail.com> Data4=NormTransformation(data4) tao=est.c(data1, Data4, max1=5, max4=5) band=54 TS=TS_twosample(Data1, Data4, tao, band, quant=c(0.9,0.9,0.9))

data1 3 data1 Input data (matrix) for condition A In order to identify the genes that show differential histone modification levels between the two conditions, condition A and condition B, ChIPtest consider the upstream 5000 bp region and downstream 2000 bp region around the transcription start site (TSS) for each gene and divide the regions into 280 bins of 25 bps. Since the two ChIP-seq samples are usually sequenced at different depths (total number of reads), the counts were rescaled according to the sequencing depth ratio. In this example, suppose that there are 5 genes and for each gene, there are 280 observations. The input data matrix has 5 rows and 280 columns. Each row represents for one gene, and each column represents for number of short reads covered at one bin after rescaling. Format The format is: num [1:5, 1:280] 0 0 0 0 1.43 0 1.43 0 0 1.43... - attr(*, "dimnames")=list of 2..$ : chr [1:5] "1" "2" "3" "4".....$ : chr [1:280] "V3" "V4" "V5" "V6"... Source T.S. Mikkelsen, et al. Comparative Epigenomic Analysis of Murine and Human Adipogenesis. Cell, 143 (156-169): 1156-1166 (2010)

4 est.c data4 Input data (matrix) for condition B Format Source In order to identify the genes that show differential histone modification levels between the two conditions, condition A and condition B, ChIPtest consider the upstream 5000 bp region and downstream 2000 bp region around the transcription start site (TSS) for each gene and divide the regions into 280 bins of 25 bps. Since the two ChIP-seq samples are usually sequenced at different depths (total number of reads), the counts were rescaled according to the sequencing depth ratio. In this example, suppose that there are 5 genes and for each gene, there are 280 observations. The input data matrix has 5 rows and 280 columns. Each row represents for one gene, and each column represents for number of short reads covered at one bin after rescaling. The format is: num [1:5, 1:280] 0 0 0 0 0 0 0 0 0 0... - attr(*, "dimnames")=list of 2..$ : chr [1:5] "1" "2" "3" "4".....$ : chr [1:280] "V3" "V4" "V5" "V6"... T.S. Mikkelsen, et al. Comparative Epigenomic Analysis of Murine and Human Adipogenesis. Cell, 143 (156-169): 1156-1166 (2010) est.c calculate the biologically relevant value c in the null hypothesis H0: TS=c, in assumption-free nonparametric test If there is no INPUT experiment (No control), treat the genes with read counts fewer than 5 as the "null genes". Test statistics were calculated based on the those "null genes" and take the average to obtain the value c, which is used in the null hypothesis H0: TS=c

NormTransformation 5 est.c(data1, data4, max1 = 5, max4 = 5) Arguments Value data1 data4 Data Matrix (after VST) for condition A Data Matrix (after VST) for condition B max1 Threshold used to decide null genes for condition A. Default as 5 max4 Threshold used to decide null genes for condition B. Default as 5 Data matrix, the default format is N row by M column. Each row represents for one gene, and each column represents for one bin tao value c in the null hypothesis H0: TS=c Data4=NormTransformation(data4) tao=est.c(data1, Data4, max1=5, max4=5) NormTransformation Variance-stabilizing transformation (VST) procedure Assume observed data approximately follow Poisson Distribution. Apply VST procedure to transform the data as approximate normal with constant variance of 1 NormTransformation(data) Arguments data Input ChIP-seq data (counts), which approximately follow Poisson Distribution.

6 TS_kernel Value Please note the input data can be a single value, a vector or a matrix. If it is a matrix, the default format is N row by M column. Each row represents for one gene, and each column represents for one bin. After VST transformation, the return data matrix would follow Normal Distribution with a constant variance 1 TS_kernel Calculate the Test Statistics for kernel-based nonparametric test. Get the difference between two conditions. Apply Kernel smoothing to fit a smooth curve. Estimate variance for each gene and improve the estimation of variance based on all the genes. Derive test statistics and get the rank list of all the genes. TS_kernel(data, band, quantile) Arguments data band quantile difference matrix between two conditions bandwidth used in kernel smoothing threshold used in variance estimation Note 1: Need to chose a bandwidth. Do not recommend to use cross validation (not gene-specific bandwidth) but chose a fixed biological meaningful bandwidth. A fixed bandwidth which can capture the signal profile and smooth out noise would be recommend. The bandwidth used in reference is 20/280. Note 2: quantile value is based on the distribution of variance estimation of each gene. Recommend to use histogram to double check the distribution. Default 0.9 = 90 %

TS_twosample 7 Value TS TS_sign Tmean Kernel based test statistics after WH transformation. Please refer the details in the reference "+" represent for condition B enriched more than condition A; "-" vice versa Original test statistics, which is calculated as integral of square of kernel estimator Data4=NormTransformation(data4) data=data4-data1 band=54 TS=TS_kernel(data, band, quantile=0.9) TS_twosample Three Nonparametric Test Statistics for two sample ChIP-seq data It includes three nonparametric test statistics for two sample differential analysis: kernel based nonparametric test, assumption-free nonparametric test with equal variance estimation and unequal variance estimation. TS_twosample(data1, data4, tao, band, quant) Arguments data1 data4 tao band quant data matrix (after VST) for condition A data matrix (after VST) for condition B the biologically relevant value c in the null hypothesis H0: TS=c, in assumptionfree nonparametric test bandwidth used in kernel smoothing threshold used in variance estimation

8 TS_twosample Value kernel-based test statistics is the same as "TS_kernel" TS_kn Deql Dnun sigma1 sigma4 Ts_yvec Dsum Sev Suv Xg kernel based test statistics assumption-free nonparametric test statistics with equal variance assumption-free nonparametric test statistics with unequal variance variance estimation for conditon A under equal variance assumption variance estimation for condition B under unequal variance assumption Original statistics, which is calculated as integral of square of kernel estimator Original statistics, which is calculated for nonparametric test without smoothing variance estimation under equal variance assumption variance estimation under unequal variance assumption estimation of standard deviation for kernel-based test statistics Data4=NormTransformation(data4) tao=est.c(data1, Data4, max1=5, max4=5) band=54 TS=TS_twosample(Data1, Data4, tao, band, quant=c(0.9,0.9,0.9))

Index Topic ChIPtest, nonparametric test, ChIP-seq, differential enrichment ChIPtest_1.0-package, 2 Topic datasets data1, 3 data4, 4 ChIPtest_1.0 (ChIPtest_1.0-package), 2 ChIPtest_1.0-package, 2 data1, 3 data4, 4 est.c, 4 NormTransformation, 5 TS_kernel, 6 TS_twosample, 7 9