An Omnibus Consistent Adaptive Percentile Modified. Wilcoxon Rank Sum Test with Applicaitions in Gene. Expression Studies

Size: px
Start display at page:

Download "An Omnibus Consistent Adaptive Percentile Modified. Wilcoxon Rank Sum Test with Applicaitions in Gene. Expression Studies"

Transcription

1 An Omnibus Consistent Adaptive Percentile Modified Wilcoxon Rank Sum Test with Applicaitions in Gene Expression Studies O. Thas, L. Clement, J.C.W. Rayner, B. Carvalho, and W. Van Criekinge Supplementary Material Corresponding author: Olivier Thas, Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Belgium ( 1

2 Web Appendix A Tables 1, 2, 3 and 4 give the empirical powers. See section 4 of the paper for the details. 2

3 Table 1: Powers of exact permutation tests for location-shift alternatives with different parent distributions (n = m = 10, α = 0.05, powers approximated based on 10, 000 simulations) θ Test Uniform Normal Cauchy Chi Expon. Logistic 0.0 WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test

4 Table 2: Powers of permutation tests for location-shift alternatives with different parent distributions (n = m = 25, α = 0.05, powers approximated based on 10, 000 simulations, p-values based on 200, 000 Monte Carlo runs) θ Test Uniform Normal Cauchy Chi Expon. Logistic 0.0 WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test

5 Web Appendix B The application of our test to the complete set of genes resulted in a list of significantly differentially expressed genes. The list contains some genes for which differential expression was not detected by the t-test or the WMW test and some of these genes have been biologically validated by confirming that the gene s promotor is methylated in most patients of the adenoma group. Methylation is an epigenetic process that results in gene silencing, i.e. the gene cannot be expressed when its promotor is methylated. The methylation status has been determined in an MSP (Methylation Specific PCR) experiment. We present the results of the LDOC1L gene with accession number NM , which is known as a gene for a Leucine zipper. The data are presented in Table 1. With the two-sided Welch t-test and the WMW tests we find p-values and , both nonsignificant at the α = 0.05 level. The adaptive PMW test gives R n = 2.86 (p, r [0.01, 0.5]) with a p-value of 0.02, resulting in the rejection of the null hypothesis. The value of R n corresponds to S n (p, r) using only a fraction of p = 15.71% observations in the right tail, and a fraction of r = 4.41% observations in the left tail of the pooled sample. This suggests that the difference between the two distributions is more pronounced for the larger expression values. This is also illustrated in Figure 1 of Web Appendix B, which shows kernel density estimates of the two distributions. The biological validation implies that the rejection of the null hypothesis is not a false positive result. This case study demonstrates that the APMW test is successful in detecting biologically relevant effects where some other tests fail. 5

6 Density Expresssion Value Figure 1: Density estimates of the expression values of the LDOC1L gene in adenoma (full line) and carcinoma patients (dashed line) In the original study (Carvalho et al., 2008, 2009) no corrections for multiple testing were performed. The biologists selected potential differentially expressed genes based on the ranking of the p-values of genes that had p-values smaller than 0.05 for both the Welch t-test and the WMW test, but they also selected a few genes that resulted in small p-values for the APMW test but that were not detected by the other two tests. This resulted in the detection of the leucine zipper gene. We do not advocate this procedure, but it demonstrates that the APMW test is successful in detecting biologically relevant effects where some other tests fail. 6

7 Web Appendix C In the absence of ties the exact means and variances of T n (p, r) for odd n under H 0 are E {T n (p, r)} = n 1 2n {n p(n p + 1) n r (n r + 1)} and Var {T n (p, r)} = n 1n 2 n p (n p + 1) 12n 2 (n 1) { 4nnp + 2n 3n 2 p 3n p } +2 n 1n 2 n p n r 4n 2 (n 1) (n p + 1)(n r + 1) + n 1 n 2 n r (n r + 1) { 4nnr 12n 2 + 2n 3n 2 } r 3n r, (n 1) and the exact means and variances of T n (p, r) for even n under H 0 are E {T n (p, r)} = n 1 2n {n2 p n 2 r} and Var {T n (p, r)} = n 1 n 2 n p { 4nn 2 12n 2 (n 1) p 3n 3 p n } + 2 n 1n 2 n 2 pn 2 r 4n 2 (n 1) + n 1 n 2 n r { 4nn 2 12n 2 (n 1) r 3n 3 r n }. 7

8 Web Appendix D Here we present the proofs of the exact mean and variance in the presence of ties (see Appendix A). We only give the proofs for the upper fraction statistic T np. The proof for the lower fraction statistic B nr is completely analogous. Preliminaries For a given fraction p there are n p observations of the pooled sample in the upper fraction. Let N p1 denote the number of observations of the first sample that appear in the upper fraction, i.e. n N p1 = c i, i=n n p+1 where c i is as defined in section 3 (i.e. c i = 1 if the ith smallest observation in the pooled sample comes from sample 1). Note that N p1 has a hypergeometric distribution with parameters n p, n and n 1, i.e. P (N p = k) = ( np k )( n np n 1 =k ) ( n n 1 ). Thus E {N p } = n 1n p n, Var {N p } = n 1n p n n p n 1 n n n 1, n 8

9 and E { Np 2 } = Var {Np } + [E {N p }] 2 = n 1n p n(n 1) (n n p n 1 + n 1 n p ). Let C t p = (c n np+1,..., c n ) denote the vector with the sample indicators of the observations in the upper fraction. Then P (C p = c N p = k) = 1 ). ( np k Hence, for any i {n n p + 1,..., n}, P (c pi = 1 N p = k) = = c:c pi =1 ( np 1 k 1 ( np ) k = k n p. ) P (C p = c N p = k) Thus also E {c pi N p = k} = k n p. Finally, we will also need P (c i = 1 and c j = 1 N p = k). We make a distinction between i = j and i j. When i = j, P (c i = 1 and c j = 1 N p = k) = P (c i = 1 N p = k) = k n p. 9

10 When i j, P (c i = 1 and c j = 1 N p = k) = P (c i = 1 c j = 1 and N p = k) P (c j = 1 N p = k) = k 1 k. n p 1 n p Exact Mean { E {T np } = E Np ECp N p {E {T np C p, N p }} } n = E Np E C p N p c i a(i, τ) = n i=n n p+1 = 1 n p E Np {N p } = n 1 n A p(τ). i=n n p+1 { } Np E Np a(i, τ) n p n i=n n p+1 a(i, τ) Exact Variance First we write Var {T pn } = E Cp,N p {Var {T np C p, N p }} + Var Cp,N p {E {T np C p, N p }} = E Np { ECp N p {Var {T C p, N p } N p } } (1) +E Np { VarCp N p {E {T np C p, N p } N p } } (2) +Var Np { ECp N p {E {T np C p.n p } N p } }. (3) 10

11 Term (1) contains Var {T C p, N p } = 0 and it is thus exactly zero. For Term 2 we first calculate Var Cp N p {E {T np C p, N p } N p }, Var Cp Np {E {T np C p, N p } N p } n = Var Cp Np c i a(i, τ) N p = = n i=n n p+1 n i=n n p+1 j=n n p+1 n n i=n n p+1 j=n n p+1 = N p(n p N p ) n 2 p [ A (2) p a(i, τ)a(j, τ)cov {c i, c j N p } a(i, τ)a(j, τ) (τ) A(1,1) p [ ] (τ). n p 1 P (c i = 1 and c j = 1 N p ) ( Np n p ) 2 ] Term 2 then becomes E Np { VarCp N p {E {T np C p, N p } N p } } = n 1(n p 1)(n n 1 ) n(n 1)n p ( ) A (2) p (τ) A(1,1) n p 1. For Term 3 we first calculate E Cp N p {E {T np C p.n p } N p }, E Cp Np {E {T np C p.n p } N p } n = E Cp Np c i a(i, τ) N p = n i=n n p+1 = N p n p A p (τ). i=n n p+1 N p n p a(i, τ) 11

12 Finally, { } Np Var Np A p (τ) = A(2) p n p n 2 p (τ) Var {N p } = A (2) p (τ) n 1(n n p )(n n 1 ) n 2 (n 1)n p. Now that we have the three terms, Var {T pn } = n 1(n n 1 ) n 2 (n 1) ( (n 1)A (2) p (τ) A (1,1) p ) (τ). 12

13 Web Appendix E The R package APMW contains the R function apmw.test that performs the APMW test. The result of the function is an R object of the htest class. Type help(apmw.test) on the R command line for more help. The file name of the source code of the R package is apmw.zip. The R file apmw example.r shows the R code to analyse the Leucine zipper gene data. See Also Section 5 of the paper and Web Appendix B for the data. Make sure that the compiled c-code file, the R source file and the data file are in your R working directory. 13

14 References Carvalho, B., Postma, C., Mongera, S., Hopmans, E., Diskin, S., van de Wiel, M., Van Criekinge, W., Thas, O., Matth Ai, A., Cuesta, M., Terhaar, J., Craanen, M., Schr Ock, E., Ylstra, B., & Meijer, G. (2008). Integration of DNA and expression microarray data unravels seven putative oncogenes on 20q amplicon involved in colorectal adenoma to carcinoma progression. Cellular Oncology, 30, Carvalho, B., Postma, C., Mongera, S., Hopmans, E., Diskin, S., van de Wiel, M., Van Criekinge, W., Thas, O., Matth Ai, A., Cuesta, M., Terhaar, J., Craanen, M., Schr Ock, E., Ylstra, B., & Meijer, G. (2009). Multiple putative oncogenes at the chromosome 20q amplicon contribute to colorectal adenoma to carcinoma progression. GUT, 58,

15 Table 3: Powers of permutation tests for location-shift alternatives with different parent distributions (n = m = 50, α = 0.05, powers approximated based on 10, 000 simulations, p-values based on 200, 000 Monte Carlo runs) θ Test Uniform Normal Cauchy Chi Expon. Logistic 0.0 WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test WMW Gastwirth HFR LT MAX SUM APMW Lepage BWS t test

16 Table 4: Gene expression levels of the LDOC1L gene (expression values after RMA preprocessing) adenoma carcinoma

Sample Size and Power Calculation in Microarray Studies Using the sizepower package.

Sample Size and Power Calculation in Microarray Studies Using the sizepower package. Sample Size and Power Calculation in Microarray Studies Using the sizepower package. Weiliang Qiu email: weiliang.qiu@gmail.com Mei-Ling Ting Lee email: meilinglee@sph.osu.edu George Alex Whitmore email:

More information

Introduction to Statistical Data Analysis III

Introduction to Statistical Data Analysis III Introduction to Statistical Data Analysis III JULY 2011 Afsaneh Yazdani Preface Major branches of Statistics: - Descriptive Statistics - Inferential Statistics Preface What is Inferential Statistics? The

More information

Comparison of Power between Adaptive Tests and Other Tests in the Field of Two Sample Scale Problem

Comparison of Power between Adaptive Tests and Other Tests in the Field of Two Sample Scale Problem Comparison of Power between Adaptive Tests and Other Tests in the Field of Two Sample Scale Problem Chikhla Jun Gogoi 1, Dr. Bipin Gogoi 2 1 Research Scholar, Department of Statistics, Dibrugarh University,

More information

Nonparametric Tests. Mathematics 47: Lecture 25. Dan Sloughter. Furman University. April 20, 2006

Nonparametric Tests. Mathematics 47: Lecture 25. Dan Sloughter. Furman University. April 20, 2006 Nonparametric Tests Mathematics 47: Lecture 25 Dan Sloughter Furman University April 20, 2006 Dan Sloughter (Furman University) Nonparametric Tests April 20, 2006 1 / 14 The sign test Suppose X 1, X 2,...,

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution

More information

Statistics Applied to Bioinformatics. Tests of homogeneity

Statistics Applied to Bioinformatics. Tests of homogeneity Statistics Applied to Bioinformatics Tests of homogeneity Two-tailed test of homogeneity Two-tailed test H 0 :m = m Principle of the test Estimate the difference between m and m Compare this estimation

More information

Distribution-Free Procedures (Devore Chapter Fifteen)

Distribution-Free Procedures (Devore Chapter Fifteen) Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal

More information

Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)

Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines) Dr. Maddah ENMG 617 EM Statistics 10/12/12 Nonparametric Statistics (Chapter 16, Hines) Introduction Most of the hypothesis testing presented so far assumes normally distributed data. These approaches

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

Visual interpretation with normal approximation

Visual interpretation with normal approximation Visual interpretation with normal approximation H 0 is true: H 1 is true: p =0.06 25 33 Reject H 0 α =0.05 (Type I error rate) Fail to reject H 0 β =0.6468 (Type II error rate) 30 Accept H 1 Visual interpretation

More information

Chapter 18 Resampling and Nonparametric Approaches To Data

Chapter 18 Resampling and Nonparametric Approaches To Data Chapter 18 Resampling and Nonparametric Approaches To Data 18.1 Inferences in children s story summaries (McConaughy, 1980): a. Analysis using Wilcoxon s rank-sum test: Younger Children Older Children

More information

Resampling and the Bootstrap

Resampling and the Bootstrap Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing

More information

5 Introduction to the Theory of Order Statistics and Rank Statistics

5 Introduction to the Theory of Order Statistics and Rank Statistics 5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future

More information

Probabilistic Index Models

Probabilistic Index Models Probabilistic Index Models Jan De Neve Department of Data Analysis Ghent University M3 Storrs, Conneticut, USA May 23, 2017 Jan.DeNeve@UGent.be 1 / 37 Introduction 2 / 37 Introduction to Probabilistic

More information

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:

More information

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample

More information

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of

More information

NONPARAMETRICS. Statistical Methods Based on Ranks E. L. LEHMANN HOLDEN-DAY, INC. McGRAW-HILL INTERNATIONAL BOOK COMPANY

NONPARAMETRICS. Statistical Methods Based on Ranks E. L. LEHMANN HOLDEN-DAY, INC. McGRAW-HILL INTERNATIONAL BOOK COMPANY NONPARAMETRICS Statistical Methods Based on Ranks E. L. LEHMANN University of California, Berkeley With the special assistance of H. J. M. D'ABRERA University of California, Berkeley HOLDEN-DAY, INC. San

More information

Formulas and Tables by Mario F. Triola

Formulas and Tables by Mario F. Triola Copyright 010 Pearson Education, Inc. Ch. 3: Descriptive Statistics x f # x x f Mean 1x - x s - 1 n 1 x - 1 x s 1n - 1 s B variance s Ch. 4: Probability Mean (frequency table) Standard deviation P1A or

More information

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains: CHAPTER 8 Test of Hypotheses Based on a Single Sample Hypothesis testing is the method that decide which of two contradictory claims about the parameter is correct. Here the parameters of interest are

More information

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely

More information

CH.9 Tests of Hypotheses for a Single Sample

CH.9 Tests of Hypotheses for a Single Sample CH.9 Tests of Hypotheses for a Single Sample Hypotheses testing Tests on the mean of a normal distributionvariance known Tests on the mean of a normal distributionvariance unknown Tests on the variance

More information

6 Single Sample Methods for a Location Parameter

6 Single Sample Methods for a Location Parameter 6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually

More information

OHSU OGI Class ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Brainerd Basic Statistics Sample size?

OHSU OGI Class ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Brainerd Basic Statistics Sample size? ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Basic Statistics Sample size? Sample size determination: text section 2-4-2 Page 41 section 3-7 Page 107 Website::http://www.stat.uiowa.edu/~rlenth/Power/

More information

Empirical Power of Four Statistical Tests in One Way Layout

Empirical Power of Four Statistical Tests in One Way Layout International Mathematical Forum, Vol. 9, 2014, no. 28, 1347-1356 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2014.47128 Empirical Power of Four Statistical Tests in One Way Layout Lorenzo

More information

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true Hypothesis esting Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true Statistical Hypothesis: conjecture about a population parameter

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Chapter 7 Comparison of two independent samples

Chapter 7 Comparison of two independent samples Chapter 7 Comparison of two independent samples 7.1 Introduction Population 1 µ σ 1 1 N 1 Sample 1 y s 1 1 n 1 Population µ σ N Sample y s n 1, : population means 1, : population standard deviations N

More information

Introduction to Nonparametric Statistics

Introduction to Nonparametric Statistics Introduction to Nonparametric Statistics by James Bernhard Spring 2012 Parameters Parametric method Nonparametric method µ[x 2 X 1 ] paired t-test Wilcoxon signed rank test µ[x 1 ], µ[x 2 ] 2-sample t-test

More information

Generalized nonparametric tests for one-sample location problem based on sub-samples

Generalized nonparametric tests for one-sample location problem based on sub-samples ProbStat Forum, Volume 5, October 212, Pages 112 123 ISSN 974-3235 ProbStat Forum is an e-journal. For details please visit www.probstat.org.in Generalized nonparametric tests for one-sample location problem

More information

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs STATISTICS 4 Summary Notes. Geometric and Exponential Distributions GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs P(X = x) = ( p) x p x =,, 3,...

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

SPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1

SPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1 SPH 247 Statistical Analysis of Laboratory Data April 28, 2015 SPH 247 Statistics for Laboratory Data 1 Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure and

More information

Statistics for Managers Using Microsoft Excel Chapter 9 Two Sample Tests With Numerical Data

Statistics for Managers Using Microsoft Excel Chapter 9 Two Sample Tests With Numerical Data Statistics for Managers Using Microsoft Excel Chapter 9 Two Sample Tests With Numerical Data 999 Prentice-Hall, Inc. Chap. 9 - Chapter Topics Comparing Two Independent Samples: Z Test for the Difference

More information

Adaptive Procedures for the Wilcoxon Mann Whitney Test: Seven Decades of Advances

Adaptive Procedures for the Wilcoxon Mann Whitney Test: Seven Decades of Advances Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Adaptive Procedures for the Wilcoxon Mann Whitney Test:

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

Sample Size Estimation for Studies of High-Dimensional Data

Sample Size Estimation for Studies of High-Dimensional Data Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,

More information

9-6. Testing the difference between proportions /20

9-6. Testing the difference between proportions /20 9-6 Testing the difference between proportions 1 Homework Discussion Question p514 Ex 9-6 p514 2, 3, 4, 7, 9, 11 (use both the critical value and p-value for all problems. 2 Objective Perform hypothesis

More information

A Regression Framework for Rank Tests Based on the Probabilistic Index Model

A Regression Framework for Rank Tests Based on the Probabilistic Index Model A Regression Framework for Rank Tests Based on the Probabilistic Index Model Jan De Neve and Olivier Thas We demonstrate how many classical rank tests, such as the Wilcoxon Mann Whitney, Kruskal Wallis

More information

Session 3 The proportional odds model and the Mann-Whitney test

Session 3 The proportional odds model and the Mann-Whitney test Session 3 The proportional odds model and the Mann-Whitney test 3.1 A unified approach to inference 3.2 Analysis via dichotomisation 3.3 Proportional odds 3.4 Relationship with the Mann-Whitney test Session

More information

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA) BSTT523 Pagano & Gauvreau Chapter 13 1 Nonparametric Statistics Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA) In particular, data

More information

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES Hisashi Tanizaki Graduate School of Economics, Kobe University, Kobe 657-8501, Japan e-mail: tanizaki@kobe-u.ac.jp Abstract:

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz

More information

Selection should be based on the desired biological interpretation!

Selection should be based on the desired biological interpretation! Statistical tools to compare levels of parasitism Jen_ Reiczigel,, Lajos Rózsa Hungary What to compare? The prevalence? The mean intensity? The median intensity? Or something else? And which statistical

More information

Module 9: Nonparametric Statistics Statistics (OA3102)

Module 9: Nonparametric Statistics Statistics (OA3102) Module 9: Nonparametric Statistics Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 15.1-15.6 Revision: 3-12 1 Goals for this Lecture

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test.

STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test. STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test. Rebecca Barter March 30, 2015 Mann-Whitney Test Mann-Whitney Test Recall that the Mann-Whitney

More information

The Chi-Square Distributions

The Chi-Square Distributions MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness

More information

Version 1: Equality of Distributions. 3. F (x) and G(x) represent the distribution functions corresponding to the Xs and Y s, respectively.

Version 1: Equality of Distributions. 3. F (x) and G(x) represent the distribution functions corresponding to the Xs and Y s, respectively. 4 Two-Sample Methods 4.1 The (Mann-Whitney) Wilcoxon Rank Sum Test Version 1: Equality of Distributions Assumptions: Given two independent random samples X 1, X 2,..., X n and Y 1, Y 2,..., Y m : 1. The

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679 APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 1 Table I Summary of Common Probability Distributions 2 Table II Cumulative Standard Normal Distribution Table III Percentage Points, 2 of the Chi-Squared

More information

The Difference in Proportions Test

The Difference in Proportions Test Overview The Difference in Proportions Test Dr Tom Ilvento Department of Food and Resource Economics A Difference of Proportions test is based on large sample only Same strategy as for the mean We calculate

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Non-parametric tests, part A:

Non-parametric tests, part A: Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are

More information

One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays

One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays Peter Tiňo School of Computer Science University of Birmingham, UK One-shot Learning

More information

Statistical Procedures for Testing Homogeneity of Water Quality Parameters

Statistical Procedures for Testing Homogeneity of Water Quality Parameters Statistical Procedures for ing Homogeneity of Water Quality Parameters Xu-Feng Niu Professor of Statistics Department of Statistics Florida State University Tallahassee, FL 3306 May-September 004 1. Nonparametric

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Drawing Inferences from Statistics Based on Multiyear Asset Returns

Drawing Inferences from Statistics Based on Multiyear Asset Returns Drawing Inferences from Statistics Based on Multiyear Asset Returns Matthew Richardson ames H. Stock FE 1989 1 Motivation Fama and French (1988, Poterba and Summer (1988 document significant negative correlations

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Using Tables and Graphing Calculators in Math 11

Using Tables and Graphing Calculators in Math 11 Using Tables and Graphing Calculators in Math 11 Graphing calculators are not required for Math 11, but they are likely to be helpful, primarily because they allow you to avoid the use of tables in some

More information

Distribution Fitting (Censored Data)

Distribution Fitting (Censored Data) Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...

More information

Resampling and the Bootstrap

Resampling and the Bootstrap Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing

More information

A Review of the Behrens-Fisher Problem and Some of Its Analogs: Does the Same Size Fit All?

A Review of the Behrens-Fisher Problem and Some of Its Analogs: Does the Same Size Fit All? A Review of the Behrens-Fisher Problem and Some of Its Analogs: Does the Same Size Fit All? Authors: Sudhir Paul Department of Mathematics and Statistics, University of Windsor, Ontario, Canada (smjp@uwindsor.ca)

More information

Biochip informatics-(i)

Biochip informatics-(i) Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing

More information

Non-parametric Hypothesis Testing

Non-parametric Hypothesis Testing Non-parametric Hypothesis Testing Procedures Hypothesis Testing General Procedure for Hypothesis Tests 1. Identify the parameter of interest.. Formulate the null hypothesis, H 0. 3. Specify an appropriate

More information

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<= A frequency distribution is a kind of probability distribution. It gives the frequency or relative frequency at which given values have been observed among the data collected. For example, for age, Frequency

More information

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,

More information

What to do today (Nov 22, 2018)?

What to do today (Nov 22, 2018)? What to do today (Nov 22, 2018)? Part 1. Introduction and Review (Chp 1-5) Part 2. Basic Statistical Inference (Chp 6-9) Part 3. Important Topics in Statistics (Chp 10-13) Part 4. Further Topics (Selected

More information

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and

More information

A3. Statistical Inference Hypothesis Testing for General Population Parameters

A3. Statistical Inference Hypothesis Testing for General Population Parameters Appendix / A3. Statistical Inference / General Parameters- A3. Statistical Inference Hypothesis Testing for General Population Parameters POPULATION H 0 : θ = θ 0 θ is a generic parameter of interest (e.g.,

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

f (1 0.5)/n Z =

f (1 0.5)/n Z = Math 466/566 - Homework 4. We want to test a hypothesis involving a population proportion. The unknown population proportion is p. The null hypothesis is p = / and the alternative hypothesis is p > /.

More information

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval

Epidemiology Principles of Biostatistics Chapter 10 - Inferences about two populations. John Koval Epidemiology 9509 Principles of Biostatistics Chapter 10 - Inferences about John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered 1. differences in

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same! Two sample tests (part II): What to do if your data are not distributed normally: Option 1: if your sample size is large enough, don't worry - go ahead and use a t-test (the CLT will take care of non-normal

More information

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 10-11: Statistical Inference: Hypothesis Testing

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 10-11: Statistical Inference: Hypothesis Testing Finansiell Statistik, GN, 15 hp, VT008 Lecture 10-11: Statistical Inference: Hypothesis Testing Gebrenegus Ghilagaber, PhD, Associate Professor April 1, 008 1 1 Statistical Inferences: Introduction Recall:

More information

Bivariate Paired Numerical Data

Bivariate Paired Numerical Data Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Section 3: Permutation Inference

Section 3: Permutation Inference Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 1 / 47 Introduction Throughout this slides we will focus only on randomized experiments, i.e

More information

Information Retrieval

Information Retrieval Information Retrieval Online Evaluation Ilya Markov i.markov@uva.nl University of Amsterdam Ilya Markov i.markov@uva.nl Information Retrieval 1 Course overview Offline Data Acquisition Data Processing

More information

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Jurusan Teknik Industri Universitas Brawijaya Outline Introduction The Analysis of Variance Models for the Data Post-ANOVA Comparison of Means Sample

More information

Nonparametric Location Tests: k-sample

Nonparametric Location Tests: k-sample Nonparametric Location Tests: k-sample Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

Relating Graph to Matlab

Relating Graph to Matlab There are two related course documents on the web Probability and Statistics Review -should be read by people without statistics background and it is helpful as a review for those with prior statistics

More information

TMA4255 Applied Statistics V2016 (23)

TMA4255 Applied Statistics V2016 (23) TMA4255 Applied Statistics V2016 (23) Part 7: Nonparametric tests Signed-Rank test [16.2] Wilcoxon Rank-sum test [16.3] Anna Marie Holand April 19, 2016, wiki.math.ntnu.no/tma4255/2016v/start 2 Outline

More information

Analysis of 2x2 Cross-Over Designs using T-Tests

Analysis of 2x2 Cross-Over Designs using T-Tests Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous

More information

ST4241 Design and Analysis of Clinical Trials Lecture 7: N. Lecture 7: Non-parametric tests for PDG data

ST4241 Design and Analysis of Clinical Trials Lecture 7: N. Lecture 7: Non-parametric tests for PDG data ST4241 Design and Analysis of Clinical Trials Lecture 7: Non-parametric tests for PDG data Department of Statistics & Applied Probability 8:00-10:00 am, Friday, September 2, 2016 Outline Non-parametric

More information

APPENDIX B Sample-Size Calculation Methods: Classical Design

APPENDIX B Sample-Size Calculation Methods: Classical Design APPENDIX B Sample-Size Calculation Methods: Classical Design One/Paired - Sample Hypothesis Test for the Mean Sign test for median difference for a paired sample Wilcoxon signed - rank test for one or

More information

Math 475. Jimin Ding. August 29, Department of Mathematics Washington University in St. Louis jmding/math475/index.

Math 475. Jimin Ding. August 29, Department of Mathematics Washington University in St. Louis   jmding/math475/index. istical A istic istics : istical Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html August 29, 2013 istical August 29, 2013 1 / 18 istical A istic

More information

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats Materials Needed: Bags of popcorn, watch with second hand or microwave with digital timer. Instructions: Follow the instructions on the

More information

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015 Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.

More information

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

A comparison study of the nonparametric tests based on the empirical distributions

A comparison study of the nonparametric tests based on the empirical distributions 통계연구 (2015), 제 20 권제 3 호, 1-12 A comparison study of the nonparametric tests based on the empirical distributions Hyo-Il Park 1) Abstract In this study, we propose a nonparametric test based on the empirical

More information