Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Similar documents
Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Bootstrapping, Permutations, and Monte Carlo Testing

Introduction to Nonparametric Statistics

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Lecture 30. DATA 8 Summer Regression Inference

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Introduction to Statistical Data Analysis III

Lecture 7: Hypothesis Testing and ANOVA

Performance Evaluation and Comparison

Resampling and the Bootstrap

STAT Section 2.1: Basic Inference. Basic Definitions

Introductory Econometrics

Analysis of Variance. Read Chapter 14 and Sections to review one-way ANOVA.

Introduction to hypothesis testing

Data analysis and Geostatistics - lecture VII

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Resampling and the Bootstrap

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

Chapter 7 Comparison of two independent samples

Resampling Methods. Lukas Meier

Solutions to Practice Test 2 Math 4753 Summer 2005

Empirical Power of Four Statistical Tests in One Way Layout

6 Single Sample Methods for a Location Parameter

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

Math Review Sheet, Fall 2008

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

The Nonparametric Bootstrap

Soc3811 Second Midterm Exam

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

Permutation tests. Patrick Breheny. September 25. Conditioning Nonparametric null hypotheses Permutation testing

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

STAT440/840: Statistical Computing

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Chapter 1 Statistical Inference

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Gov 2002: 3. Randomization Inference

Chapter 12 - Lecture 2 Inferences about regression coefficient

SPRING 2007 EXAM C SOLUTIONS

First we look at some terms to be used in this section.

Contrasts and Multiple Comparisons Supplement for Pages

Do students sleep the recommended 8 hours a night on average?

10.0 REPLICATED FULL FACTORIAL DESIGN

Lecture 8 Hypothesis Testing

Bootstrapping, Randomization, 2B-PLS

Module 9: Nonparametric Statistics Statistics (OA3102)

Conditioning Nonparametric null hypotheses Permutation testing. Permutation tests. Patrick Breheny. October 5. STA 621: Nonparametric Statistics

IEOR E4703: Monte-Carlo Simulation

the logic of parametric tests

Chi-squared (χ 2 ) (1.10.5) and F-tests (9.5.2) for the variance of a normal distribution ( )

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

1) Answer the following questions with one or two short sentences.

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

Hypothesis Testing. File: /General/MLAB-Text/Papers/hyptest.tex

Robustness and Distribution Assumptions

Bootstrap. ADA1 November 27, / 38

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

Testing Research and Statistical Hypotheses

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

This does not cover everything on the final. Look at the posted practice problems for other topics.

Soc 3811 Basic Social Statistics Second Midterm Exam Spring Your Name [50 points]: ID #: ANSWERS

Chapter 2: Resampling Maarten Jansen

Introduction to Statistics with GraphPad Prism 7

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Comparison of Two Population Means

EXAM # 2. Total 100. Please show all work! Problem Points Grade. STAT 301, Spring 2013 Name

Hypothesis Testing: One Sample

The t-statistic. Student s t Test

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Statistical Data Analysis Stat 3: p-values, parameter estimation

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

Sample Size Determination

Lecture 26. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49


W&M CSCI 688: Design of Experiments Homework 2. Megan Rose Bryant

Unit 14: Nonparametric Statistical Methods

Table 1: Fish Biomass data set on 26 streams

One-Sample Numerical Data

Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk

Ch. 7. One sample hypothesis tests for µ and σ

8: Hypothesis Testing

THE ROYAL STATISTICAL SOCIETY 2015 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 3

Business Statistics MEDIAN: NON- PARAMETRIC TESTS

Introduction to Statistics

Practice Problems Section Problems

Transcription:

Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of G y = y 1, y 2,, y m And we wish to test the null hypothesis of no difference between F and G, H 0 : F = G

The Two-Sample Problem An hypothesis test, of which a permutation test is an example, is a formal way of deciding whether or not the data decisively reject H 0 An hypothesis test begins with a test statistic θ. For example, the mean difference - θ = z y We will assume here that if H 0 is not true, we expect to observe large values of θ than if H 0 is true the larger the value of θ we observe, the stronger the evidence against H 0 The achieved significance level is defined as: ASL = Prob H0 θ θ Where the random variable θ has the distribution of θ if H 0 is true The hypothesis test of H 0 consist of computing ASL, and seeing if it is smaller than some predetermined threshold (.05, for example)

The Two-Sample Problem A traditional hypothesis test for the problem: F = N μ T, σ 2, G = N μ C, σ 2 Then the null hypothesis is H 0 : μ T = μ C And we know that under H 0 θ = z y ~N 0, σ 2 1 n + 1 m So we can compute ASL: ASL = 1 Φ σ θ 1 n + 1 m

The Two-Sample Problem How do we calculate the ASL when the null hypothesis H 0 doesn t specifies a single null distribution? In most problems the null hypothesis F = G leaves us with a family of possible null hypothesis distributions, rather than just one. For instance, when θ~n 0, σ 2 1\n + 1\m, the null hypothesis family includes all normal distributions with expectation 0 (since we don t know the true value of σ). In the normal situation, we could approximate the ASL using Student s t distribution. But this only works in the normal case.

Permutation Tests A permutation test (also called a randomization test, or an exact test) is a type of statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points. The idea was introduced by R.A. Fisher in the 1930 s, more as a theoretical argument supporting Student s t- test than as a useful statistical method in its own right. The basic idea is attractively simple and free of mathematical assumptions.

Fisher s Permutation Test We define N = n + m, and let v = v 1, v 2,, v N be the combined and ordered vector of the observed values z, y. Also let g = g 1, g 2,, g N be the vector that indicates the observations original labels. There are N n = N! n!m! possible g vectors. Permutation Lemma: Under H 0 : F = G, the vector g has probability 1/ N n of equaling any one of its possible values Let g indicate any one of the θ = θ g N n The permutation ASL is defined as: possible vectors of g, and ASL perm = Prob perm θ θ = # θ θ / N n

Approximating ASL perm 1. Choose B independent vectors g 1, g 2,, g (B), each being randomly selected without replacement from the set N of all possible such vectors n 2. Evaluate the permutation replications of θ corresponding to each permutation vector, θ b, b = 1,2,, B 3. Approximate ASL perm by ASL perm = # θ b θ /B How should we choose B?

The Mouse Data Sixteen mice were randomly assigned to a treatment group or a control group. Shown are their survival times, in days, following a test surgery. Group Treatment: Control: 94 38 23 52 10 40 Data 197 99 104 51 27 16 141 146 30 46 Mean 86.86 56.22 Estimated Standard Error 25.24 Here, θ = z y = 30.63, and the pooled variance estimator is σ = n z i z 2 m i=1 + y j y 2 j=1 n+m 2 1/2 = 54.21 So the ASL according to Student s t-test is: ASL = Prob t 14 > 30.63 54.21 1 9 + 1 7 =.141 14.14

The Mouse Data Based on B = 1000 permutations, the estimated ASL perm is ASL perm = 131 1000 =.131 Although there are no normality assumptions underlining ASL perm, the result is close to the t-test s. Fisher s main point in introducing permutation tests was to support the use of Student s test in non-normal applications.

Number of Replications Let A = ASL perm and A = ASL perm, then B A is the number of time that the θ b s values exceeded the observed θ, and so has a binomial distribution: B A~Bin B, A A 1 A E A = A; Var A = B The coefficient of variation of A is CV B A = (1 A)/A Controlling CV B A means controlling the affect of the Monte Carlo error on our estimation of ASL perm B 1/2

Number of Replications (1 A)/A 1/2 as a function of A: A:.5.25.1.05.025 (1 A)/A 1/2 : 1 1.73 3.00 4.36 6.24 Number of permutations required to make CV ASL function of the true ASL perm : = k, as a ASL perm :.5.25.1.05.025 B for k =.1: 100 299 900 1901 3894 B for k =.2: 25 75 225 476 974

Permutation Test s Accuracy In general, if H 0 : F = G is true, then Prob H0 ASL perm < α = α For any value of α between 0 and 1, except for small discrepancies caused by discreteness of the permutation distribution. This applies to any statistic θ (median, trimmed means ) Accuracy means that ASL perm won t tend to be misleadingly small when H 0 is true (maintaining type I error). But what about the power? Choosing a poor test statistic θ will result in small probability of rejecting H 0 when it is false low power

Other Test Statistics We can look at the ratio of estimated variances: θ = log σ 2 z /σ 2 y, where, in such case we have no a priori reason to for believing θ will be greater than one rather than less than one. In this situation, it is common to compute two-sided ASL: ASL perm two sided = # θ b > θ /B For example, in the mouse data, after 1000 permutation replications of θ = log σ z 2 /σ y 2 (log transformation has no effect on the permutation results), we get ASL perm two sided =.305

Other Test Statistics We run 4 different permutation tests, according to 4 different test statistics θ 1 = z y; θ 2 = z.15 y.15 ; θ 3 = z.25 y.25 ; θ 4 = z.5 y.5, each using same 1000 replications. Then we rank the evidence against H 0 according to φ = min ASL k, k = 1,2,3,4 k Small values of φ more strongly contradict H 0, but it isn t true that the ASL perm based on φ equals to min ASL k k Instead, for each k and b = 1,, B we calculate A k b = 1 B B i=1 I θk i θk b Then let φ b = min A k b k Which are genuine permutation replications of φ (not obvious, but true..), and therefor ASL perm = # φ b φ /B

Other Test Statistics In the mouse data, we get φ = min k ASL k = ASL 1 =.131 And 170 of the 1000 values of φ b are less than.131, giving ASL perm =.17

Relationship to the Bootstrap Permutation methods tend to apply to only a narrow range of problems. However when they apply, as in testing F = G in a two-sample problem, they give gratifyingly exact answers without parametric assumptions. The bootstrap distribution was originally called the combination distribution. It was designed to extend the virtues of permutation testing to the great majority of statistical problems where there is nothing to permute.

Relationship of Hypothesis Tests to CI and the Bootstrap We can use confidence intervals to calculate ASLs. For H 0 : F = G; θ = z y If we choose α so that θ lo, the lower end of the 1 2α CI for θ, exactly equals 0. Then Prob θ=0 θ θ = α In other words, values of θ that are smaller than θ lo are implausible according to our data. So instead we can ask ourselves what value of α will make the lower end of the bootstrap confidence interval equal to zero? For the percentile method, the answer is α 0 = # θ b < 0 B = ASL % * Note that these are θ b based on the bootstrap non-parametric sampling procedure for constructing CI s and not for estimating F 0

Relationship of Hypothesis Tests to CI and the Bootstrap The permutation ASL is exact, while the bootstrap ASL is approximated. The bootstrap histogram is centered near θ, while the permutation histograms are centered near 0. The bootstrap ASL tests the null hypothesis θ = 0, while the permutation ASL test F = G. The first is more realistic/practical. Still, permutation test usually perform reasonably well even when F = G is far from reasonable. The combination of a point estimate and a confidence interval is usually more informative than just a hypothesis test by itself.