Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Similar documents
Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

Lecture 30. DATA 8 Summer Regression Inference

Curve Fitting Re-visited, Bishop1.2.5

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Political Science 236 Hypothesis Testing: Review and Bootstrapping

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

The Nonparametric Bootstrap

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

11. Bootstrap Methods

Kernel-based density. Nuno Vasconcelos ECE Department, UCSD

A better way to bootstrap pairs

Testing Statistical Hypotheses

STAT Section 2.1: Basic Inference. Basic Definitions

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Confidence intervals for kernel density estimation

A Practitioner s Guide to Cluster-Robust Inference

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

One-Sample Numerical Data

Bootstrap & Confidence/Prediction intervals

A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE

Bootstrap, Jackknife and other resampling methods

Testing Statistical Hypotheses

Probability and Statistics

Defect Detection using Nonparametric Regression

(1) Introduction to Bayesian statistics

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Density Estimation. Seungjin Choi

Nonparametric Statistics

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Lecture : Probabilistic Machine Learning

Subject CS1 Actuarial Statistics 1 Core Principles

Particle Filters. Outline

STATISTICS 4, S4 (4769) A2

Nonparametric tests, Bootstrapping

Nonparametric Methods

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Post-exam 2 practice questions 18.05, Spring 2014

Big Data Analysis with Apache Spark UC#BERKELEY

Chapter 9. Non-Parametric Density Function Estimation

Time Series and Forecasting Lecture 4 NonLinear Time Series

Introduction to Nonparametric Statistics

Chapter 9. Non-Parametric Density Function Estimation

Statistics: Learning models from data

Probability and Statistics

GEOS 36501/EVOL January 2012 Page 1 of 23

Chapter 2: Resampling Maarten Jansen

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

STAT440/840: Statistical Computing

AUTOMOTIVE ENVIRONMENT SENSORS

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

interval forecasting

Statistical Inference: Uses, Abuses, and Misconceptions

Bootstrap, Jackknife and other resampling methods

Chapter 7: Simple linear regression

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Introduction to Statistical Analysis

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Experimental Design and Data Analysis for Biologists

3 Joint Distributions 71

Distribution Fitting (Censored Data)

Answers and expectations

Bootstrap Confidence Intervals

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Nonparametric Bayesian Methods (Gaussian Processes)

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Nonparametric Estimation of Distributions in a Large-p, Small-n Setting

More on Estimation. Maximum Likelihood Estimation.

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Performance Evaluation and Comparison

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

Introduction. Chapter 1

Day 3B Nonparametrics and Bootstrap

Statistics for Python

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Chapter 3 Statistics for Describing, Exploring, and Comparing Data. Section 3-1: Overview. 3-2 Measures of Center. Definition. Key Concept.

Statistical Inference. Part IV. Statistical Inference

Introduction to Machine Learning

Minimum Message Length Analysis of the Behrens Fisher Problem

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Statistical Inference with Regression Analysis

Time Series Models for Measuring Market Risk

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Lecture 5: Sampling Methods

Foundations of Nonparametric Bayesian Methods

Bagging During Markov Chain Monte Carlo for Smoother Predictions

L11 Hypothesis testing

Bootstrap metody II Kernelové Odhady Hustot

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

1 Mixed effect models and longitudinal data analysis

Inverse Sampling for McNemar s Test

Statistical Inference

Transcription:

Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic - t(x) (not necessary an estimate of a parameter). We seek an achieved significance level ASL = Prob H0 t x t(x) Where the random variable x has a distribution specified by the null hypothesis H 0 - denote as F 0. Bootstrap hypothesis testing uses a plug-in style to estimate F 0.

The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of G y = y 1, y 2,, y m And we wish to test the null hypothesis of no difference between F and G, H 0 : F = G

Bootstrap Hypothesis Testing F = G Denote the combined sample by x, and its empirical distribution by F 0. Under H 0, F 0 provides a non parametric estimate for the common population that gave rise to both z and y. 1. Draw B samples of size n + m with replacement from x. Call the first n observations z and the remaining m y 2. Evaluate t( ) on each sample - t(x b ) 3. Approximate ASL boot by ASL boot = # t x b t x /B * In the case that large values of t(x b ) are evidence against H 0

Bootstrap Hypothesis Testing F = G on the Mouse Data A histogram of bootstrap replications of t x = z y for testing H 0 : F = G on the mouse data. The proportion of values greater than 30.63 is.121. Calculating z y t x = σ 1 n + 1 m (approximate pivotal) for the same replications produced ASL boot =.128

Testing Equality of Means Instead of testing H 0 : F = G, we wish to test H 0 : μ z = μ y, without assuming equal variances. We need estimates of F and G that use only the assumption of common mean 1. Define points z i = z i z + x, i = 1,, n, and y i = y i y + x, i = 1,, m. The empirical distributions of z and y shares a common mean. 2. Draw B bootstrap samples with replacement z, y from z1, z 2,, z n and y 1, y 2,, y m respectivly 3. Evaluate t( ) on each sample - t x b = z y σ z 1 n + σ y 1 m 4. Approximate ASL boot by ASL boot = # t x b t x /B

Permutation Test VS Bootstrap Hypothesis Testing Accuracy: In the two-sample problem, ASL perm is the exact probability of obtaining a test statistic as extreme as the one observed. In contrast, the bootstrap explicitly samples from estimated probability mechanism. ASL boot has no interpretation as an exact probability. Flexibility: When special symmetry isn t required, the bootstrap testing can be applied much more generally than the permutation test. (Like in the two sample problem permutation test is limited to H 0 : F = G, or in the one-sample problem)

The One-Sample Problem We observe a random sample: F z = z 1, z 2,, z n And we wish to test whether the mean of the population equals to some predetermine value μ 0 H 0 : μ z = μ 0

Bootstrap Hypothesis Testing μ z = μ 0 What is the appropriate way to estimate the null distribution? The empirical distribution F is not an appropriate estimation, because it does not obey H 0. As before, we can use the empirical distribution of the points: zi = z i z + μ 0, i = 1,, n Which has a mean of μ 0.

Bootstrap Hypothesis Testing μ z = μ 0 The test will be based on the approximate distribution of the test statistic t z = z μ 0 σ/ n We sample B times z1,, z n with replacement from z1,, z n, and for each sample compute t z = z μ 0 σ/ n And the estimated ASL is given by ASL boot = # t z b t z /B * In the case that large values of t z b are evidence against H 0

Testing μ z = μ 0 on the Mouse Data Taking μ 0 = 129, the observed value of the test statistic is 86.9 129 t z = 66.8/ 7 = 1.67 (When estimating σ with the unbiased estimator for standard deviation). For 94 of 1000 bootstrap samples, t z was smaller than -1.67, and therefor ASL boot =.094 For reference, the student s t-test result for the same null hypothesis on that data gives us ASL = Prob t 6 < 42.1 66.8/ 7 = 0.07

Testing Multimodality of a Population A mode is defined to be a local maximum or bump of the population density The data: x 1,, x 485 Mexican stamps thickness from 1872. The number of modes is suggestive of the number of distinct type of paper used in the printing.

Testing Multimodality of a Population Since the histogram is not smooth, it is difficult to tell from it whether there are more than one mode. A Gaussian kernel density with window size h estimate can be used in order to obtain a smoother estimate: f t; h = 1 nh n φ t x i h i=1 As h increases, the number of modes in the density estimate is non-increasing

Testing Multimodality of a Population The null hypothesis: H 0 : number of modes = 1 Versus number of modes > 1. Since the number of modes decreases as h increases, there is a smallest value of h such that f t; h has one mode. Call it h 1. In our case, h 1.0068.

Testing Multimodality of a Population It seems reasonable to use f(t; h 1 ) as the estimated null distribution for our test of H 0. It is the density estimate that uses least amount of smoothing among all estimated with one mode (conservative). A small adjustment to f is needed because the formula artificially increases the variance of the estimate with h 1 2. Let g ; h 1 be the rescale estimate, that imposes variance equal to the sample variance. A natural choice for a test statistic is h 1 - a large value of h 1 is evidence against H 0. Putting all of this together, the achieved significance level is ASL boot = Prob g ;h1 h 1 > h 1 Where each bootstrap sample x is drawn from g ; h 1

Testing Multimodality of a Population The sampling from g ; h 1 is given by: x i = x + 1 + h 1 2 σ 2 12 y i x + h 1 ε i ; i = 1,, n Where y 1,, y n are sampled with replacement from x 1,, x n, and ε i are standard normal random variables. (called smoothed bootstrap) In the stamps data, out of 500 bootstrap samples, none had h 1 >.0068, so ASL boot = 0. The results can be interpreted in sequential manner, moving on to higher values of the least amount of modes. (Silverman 1981) When testing the same for H 0 : number of modes = 2, 146 samples out of 500 had h 2 >.0033, which translates to ASL boot = 0.292. In our case, the inference process will end here.

Summary A bootstrap hypothesis test is carried out using the followings: a) A test statistic t(x) b) An approximate null distribution F 0 for the data under H 0 Given these, we generate B bootstrap values of t(x ) under F 0 and estimate the achieved significance level by ASL boot = # t x b t(x) /B The choice of test statistic t x and the estimate of the null distribution will determine the power of the test. In the stamp example, if the actual population density is bimodal, but the Gaussian kernel density does not approximate it accurately, then the suggested test will not have high power. Bootstrap tests are useful when the alternative hypothesis is not well specified. In cases where there is parametric alternative hypothesis, likelihood or Bayesian methods might be preferable.