STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Similar documents
Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Topic 15: Simple Hypotheses

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Math 494: Mathematical Statistics

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Review. December 4 th, Review

Institute of Actuaries of India

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Hypothesis Testing: The Generalized Likelihood Ratio Test

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

A Very Brief Summary of Statistical Inference, and Examples

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

ST495: Survival Analysis: Hypothesis testing and confidence intervals

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Hypothesis Testing Chap 10p460

TUTORIAL 8 SOLUTIONS #

STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples.

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Hypothesis testing: theory and methods

Lecture 21. Hypothesis Testing II


Central Limit Theorem ( 5.3)

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Solution: First note that the power function of the test is given as follows,

Practice Problems Section Problems

Mathematical statistics

Comparing Means from Two-Sample

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Better Bootstrap Confidence Intervals

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Topic 19 Extensions on the Likelihood Ratio

Introductory Econometrics

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Hypothesis testing (cont d)

STAT 830 Non-parametric Inference Basics

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

Lecture 12 November 3

Topic 10: Hypothesis Testing

Math 494: Mathematical Statistics

Composite Hypotheses and Generalized Likelihood Ratio Tests

simple if it completely specifies the density of x

Mathematical Statistics

Performance Evaluation and Comparison

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Probability and Statistics Notes

Lecture 8: Information Theory and Statistics

Summary of Chapters 7-9

Mathematical statistics

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Lecture 7 Introduction to Statistical Decision Theory

STAT 830 Hypothesis Testing

Statistical Inference

Chapter 7. Hypothesis Testing

Topic 10: Hypothesis Testing

Inference for Single Proportions and Means T.Scofield

STAT 461/561- Assignments, Year 2015

Basic Concepts of Inference

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

STAT 830 Hypothesis Testing

Introduction to Statistics

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Master s Written Examination

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Lecture notes on statistical decision theory Econ 2110, fall 2013

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test.

Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing

Ch. 5 Hypothesis Testing

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

Mathematical statistics

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

Chapters 10. Hypothesis Testing

Samples and Populations Confidence Intervals Hypotheses One-sided vs. two-sided Statistical Significance Error Types. Statistiek I.

Frequentist Statistics and Hypothesis Testing Spring

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Primer on statistics:

One-Sample Numerical Data

Topic 17: Simple Hypotheses

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Math 152. Rumbos Fall Solutions to Assignment #12

Testing Statistical Hypotheses

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Chapter 5. Bayesian Statistics

Statistics. Statistics

Transcription:

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015

The Bootstrap

Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,..., x M, but that this population is so large that we cannot possibly observe every x i. We have seen that we can take a sample X 1,..., X n and generate an estimate, ˆθ (e.g. MLE, MOM). What if we want to know properties of this estimator, but we don t know any of the theory?

Bootstrap Suppose that we can feasibly take another N samples from our population. Then from each sample we can generate another estimator: X (1) 1,..., X n (1) ˆθ (1) sample x 1,..., x M. X (N) 1,..., X n (N) ˆθ (N) we can estimate properties of ˆθ using the estimates ˆθ (1),..., ˆθ (N). For example: E(ˆθ) = 1 N ˆθ (i) N In most cases, however, it is not feasible to take new samples from our population. How else can we generate samples to estimate properties of ˆθ? i=1

Bootstrap Suppose that we know the distribution of the population (e.g. F θ = Exponential(4)). Then we can simulate N IID samples from F θ and calculate the estimators from each sample X (1) 1,..., X n (1) ˆθ (1) simulate F θ. X (N) 1,..., X n (N) ˆθ (N) we can estimate properties of ˆθ using the estimates ˆθ (1),..., ˆθ (N). For example: Var(ˆθ) = 1 N N ˆθ(i) 1 N i=1 N j=1 2 ˆθ (j) In most cases, however, we don t know what F θ is.

Bootstrap Bootstrapping is a method which uses random sampling techniques to estimate properties (such as bias, variance, confidence intervals, etc) of an estimator, ˆθ, when we don t know the true distribution, F θ, of our data (and we cannot feasibly draw new samples from our population).

Bootstrap We will focus on two approaches to generating bootstrap samples: Parametric bootstrap: Estimate ˆθ from our original sample, X 1,..., X n and generate samples X 1,..., X n from Fˆθ, which approximates F θ. Non-parametric bootstrap: Take samples X 1,..., X n with replacement from our original sample X 1,..., X n. Once we have B bootstrap samples, we can generate B estimates of ˆθ: X (1) 1,..., Xn (1) (1) ˆθ X (B) 1,..., X (B) n. ˆθ (B) We can use these bootstrapped estimators, ˆθ (1),..., ˆθ (B), to estimate properties of ˆθ

Parametric Bootstrap

Parametric Bootstrap Recall that we are using bootstrapping because we don t know the true parameter θ, but we want to identify the distribution of our estimator ˆθ (e.g. the MLE). Suppose that we have observed a sample X 1,..., X n, X i F θ. Suppose further that we know F, but θ is unknown (For example, we know that our sample is N(θ, 2) distributed, but we don t know θ). We can calculate ˆθ = T (X 1,..., X n ) for our sample. To identify the distribution of our estimator, ˆθ, we want to obtain more observations of ˆθ, but we only have one sample! How can we get more (approximated) values of ˆθ?

Parametric Bootstrap Why not approximate the distribution, F θ, using our estimate, ˆθ? We can then draw samples from the approximated distribution Fˆθ: X (1) 1,..., Xn (1) ˆθ (1) Fˆθ simulate X (N) 1,..., Xn (N). ˆθ (N) We can now estimate properties of ˆθ using these bootstrapped estimates, for example: Var boot (ˆθ) = 1 B ˆθ i 1 B ˆθ j B B i=1 j=1

Non-parametric Bootstrap

Non-parametric Bootstrap Recall for the parametric bootstrap, we try to approximate the distribution unknown F θ using our estimate ˆθ as the parameter. But what if we don t even know the form F?

Non-parametric Bootstrap For non-parametric bootstrap, we don t assume any distributional form F. In fact, we draw our bootstrap samples from our original sample X 1,..., X n (taken from the population x 1,..., x M ): X (1) 1,..., Xn (1) ˆθ (1) X 1,..., X n sample X (B) 1,..., Xn (B). ˆθ (B)

Bootstrap Confidence Intervals

Bootstrap Confidence Intervals Recall that the goal of the bootstrapping was to learn about our estimator ˆθ (e.g. to estimate variance of ˆθ by calculating the variance of the bootstrapped estimators ˆθ 1,..., ˆθ B ) We can also use our bootstrapped estimators, ˆθ 1,..., ˆθ B, to calculate approximate confidence intervals for θ.

Bootstrap Confidence Intervals Three types of confidence intervals: Normal interval: ˆθ ± q (1 α/2 ) Var boot (ˆθ) where q is the 1 α/2 quantile of the standard normal distribution. Percentile interval: Pivotal interval: (ˆθ (α/2), ˆθ (1 α/2) ) (2ˆθ ˆθ (1 α/2), 2ˆθ ˆθ (α/2) )

Exercise

Exercise: Bootstrapping Suppose that a sample consists of the following observations: 78, 86, 97, 91, 83, 89, 92, 88, 79, 68 Then the α-trimmed mean is the average of the inner (1 2α) values in the sample. For example, if α = 0.2, then the trimmed mean is the average of the inner 60% of the observations 68, 78, 79, 83, 86, 88, 89, 91, 92, 97 Why might we prefer to use the trimmed mean over the usual sample mean in this case?

Exercise: Bootstrapping We can compute the α = 0.2 trimmed mean in R for the above sample using the command mean(x, trim = 0.2). Suppose that we don t know a general formula for the standard error of the trimmed mean. Our goal is to estimate it using bootstrapping. 1. Suppose that we are fairly confident that the data comes from a Normal(µ, 8 2 ) distribution (but we don t know µ). Use parametric bootstrap to estimate the standard error of the trimmed mean. 2. Suppose we have absolutely no idea what the underlying distribution of the data is. Use non-parametric bootstrap to estimate the standard error of the trimmed mean. 3. Calculate 95% pivot confidence intervals for each of these estimates.

Hypothesis Testing

Hypothesis Testing Hypothesis testing is a method of using statistical inference for testing a hypothesis. For example, suppose that the DMV claims that the average waiting time is 20 minutes. We might want to test whether the average waiting time at the DMV is in fact more than 20 minutes. To test this hypothesis, we examine the DMV waiting times for a random sample of people and determine whether or not we have enough evidence to show that the average waiting time for the population is more than 20 minutes. In general, we are interested in testing the null hypothesis against the alternative hypothesis H 0 : µ = 20 H 1 : µ > 20

Hypothesis Testing The terminology we use when conducting a hypothesis test is that we either: Have enough evidence (based on our statistic, T (X 1,..., X n )) to reject the null hypothesis H 0 in favor of the alternative hypothesis H 1, or We do not have enough evidence to reject the null hypothesis H 0, and so our data is consistent with the null hypothesis. This doesn t mean that the alternative hypothesis, H 1, is false, just that we don t have enough evidence to show that it is true!

Hypothesis Testing Suppose we are testing a hypothesis about the mean, µ, of a distribution, for example: H 0 : N(µ = 5, 4) against the composite alternative hypothesis that H 1 : N(µ > 5, 4) Given that we have observed a sample X 1,..., X 20, what statistic, T (X 1,..., X 20 ), could we consider in order to determine whether we have enough evidence to suggest that µ > 5 rather than µ = 5? T (X 1,..., X n ) = observed/estimated value null value SD of estimator = X n µ 0 SD( X n )

Hypothesis Testing Recall that we are testing the null hypothesis that the true distribution is H 0 : N(µ = 5, 4) versus the alternative hypothesis that the true distribution is H 1 : N(µ > 5, 4) Suppose that from our sample, X 1,..., X 20, we observed X n = 5.4 and SD( X n ) = SD(X )/ n = 0.45 T (X 1,..., X n ) = X n µ 0 SD( X n ) = 5.4 5 0.45 = 0.89 This is not a big number our data seems fairly consistent with the null hypothesis (our observed sample mean is within 1 SD of µ 0 = 5).

Hypothesis Testing Recall that we are testing the null hypothesis that the true distribution is H 0 : N(µ = 5, 4) versus the alternative hypothesis that the true distribution is H 1 : N(µ > 5, 4) If, on the other hand, we observed X n = 7 and SD( X n ) = 0.45 T (X 1,..., X n ) = X n µ 0 SD( X n ) = 7 5 0.45 = 4.44 which is much larger. In this case, we have enough evidence to reject H 0, (that the true mean is 5) in favor of the alternative, H 1 (that the true mean is > 5).

Hypothesis Testing How do we determine what values of T (X 1,..., X n ) are big enough such that we can reasonably conclude that our null hypothesis is false? 1. First, we identify the distribution of the test statistic T (X 1,..., X n ) assuming the null hypothesis is true. 2. We then calculate the probability of observing a test statistic that is as or more extreme (in terms of the alternate hypothesis) than T (X 1,..., X n ). This probability is called the p-value.

Hypothesis Testing In our example, we had T (X 1,..., X n ) = X n µ 0 SD( X n ) which, if we assume the null hypothesis is true (i.e. X i N(µ 0, 4), where µ 0 = 5), then T (X 1,..., X n ) N(0, 1) Next, we want to calculate the probability of observing a test statistic that is as or more extreme than T (X 1,..., X n ). Since our alternate hypothesis is H 1 : µ > 5, a test statistic that is as or more extreme corresponds to a value greater than our observed T (X 1,..., X n ). If our alternate hypothesis was H 1 : µ < 5, a test statistic that is as or more extreme corresponds to a value less than T (X 1,..., X n ).

Hypothesis Testing In our example, we had X n = 7 and SD( X n ) = SD(X )/ n = 2/ 20 = 0.45, so that out test statistic is T (X 1,..., X n ) = X n µ 0 SD( X n ) = 4.44 We know that if the H 0 is true then T (X 1,..., X n ) N(0, 1), so our p-value for the hypothesis test with H 1 : µ > 5 is given by P(Z > T (X 1,..., X n )) = P(Z > 4.44) = 0.0000045 This tells us that, assuming H 0 is true, the probability of seeing a test statistic as extreme or more extreme than what we have observed is so extremely small, that we can reject H 0 = 5 in favor of H 1 < 5

Exercise

Exercise: Hypothesis Testing Suppose that you had read an article which claimed that the average weight of red pandas was 6.3kg. However, suppose that you had recently weighed n = 121 red pandas and found that these pandas had an average weight of 5.1kg. Assume that it is well-known that the standard deviation of the weights of red pandas was 1.4. Do you have enough evidence to claim with 95% probability that the true average weight of red pandas is in fact less than 6.3kg?

Hypothesis Testing Recall that we are trying to make a decision on whether to reject H 0 in favor of H 1 based on a statistic T (X 1,..., X n ). Rejection region: the range of values of T (X 1,..., X n ) for which we reject the null hypothesis, H 0. Acceptance region: the range of values of T (X 1,..., X n ) for which we do not reject the null hypothesis, H 0.

Hypothesis Testing There are two primary types of error associated with hypothesis testing 1. Type I error: the error arising from rejecting H 0 when it is actually true A type I error occurs with probability α The significance level of the test is equal to α 2. Type II error: the error arising from accepting H 0 when it is actually false A type II error occurs with probability β The power of a test is equal to 1 β (the probability of rejecting H 0 when it is actually false) The power can be thought of as the ability of the test to detect an effect if the effect actually exists

Hypothesis Testing: The Neyman-Pearson Approach

Hypothesis Testing We will now examine hypothesis testing from the Neyman-Pearson (NP) approach: Suppose that we observe a sample of DMV waiting times, X 1,..., X n, and we are interested identifying the distribution, F θ, of waiting times. Suppose that we know that F θ = Poisson(θ), but we don t know θ. Then based on our observed data X 1,..., X n, we might want to test the null hypothesis that H 0 : the true dist of waiting times is Poisson(θ 0 = 20) against the alternative hypothesis that H 1 : the true dist of waiting times is Poisson(θ 1 = 30)

Hypothesis Testing Note that the hypotheses and H 0 : Poisson(θ 0 = 20) H 1 : Poisson(θ 1 = 30) are both examples of simple hypothesis (each hypothesis assumes only one value for the parameter). All hypotheses in the Neyman-Pearson framework are of this type.

Hypothesis Testing The Neyman-Pearson approach to hypothesis testing: Fix α (the probability of a Type I error) at some small pre-defined value Us the likelihood ratio to find the corresponding test with the highest power (we want to maximize the probability of rejecting the H 0 when it is false)

Hypothesis Testing Suppose we are testing the simple null hypothesis against the simple alternative H 0 : θ = θ 0 H 1 : θ = θ 1 The likelihood ratio is the ratio of the likelihood function, f 0 (X ), of the data assuming that θ = θ 0 (H 0 is true) to the likelihood function, f 1 (X ), of the data assuming that θ = θ 1 (H 1 is true): f 0 (X 1,..., X n ) f 1 (X 1,..., X n )

Hypothesis Testing Assuming H 0 and H 1 are both simple, the Neyman-Pearson lemma, tells us that: If the likelihood ratio test (LRT) that rejects H 0 when f 0 (X ) f 1 (X ) < c(α) has significance level α (probability of incorrectly rejecting H 0 when it is true), then any other test with significance level α α has power less than or equal to that of the LRT.

Exercise

Exercise: Neyman-Pearson hypothesis testing Suppose we have a sample X 1,..., X n such that X i Normal(µ, 16) where µ is unknown. We want to test the null hypothesis against the alternative hypothesis H 0 : µ = 10 H 1 : µ = 15 Find the best test with sample size n = 16 and significance level α = 0.05. What is the power of this test?