Political Science 236 Hypothesis Testing: Review and Bootstrapping

Similar documents
LECTURE 5 HYPOTHESIS TESTING

Statistical Inference

INTERVAL ESTIMATION AND HYPOTHESES TESTING

How do we compare the relative performance among competing models?

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing


Statistical Inference

Introductory Econometrics

Ch. 5 Hypothesis Testing

Topic 10: Hypothesis Testing

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Visual interpretation with normal approximation

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Tests about a population mean

Probability and Statistics

Review. December 4 th, Review

CH.9 Tests of Hypotheses for a Single Sample

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

HYPOTHESIS TESTING. Hypothesis Testing

Statistical Tests. Matthieu de Lapparent

Performance Evaluation and Comparison

Resampling and the Bootstrap

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Topic 10: Hypothesis Testing

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Statistical Inference. Hypothesis Testing

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

STAT 461/561- Assignments, Year 2015

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

Non-parametric Inference and Resampling

hypothesis a claim about the value of some parameter (like p)

Math 494: Mathematical Statistics

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Math Review Sheet, Fall 2008

Stat 5101 Lecture Notes

Interpreting Regression Results

Probability and Statistics

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Hypothesis Testing. File: /General/MLAB-Text/Papers/hyptest.tex

18.05 Practice Final Exam

Sampling Distributions: Central Limit Theorem

Stat 710: Mathematical Statistics Lecture 31

POLI 443 Applied Political Research

Introduction to Statistical Data Analysis III

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

A3. Statistical Inference

Fundamental Probability and Statistics

One-Sample Numerical Data

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

AP Statistics Ch 12 Inference for Proportions

14.30 Introduction to Statistical Methods in Economics Spring 2009

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

LECTURE 5. Introduction to Econometrics. Hypothesis testing

TUTORIAL 8 SOLUTIONS #

Asymptotic Statistics-VI. Changliang Zou

ST5215: Advanced Statistical Theory

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Single Sample Means. SOCY601 Alan Neustadtl

Chapter 7 Comparison of two independent samples

Lecture 30. DATA 8 Summer Regression Inference

Introduction to Nonparametric Statistics

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Bootstrap Testing in Econometrics

The Linear Regression Model

Practice Problems Section Problems

Econ 325: Introduction to Empirical Economics

Multiple Regression Analysis

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

Summary of Chapters 7-9

Bias Variance Trade-off

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Parameter Estimation and Fitting to Data

Chapter 7: Hypothesis testing

MTMS Mathematical Statistics

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Introduction to hypothesis testing

Lecture 7: Hypothesis Testing and ANOVA

The Purpose of Hypothesis Testing

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Transcription:

Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The goal of hypothesis testing is to decide, using a sample from the population, which of two complimentary hypotheses is true. In general, the two complimentary hypotheses are called the null hypothesis and the alternative hypothesis. If we let θ be a population parameter and Θ be the parameter space, we can define these complementary hypotheses as follows: Definition 1.2 Let Θ 0 and Θ 1 Θ c 0 alternative hypothesis are defined as follows be a partition of the parameter space Θ. Then the null and 1. Null Hypothesis: H 0 : θ Θ 0 2. Alternative Hypothesis: H 1 : θ Θ 1 Definition 1.3 Testing Procedure. A testing procedure is a rule, based on the outcome of a random sample from the population under study, used to decide whether to reject H 0. 1

The subset of the sample space for which H 0 will be rejected is called the critical region ( or the rejection region), and its complement is called the acceptance region. In general, a hypothesis test will be specified in terms of a test statistic T (X 1, X 2,..., X N ) T (X), which is a function of the sample. We can define the critical region formally as follows. Definition 1.4 Critical Region. The subset C R N of the sample space for which H 0 is rejected is called the critical region and is defined by C c = { x R N : T (x) > c } for some c R. The value c is called the critical value. The complement of C c, C a C c c, is called the acceptance region. If we let C T c be the critical region of the test statistic T (X) (i.e. C T c is defined by C c = { x R N : T (x) C T } c ), a statistical test of H0 against H 1 will generally be defined as: 1. T (x) C T c = Reject H 0 T (x) / C T c = Accept H 0 A hypothesis test of H 0 : θ Θ 0 against H 1 : θ Θ 1 can make one of two types of errors. Definition 1.5 Type I and Type II Errors. Let H 0 be a null hypothesis being tested for acceptance or rejection. The two types of errors that can be made are 1. Type I Error: rejecting H 0 when θ Θ 0 (i.e, when H 0 is true) 2. Type II Error: accepting H 0 when θ Θ 1 (i.e, when H 0 is false) So a type I error is committed when the statistical test mistakenly rejects the null hypothesis, and a type II error is committed when the test mistakenly accepts the null hypothesis. The ideal 2

test is one where the hypothesis would always be correctly identified as being either true or false. For such an ideal test to exist, we must partition the range of potential sample outcomes in such a way that outcomes in the critical region C c would occur if and only if H 0 were true and outcomes in the acceptance region C a would occur if and only if H 0 were false. In general, ideal tests cannot be constructed. For θ Θ 0, the test will make a mistake if x C c and therefore the probability of a type I error is P θ (X C c ) and for θ Θ 1, the test will make a mistake if x C a and therefore the probability of a type II error is P θ (X C a ). Note that P θ (X C c ) = 1 P θ (X C a ). We will now define the power function of a test. The power function completely summarizes all of the operating characteristics of a statistical test with respect to probabilities of making correct and incorrect decisions about H 0. The power function is defined below. Definition 1.6 Let H 0 be defined as H 0 : θ Θ 0 and H 1 be defined as H 1 : θ Θ 1. Let the critical region C c define a test of H 0. Then the power function of the statistical test is the function of θ defined by β (θ) P θ (X C c ) = probability of Type I error if θ Θ 0 one minus probability of Type II error if θ Θ 1 In words, the power function indicates the probability of rejecting H 0 for every value of θ Θ. The value of the power function at a particular value of the parameter space θ p Θ is called the power of the test at θ p and represents the probability of rejecting H 0 if θ p were the true value of the parameter vector. The ideal power function is 0 for all θ Θ 0 and 1 for all θ Θ 1. In general, this ideal cannot be attained and we say that a good test has power function near 0 for all θ Θ 0 and near 1 for all θ Θ 1. When comparing two tests for a given H 0, a test is better if it has lower power for θ Θ 0 and higher power for θ Θ 1 which implies that the test has lower probabilities of both type I and type II error. We now define the size and level of a test: 3

Definition 1.7 Size. For 0 α 1, a test with power function β (θ) is a size-α test if sup θ Θ0 β (θ) = α Definition 1.8 Level. sup θ Θ0 β (θ) α For 0 α 1, a test with power function β (θ) is a level-α test if In words, the size of the test is the maximum probability of Type I error associated with a given test rule. The lower the size of the test, the lower the maximum probability of mistakenly rejecting H 0. The level of a test is an upper bound to the type I error probability of a statistical test. The key difference between these two concepts is that the size represents the maximum value of β (θ) for θ Θ 0 (i.e. the maximum type I error) while the level is only a bound that might not equal β (θ) for any θ Θ 0 nor equal the supremum of β (θ) for θ Θ 0. Thus, the set of level-α tests contains the set of size-α tests. In other words, a test of H 0 having size γ is a α-level test for any α γ. In applications, when we say that H 0 is (not) rejected at the α-significance level, we often mean that α was the bound on the level of protection against type I error that was used when constructing the test. A more accurate statement is regarding the level of protection against type I error is that H 0 is (not) rejected using a size-α test. 2 Bootstrapping Hypothesis Tests The simplest situation involves a simple null hypothesis H 0 that completely specifies the probability distribution of the data. Thus, if we have a sample x 1, x 2,..., x n from a population with CDF F, then H 0 specifies that F = F 0 where F 0 contains no unknown parameters. A statistical test is based on a test statistic T which measures the discrepancy between the data and the null hypothesis. We will follow the convention that large values of T are evidence against H 0. If the null hypothesis is simple and the observed value of the test statistics is denoted by t, then the level of evidence 4

against H 0 is measured by the significance probability p = P (T t H 0 ) which is referred to as the p-value. The p-value is effectively the marginal size test at which a given hypothesis would be rejected based on the observed outcome of X. A corresponding notion is that of a critical value t p for t, which is associated with testing at level p: if t t p then H 0 is rejected at level p or 100p%. It follows that t p is defined as P (T t p H 0 ) = p Note that p is what we defined earlier as the size of the test and the set {(x 1, x 2,..., x n ) : t t p H 0 } is the level p critical region of the test. distribution of T. The distribution of the T under H 0 is called the null 2.1 How to choose the test-statistic In a parametric setting, there is an explicit form of the sampling distribution of the data with a finite number of unknown parameters. In these cases the alternative hypothesis guides the choice of the test statistic (usually through use of the likelihood function of the data). In non-parametric settings, no particular forms are specified for the distributions and hence the appropriate choice of T is less clear. However, the choice of T should be always based on some notion of what is of concern in the case that H 0 turns out to be false. In all non-parametric problems, the null hypothesis H 0 leaves some parameters unknown and therefore does not completely specify F. In this case, the p-value is not well defined because P (T t F ) may depend upon which F satisfying H 0 is taken. 2.1.1 Pivot Tests When H 0 concerns a particular parameter value, we can use the equivalence between hypothesis tests and confidence intervals. This equivalence implies that if the value of θ 0 is outside a 1 α 5

confidence interval for θ, then θ differs from θ 0 with p-value less than α. A specific form of test based on this equivalence is a pivot test. Suppose that T is an estimator for a scalar θ, with estimated variance V. Suppose also that the studentized version of T, Z = T θ, is a pivot (i.e. V 1/2 its distribution is the same for all relevant F, and in particular for all θ). For a one-sided test of H 0 : θ = θ 0 versus H 1 : θ > θ 0, the p-value that corresponds to the observed studentized test statistic z 0 = t θ 0 v 1/2 is p = P However, since Z is a pivot we have P { T θ0 V 1/2 t θ } 0 v 1/2 H 0 and therefore the p-value can be written as { T θ0 V 1/2 t θ } 0 v 1/2 H 0 { = P Z t θ } 0 v 1/2 H 0 { = P Z t θ } 0 v 1/2 F p = P {Z z 0 F } Note that this has a big advantage in the context of bootstrapping, because we do not have to construct a special null-hypothesis sampling distribution. 2.2 Non-Parametric Bootstrap Tests Testing hypothesis requires that probability calculations be done under the null hypothesis model. This means that the usual bootstrap setting must be modified, since resampling from the empirical CDF F and applying the plug-in principle to obtain θ ( ) = t F won t give us an estimator of θ under the null hypothesis H 0. In the hypothesis testing context, instead of resampling from the empirical CDF F, we must resample from an empirical CDF F 0 which satisfies the relevant null hypothesis H 0. (Unless, as we mentioned above, we can construct a pivot test-statistic). 6

Once we have decided on the null resampling distribution F 0, the basic bootstrap test will compute the p-value as or will approximate it by using the results t 1, t 2,..., t B p boot = P { T t F 0 } p boot = # {t b t} B from B bootstrap samples. Example 2.1 Difference in means. Suppose we want to compare two population means µ 1 and µ 2 using the test statistic t = x 1 x 2. We will use the following sample data: sample1 82 79 81 79 77 79 79 78 79 82 76 73 64 sample2 84 86 85 82 77 76 77 80 83 81 78 78 78 If the shapes of the underlying distributions are identical, then under H 0 : µ 1 = µ 2 the two distributions are the same. In this case, it is sensible to choose for F 0 the pooled empirical CDF of the two samples. Applying this procedure with 1, 000 bootstrap samples yielded 52 values of t greater than the observed value t = 80.38 77.53 = 2.84, which implies a p-value of 52 1000 cannot reject the null at 5% (but we can at 5.2%!!) = 0.052. So we 2.2.1 Studentized Bootstrap Method For some problems, it is possible to obtain more stable significance tests by studentizing comparisons. Remember that because of the relationship between confidence sets and hypothesis tests, such a test can be obtained calculating a 1 p confidence set by the studentized bootstrap method and concluding that the p-value is less than p is the null hypothesis parameter falls outside the confidence set. We can also implement this idea by bootstrapping the test statistic directly rather than constructing confidence intervals. In this case, the p-value can be obtained directly. Suppose that θ is 7

a scalar with estimator T and that we want to test H 0 : θ = θ 0 against H 1 : θ > θ 0. The method we mentioned in the section Pivot Tests applies when Z = T θ V 1/2 is approximately a pivot (i.e. its distribution is approximately independent of unknown parameters). Then, with z 0 = t θ 0 of v 1/2 being the observed studentized test statistic the bootstrap analog p = P {Z z 0 F } is p = P { Z z 0 F } which we can approximate by bootstrapping without having to decide on a null empirical distribution F 0. Example 2.2 Let s continue the example of the difference in means. We were comparing compare two population means µ 1 and µ 2 using the test statistic t = x 1 x 2. Now, it would reasonable to suppose that the usual two-sample t-statistic Z = X 2 X 1 (µ 2 µ 1 ) ( S 2 2 /n 2 + S 2 1 /n 1) 1/2 is approximately pivotal. We take F to be the empirical CDF of the two samples taken together, provided that no assumptions are made connecting the two distributions. The observed value of the test statistic under the null is We also calculate B values of z 0 = x 2 x 1 ( s 2 2 /n 2 + s 2 1 /n 1) 1/2 z = x 2 x 1 (x 2 x 1 ) ( s 2 2 /n 2 + s 2 1 /n 1 ) 1/2 8

3 Testing Linear Restrictions in OLS Consider the problem of testing the following null hypothesis H 0 : Rβ = r where the d K matrix R is matrix of restrictions (where d is the number of restrictions) and r is a p 1 vector of constants. The alternative hypothesis is H 1 : Rβ r. Using standard results from multivariate normal distributions, we now that T 1 ( ) T ( R β r R ( X T X ) ) 1 1 ( ) R T R β r T 2 and hence we have pivotal statistic given by σ 2 ( ) T ( ) y X β y X β σ 2 T 1 T 2 χ 2 N K χ 2 d F = = (Rβ r) b T R(X T X) 1 1 R T (Rβ r) b 1 σ 2 d (y Xβ) b T (y Xβ) b 1 σ 2 N r ( ) T ( R β r R ( X T X ) ) 1 1 ( ) R T R β r 1 d ( ) T ( ) 1 y X β y X β N K ( ) T ( R β r R ( X T X ) ) 1 1 ( ) R T R β r ds 2 F d,n K References Davidson, A. C. and D.V. Hinkley, 2006. Bootstrap Methods and their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. 9