Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing Eric Zivot October 12, 2011
Hypothesis Testing 1. Specify hypothesis to be tested H 0 : null hypothesis versus. H 1 : alternative hypothesis 2. Specify significance level α (size) of test α =Pr(Reject H 0 H 0 is true) 3. Construct test statistic, T, from observed data 4. Use test statistic T to evaluate data evidence regarding H 0 T is big evidence against H 0 T is small evidence in favor of H 0
5. Decide to reject H 0 at specified significance level if value of T falls in the rejection region T rejection region reject H 0 Remark: Usually the rejection region of T is determined by a critical value, cv α, such that T > cv α reject H 0 T cv α do not reject H 0 Typically, cv α is the (1 α) 100% quantile of the probability distribution of T under H 0 such that α =Pr(Reject H 0 H 0 is true) =Pr(T>cv α )
Decision Making and Hypothesis Tests Reality Decision H 0 is true H 0 is false Reject H 0 Type I error No error Do not reject H 0 No error Type II error Significance Level (size) of Test level =Pr(Type I error) Pr(Reject H 0 H 0 is true) Goal: Constuct test to have a specified small significance level level =5%or level =1%
Power of Test power =1 Pr(Type II error) =Pr(Reject H 0 H 0 is false) or Pr(Reject H 0 H 1 is true) Goal: Construct test to have high power Problem: Impossible to simultaneously have level 0 and power 1. As level 0 power also 0.
Example (Exact Tests in Finite Samples): Let X 1,...,X n be iid random variables with X i N(μ, σ 2 ). Consider testing the hypotheses H 0 : μ = μ 0 vs. H 1 : μ>μ 0 One potential test statistic is the t-statistic t μ=μ0 = ˆμ μ 0 SE(ˆμ) ˆμ = 1 X i, SE(ˆμ) = σ n i=1 n Intuition: We should reject H 0 if t μ=μ0 >> 0. nx Result: For any fixed n, under H 0 : μ = μ 0 t μ=μ0 = ˆμ μ 0 SE(ˆμ) = 1 X n µ Xi μ 0 n i=1 σ N(0, 1)
Remarks 1. The finite sample distribution of t μ=μ0 is N(0, 1), which is independent of the population parameters μ 0 and σ (sometimes called nuisance parameters). When a test statistic does not depend on nuisance parameters, it is called a pivotal statistic or pivot. 2. Let α (0, 1) denote the significance level (size). Because we have a one-sided test, the rejection region is determined by the critical value cv α such that Pr(Reject H 0 H 0 is true) =Pr(Z>cv α )=α where Z N(0, 1). Hence, cv α is the (1 α) 100% quantile of the standard normal distribution. For example, if α = 0.05 then cv.05 = 1.645.
3. The t-test that rejects when t μ=μ0 >cv α is an exact size α finite sample test. 4. We can calculate the finite sample distribution of t μ=μ0 becausewemade a number of very strong assumptions: X 1,...,X n be iid random variables with X i N(μ, σ 2 ). In particular, if X i is not normally distributed then t μ=μ0 will not be normally distributed.
Example Continued: Computing Finite Sample Power Recall, power =Pr(Reject H 0 H 1 is true) Tocomputepower,onehastospecifyH 1. Suppose H 1 : μ = μ 1 >μ 0 Then power =Pr! Èμ μ0 SE(ˆμ) >cv α H 1 : μ = μ 1
Under H 1 : μ = μ 1, where ˆμ μ 0 SE(ˆμ) = ˆμ μ 1 + μ 1 μ 0 SE(ˆμ) = Z + δ N(δ, 1) = ˆμ μ 1 SE(ˆμ) + μ 1 μ 0 SE(ˆμ) Z = ˆμ μ 1 N(0, 1) SE(ˆμ) δ = μ 1 μ 0 SE(ˆμ) = µ μ1 μ n 0 σ Therefore, under H 1 : μ = μ 1 Èμ μ0 power = Pr SE(ˆμ) >cv α H 1 : μ = μ 1 = Pr(Z + δ>cv α )=Pr(Z>cv α δ)!
Remark: 1. Power is monotonic in δ. 2. For any fixed alternative μ 1 >μ 0, as n δ = µ μ1 μ n 0 σ and power 1.
Hypothesis Testing Based on Asymptotic Distributions Statistical inference in large-sample theory (asymptotic theory) is based on test statistics whose asymptotic distributions are known under the truth of the null hypothesis The derivation of the distribution of test statistics in large-sample theory is much easier than in finite-sample theory because we only care about the large-sample approximation to the unknown finite sample distribution the LLN, CTL, Slutsky s Theorem and the Continuous Mapping Theorem (CMT) allow us to derive asymptotic distributions fairly easily
Example (Asymptotic tests). Let X 1,...,X n be a sequence of covariance stationary and ergodic random variables with E[X i ]=μ and var(x i )=σ 2. We can write X i = μ + ε i where ε i is a covariance stationary MDS with variance σ 2. Consider testing the hypotheses H 0 : μ = μ 0 vs. H 1 : μ 6= μ 0 Two asymptotic test statistics are the asymptotic t-statistic and Wald statistic t μ=μ0 = ˆμ μ 0 dase(ˆμ), Wald = ³ (ˆμ μ t μ=μ0 2 = 0 ) 2 avar(ˆμ) d ˆμ = 1 n nx i=1 X i, dase(ˆμ) = n, 2 = 1 n nx i=1 (X i ˆμ) 2
The CLT for stationary and ergodic MDS can be used to show that under H 0 : μ = μ 0 ˆμ A N (μ 0, avar(ˆμ)) d, avar(ˆμ) d =2 n The Ergodic theorem and Slutsky s theorem can be used to show that t μ=μ0 A N(0, 1) = Z Additionally, the CMT can be used to show that Wald = ³ t μ=μ0 2 A χ 2 (1) = χ If the significance level of each test is 5%, then the asymptotic critical values are determined by t-test : Pr( Z >cv.05 )=0.05 cv.05 =1.96 Wald test : Pr(χ >cv.05 )=0.05 cv.05 =3.84
Remarks: 1. Theasymptotict-testisdifferent from the finite sample t-test in two respects: (a) The way the standard error is computed (dase(ˆμ) vs. se(ˆμ)) (b) The actual size (significance level) of the asymptotic test is 5% (the nominal significance level) only as n. In finite samples, the actual size of the asymptotic test may be smaller or larger than 5%. The difference between the actual size and the nominal size is called the size distortion.
Remarks continued: 2. For fixed alternatives (e.g. H 1 : μ = μ 1 6= μ 0 ) The power of the asymptotic tests converge to 1 as n. That is, the asymptotic tests are consistent tests. To see this, consider the t-test ˆμ μ 0 dase(ˆμ) = µˆμ μ0 n = µˆμ μ1 + μ n 1 μ 0 = µˆμ μ1 n + µ μ1 μ n 0 d N(0, 1) + (± ) =± Therefore, as n power = Pr(reject H 0 H 1 is true) = Pr ³ tμ=μ0 > 1.96 =Pr( > 1.96) = 1
Remarks continued: 3. If there are several asymptotic tests for the same hypotheses, we can compare the asymptotic power of these tests by considering asymptotic power under so-called local alternatives of the form H 1 : μ 1 = μ 0 + δ n, δ fixed Under this local alternative we can calculate non-trivial asymptotic power. To see this consider the t-test:
So that ˆμ μ 0 dase(ˆμ) = µˆμ μ0 n = µˆμ μ1 + μ n 1 μ 0 = µˆμ μ1 n + µ μ1 μ n 0 = µˆμ μ1 n + μ 0 + δ n μ 0 n = µˆμ μ1 n + δ d N(0, 1) + δ µ δ σ = N σ, 1 which depends on δ. power = Pr(reject H 0 H 1 is true) = Prµ N µ δ σ, 1 > 1.96