Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

18.650 Statistics for Applications Chapter 5: Parametric hypothesis testing 1/37

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were 14974 participants Average running time was 103.5 minutes. Were runners faster in 2012? To answer this question, select n runners from the 2012 race at random and denote by X 1,...,X n their running time. 2/37

Cherry Blossom run (2) We can see from past data that the running time has Gaussian distribution. The variance was 373. 3/37

Cherry Blossom run (3) We are given i.i.d r.v X 1,...,X n and we want to know if X 1 N(103.5, 373) This is a hypothesis testing problem. There are many ways this could be false: 1. IE[X 1 ] = 103.5 2. var[x 1 ] = 373 3. X 1 may not even be Gaussian. We are interested in a very specific question: is IE[X 1 ] < 103.5? 4/37

Cherry Blossom run (4) We make the following assumptions: 1. var[x 1 ] = 373 (variance is the same between 2009 and 2012) 2. X 1 is Gaussian. The only thing that we did not fix is IE[X 1 ] = µ. Now we want to test (only): Is µ = 103.5 or is µ < 103.5? By making modeling assumptions, we have reduced the number of ways the hypothesis X 1 N(103.5, 373) may be rejected. The only way it can be rejected is if X 1 N(µ, 373) for some µ < 103.5. We compare an expected value to a fixed reference number (103.5). 5/37

Cherry Blossom run (5) Simple heuristic: If X n < 103.5, then µ < 103.5 This could go wrong if I randomly pick only fast runners in my sample X 1,...,X n. Better heuristic: If X n < 103.5 (something that 0), then µ < 103.5 n To make this intuition more precise, we need to take the size of the random fluctuations of X n into account! 6/37

Clinical trials (1) Pharmaceutical companies use hypothesis testing to test if a new drug is efficient. To do so, they administer a drug to a group of patients (test group) and a placebo to another group (control group). Assume that the drug is a cough syrup. Let µ control denote the expected number of expectorations per hour after a patient has used the placebo. Let µ drug denote the expected number of expectorations per hour after a patient has used the syrup. We want to know if µ drug < µ control We compare two expected values. No reference number. 7/37

Clinical trials (2) Let X 1,...,X ndrug denote n drug i.i.d r.v. with distribution Poiss(µ drug ) Let Y 1,...,Y ncontrol denote n control i.i.d r.v. with distribution Poiss(µ control ) We want to test if µ drug < µ control. Heuristic: If X drug < X control (something that 0), then conclude that µ drug < µ control n drug n control 8/37

Heuristics (1) Example 1: A coin is tossed 80 times, and Heads are obtained 54 times. Can we conclude that the coin is significantly unfair? iid n = 80, X 1,...,X n Ber(p); X n = 54/80 =.68 If it was true that p =.5: By CLT+Slutsky s theorem, X n.5 n N(0, 1). J.5(1.5) X n.5 n 3.22 J.5(1.5) Conclusion: It seems quite reasonable to reject the hypothesis p =.5. 9/37

Heuristics (2) Example 2: A coin is tossed 30 times, and Heads are obtained 13 times. Can we conclude that the coin is significantly unfair? iid n = 30,X 1,...,X n Ber(p); X n = 13/30.43 If it was true that p =.5: By CLT+Slutsky s theorem, X n.5 n N(0, 1). J.5(1.5) X n.5 Our data gives n J.77.5(1.5) The number.77 is a plausible realization of a random variable Z N(0, 1). Conclusion: our data does not suggest that the coin is unfair. 10/37

Statistical formulation (1) Consider a sample X 1,...,X n of i.i.d. random variables and a statistical model (E, (IP θ ) θ Θ ). Let Θ 0 and Θ 1 be disjoint subsets of Θ. Consider the two hypotheses: { H 0 : θ Θ 0 H 1 : θ Θ 1 H 0 is the null hypothesis, H 1 is the alternative hypothesis. If we believe that the true θ is either in Θ 0 or in Θ 1, we may want to test H 0 against H 1. We want to decide whether to reject H 0 (look for evidence against H 0 in the data). 11/37

Statistical formulation (2) H 0 and H 1 do not play a symmetric role: the data is is only used to try to disprove H 0 In particular lack of evidence, does not mean that H 0 is true ( innocent until proven guilty ) A test is a statistic ψ {0, 1} such that: If ψ = 0, H 0 is not rejected; If ψ = 1, H 0 is rejected. Coin example: H 0 : p = 1/2 vs. H 1 : p = 1/2. { X n.5 ψ = 1I n J.5(1.5) } > C, for some C > 0. How to choose the threshold C? 12/37

Statistical formulation (3) Rejection region of a test ψ: R ψ = {x E n : ψ(x) = 1}. Type 1 error of a test ψ (rejecting H 0 when it is actually true): α ψ : Θ 0 IR θ IP θ [ψ = 1]. Type 2 error of a test ψ (not rejecting H 0 although H 1 is actually true): β ψ : Θ 1 IR θ IP θ [ψ = 0]. Power of a test ψ: π ψ = inf (1 β ψ (θ)). θ Θ 1 13/37

Statistical formulation (4) A test ψ has level α if α ψ (θ) α, θ Θ 0. A test ψ has asymptotic level α if In general, a test has the form lim α ψ (θ) α, θ Θ 0. n ψ = 1I{T n > c}, for some statistic T n and threshold c IR. T n is called the test statistic. The rejection region is R ψ = {T n > c}. 14/37

Example (1) iid Let X 1,...,X n Ber(p), for some unknown p (0, 1). We want to test: H 0 : p = 1/2 vs. H 1 : p = 1/2 with asymptotic level α (0, 1). pˆn 0.5 Let T n = n J.5(1.5), where pˆn is the MLE. If H 0 is true, then by CLT and Slutsky s theorem, Let ψ α = 1I{T n > q α/2 }. IP[T n > q α/2 ] 0.05 n 15/37

Example (2) Coming back to the two previous coin examples: For α = 5%, q α/2 = 1.96, so: In Example 1, H 0 is rejected at the asymptotic level 5% by the test ψ 5% ; In Example 2, H 0 is not rejected at the asymptotic level 5% by the test ψ 5%. Question: In Example 1, for what level α would ψ α not reject H 0? And in Example 2, at which level α would ψ α reject H 0? 16/37

p-value Definition The (asymptotic) p-value of a test ψ α is the smallest (asymptotic) level α at which ψ α rejects H 0. It is random, it depends on the sample. Golden rule p-value α H 0 is rejected by ψ α, at the (asymptotic) level α. The smaller the p-value, the more confidently one can reject H 0. Example 1: p-value = IP[ Z > 3.21].01. Example 2: p-value = IP[ Z >.77].44. 17/37

Neyman-Pearson s paradigm Idea: For given hypotheses, among all tests of level/asymptotic level α, is it possible to find one that has maximal power? Example: The trivial test ψ = 0 that never rejects H 0 has a perfect level (α = 0) but poor power (π ψ = 0). Neyman-Pearson s theory provides (the most) powerful tests with given level. In 18.650, we only study several cases. 18/37

The χ 2 distributions Definition For a positive integer d, the χ 2 (pronounced Kai-squared ) distribution with d degrees of freedom is the law of the random iid variable Z2 1 + Z2 2 +... + Zd 2, where Z 1,...,Z d N(0, 1). Examples: If Z N d (0,I d ), then IZI 2 2 χ 2 d. Recall that the sample variance is given by S n = 1 n (Xi X n) 2 = 1 n 2 Xi n n i=1 i=1 (X n) 2 Cochran s theorem implies that for X 1,...,X n N(µ, σ 2 ), if S n is the sample variance, then iid χ 2 2 = Exp(1/2). ns n χ 2 n 1. σ 2 19/37

Student s T distributions Definition For a positive integer d, the Student s T distribution with d degrees of freedom (denoted by t d ) is the law of the random Z variable J V/d independent of V ). Example:, where Z N(0, 1), V χ 2 and Z V (Z is d iid Cochran s theorem implies that for X 1,...,X n N(µ, σ 2 ), if S n is the sample variance, then X n µ n 1 t n 1. S n 20/37

Wald s test (1) Consider an i.i.d. sample X 1,...,X n with statistical model (E, (IP θ ) θ Θ ), where Θ IR d (d 1) and let θ 0 Θ be fixed and given. Consider the following hypotheses: { H 0 : θ = θ 0 H 1 : θ = θ 0. Let θˆ MLE be the MLE. Assume the MLE technical conditions are satisfied. If H 0 is true, then ) (d) n I(θˆMLE ) 1/2 (θˆmle θ 0 N d (0,I d ) w.r.t. IP θ0. n n 21/37

Wald s test (2) Hence, ( ) ( ) θˆmle θ MLE (d) n n θ 0 I(ˆ ) θˆmle n θ 0 χd 2 w.r.t. IP θ0. n }{{} Tn Wald s test with asymptotic level α (0, 1): ψ = 1I{T n > q α }, where q α is the (1 α)-quantile of χ 2 (see tables). d Remark: Wald s test is also valid if H 1 has the form θ > θ 0 or θ < θ 0 or θ = θ 1... 22/37

Likelihood ratio test (1) Consider an i.i.d. sample X 1,...,X n with statistical model (E, (IP θ ) θ Θ ), where Θ IR d (d 1). Suppose the null hypothesis has the form (0) (0) H 0 : (θ r+1,...,θ d ) = (θ r+1,...,θ d ), (0) (0) for some fixed and given numbers θ r+1,...,θ d. Let and ˆθ n = argmax l n (θ) (MLE) θ Θ θˆc = argmax l n (θ) ( constrained MLE ) n θ Θ 0 23/37

Likelihood ratio test (2) Test statistic: ( ) T n = 2 l n (θˆn) l n (θˆc n). Theorem Assume H 0 is true and the MLE technical conditions are satisfied. Then, (d) T n χ d 2 r w.r.t. IP θ. n Likelihood ratio test with asymptotic level α (0, 1): ψ = 1I{T n > q α }, where q α is the (1 α)-quantile of χ 2 d r (see tables). 24/37

Testing implicit hypotheses (1) Let X 1,...,X n be i.i.d. random variables and let θ IR d be a parameter associated with the distribution of X 1 (e.g. a moment, the parameter of a statistical model, etc...) Let g : IR d IR k be continuously differentiable (with k < d). Consider the following hypotheses: { H 0 : g(θ) = 0 H 1 : g(θ) = 0. E.g. g(θ) = (θ 1,θ 2 ) (k = 2), or g(θ) = θ 1 θ 2 (k = 1), or... 25/37

Testing implicit hypotheses (2) Suppose an asymptotically normal estimator θˆn is available: ( ) (d) n θˆ n θ N d (0, Σ(θ)). n Delta method: ( ) (d) n g(θˆn) g(θ) N k (0, Γ(θ)), n where Γ(θ) = g(θ) Σ(θ) g(θ) IR k k. Assume Σ(θ) is invertible and g(θ) has rank k. So, Γ(θ) is invertible and ( ) (d) n Γ(θ) 1/2 g(θˆn) g(θ) N k (0,I k ). n 26/37

Testing implicit hypotheses (3) Then, by Slutsky s theorem, if Γ(θ) is continuous in θ, ( ) (d) n Γ(θˆn ) 1/2 g(θˆn) g(θ) N k (0,I k ). Hence, if H 0 is true, i.e., g(θ) = 0, n ) Γ 1 (d) ng(θˆn (ˆ θ )g(ˆ χ 2 n θ n ) k }{{}. n T n Test with asymptotic level α: ψ = 1I{T n > q α }, where q α is the (1 α)-quantile of χ 2 (see tables). k 27/37

The multinomial case: χ 2 test (1) Let E = {a 1,...,a K } be a finite space and (IP p ) family of all probability distributions on E: p Δ K be the n K Δ = p = (p 1,...,p K ) (0, 1) K K : p j = 1. j=1 For p Δ K and X IP p, IP p [X = a j ] = p j, j = 1,..., K. 28/37

The multinomial case: χ 2 test (2) iid Let X 1,...,X n IP p, for some unknown p Δ K, and let p 0 Δ K be fixed. We want to test: H 0 : p = p 0 vs. H 1 : p = p 0 with asymptotic level α (0, 1). Example: If p 0 = (1/K, 1/K,..., 1/K), we are testing whether IP p is the uniform distribution on E. 29/37

The multinomial case: χ 2 test (3) Likelihood of the model: N 1 N 2 N K L n (X 1,...,X n, p) = p p...p, 1 2 K where N j = #{i = 1,...,n : X i = a j }. Let pˆ be the MLE: N j pˆj =, j = 1,..., K. n pˆ maximizes logl n (X 1,...,X n, p) under the constraint nk pj = 1. j=1 30/37

The multinomial case: χ 2 test (4) If H 0 is true, then n(pˆ p 0 ) is asymptotically normal, and the following holds. Theorem ( ) 2 K 0 pˆj p j (d) n n χ 2 j=1 p j 0 }{{} T n n K 1. χ 2 test with asymptotic level α: ψ α = 1I{T n > q α }, where q α is the (1 α)-quantile of χ 2 K 1. Asymptotic p-value of this test: p value = IP[Z > T n T n ], where Z χ 2 and Z T n. K 1 31/37

The Gaussian case: Student s test (1) iid Let X 1,...,X n N(µ, σ 2 ), for some unknown µ IR,σ 2 > 0 and let µ 0 IR be fixed, given. We want to test: H 0 : µ = µ 0 vs. H 1 : µ = µ 0 with asymptotic level α (0, 1). X n µ 0 If σ 2 is known: Let T n = n. Then, T n N(0, 1) σ and ψ α = 1I{ T n > q α/2 } is a test with (non asymptotic) level α. 32/37

The Gaussian case: Student s test (2) If σ 2 is unknown: Xn µ 0 Let T n = n 1, where S n is the sample variance. S n Cochran s theorem: X n S n ; ns n χ 2 σ 2 n 1. Hence, T T n t n 1 : Student s distribution with n 1 degrees of freedom. 33/37

The Gaussian case: Student s test (3) Student s test with (non asymptotic) level α (0, 1): ψ α = 1I{ T T n > q α/2 }, where q α/2 is the (1 α/2)-quantile of t n 1. If H 1 is µ > µ 0, Student s test with level α (0, 1) is: ψ = 1I{T T n > q α }, α where q α is the (1 α)-quantile of t n 1. Advantage of Student s test: Non asymptotic Can be run on small samples Drawback of Student s test: It relies on the assumption that the sample is Gaussian. 34/37

Two-sample test: large sample case (1) Consider two samples: X 1,...,X n and Y 1,...,Y m, of independent random variables such that IE[X 1 ] = = IE[X n ] = µ X, and IE[Y 1 ] = = IE[Y m ] = µ Y Assume that the variances of are known so assume (without loss of generality) that var(x 1 ) = = var(x n ) = var(y 1 ) = = var(y m ) = 1 We want to test: H 0 : µ X = µ Y vs. H 1 : µ X = µ Y with asymptotic level α (0, 1). 35/37

Two-sample test: large sample case (2) From CLT: (d) n(x n µ X ) N(0, 1) n and (d) (d) m(ȳ m µ Y ) N(0, 1) n(ȳ m µ Y ) N(0,γ) n m m m γ n Moreover, the two samples are independent so (d) n(x n Y m )+ n(µ X µ Y ) N(0, 1+γ) n m m γ n Under H 0 : µ X = µ Y : X n Ȳ m (d) n J N(0, 1) 1+m/n n m m γ n { X n Y } m Test: ψ α = 1I n J > q α/2 1+m/n 36/37

Two-sample T-test If the variances are unknown but we know that X i N(µ X,σX 2 ), Y i N(µ Y,σY 2 ). Then σx 2 σy 2 X n Y m N ( ) µ X µ Y, + n m Under H 0 : X n Y J m N(0, 1) σx 2 /n + σy 2 /m For unknown variance: X n Y J m t N SX 2 /n + SY 2 /m where ( S 2 /n + S 2 /m ) 2 N = X Y SX 4 SY + 4 n 2 (n 1) m 2 (m 1) 37/37

MIT OpenCourseWare http://ocw.mit.edu 18.650 / 18.6501 Statistics for Applications Fall 2016 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.