Chi-Squared Tests Math 6070, Spring 2006

Similar documents
Chi-squared tests Math 6070, Spring 2014

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

5. Likelihood Ratio Tests

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Topic 9: Sampling Distributions of Estimators

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Last Lecture. Wald Test

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

1 Inferential Methods for Correlation and Regression Analysis

Estimation for Complete Data

6.3 Testing Series With Positive Terms

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Statistics 511 Additional Materials

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Lecture 2: Monte Carlo Simulation

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Stat 319 Theory of Statistics (2) Exercises

Lecture 7: Properties of Random Samples

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Topic 18: Composite Hypotheses

Parameter, Statistic and Random Samples

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Summary. Recap ... Last Lecture. Summary. Theorem

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Efficient GMM LECTURE 12 GMM II

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Frequentist Inference

Binomial Distribution

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

STAC51: Categorical data Analysis

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 6 Principles of Data Reduction

Introductory statistics

Kurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)

MATH/STAT 352: Lecture 15

Lecture 11 and 12: Basic estimation theory

Simulation. Two Rule For Inverting A Distribution Function

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Properties and Hypothesis Testing

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

STATISTICAL INFERENCE

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

A statistical method to determine sample size to estimate characteristic value of soil parameters

11 Correlation and Regression

Lecture Notes 15 Hypothesis Testing (Chapter 10)

TAMS24: Notations and Formulas

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Stat 200 -Testing Summary Page 1

7.1 Convergence of sequences of random variables

Problem Set 4 Due Oct, 12

This is an introductory course in Analysis of Variance and Design of Experiments.

Module 1 Fundamentals in statistics

Mathematical Statistics - MS

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Chapter 8: Estimating with Confidence

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Infinite Sequences and Series

Probability and statistics: basic terms

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

1.010 Uncertainty in Engineering Fall 2008

Expectation and Variance of a random variable

Section 14. Simple linear regression.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Stat 421-SP2012 Interval Estimation Section

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Chapter 6 Sampling Distributions

4. Partial Sums and the Central Limit Theorem

Maximum Likelihood Estimation

Bayesian Methods: Introduction to Multi-parameter Models

Distribution of Random Samples & Limit theorems

11 THE GMM ESTIMATION

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

1 Approximating Integrals using Taylor Polynomials

Lecture 33: Bootstrap

32 estimating the cumulative distribution function

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Confidence Intervals for the Population Proportion p

Transcription:

Chi-Squared Tests Math 6070, Sprig 2006 Davar Khoshevisa Uiversity of Utah February XXX, 2006 Cotets MLE for Goodess-of Fit 2 2 The Multiomial Distributio 3 3 Applicatio to Goodess-of-Fit 6 3 Testig for a Fiite Distributio 6 32 Testig for a Desity 8 Chi-squared tests refers to a family of statistical test whose large-sample asymptotics are related i oe way or aother to the χ 2 family of distributios These are our first examples of o-parametric procedures Let me metio oe particularly useful chi-squared test here before we get started properly Cosider a populatio whose distributio is believed to follow a kow desity fuctio f say, expoetial(2) The there is a chi-squared test that ca be used to check our hypothesis This chi-squared test is easy to implemet, ad works well for large-sized samples First, we study a related parametric estimatio problem The we fid a cosistet MLE Ad fially, we use our estimate to devise a test for our hypothesis This will be doe i several stages It is likely that you already kow somethig of the resultig χ 2 -test (page 7) But here you will fid a rigorous proof, i lie with the spirit of this course, of the fact that the said χ 2 -test actually works A [covicig but] o-rigorous justificatio of the same fact ca be foud i pages 34 36 of the excellet moograph of P Bickel ad K Doksum [Mathematical Statistics, Holde Day, First editio, 977] There, a likelihood priciple argumet is devised, which usually works

MLE for Goodess-of Fit Suppose X,, X is a idepedet sample from a fiite populatio More precisely, let t,, t m be m distict umbers (or types ), ad θ := (θ,, θ m ) a m-dimesioal probability vector Cosider iid radom variables X,, X such that P{X = t i } = θ i for all i m That is, each X is of type t i with probability θ i We wish to fid a (actually, the ) MLE for the mass fuctio of the X i s That is, the MLE for the vector θ := (θ,, θ m ) Note that, here, Θ is the collectio of all θ such that 0 θ,, θ m ad θ + + θ m = This is a kid of parametric problem, of course, ad we proceed by computig the MLE Defie N := (N,, N m ) to be the cout statistics for the various types Usig colum-vector otatio, this meas that j= I {X j = t } N = () j= I {X j = t m } I words, N k deotes the umber of occuraces of type k i the sample X := (X,, X ) Note that m j= N j =, so the N j s are obviously depedet rv s Let us calculate their joit mass fuctio Let = (,, m ) be a m-vector of positive itegers such that m j= j = Such s are the possible values of N Moreover, P θ {N = } = P θ {X = s,, X = s }, (2) where the sum is take over all s = (s,, s ) such that of them are of type t, 2 of type t 2, etc For all such s, P θ {X = s,, X = s } = θ θm m (3) Moreover, the umber of all such s s is exactly ( )! :=,, m! m! (4) Therefore, we have derived the followig Lemma The radom -vector N has the multiomial distributio with parameters (, θ) That is, for all itegers,, m such that + + m =, ( ) P θ {N = } = θ,, θm m (5) m Cosider the special case = 2 The, N biomial(, θ), ad N 2 = N So the multiomial distributio is a extesio of the biomial i this 2

sese We write N Multiomial(, θ) i place of the statemet that N is multiomial with parameters (, θ) It turs out that we ca compute the MLE for θ by usig N as the basic data To do so, we use Lemma ad first fid the log-likelihood as follows: L(θ) = A + m N i l θ i, (6) where A does ot deped o θ The case where some θ i {0, } is trivial ad ca be ruled out easily Therefore, we may assume that 0 < θ i < for all i =,, m Thus, for all l =,, m, θ l L(θ) = m ( Ni θ i θ ) i (7) θ l As θ varies i Θ, θ,, θ m roam freely subject to the sigle coditio that θ m = m θ i I other words, ( θ i / θ l ) = 0 for all i =,, m differet from l, but ( θ m / θ l ) = This proves that ( L(θ)/ θ l ) = (N m /θ m ) (N l /θ l ) for all l =,, m Set it equal to zero (for MLE) to see that there must exist γ such that N j = γθ j for all j =,, m But m j= N j =, whereas m j= θ j = Therefore, γ = We have proved the followig Lemma 2 The MLE for θ based o N,, N m is give by N / ˆθ := (8) N m / [To prove that L/ θ l = 0 ideed yields a maximum, you take secod derivatives ad apply similar argumets] 2 The Multiomial Distributio Choose ad fix a probability vector θ, ad assume it is the true oe so that we write P, E, etc i place of the more cumbersome P θ, E θ, etc We ca use () to say a few more thigs about the distributio of N Note first that () is equivalet to the followig: N = I {X j = t } I {X j = t m } j= We ca compute directly to fid that := Y j (9) j= EY = θ, (0) 3

ad θ ( θ ) θ θ 2 θ θ m θ θ 2 θ 2 ( θ 2 ) θ 2 θ m CovY = θ m θ θ m θ 2 θ m ( θ m ) := Q 0 () Because Y,, Y are iid radom vectors, the followig is a cosequece of the precedig, cosidered i cojuctio with the multidimesioal CLT Theorem 3 As, N θ d Nm (0, Q 0 ) (2) There is a variat of Theorem 3 which is more useful to us Defie (N θ )/ θ Ñ := (N m θ m )/ (3) θ m This too is a sum of iid radom vectors Ỹ,, Ỹ, ad has EỸ := 0 ad CovỸ = Q, (4) where Q is the (m m)-dimesioal covariace matrix give below: θ θ θ 2 θ θ m θ θ 2 θ 2 θ 2 θ m Q = θ m θ (5) θ m θ 2 θ m The we obtai, Theorem 4 As, Ñ d N m (0, Q) (6) Remark 5 Oe should be a little careful at this stage because Q is sigular [For example, cosider the case m = 2] However, N m (0, Q) makes sese eve i this case The reaso is this: If C is a o sigular covariace matrix, the we kow what it meas for X N m (0, C), ad a direct computatio reveals that for all real umbers t ad all m-vectors ξ, E [exp (itξ X)] = E [ exp ( t2 2 ξ Cξ )] (7) 4

This defies the characteristic fuctio of the radom variable ξ X uiquely There is a theorem (Ma Wold device) that says that a uique descriptio of the characteristic fuctio of ξ X for all ξ also describes the distributio of X uiquely This meas that ay radom vector X that satisfies (7) for all t R ad ξ R m is N m (0, C) Moreover, this approach does ot require C to be ivertible To lear more about multivariate ormals, you should atted Math 600 As a cosequece of Theorem 4 we have the followig: As, Ñ 2 d Nm (0, Q) 2 (8) [Recall that x 2 = x 2 + + x 2 m] Next, we idetify the distributio of N m (0, Q) 2 A direct (ad somewhat messy) calculatio shows that Q is a projectio matrix That is, Q 2 := QQ = Q It follows fairly readily from this that its eigevalues are all zeros ad oes Here is the proof: Let λ be a eigevalue ad x a eigevector Without loss of geerality, we may assume that x = ; else, cosider the eigevector y := x/ x istead This meas that Qx = λx Therefore, Q 2 x = λ 2 x Because Q 2 = Q, we have prove that λ = λ 2, whece it follows that λ {0, }, as promised Write Q i its spectral form That is, λ 0 0 Q = P 0 λ 2 0 P, (9) 0 0 λ m where P is a orthogoal (m m) matrix [This is because Q is positive semidefiite You ca lear more o this i Math 600] Defie λ 0 0 0 λ2 0 Λ := P (20) 0 0 λm The, Q = Λ Λ, which is why Λ is called the (Perose) square root of Q Let Z be a m-vector of iid stadard ormals ad cosider ΛZ This is multivariate ormal (check characteristic fuctios), E[ΛZ] = 0, ad Cov(ΛZ) = Λ Cov(Z)Λ = Λ Λ = Q I other words, we ca view N m (0, Q) as ΛZ Therefore, N m (0, Q) 2 d = ΛZ 2 = m λ j Zi 2 (2) Because λ i {0, }, we have proved that N m (0, Q) χ 2 r, where r = rak(q) Aother messy but elemetary computatio shows that the rak of 5

Q is m [Roughly speakig, this is because the θ i s have oly oe source of liear depedece Namely, that θ + + θ m = ] This proves that Ñ 2 d χ 2 m (22) Write out the left-had side explicitly to fid the followig: Theorem 6 As, m (N i θ i ) 2 θ i d χ 2 m (23) It took a while to get here, but we will soo be rewarded for our effort 3 Applicatio to Goodess-of-Fit 3 Testig for a Fiite Distributio I light of the developmet of Sectio, ca paraphrase Theorem 6 as sayig that uder P θ, as, X (θ) d χ 2 m, where X (θ) := ) 2 m (ˆθi θ i (24) θ i Therefore, we are i a positio to derive a asymptotic ( α)-level cofidece set for θ Defie for all a > 0, { C (a) := θ Θ : X (θ) a } (25) Equatio (24) shows that for all θ Θ, lim P θ {θ C (a)} = P { χ 2 m a } (26) We ca fid (eg, from χ 2 -tables) a umber χ 2 (α) such that It follows that P { χ 2 m > χ 2 m (α) } = α (27) C ( χ 2 m (α) ) is a asymptotic level ( α) cofidece iterval for θ (28) The statistic X (θ) was discovered by K Pearso i 900, ad is therefore called the Pearso χ 2 statistic Icidetally, the term stadard deviatio is also due to Karl Pearso (893) 6

The statistics X (θ) is a little awkward to work with, ad this makes the computatio of C (χ 2 m (α)) difficult Nevertheless, it is easy eough to use C (χ 2 m (α)) to test simple hypotheses for θ Suppose we are iterested i the test H 0 : θ = θ 0 versus H : θ θ 0, (29) where θ 0 = (θ 0,, θ 0m ) is a (kow) probability vector i Θ The we ca reject H 0 (at asymptotic level ( α)) if θ 0 C (χ 2 m (α)) I other words, we reject H 0 if X (θ 0 ) > χ2 m (α) (30) Because X (θ 0 ) is easy to compute, (29) ca be carried out with relative ease Moreover, X (θ 0 ) has the followig iterpretatio: Uder H 0, X (θ 0 ) = m (Obs i Exp i ) 2, (3) Exp i where Obs i deotes the umber of observatios i the sample that are of type i, ad Exp i := θ 0i is the expected umber uder H 0 of sample poits that are of type i Next we study two examples Example 7 A certai populatio cotais oly the digits 0,, ad 2 We wish to kow if every digit is equally likely to occur i this populatio That is, we wish to test H 0 : θ = ( 3, 3, 3 ) versus H : θ ( 3, 3, 3 ) (32) To this ed, suppose we make, 000 idepedet draws from this populatio, ad fid that we have draw 300 zeros (ˆθ = 03), 340 oes (ˆθ 2 = 034), ad 360 twos (ˆθ 3 = 036) That is, Types Obs Exp (Obs Exp) 2 /Exp 0 030 /3 333 0 3 034 /3 33 0 4 2 036 /3 23 0 3 Total 00 00055 (approx) Suppose we wish to test (32) at the traditioal (asymptotic) level 95% Here, m = 3 ad α = 005, ad we fid from chi-square tables that χ 2 2(005) 5999 Therefore, accordig to (30), we reject if ad oly if X (θ 0 ) 00055 is greater tha χ 2 2(005)/000 000599 Thus, we do ot reject the ull hypothesis at asymptotic level 95% O the other had, χ 2 2(0) 4605, ad X (θ 0 ) is greater tha χ 2 2(0)/000 0004605 Thus, had we performed the test at 7

asymptotic level 90% we would ideed reject H 0 I fact, this discussio has show that the asymptotic P -value of our test is somewhere betwee 005 ad 0 It is more importat to report/estimate P -values tha to make blaket declaratios such as accept H 0 or reject H 0 Cosult your Math 5080 5090 text(s) To illustrate this example further, suppose the table of observed vs expected remais the same but = 2000 The, X (θ 0 ) 00055 is a good deal greater tha χ 2 2(005)/ 000299955 ad we reject H 0 soudly at the asymptotic level 95% I fact, i this case the P -value is somewhere betwee 0005 ad 00025 (check this!) You ca fid may olie χ 2 -tables Oe such table ca be foud at http://wwwucedu/~farkouh/usefull/chihtml [The ufortuate misspellig usefull is ot due to me] 32 Testig for a Desity Suppose we wish to kow if a certai data-set comes from the expoetial(2) distributio (say!) So let X,, X be a idepedet sample We have H 0 : X expoetial(2) versus H : X expoetial(2) (33) Now take a fie mesh 0 < z < z 2 < < z m < of real umbers The, P H0 {z j X z j+ } = 2 zj+ z j e 2r dr = e 2zj e 2zj+ (34) These are θ 0j s, ad we ca test to see whether or ot θ = θ 0, where θ j is the true probability P{z j X z j+ } for all j =,, m The χ 2 -test of the previous sectio hadles just such a situatio That is, let ˆθ j deote the proportio of the data X,, X that lies i [z j, z j+ ] The (30) provides us with a asymptotic ( α)-level test for H 0 that the data is expoetial(2) Other distributios are hadled similarly, but θ 0j is i geeral z j+ z j f 0 (x) dx, where f 0 is the desity fuctio uder H 0 Example 8 Suppose we wish to test (33) Let m = 5, ad defie z j = (j )/0 for j =,, 5 The, { j θ 0j = P H0 0 X j } ( ) ( 6(j ) = exp exp 6j ) (35) 0 5 5 That is, j θ 0j 06988 2 0205 3 00634 4 009 5 00058 8

Cosider the followig (hypothetical) data: = 200, ad j θ 0j ˆθj 06988 07 2 0205 02 3 00634 0 4 009 0 5 00058 0 The, we form the χ 2 -table, as before, ad obtai Types Exp Obs (Obs Exp) 2 /Exp 06988 07 206 0 6 2 0205 02 000052 3 00634 0 002 4 009 0 009 5 00058 0 00058 Total 09976 004655 (approx) Thus, X (θ 0 ) 004655 You should check that 005 P -value 0 I fact, the P -value is much closer to 0 So, at the asymptotic level 95%, we do ot have eough evidece agaist H 0 Some thigs to thik about: I the totals for Exp, you will see that the total is 09976 ad ot Ca you thik of why this has happeed? Does this pheomeo reder our coclusios false (or weaker tha they may appear)? Do we eed to fix this by icludig more types? How ca we address it i a sesible way? Would addressig this chage the fial coclusio? 9