ML test i a slightly differet form ML Testig (Likelihood Ratio Testig) for o-gaussia models Surya Tokdar Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) ={θ : l x (θ) max θ Θ l x (θ) c2 2 } ML test = reject H 0 if B c (x) Θ 0 = Same as: reject H 0 if max l x(θ) max l x (θ) c 2 /2 θ Θ θ Θ 0 Or equivaletly: reject H 0 if Λ(x) = max θ Θ L x (θ) max θ Θ0 L x (θ) k 1/30 Size calculatio Asymptotic ormality of MLE Need to calculate max θ Θ0 P [X θ] (Λ(X ) > k) IID Will work for models of the form: X i g(x i θ), θ Θ Ad hypothesis: H 0 : A T θ = a 0 for a give p r matrix A ad give scalar a 0 (typically a 0 = 0) Uder regularity coditios: 2 log Λ(X ) d χ 2 (r) whe IID X i g(x i θ 0 )withθ 0 Θ 0. Follows from asymptotic ormality of ˆθ MLE (X ) Theorem. Assume the followig (Cramér) coditios 1. θ θ log g(x θ) is twice cotiuously differetiable. 2. θ g( θ) isidetifiable. 3. 3 θ 3 log g(x i θ) < h(x), θ bhd(θ 0 ), E [X θ0]h(x 1 ) < 4. ˆθ MLE (x) is the uique solutio of l x (θ) =0foreveryx. IID If X i g(x i θ 0 )the (ˆθ MLE θ 0 ) d N (0, {I F 1 (θ 0 )} 1) where I F 1 (θ) = E [X i θ] 2 θ 2 log g(x 1 θ) = Fisher iformatio at θ. 2/30 3/30 A sketch of a proof Three large approximatios to Λ(x) First order Taylor approximatio: 0= 1 lx (ˆθ MLE ) 1 lx (θ 0 )+ l X (θ 0 ) T Write T i = θ log g(x i θ). ETi = 0, VarT i = I F 1 (θ 0) 1 l X (θ 0 )= ( T 0) d N(0, I F 1 (θ 0)) [by CLT] Write W i = 2 log g(x θ 2 i θ) atθ = θ 0 EWi = I1 F (θ 0) lx (θ0) = W p I1 F (θ 0) Rest is Slutksy s theorem (ˆθ MLE θ 0 ) Assume θ 0 Θ 0,i.e.,A T θ 0 = a 0 Notatio: ˆθ H0 = argmax θ Θ0 l x (θ) (must have A T ˆθ H0 = a 0 ) Quadratic, Wald, Rao approximatios F (X )=(A T ˆθ MLE a 0) T {A T Ix 1 A} 1 (A T ˆθ MLE a 0) 2logΛ(X ) W (X )=(A T ˆθ MLE a 0) T {A T I1 F (ˆθ MLE) 1 A} 1 (A T ˆθ MLE a 0) S(X )= 1 l X (ˆθ H0 ) T {I1 F (ˆθ H0 )} 1 l X (ˆθ H0 ) 2 log Λ(X ) F (X ) p 0, etc. Each of 2 log Λ(X ), F (X ), W (X ), S(X ) d χ 2 (r) 4/30 5/30
Sketches of proof if you cared - part I Sketches of proof if you cared - part II 2 log Λ(X ) F (X ) follows from Quadratic approximatio: l x (θ) l x (ˆθ MLE ) 1 2 (θ ˆθ MLE ) T I X (θ ˆθ MLE ) ad the correspodig profile log-likelihood of η = A T θ: l x(η) l x (ˆθ MLE ) 1 2 (η AT ˆθ MLE ) T {A T I 1 X A} 1 (η A T ˆθ MLE ) ad by otig Λ(x) =l x(a T ˆθ H0 ) l x(a T ˆθ MLE ) 2 log Λ(X ) S(X ) follows because 0= l x (ˆθ MLE ) l x (ˆθ H0 )+ l x (ˆθ H0 ) T (ˆθ MLE ˆθ H0 ) l x (ˆθ H0 ) I F 1 (ˆθ H0 ) Ad so ˆθ MLE ˆθ H0 I F 1 (ˆθ H0 ) 1 l x (ˆθ H0 ) Plug this ito the origial quadratic approximatio ad use I X I F 1 (θ 0) I F 1 (ˆθ H0 )becauseˆθ MLE θ 0 2 log Λ(X ) W (X ) follows similarly, use I X I F 1 (ˆθ MLE ). 6/30 7/30 Sketches of proof if you cared - part III Back to size calculatio It is easiest to prove W (X ) d χ 2 (r) By asymptotic ormality of MLE ad Slutsky s theorem W (X ) d Z T {A T I1 F (θ 0 ) 1 A} 1 Z where Z N r (0, {A T I1 F (θ 0) 1 A}) But Z N r (0, Σ) measz T Σ 1 Z χ 2 (r). Recall: Each of 2 log Λ(X ), F (X ), W (X ), S(X ) d χ 2 (r) So each of the followig test procedures has size α ML/LRT: Reject H0 if 2 log Λ(x) > q χ 2(1 α, r) Quadratic: Reject H 0 if F (x) > q χ 2(1 α, r) Wald: Reject H0 if W (x) > q χ 2(1 α, r) Rao: Reject H 0 if S(x) > q χ 2(1 α, r) Here q χ 2(u, r) =F 1 χ 2 (u, r) wheref χ 2(x, r) = CDF of χ 2 (r) Also, oe ca calculate p-values as 1 F χ 2(2 log Λ(x), r) etc. 8/30 9/30 Which oe to use? Example: low birthweight Depeds o which is easier to compute To get W (x) you do t eed ˆθ H0, oly eed ˆθ MLE. To get S(x) you oly eed ˆθ H0, do t eed ˆθ MLE. I may cases F (X )=W (X )becausei X = I F 1 (ˆθ MLE ) This happes i most geeralized liear models: IND Y i g(y i β, z i )=h(z i, y i )e ξ(β)y i zi T β A(β,z i ) ad Wald s tests are popular choices Multiomial model of category couts, S(X ) = Pearso s-χ 2 Natality data = 500 records o US births i Jue 1997. Y i =1ifi-th birth record has birthweight < 2500g, Y i =0 otherwise. z i = (Cigarettes i, Black i ) records daily umber of cigarettes ad the race of the mother. Model P(Y i = 1) log 1 P(Y i = 1) = β 1 + β 2 Cigarettes i + β 3 Black i + β 4 Cigarettes i Black i. Ca compute ˆβ MLE ad I X (umerically). 10 / 30 11 / 30
MLE ad curvature From glm() fuctio i R 3.170 ˆβ MLE = 0.079 1.064 0.003 0.065 0.003 0.065 0.003 I 1 X = 0.003 0.001 0.003 0.001 0.065 0.003 0.192 0.011 0.003 0.001 0.011 0.005 How would you test the ull hypothesis that for a black mother, probability of low birthweight depeds o the umber of cigarettes? For a o-black mother? How would you test that this depedece is differet for black ad o-black mothers? Categorical data & Multiomial models Data: Couts of categories formed by oe or more attributes Tables could be oe-way, two-way, etc., depedig o how may attributes are used to decide categories. Hair color Eye color Blue Gree Brow Black Total Blode 20 15 18 14 67 Red 11 4 24 2 41 Brow 9 11 36 18 74 Black 8 17 20 4 49 Total 48 47 98 38 231 12 / 30 13 / 30 Multiomial model Multiomial model (cotd.) Data are k category couts of objects X =(X 1,, X k ), X j 0, X 1 + X 2 + + X k =. Model X Multiomial(, p) wherep =(p 1,, p k ) k k is k-dim simplex, cotais k-dim prob vectors b =(b 1,, b k ), i.e., b j 0 j ad j b j = 1) Multiomial(, p) haspmf ( ) f (x p) = p x1 1 x 1 x px k k k This is a extesio of the biomial distributios. I X we record the couts from idepedet trials with k outcomes with probabilities give by p. Cosequetly X j Bi(, p j ) for ay sigle category 1 j k. Similarly for two categories j 1 j 2, X j1 + X j2 Bi(, p j1 + p j2 ), ad so o. for x =(x 1,, x k )withx i 0ad i x i =, f (x p) =0 otherwise. 14 / 30 15 / 30 Multiomial model (cotd.) MLE For two-way tables, with k 1 rows ad k 2 colums, we ca write X ad p either as k 1 k 2 -dim vectors, or more commoly as k 1 k 2 matrices. Eve whe they re writte as matrices, we ll write X Multiomial(, p) to mea that the multiomial distributio is placed o the vector forms of X ad p. For multi-way table we ll use arrays of appropriate dimesios to represet X ad p. To obtai the MLE of p based o data X = x, weca maximize the log-likelihood fuctio i p with a additioal Lagrage compoet to accout for the costrait k p j = 1: l x (p, λ) = cost + k k x j log p j + λ( p j 1) The solutio i p equals: ( x1 ˆp MLE =,, x ) k 16 / 30 17 / 30
Hypothesis testig: Medel s peas Formalizatio Medel, the fouder of moder geetics, studied how physical characteristics are iherited i plats. His studies led him to propose the laws of segregatio ad idepedet assortmet. We ll test this i a simple cotext. Uder Medel s laws, whe pure roud-yellow ad pure gree-wrikled pea plats are cross-bred, the ext geeratio of plat seeds should exhibit a 9:3:3:1 ratio of roud-yellow, roud-gree, wrikled-yellow ad wrikled-gree combiatios of shape ad color. I a sample of 556 plats from the ext geeratio the observed couts for these combiatios are (315, 108, 101, 32). Does the data support Medel s laws? Data X =(X 1, X 2, X 3, X 4 ) givig the category couts of the four types of plats Model X Multiomial( = 556, p), p 4 Test H 0 : p =( 9 16, 3 16, 3 16, 1 16 ) 18 / 30 19 / 30 Hardy-Weiberg equilibrium Formalizatio The spottig o the wigs of Scarlet tiger moths are cotrolled by a gee that comes i two varieties (alleles) whose combiatios (moths have pairs of chromosomes) produce three varieties of spottig patter: white spotted, little spotted ad itermediate. If the moth populatio is i Hardy-Weiberg equilibrium (o curret selectio drift), the these varieties should be i the ratio a 2 :(1 a) 2 :2a(1 a), where a (0, 1) deotes the abudace of the domiat white spottig allele. I a sample of 1612 moths, the three varieties were couted to be 1469, 5 ad 138. Is the moth populatio i HW equilibrium? Data X =(X 1, X 2, X 3 ) the category couts of the three spottig patters, model X Multiomial( = 1612, p), p 3, Test whether H 0 : p HW 3, HW 3 is a subset of 3 cotaiig all vectors of the form (a 2, (1 a) 2, 2a(1 a)) for some a (0, 1). 20 / 30 21 / 30 Hair ad eye color ML testig Are hair color ad eye color two idepedet attributes? I this case, writig p i the matrix form p =((p ij )), 1 i k 1 ad 1 j k 2,wewattotestifp ij = p (1) i p (2) j for some p (1) k1 ad p (2) k2. It s elemetary that if p ij factors as above, the p (1) i = k 2 p ij ad p (2) j = k 1 i=1 p ij. Writig the row ad colum totals as p i ad p j, the test of idepedece is ofte represeted by H 0 : p ij = p i p j, i, j dim( k )=k 1, as vector p k satisfy k i=1 p i = 1. Hypotheses H 0 : p Θ 0, Θ 0 is a sub-simplex of k of dimesio q < k. Medel s peas: Θ0 =( 9 16, 3 16, 3 ), q =0 16, 1 16 HW equilibrium: Θ0 = HW 3, q =1 Idepedece: Θ 0 = I k1,k2, q = k 1 + k 2 2. Could use ay of ML, quadratic, Wald or Rao tests We ll use I k 1,k 2 to deote this set. 22 / 30 23 / 30
Pearso s χ 2 tests Example 1: poit ull Rao s test statistics boils dow to a very coveiet form: S(X )= k (X j ˆp H0,j) 2 ˆp H0,j = (Observed Expected) 2 Expected For categorical data, this was earlier discovered by Pearso who also ascertaied its approximate χ 2 distributio. Because this is Rao statistic, we have S(X ) d χ 2 (k 1 q) For testig H 0 : p = p 0 agaist p p 0,where p 0 =(p 0,1,, p 0,k ) k is a fixed probability vector of iterest (as i Medel s peas example), the S(X )= k (X j p 0,j ) 2 p 0,j which is asymptotically χ 2 (k 1 0) = χ 2 (k 1). Size-α Pearso s test rejects H 0 if S(X ) > q χ 2(1 α, k 1). 24 / 30 25 / 30 Example 2: parametric form Example 2: parametric form (cotd.) HW test: H 0 : p 0 =(η 2, η(1 η), η(1 η), (1 η) 2 ), for some 0 < η < 1 To compute ˆp H0 it is equivalet to write the likelihood fuctio i η ad maximize: L X (η) = cost. {η 2 } x1 {η(1 η)} x2 {η(1 η)} x3 {(1 η) 2 } x4 = cost. η 2x1+x2+x3 (1 η) x2+x3+2x4 ad so ˆη MLE = 2x1+x2+x3 2. Oce we have ˆη MLE, we ca costruct ˆp H0 =(ˆη 2 MLE, ˆη MLE (1 ˆη MLE ), ˆη MLE (1 ˆη MLE ), (1 ˆη MLE ) 2 )ad evaluate S(X ). Because Θ 0 has dimesio q = 1 (oly a sigle umber η eeds to be kow), the asymptotic distributio of S(X ) is χ 2 (k 2). The same calculatios carry through for a more geeral parametric form: H 0 : p =(g 1 (η),, g k (η)) where η E is q-dim vector ad g 1 (η),, g k (η) are fuctios such that for every η E, (g 1 (η),, g k (η)) k. 26 / 30 27 / 30 Example 3: idepedece Example 3: idepedece (cotd.) Cosider testig H 0 : p ij = p i p j, i, j To get ˆp H0, we write the likelihood fuctio i terms of p i, 1 i k 1 ad p j,1 j k 2 : k 1 k 2 L x (p 1,, p k1, p 1,, p k2 ) = cost. (p i p j ) x ij i=1 { k1 = cost. i=1 p x i i }{ k2 where x i = k 2 x ij ad x j = k 1 i=1 x ij are the margi couts of our two-way table. p x j j } Because (p 1,, p k1 ) k1 ad (p 1,, p k2 ) k2,the maximizer is give by: Ad so ˆp H0 ˆp i = x i, ˆp j = x j, 1 i k 1, 1 j k 2 has coordiates: ˆp H0,ij = x i x j 2 k 1 k 2 S(X )= i=1 givig (X ij X i X j ) 2 X i X j Because dim(θ 0 )=q = k 1 1+k 2 1, S(X ) d χ 2 (k 1 k 2 1 k 1 +1 k 2 + 1) = χ 2 ((k 1 1)(k 2 1)). 28 / 30 29 / 30
Hair-eye color Hair color Eye color Blue Gree Brow Black Total Blode 20 (13.9) 15 (13.6) 18 (28.4) 14 (11.0) 67 Red 11(8.5) 4 (8.3) 24 (17.4) 2 (6.7) 41 Brow 9 (15.4) 11 (15.1) 36 (31.4) 18 (12.2) 74 Black 8 (10.2) 17 (10.0) 20 (20.8) 4 (8.1) 49 Total 48 47 98 38 231 S(x) = 30.9. So the p-value is 1 F (4 1)(4 1) (30.9) = 1 F 9 (30.9) 0. 30 / 30