Theorem. Assume the following (Cramér) conditions 1. θ θ

Similar documents
ML Testing (Likelihood Ratio Testing) for non-gaussian models

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Statistics 3858 : Likelihood Ratio for Multinomial Models

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Multinomial likelihood. Multinomial MLE. NIST data and genetic fingerprints. θ = (p 1,..., p m ) with j p j = 1 and p j 0. Point probabilities

Introductory statistics

Last Lecture. Wald Test

Chi-Squared Tests Math 6070, Spring 2006

6. Sufficient, Complete, and Ancillary Statistics

TAMS24: Notations and Formulas

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Lecture 19: Convergence

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Problem Set 4 Due Oct, 12

Summary. Recap ... Last Lecture. Summary. Theorem

Lecture 33: Bootstrap

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Efficient GMM LECTURE 12 GMM II

Statistical inference: example 1. Inferential Statistics

Chapter 6 Principles of Data Reduction

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

5. Likelihood Ratio Tests

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Lecture 7: Properties of Random Samples

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Topic 9: Sampling Distributions of Estimators

Lecture 12: November 13, 2018

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Lecture 12: September 27

Properties and Hypothesis Testing

Kurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)

4. Hypothesis testing (Hotelling s T 2 -statistic)

Lecture 23: Minimal sufficiency

Chimica Inorganica 3

STAC51: Categorical data Analysis

11 THE GMM ESTIMATION

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

1 Models for Matched Pairs

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

MATH/STAT 352: Lecture 15

Statistical Theory MT 2009 Problems 1: Solution sketches

Stat410 Probability and Statistics II (F16)

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Final Examination Statistics 200C. T. Ferguson June 10, 2010

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Statistical Inference Based on Extremum Estimators

B Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Statistical Hypothesis Testing. STAT 536: Genetic Statistics. Statistical Hypothesis Testing - Terminology. Hardy-Weinberg Disequilibrium

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Stat 319 Theory of Statistics (2) Exercises

10-701/ Machine Learning Mid-term Exam Solution

STATISTICAL INFERENCE

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Frequentist Inference

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Lecture 24: Variable selection in linear models

6 Sample Size Calculations

Support vector machine revisited

HOMEWORK I: PREREQUISITES FROM MATH 727

Lecture 2: Monte Carlo Simulation

Topic 9: Sampling Distributions of Estimators

Rank tests and regression rank scores tests in measurement error models

Statistics 511 Additional Materials

Estimation for Complete Data

Parameter, Statistic and Random Samples

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

1 Approximating Integrals using Taylor Polynomials

1.010 Uncertainty in Engineering Fall 2008

Chapter Vectors

Power and Type II Error

Statistical Theory MT 2008 Problems 1: Solution sketches

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

A statistical method to determine sample size to estimate characteristic value of soil parameters

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

11 Correlation and Regression

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Binomial Distribution

Testing Statistical Hypotheses for Compare. Means with Vague Data

Topic 9: Sampling Distributions of Estimators

[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Linear Support Vector Machines

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

, then cv V. Differential Equations Elements of Lineaer Algebra Name: Consider the differential equation. and y2 cos( kx)

Statistical Inference

Transcription:

ML test i a slightly differet form ML Testig (Likelihood Ratio Testig) for o-gaussia models Surya Tokdar Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) ={θ : l x (θ) max θ Θ l x (θ) c2 2 } ML test = reject H 0 if B c (x) Θ 0 = Same as: reject H 0 if max l x(θ) max l x (θ) c 2 /2 θ Θ θ Θ 0 Or equivaletly: reject H 0 if Λ(x) = max θ Θ L x (θ) max θ Θ0 L x (θ) k 1/30 Size calculatio Asymptotic ormality of MLE Need to calculate max θ Θ0 P [X θ] (Λ(X ) > k) IID Will work for models of the form: X i g(x i θ), θ Θ Ad hypothesis: H 0 : A T θ = a 0 for a give p r matrix A ad give scalar a 0 (typically a 0 = 0) Uder regularity coditios: 2 log Λ(X ) d χ 2 (r) whe IID X i g(x i θ 0 )withθ 0 Θ 0. Follows from asymptotic ormality of ˆθ MLE (X ) Theorem. Assume the followig (Cramér) coditios 1. θ θ log g(x θ) is twice cotiuously differetiable. 2. θ g( θ) isidetifiable. 3. 3 θ 3 log g(x i θ) < h(x), θ bhd(θ 0 ), E [X θ0]h(x 1 ) < 4. ˆθ MLE (x) is the uique solutio of l x (θ) =0foreveryx. IID If X i g(x i θ 0 )the (ˆθ MLE θ 0 ) d N (0, {I F 1 (θ 0 )} 1) where I F 1 (θ) = E [X i θ] 2 θ 2 log g(x 1 θ) = Fisher iformatio at θ. 2/30 3/30 A sketch of a proof Three large approximatios to Λ(x) First order Taylor approximatio: 0= 1 lx (ˆθ MLE ) 1 lx (θ 0 )+ l X (θ 0 ) T Write T i = θ log g(x i θ). ETi = 0, VarT i = I F 1 (θ 0) 1 l X (θ 0 )= ( T 0) d N(0, I F 1 (θ 0)) [by CLT] Write W i = 2 log g(x θ 2 i θ) atθ = θ 0 EWi = I1 F (θ 0) lx (θ0) = W p I1 F (θ 0) Rest is Slutksy s theorem (ˆθ MLE θ 0 ) Assume θ 0 Θ 0,i.e.,A T θ 0 = a 0 Notatio: ˆθ H0 = argmax θ Θ0 l x (θ) (must have A T ˆθ H0 = a 0 ) Quadratic, Wald, Rao approximatios F (X )=(A T ˆθ MLE a 0) T {A T Ix 1 A} 1 (A T ˆθ MLE a 0) 2logΛ(X ) W (X )=(A T ˆθ MLE a 0) T {A T I1 F (ˆθ MLE) 1 A} 1 (A T ˆθ MLE a 0) S(X )= 1 l X (ˆθ H0 ) T {I1 F (ˆθ H0 )} 1 l X (ˆθ H0 ) 2 log Λ(X ) F (X ) p 0, etc. Each of 2 log Λ(X ), F (X ), W (X ), S(X ) d χ 2 (r) 4/30 5/30

Sketches of proof if you cared - part I Sketches of proof if you cared - part II 2 log Λ(X ) F (X ) follows from Quadratic approximatio: l x (θ) l x (ˆθ MLE ) 1 2 (θ ˆθ MLE ) T I X (θ ˆθ MLE ) ad the correspodig profile log-likelihood of η = A T θ: l x(η) l x (ˆθ MLE ) 1 2 (η AT ˆθ MLE ) T {A T I 1 X A} 1 (η A T ˆθ MLE ) ad by otig Λ(x) =l x(a T ˆθ H0 ) l x(a T ˆθ MLE ) 2 log Λ(X ) S(X ) follows because 0= l x (ˆθ MLE ) l x (ˆθ H0 )+ l x (ˆθ H0 ) T (ˆθ MLE ˆθ H0 ) l x (ˆθ H0 ) I F 1 (ˆθ H0 ) Ad so ˆθ MLE ˆθ H0 I F 1 (ˆθ H0 ) 1 l x (ˆθ H0 ) Plug this ito the origial quadratic approximatio ad use I X I F 1 (θ 0) I F 1 (ˆθ H0 )becauseˆθ MLE θ 0 2 log Λ(X ) W (X ) follows similarly, use I X I F 1 (ˆθ MLE ). 6/30 7/30 Sketches of proof if you cared - part III Back to size calculatio It is easiest to prove W (X ) d χ 2 (r) By asymptotic ormality of MLE ad Slutsky s theorem W (X ) d Z T {A T I1 F (θ 0 ) 1 A} 1 Z where Z N r (0, {A T I1 F (θ 0) 1 A}) But Z N r (0, Σ) measz T Σ 1 Z χ 2 (r). Recall: Each of 2 log Λ(X ), F (X ), W (X ), S(X ) d χ 2 (r) So each of the followig test procedures has size α ML/LRT: Reject H0 if 2 log Λ(x) > q χ 2(1 α, r) Quadratic: Reject H 0 if F (x) > q χ 2(1 α, r) Wald: Reject H0 if W (x) > q χ 2(1 α, r) Rao: Reject H 0 if S(x) > q χ 2(1 α, r) Here q χ 2(u, r) =F 1 χ 2 (u, r) wheref χ 2(x, r) = CDF of χ 2 (r) Also, oe ca calculate p-values as 1 F χ 2(2 log Λ(x), r) etc. 8/30 9/30 Which oe to use? Example: low birthweight Depeds o which is easier to compute To get W (x) you do t eed ˆθ H0, oly eed ˆθ MLE. To get S(x) you oly eed ˆθ H0, do t eed ˆθ MLE. I may cases F (X )=W (X )becausei X = I F 1 (ˆθ MLE ) This happes i most geeralized liear models: IND Y i g(y i β, z i )=h(z i, y i )e ξ(β)y i zi T β A(β,z i ) ad Wald s tests are popular choices Multiomial model of category couts, S(X ) = Pearso s-χ 2 Natality data = 500 records o US births i Jue 1997. Y i =1ifi-th birth record has birthweight < 2500g, Y i =0 otherwise. z i = (Cigarettes i, Black i ) records daily umber of cigarettes ad the race of the mother. Model P(Y i = 1) log 1 P(Y i = 1) = β 1 + β 2 Cigarettes i + β 3 Black i + β 4 Cigarettes i Black i. Ca compute ˆβ MLE ad I X (umerically). 10 / 30 11 / 30

MLE ad curvature From glm() fuctio i R 3.170 ˆβ MLE = 0.079 1.064 0.003 0.065 0.003 0.065 0.003 I 1 X = 0.003 0.001 0.003 0.001 0.065 0.003 0.192 0.011 0.003 0.001 0.011 0.005 How would you test the ull hypothesis that for a black mother, probability of low birthweight depeds o the umber of cigarettes? For a o-black mother? How would you test that this depedece is differet for black ad o-black mothers? Categorical data & Multiomial models Data: Couts of categories formed by oe or more attributes Tables could be oe-way, two-way, etc., depedig o how may attributes are used to decide categories. Hair color Eye color Blue Gree Brow Black Total Blode 20 15 18 14 67 Red 11 4 24 2 41 Brow 9 11 36 18 74 Black 8 17 20 4 49 Total 48 47 98 38 231 12 / 30 13 / 30 Multiomial model Multiomial model (cotd.) Data are k category couts of objects X =(X 1,, X k ), X j 0, X 1 + X 2 + + X k =. Model X Multiomial(, p) wherep =(p 1,, p k ) k k is k-dim simplex, cotais k-dim prob vectors b =(b 1,, b k ), i.e., b j 0 j ad j b j = 1) Multiomial(, p) haspmf ( ) f (x p) = p x1 1 x 1 x px k k k This is a extesio of the biomial distributios. I X we record the couts from idepedet trials with k outcomes with probabilities give by p. Cosequetly X j Bi(, p j ) for ay sigle category 1 j k. Similarly for two categories j 1 j 2, X j1 + X j2 Bi(, p j1 + p j2 ), ad so o. for x =(x 1,, x k )withx i 0ad i x i =, f (x p) =0 otherwise. 14 / 30 15 / 30 Multiomial model (cotd.) MLE For two-way tables, with k 1 rows ad k 2 colums, we ca write X ad p either as k 1 k 2 -dim vectors, or more commoly as k 1 k 2 matrices. Eve whe they re writte as matrices, we ll write X Multiomial(, p) to mea that the multiomial distributio is placed o the vector forms of X ad p. For multi-way table we ll use arrays of appropriate dimesios to represet X ad p. To obtai the MLE of p based o data X = x, weca maximize the log-likelihood fuctio i p with a additioal Lagrage compoet to accout for the costrait k p j = 1: l x (p, λ) = cost + k k x j log p j + λ( p j 1) The solutio i p equals: ( x1 ˆp MLE =,, x ) k 16 / 30 17 / 30

Hypothesis testig: Medel s peas Formalizatio Medel, the fouder of moder geetics, studied how physical characteristics are iherited i plats. His studies led him to propose the laws of segregatio ad idepedet assortmet. We ll test this i a simple cotext. Uder Medel s laws, whe pure roud-yellow ad pure gree-wrikled pea plats are cross-bred, the ext geeratio of plat seeds should exhibit a 9:3:3:1 ratio of roud-yellow, roud-gree, wrikled-yellow ad wrikled-gree combiatios of shape ad color. I a sample of 556 plats from the ext geeratio the observed couts for these combiatios are (315, 108, 101, 32). Does the data support Medel s laws? Data X =(X 1, X 2, X 3, X 4 ) givig the category couts of the four types of plats Model X Multiomial( = 556, p), p 4 Test H 0 : p =( 9 16, 3 16, 3 16, 1 16 ) 18 / 30 19 / 30 Hardy-Weiberg equilibrium Formalizatio The spottig o the wigs of Scarlet tiger moths are cotrolled by a gee that comes i two varieties (alleles) whose combiatios (moths have pairs of chromosomes) produce three varieties of spottig patter: white spotted, little spotted ad itermediate. If the moth populatio is i Hardy-Weiberg equilibrium (o curret selectio drift), the these varieties should be i the ratio a 2 :(1 a) 2 :2a(1 a), where a (0, 1) deotes the abudace of the domiat white spottig allele. I a sample of 1612 moths, the three varieties were couted to be 1469, 5 ad 138. Is the moth populatio i HW equilibrium? Data X =(X 1, X 2, X 3 ) the category couts of the three spottig patters, model X Multiomial( = 1612, p), p 3, Test whether H 0 : p HW 3, HW 3 is a subset of 3 cotaiig all vectors of the form (a 2, (1 a) 2, 2a(1 a)) for some a (0, 1). 20 / 30 21 / 30 Hair ad eye color ML testig Are hair color ad eye color two idepedet attributes? I this case, writig p i the matrix form p =((p ij )), 1 i k 1 ad 1 j k 2,wewattotestifp ij = p (1) i p (2) j for some p (1) k1 ad p (2) k2. It s elemetary that if p ij factors as above, the p (1) i = k 2 p ij ad p (2) j = k 1 i=1 p ij. Writig the row ad colum totals as p i ad p j, the test of idepedece is ofte represeted by H 0 : p ij = p i p j, i, j dim( k )=k 1, as vector p k satisfy k i=1 p i = 1. Hypotheses H 0 : p Θ 0, Θ 0 is a sub-simplex of k of dimesio q < k. Medel s peas: Θ0 =( 9 16, 3 16, 3 ), q =0 16, 1 16 HW equilibrium: Θ0 = HW 3, q =1 Idepedece: Θ 0 = I k1,k2, q = k 1 + k 2 2. Could use ay of ML, quadratic, Wald or Rao tests We ll use I k 1,k 2 to deote this set. 22 / 30 23 / 30

Pearso s χ 2 tests Example 1: poit ull Rao s test statistics boils dow to a very coveiet form: S(X )= k (X j ˆp H0,j) 2 ˆp H0,j = (Observed Expected) 2 Expected For categorical data, this was earlier discovered by Pearso who also ascertaied its approximate χ 2 distributio. Because this is Rao statistic, we have S(X ) d χ 2 (k 1 q) For testig H 0 : p = p 0 agaist p p 0,where p 0 =(p 0,1,, p 0,k ) k is a fixed probability vector of iterest (as i Medel s peas example), the S(X )= k (X j p 0,j ) 2 p 0,j which is asymptotically χ 2 (k 1 0) = χ 2 (k 1). Size-α Pearso s test rejects H 0 if S(X ) > q χ 2(1 α, k 1). 24 / 30 25 / 30 Example 2: parametric form Example 2: parametric form (cotd.) HW test: H 0 : p 0 =(η 2, η(1 η), η(1 η), (1 η) 2 ), for some 0 < η < 1 To compute ˆp H0 it is equivalet to write the likelihood fuctio i η ad maximize: L X (η) = cost. {η 2 } x1 {η(1 η)} x2 {η(1 η)} x3 {(1 η) 2 } x4 = cost. η 2x1+x2+x3 (1 η) x2+x3+2x4 ad so ˆη MLE = 2x1+x2+x3 2. Oce we have ˆη MLE, we ca costruct ˆp H0 =(ˆη 2 MLE, ˆη MLE (1 ˆη MLE ), ˆη MLE (1 ˆη MLE ), (1 ˆη MLE ) 2 )ad evaluate S(X ). Because Θ 0 has dimesio q = 1 (oly a sigle umber η eeds to be kow), the asymptotic distributio of S(X ) is χ 2 (k 2). The same calculatios carry through for a more geeral parametric form: H 0 : p =(g 1 (η),, g k (η)) where η E is q-dim vector ad g 1 (η),, g k (η) are fuctios such that for every η E, (g 1 (η),, g k (η)) k. 26 / 30 27 / 30 Example 3: idepedece Example 3: idepedece (cotd.) Cosider testig H 0 : p ij = p i p j, i, j To get ˆp H0, we write the likelihood fuctio i terms of p i, 1 i k 1 ad p j,1 j k 2 : k 1 k 2 L x (p 1,, p k1, p 1,, p k2 ) = cost. (p i p j ) x ij i=1 { k1 = cost. i=1 p x i i }{ k2 where x i = k 2 x ij ad x j = k 1 i=1 x ij are the margi couts of our two-way table. p x j j } Because (p 1,, p k1 ) k1 ad (p 1,, p k2 ) k2,the maximizer is give by: Ad so ˆp H0 ˆp i = x i, ˆp j = x j, 1 i k 1, 1 j k 2 has coordiates: ˆp H0,ij = x i x j 2 k 1 k 2 S(X )= i=1 givig (X ij X i X j ) 2 X i X j Because dim(θ 0 )=q = k 1 1+k 2 1, S(X ) d χ 2 (k 1 k 2 1 k 1 +1 k 2 + 1) = χ 2 ((k 1 1)(k 2 1)). 28 / 30 29 / 30

Hair-eye color Hair color Eye color Blue Gree Brow Black Total Blode 20 (13.9) 15 (13.6) 18 (28.4) 14 (11.0) 67 Red 11(8.5) 4 (8.3) 24 (17.4) 2 (6.7) 41 Brow 9 (15.4) 11 (15.1) 36 (31.4) 18 (12.2) 74 Black 8 (10.2) 17 (10.0) 20 (20.8) 4 (8.1) 49 Total 48 47 98 38 231 S(x) = 30.9. So the p-value is 1 F (4 1)(4 1) (30.9) = 1 F 9 (30.9) 0. 30 / 30