Maximum Likelihood Large Sample Theory

Similar documents
MIT Spring 2015

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

Statistics 3858 : Maximum Likelihood Estimators

Methods of Estimation

Central Limit Theorem ( 5.3)

MATH4427 Notebook 2 Fall Semester 2017/2018

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Post-exam 2 practice questions 18.05, Spring 2014

Lecture 10: Generalized likelihood ratio test

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Lecture 28: Asymptotic confidence sets

Theory of Maximum Likelihood Estimation. Konstantin Kashin

STAT 512 sp 2018 Summary Sheet

Statistics Ph.D. Qualifying Exam

MIT Spring 2016

Statistics 135 Fall 2008 Final Exam

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

MIT Spring 2016

Goodness of Fit Goodness of fit - 2 classes

simple if it completely specifies the density of x

A Very Brief Summary of Statistical Inference, and Examples

2017 Financial Mathematics Orientation - Statistics

Chapter 8 - Statistical intervals for a single sample

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

BTRY 4090: Spring 2009 Theory of Statistics

Institute of Actuaries of India

This does not cover everything on the final. Look at the posted practice problems for other topics.

ML Testing (Likelihood Ratio Testing) for non-gaussian models

7.1 Basic Properties of Confidence Intervals

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

2014/2015 Smester II ST5224 Final Exam Solution

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences FINAL EXAMINATION, APRIL 2013

Graduate Econometrics I: Maximum Likelihood I

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

2.6.3 Generalized likelihood ratio tests

Lecture 1: Introduction

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Chapter 3 : Likelihood function and inference

Topic 19 Extensions on the Likelihood Ratio

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

DA Freedman Notes on the MLE Fall 2003

TUTORIAL 8 SOLUTIONS #

Loglikelihood and Confidence Intervals

HT Introduction. P(X i = x i ) = e λ λ x i

Chapters 10. Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Chapter 5. Chapter 5 sections

MIT Spring 2015

Math 152. Rumbos Fall Solutions to Assignment #12

Horse Kick Example: Chi-Squared Test for Goodness of Fit with Unknown Parameters

Chapter 3: Maximum Likelihood Theory

Spring 2012 Math 541B Exam 1

Probability and Statistics Notes

1 One-way analysis of variance

Introduction to Bayesian Methods

Composite Hypotheses and Generalized Likelihood Ratio Tests

SOLUTION FOR HOMEWORK 4, STAT 4352

MIT Spring 2016

Mathematical statistics

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Inference in non-linear time series

Math 70 Homework 8 Michael Downs. and, given n iid observations, has log-likelihood function: k i n (2) = 1 n

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Lecture 32: Asymptotic confidence sets and likelihoods

Basic Concepts of Inference

Statistics. Statistics

9 Asymptotic Approximations and Practical Asymptotic Tools

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

15 Discrete Distributions

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Some General Types of Tests

Introduction to Estimation Methods for Time Series models Lecture 2

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

Confidence Intervals for Normal Data Spring 2014

MIT Spring 2016

Exercises and Answers to Chapter 1

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Statistical Inference

Ch. 5 Hypothesis Testing

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Notes on the Multivariate Normal and Related Topics

STT 315 Problem Set #3

Delta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0)

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

6. MAXIMUM LIKELIHOOD ESTIMATION

Elements of statistics (MATH0487-1)

Hypothesis testing: theory and methods

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Transcription:

Maximum Likelihood Large Sample Theory MIT 18.443 Dr. Kempthorne Spring 2015 1

Outline 1 Large Sample Theory of Maximum Likelihood Estimates 2

Asymptotic Results: Overview Asymptotic Framework Data Model : X n = (X 1, X 2,..., X n ). i.i.d. sample with pdf/pmf f (x n 1,..., x n θ) = i=1 f (x i θ) Data Realization: X n = x n = (x 1,..., x n ) Likelihood of θ (given x n ): lik(θ) = f (x 1,..., x n θ) θˆ n: MLE of θ given xn = ( x1,..., xn) { θ ˆn, n } : sequence of MLEs indexed by sample size n Results: L Consistency: θˆn θ L κ(θ)/n Asymptotic Variance: σˆ θ n = Var(ˆ θ n ) where κ(θ) is an explicit function of the pdf/pmf f ( θ). Limiting Distribution: n(θˆn θ) N(0, κ(θ)). L 3

Theorems for Asymptotic Results Setting: x 1,..., x n a realization of an i.i.d. sample from distribution with density/pmf f (x θ). n (θ) = i=1 ln f (x i θ) θ 0 : true value of θ ˆθ n : the MLE Theorem 8.5.2.A Under appropriate smoothness conditions on f, the MLE θˆn is consistent, i.e., for any true value θ 0, for every E > 0, P( θˆn θ 0 > E) 0. Proof: Weak Law of Large Numbers (WLLN) 1 L n (θ) E [log f (x θ) θ 0 ] = log[f (x θ)]f (x θ 0 )dx (Note!! statement holds for every θ given any value of θ 0.) 4

Theorem 8.5.2A (continued) Proof (continued): The MLE θˆn maximizes 1 n (θ) Since 1 n (θ) E [log f (x θ) θ 0 ], ˆθ n is close to θ maximizing E [log f (x θ) θ 0 ] Under smoothness conditions on f (x θ), θ maximizes E [log f (x θ) θ 0 ] if θ solves d dθ (E[log f (x θ) θ 0 ]) = 0 Claim: θ = θ 0 : d d dθ (E[log f (x θ) θ 0 ]) = dθo log[f (x θ)]f { (x θ 0 )dx d = log[f (x θ)] f (x θ 0 )dx (dθ x d [f (x θ)] dθ = f (x θ) f (x θ 0 )dx d (at θ = θ 0 ) = [ dθ [f (x θ)]dx] θ=θ0 d = [ d f (x θ)dx] θ=θ0 = (1) 0. dθ 5 dθ

Theorems for Asymptotic Results Theorem 8.5.B Under smoothness conditions on f (x θ), L n(θˆn - θ 00 ) N(0, 1/I( ( θ 0 )) 2 where I(θ) = E log f (X θ) θ θ 2 Proof: Using the Taylor approximation to ' (θ), centered at θ 0 consider the following development: 0 = ' (θˆ) ' (θ θ θ) '' 0 ) + (ˆ (θ 0 ) ' (θ 0 ) = (θˆ θ) '' (θ 0 ) n[ 1 n ' (θ 0 )] = n(θˆ θ) 1 n [ '' (θ 0 )] L By the CLT n[ 1 ' (θ 0 )] N(0, I(θ 0 )) (Lemma A) n L By the WLLN 1 [ '' (θ 0 )] I(θ 0 ) n L Thus: n(θˆ θ) (1/I(θ 0 )) 2 N(0, I(θ 0 )) = N(0, 1/I(θ 0 )) 6

Theorems for Asymptotic Results Lemma A (extended). For the distribution with pdf/pmf f (x θ) define The Score Function U(X ; θ) = θ logf (X θ) The (Fisher) Information - 0 of the distribution ( ) I(θ) = E 2 log f (X θ) θ θ 2 Then under sufficient smoothness conditions on f (x θ) (a). E [U(X ; θ) θ] = 0 (b). Var[U(X ; θ) θ] = I(θ) o { = E [U(X ; θ)] 2 θ] Proof: Differentiate f (x θ)dx = 1 with respect to θ two times. Interchange the order of differentiation and integration. (a) follows from the first derivative. (b) follows from the second derivative. 7

Theorems for Asymptotic Results Qualifications/Extensions Results require true θ 0 to lie in interior of the parameter space. Results require that {x : f (x θ) > 0} not vary with θ. Results extend to multi-dimensional θ Vector-valued Score Function Matrix-valued (Fisher) Information 8

Outline 1 Large Sample Theory of Maximum Likelihood Estimates 9

Confidence Intervals Confidence Interval for a Normal Mean X 1, X 2,..., X n i.i.d. N(µ, σ 0 2 ), unknown mean µ (known σ 0 2 ) Parameter Estimate: X (sample mean) X = 1 n1 X i n i=1 E [X ] = µ Var[X ] = σ 2 = σ 2 0 /n X A 95% confidence interval for µ is a random interval, calculated from the data, that contains µ with probability 0.95, no matter what the value of the true µ. 10

Confidence Interval for a Normal Mean Sampling distribution of X X N(µ, σ 2 ) = N(µ, σ 2 0 /n) X X µ = Z = N(0, 1) (standard normal) σ X For any 0 < α < 1, (e.g. α = 0.05) Define z(α/2) : P(Z > z(α/2)) = α/2 By symmetry of the standard normal distribution P ( z(α/2) < Z < +z(α/2)) = 1 α i.e., ( x X µ P z(α/2) < < +z(α/2) = 1 α σ X Re-express interval of Z as interval of µ: o { P X z(α/2)σ X < µ < X + z(α/2)σ X = 1 α The interval given by [X ± z(α/2) σ ] is the X 100(1 α)% confidence interval for µ 11

Confidence Interval for a Normal Mean Important Properties/Qualifications The confidence interval is random. The parameter µ is not random. 100(1 α)%: the confidence-level of the confidence interval is the probability that the random interval contains the fixed parameter µ (the coverage probability of the confidence interval) Given a data realization of (X 1,..., X n ) = (x 1,..., x n ) µ is either inside the confidence interval or not. The confidence level scales the reliability of a sequence of confidence intervals constructed in this way. The confidence interval quantifies the uncertainty of the parameter estimate. 12

Confidence Intervals for Normal Distribution Parameters Normal Distribution with unknown mean (µ) and variance σ 2. X 1,..., X n i.i.d. N(µ, σ 2 ), with MLEs: µˆ = X n1 σˆ2 = 1 (X i X ) 2 n i=1 Confidence interval for µ based on the T -Statistic X µ T = t n 1 S/ n where t n 1 is Student s t distribution with (n 1) degrees of 1 n n 1 i=1 freedom and S 2 = (X i X ) 2. Define t n 1 (α/2) : P(t n 1 > t n 1 (α/2)) = α/2 By symmetry of Student s t distribution P [ t n 1 (α/2) < T < +t n 1 (α/2)] = 1 α [ X µ i.e., P t n 1 (α/2) < < +t n 1 (α/2) = 1 α S n 13

Confidence Interval For Normal Mean Re-express [ interval of T as interval ] of µ: X µ P t n 1 < < +t n 1 = 1 α S n = P X t n 1 (α/2)s/ n < µ < X + t n 1 (α/2)s/ n = 1 α The interval given by [X ± t n 1 (α/2) S/ n] is the 100(1 α)% confidence interval for µ Properties of confidence intervals for µ Center is X = µˆmle Width proportional to ˆσ µˆmle = S/ n (random!) 14

Confidence Interval For Normal Variance Normal Distribution with unknown mean (µ) and variance σ 2. X 1,..., X n i.i.d. N(µ, σ 2 ), with MLEs: µˆ = X n1 σˆ2 = 1 (X i X ) 2 n i=1 Confidence interval for σ 2 based on the sampling distribution of the MLE ˆσ 2. nσˆ2 Ω = χ 2 σ 2 n 1. where χ 2 n 1 is Chi-squared distribution with (n 1) d.f. Define χ 2 n 1 (α ) : P(χ 2 n 1 > χ n 2 1 (α )) = α Using α = o α/2 and α = (1 α/2), { P +χ2 n 1 (1 α/2) < Ω < +χ ( 2 n 1 (α/2) = x 1 α nσˆ2 i.e., P +χ2 n 1 (1 α/2) < < +χ 2 σ 2 n 1(α/2) = 1 α 15

Confidence Interval for Normal Variance Re-express interval of Ω as interval of σ 2 : P[+χn 1 2 (1 α/2) < Ω < +χ 2 n 1 (α/2)] = 1 α nσˆ2 P[+χn 2 1(1 α/2) < < +χ 2 n 1(α/2)] = 1 α σ 2 nσˆ2 nσˆ2 P[ < σ 2 < ] = 1 α χ 2 (α/2) χ 2 (1 α/2) n 1 n 1 The 100(1 α)% confidence interval for σ 2 is given by nσˆ2 nσˆ2 [ < σ 2 < ] χn 2 1(α/2) χn 2 1(1 α/2) Properties of confidence interval for σ 2 Asymmetrical about the MLE ˆσ 2. Width proportional to ˆσ 2 (random!) 100(1 α)% a confidence interval for σ a immediate: n n [ˆσ( 2 ) < σ < σˆ( )] (α/2) (1 α/2) χ n 1 χn 2 1 16

Confidence Intervals For Normal Distribution Parameters Important Features of Normal Distribution Case Required use of exact sampling distributions of the MLEs µˆ and ˆσ 2. Construction of each confidence interval based on pivotal quantity: a function of the data and the parameters whose distribution does not involve any unknown parameters. Examples of pivotals X µ T = t n 1 S/ n nσˆ2 Ω = χ 2 σ 2 n 1 17

Confidence Intervals Based On Large Sample Theory Asymptotic Framework (Re-cap) Data Model : X n = (X 1, X 2,..., X n ). i.i.d. sample with pdf/pmf f (x n 1,..., x n θ) = i=1 f (x i θ) Likelihood of θ given X n = x n = (x 1,..., x n ): lik(θ) = f (x 1,..., x n θ) θˆ n = θˆ n (x n ): MLE of θ given x n = (x 1,..., x n ) {θˆn, n }: sequence of MLEs indexed by sample size n Results (subject to sufficient smoothness conditions on f ) L Consistency: θˆn θ L 1 Asymptotic Variance: σˆ θ n = Var(θˆn ) ni(θ) o { 2 - ) d d where I(θ) = E[ dθ [log f (x θ)] ] = E [ 2 [log f (x θ)] ] dθ 2 Limiting Distribution: ni(θ)(θˆn θ) L N(0, 1). MIT 18.443 Maximum LikelihoodLarge Sample Theory 18

Confidence Intervals Based on Large Sample Theory Large-Sample Confidence Interval Exploit the limiting pivotal quantity L Z n = ni(θ)(θˆn θ) N(0, 1). I.e. P( z(α/2) < Z n < +z(α/2)) 1 α P( z(α/2) < ni(θ)(θˆn θ) < +z(α/2)) 1 α P( z(α/2) < ni(θˆn)(θˆn θ) < +z(α/2)) 1 α Note (!): I(θˆn) substituted for I(θ) Re-express interval of Z n as interval of θ: P(θˆn z(α/2) 1 < θ < θˆn + z(α/2) 1 ) 1 α ni(θˆ) ni(θˆ) The interval given by [θˆn ± z(α/2) 1 ] is the ni(θˆ) 100(1 α)% confidence interval (large sample) for θ 19

Large-Sample Confidence Intervals Example 8.5.B. Poisson Distribution X 1,..., X n i.i.d. Poisson(λ) f (x λ) = λx x! e λ n (λ) = i=1[x i ln(λ) λ ln(x!)] MLE λˆ d (λ) n1 x i λˆ solves: = [ 1] = 0; λˆ = X. dλ λ i=1 d 2 X i 1 I(λ) = E [ log f (x λ)] = E [ ] = dλ 2 λ 2 λ ˆλ λ L Z n = ni(λˆ)(λˆ λ) = N(0, 1) ˆλ/n Large-Sample Confidence Interval for λ Approximate 100(1 α)% confidence interval for λ X [λˆ ± σˆλˆ] = [X ± z(α/2) n ] 20

Confidence Interval for Poisson Parameter Deaths By Horse Kick in Prussian Army (Bortkiewicz, 1898) Annual Counts of fatalities in 10 corps of Prussian cavalry over a period of 20 years. n = 200 corps-years worth of data. Annual Fatalities Observed 0 109 1 65 2 22 3 3 4 1 Model X 1,..., X 200 as i.i.d. Poisson(λ). ˆ 122 λ MLE = X = 200 = 0.61 ˆ σˆˆ λ MLE = λ MLE /n =.0552 For an 95% confidence interval, z(α/2) = 1.96 giving 0.61 ± (1.96)(.0552) = [.5018,.7182] 21

Confidence Interval for Multinomial Parameter Definition: Multinomial Distribution W 1,..., W n are iid Multinomial(1, probs = (p 1,..., p m )) r.v.s The sample space of each W i is W = {1, 2,..., m}, a set of m distinct outcomes. P(W i = k) = p k, k = 1, 2,..., m. Define Count Statistics from Multinomial Sample: n X k = i=1 1(W i = k), (sum of indicators of outcome k), k = 1,..., m X = (X 1,..., X m ) follows a multinomial distribution n!. m x f j (x1,..., x n p 1,..., p m ) = m j=1 p j j=1 x i! where (p 1,..., p m ) is the vector of cell probabilities with m m i=1 p i = 1 and n = j=1 x i is the total count. Note: for m = 2, the W i are Bernoulli(p 1 ) X 1 is Binomial(n, p 1 ) and X 2 n X 1 22

MLEs of Multinomial Parameter Maximum Likelihood Estimation for Multinomial Likelihood function of counts lik(p 1,..., p m ) = log[f (x 1,..., x m p 1,..., p m )] m m = log(n!) j=1 log(x j!) + j=1 x j log(p j ) Note: Likelihood function of Multinomial Sample w 1,...,, w n lik (p 1,..., p m ) = log[f (w 1,..., w n p 1,..., p m )] n m = i=1[ j=1 log(p j ) 1(W i = j)] m = j=1 x j log(p j ) Maximum Likelihood Estimate (MLE) of (p 1,..., p m ) maximizes lik(p 1,..., p m ) (with x 1,..., x m fixed!) Maximum achieved when differential is zero m Constraint: j=1 p j = 1 Apply method of Lagrange multipliers Solution: ˆp j = x j /n, j = 1,..., m. Note: if any x j = 0, then ˆp j = 0 solved as limit 23

MLEs of Multinomial Parameter Example 8.5.1.A Hardy-Weinberg Equilibrium Equilibrium frequency of genotypes: AA, Aa, and aa P(a) = θ and P(A) = 1 θ Equilibrium probabilities of genotypes: (1 θ) 2, 2(θ)(1 θ), and θ 2. Multinomial Data: (X 1, X 2, X 3 ) corresponding to counts of AA, Aa, and aa in a sample of size n. Sample Data Genotype AA Aa aa Total Count X 1 X 2 X 3 n Frequency 342 500 187 1029 24

Hardy-Weinberg Equilibrium Maximum-Likelihood Estimation of θ (X 1, X 2, X 3 ) Multinomial(n, p = ((1 θ) 2, 2θ(1 θ), θ 2 )) Log Likelihood for θ (θ) = log(f (x 1, x 2, x 3 p 1 (θ), p 2 (θ), p 3 (θ))) n! = log( p 1 (θ) x 1 p 2 (θ) x 2 p 3 (θ) x 3 x 1!x 2!x 3! ) = x 1 log((1 θ) 2 ) + x 2 log(2θ(1 θ)) +x 3 log(θ 2 ) + (non-θ terms) = (2x 1 + x 2 )log(1 θ) + (2x 3 + x 2 )log(θ) + (non-θ terms) First Differential of log likelihood: (2x 1 + x 2 ) (2x 3 + x 2 ) ' (θ) = + 1 θ θ ˆ 2x 3 + x 2 2x 3 + x 2 = θ = = = 0.4247 2x 1 + 2x 2 + 2x 3 2n 25

Asymptotic variance of MLE θˆ: 1 Var(θˆ) E [ '' (θ)] Second Differential of log likelihood: d (2x 1 + x 2 ) (2x 3 + x 2 ) '' (θ) = [ + ] dθ 1 θ θ (2x 1 + x 2 ) (2x 3 + x 2 ) = (1 θ) 2 θ 2 Each of the X i are Binomial(n, p i (θ)) so E [X 1 ] = np 1 (θ) = n(1 θ) 2 E [X 2 ] = np 2 (θ) = n2θ(1 θ) E [X 3 ] = np 3 (θ) = nθ 2 E [ '' 2n (θ)] = θ(1 θ) a θˆ(1 θˆ).4247(1.4247) σˆ θˆ = = = 0.0109 2n 2 1029 26

Hardy-Weinberg Model Approximate 95% Confidence Interval for θ ˆ Interval : θ ± z(α/2) σˆˆ, with θ ˆθ = 0.4247 Interval : σˆθˆ = 0.0109 z(α/2) = 1.96 (with α = 1 0.95) 0.4247 ± 1.96 (0.0109) = [0.4033,.4461] Note (!!) : Bootstrap simulation in R of θˆ the RMSE (θˆ) = 0.0109 is virtually equal to ˆσ θˆ 27

MIT OpenCourseWare http://ocw.mit.edu 18.443 Statistics for Applications Spring 2015 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.