Chapter 3: Maximum Likelihood Theory

Similar documents
Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Graduate Econometrics I: Maximum Likelihood I

Introduction to Estimation Methods for Time Series models Lecture 2

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Parameter Estimation

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

ECE531 Lecture 10b: Maximum Likelihood Estimation

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Exercises and Answers to Chapter 1

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Statistics 3858 : Maximum Likelihood Estimators

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

This paper is not to be removed from the Examination Halls

ECE 275A Homework 7 Solutions

Generalized Linear Models. Kurt Hornik

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Brief Review on Estimation Theory

Econometrics I, Estimation

STAT 512 sp 2018 Summary Sheet

Economics 620, Lecture 2: Regression Mechanics (Simple Regression)

ECON 4160, Autumn term Lecture 1

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Multivariate Regression

Machine Learning Basics: Maximum Likelihood Estimation

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

STAT 730 Chapter 4: Estimation

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Advanced Quantitative Methods: maximum likelihood

Graduate Econometrics I: Maximum Likelihood II

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

ECE531 Lecture 8: Non-Random Parameter Estimation

simple if it completely specifies the density of x

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Statistical Machine Learning Hilary Term 2018

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

Math 181B Homework 1 Solution

Correlation and Regression

HT Introduction. P(X i = x i ) = e λ λ x i

Canonical Correlation Analysis of Longitudinal Data

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

The linear model is the most fundamental of all serious statistical models encompassing:

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

Advanced Quantitative Methods: maximum likelihood

Lecture 3. Inference about multivariate normal distribution

Regression Estimation Least Squares and Maximum Likelihood

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

CS281A/Stat241A Lecture 17

2.3 Methods of Estimation

Lecture 3 September 1

Review and continuation from last week Properties of MLEs

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Recent Advances in the analysis of missing data with non-ignorable missingness

Estimation, Inference, and Hypothesis Testing

Asymptotic Statistics-III. Changliang Zou

Outline of GLMs. Definitions

Simple and Multiple Linear Regression

MCMC algorithms for fitting Bayesian models

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Lecture 2: ARMA(p,q) models (part 2)

Covariance function estimation in Gaussian process regression

Part 6: Multivariate Normal and Linear Models

Exercises Chapter 4 Statistical Hypothesis Testing

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Mathematical statistics

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Generalized Linear Models Introduction

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

10. Linear Models and Maximum Likelihood Estimation

Better Bootstrap Confidence Intervals

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Gaussian Processes 1. Schedule

STA 260: Statistics and Probability II

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

POLI 8501 Introduction to Maximum Likelihood Estimation

A General Overview of Parametric Estimation and Inference Techniques.

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Mathematical statistics

Chapter 4: Asymptotic Properties of the MLE (Part 2)

System Identification, Lecture 4

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Linear Methods for Prediction

Notes on the Multivariate Normal and Related Topics

Theory of Statistics.

Various types of likelihood

Statistics and Econometrics I

Transcription:

Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40

1 Introduction Example 2 Maximum likelihood estimator Notation Likelihood and log-likelihood Maximum likelihood principle Equivariance principle 3 Fisher information Score vector Fisher information matrix 4 Asymptotic results Overview Consistency Asymptotic efficiency Large sample distribution Back to the equivariance... Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 2 / 40

Introduction Example Example 1: Suppose that Y 1,Y 2,,Y n are i.i.d. random variables, with Y i B(p): { 1 with probability p Y i = 0 with probability 1 p where p is an unknown parameter to estimate. The sample (y 1, y 2,, y n ) is observed. Explicit assumption regarding the distribution of Y i. Can we find an estimate (estimator) of p? Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 3 / 40

Introduction Example Example 1 (cont d) The joint distribution of the sample is: ( n ) P (Y i = y i ) = P ((Y 1 = y 1 ) (Y 2 = y 2 ) (Y n = y n )) = = n P(Y i = y i ) n p y i (1 p) 1 y i = p np y i (1 p) n n P y i The likelihood function is the joint density of the data, except that we treat it as a function of the parameter: n L(p y) L(y; p) = p y i (1 p) 1 y i... The likely values of the unknown parameter given the realizations of the random variables... Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 4 / 40

Introduction Example Example 1 (cont d) Suppose that two estimates of p, given by ˆp 1,n (y) and ˆp 2,n (y), are such that L n (y; ˆp 1,n (y)) > L n (y; ˆp 2,n (y)) The sample we observe y = (y 1,, y n ) is more likely to have occurred if p = ˆp 1,n (y) than if p = ˆp 2,n (y) ˆp 1,n (y) is a more plausible value. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 5 / 40

Introduction Example Example 1 (cont d) Under suitable regularity conditions, the maximum likelihood estimate (estimator) is defined to be: ˆp = argmaxl(y; p) = argmaxl(y; p) p p where l(y; p) = log(l(y; p)) is the log-likelihood function. The maximum likelihood estimate is: ˆp(y) = 1 n The maximum likelihood estimator is: ˆp = 1 n y i Y i. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 6 / 40

Introduction Example How to apply the maximum likelihood principle to the multiple linear regression model? What are the main properties of the maximum likelihood estimator? Is it asymptotically unbiased? Is it asymptotically efficient? Under which condition(s)? Is it consistent? What is the asymptotic distribution? What are the main properties of any transformation of the estimator, say θ = g(p)?... All of these questions are answered in this lecture... Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 7 / 40

Maximum likelihood estimator Notation 2. Maximum likelihood estimator 2.1 Notation Consider the multiple linear regression model: y i = x i b + u i where the error terms are spherical, and the observations (y i, x i ), i = 1,, n, are i.i.d. The joint density function is given by: where θ = (b, σ 2 ). By definition, f (y i, x i ) L i (y i, x i ; θ) L(y i, x i ; θ) f (y i, x i ) = f (y i x i )f (x i ) where f (y i x i ) is the conditional density of Y X = x i and f (x i ) is the marginal density of X i. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 8 / 40

Maximum likelihood estimator Notation To get the (log-)likelihood function, one needs some parametric assumptions: 1 One can specify the conditional distribution of u X, i.e. the conditional distribution of Y X: u X N (0 n 1, σ 2 I n ) i.e Y X N (Xb, σ 2 I n ) 2 One can specify the joint (multivariate) distribution of (X, Y ) and the marginal (multivariate) distribution of X: ( Y X ) N (( EY EX or X N (EX, Σ xx ) ) ( Σyy Σ, yx Σ xy Σ xx )) Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 9 / 40

Maximum likelihood estimator Notation In the first case (conditional distribution), the estimator of θ can be observed from the conditional (log-)likelihood function. In the second case (joint distribution), the estimator of θ can be derived from the joint (log-)likelihood function The joint likelihood function is the product of the conditional likelihood function and the marginal likelihood function (the information provided by the marginal distribution of X). The joint log-likelihood function is the sum of the conditional log-likelihood function and the marginal log-likelihood function. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 10 / 40

Maximum likelihood estimator Notation The conditional and marginal (log-)likelihood function (and thus the joint and conditional (log-)likelihood function) are conceptually different, and so are the two corresponding estimators (especially, in finite samples). Choosing one or the other depends on the empirical setting. For instance, the distribution of the sample data can be conditionally normal but not jointly normal (e.g., the variables X are arbitrarily determined in some experimental settings). In the sequel, we only consider the conditional maximum likelihood estimator (under the assumption of independent samples). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 11 / 40

Maximum likelihood estimator 2.2. Likelihood and log-likelihood Likelihood and log-likelihood Definition The (conditional) likelihood function is defined to be: L n : Y Θ [0, + ) ((y, x), θ) L n (y x; θ) = n L i (y i x i ; θ) Remark: The conditional likelihood function is the joint conditional density of the data in which the unknown parameter is θ. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 12 / 40

Maximum likelihood estimator Likelihood and log-likelihood Definition The (conditional) log-likelihood function is defined to be: l n : Y Θ R ((y, x); θ) l n (y x; θ) = logl i (y i x i ; θ) Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 13 / 40

Maximum likelihood estimator Likelihood and log-likelihood Application: The multiple linear regression model. Under the conditional normality assumption, ( f (x i x i ; θ) L i (y i x i ; θ) = (σ 2 2π) 1 2 exp (y ) i x i b)2 2σ0 2 Therefore L n (y x; θ) = n L i (y i x i ; θ) = (σ 2 2π) n 2 exp ( 1 2σ 2 and l n (y x; θ) = n 2 log(2π) n 2 log(σ2 ) 1 2σ 2 (y i x i b)2 ) (y i x i b)2. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 14 / 40

Maximum likelihood estimator 2.3. Maximum likelihood principle Maximum likelihood principle Definition A maximum likelihood estimator of θ Θ R k is a solution to the maximization problem: or ˆθ n = argmaxl n (θ) θ Θ ˆθ n = argmaxl n (θ). θ Θ Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 15 / 40

Maximum likelihood estimator Maximum likelihood principle 2.3. The maximum likelihood principle: Using the first-order conditions... Definition Under suitable regularity conditions, a maximum likelihood estimator of θ Θ R k is defined to be the solution of the first-order conditions (likelihood or log-likelihood equations): or L n θ (y x, ˆθ n ) = 0 k 1 l n θ (y x, ˆθ n ) = 0 k 1. Remark: Regularity conditions are fundamental! Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 16 / 40

Maximum likelihood estimator Maximum likelihood principle Application: The multiple linear regression model (cont d) Under suitable regularity conditions, the first-order condition are given by: { ln b (y x; ˆθ n ) = 0 k 1 l n σ 2 (y x; ˆθ n ) = 0 1 1 1 σ 2 x i (y i x i ˆb n ) = 0 k 1 (y i x i ˆb n ) 2 = 0 n + 1 2ˆσ n 2 2ˆσ n 4 The maximum likelihood estimate of θ is: ( ) 1 ( ˆb n = x i x i ) x i y i ˆσ n 2 = n 1 n (y i x i ˆb n ) 2. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 17 / 40

Maximum likelihood estimator Maximum likelihood principle Second-order conditions: the Hessian matrix evaluated at θ = ˆθ n must be negative definite. The Hessian matrix is given by: and H θ= ˆθn H = = 1 σ 2 1 x σ 4 x i x i 1 x σ 4 i (y i x i b) i (y i x i b) 1ˆσ x n 2 i x i 0 k 1 0 1 k n 2ˆσ n 4 n 1 2σ 4 σ 6 (y i x i b)2 Given that (X X) is positive definite and ˆσ 2 > 0, then H θ= ˆθn negative definite and ˆθ n is a maximum. is Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 18 / 40

Maximum likelihood estimator 2.4. Equivariance principle Equivariance principle Definition Under suitable regularity conditions, the maximum likelihood estimator of a function g(θ) of the parameter θ is g(ˆθ n ), where ˆθ n is the maximum likelihood estimator of θ. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 19 / 40

Maximum likelihood estimator Equivariance principle Example: Suppose Y 1,,Y n i.i.d. E(θ). The likelihood function is: n L n (y; θ) = θ exp( θy i ) ( ) = θ n exp θ y i. One gets (second-order conditions hold): ˆθ n = 1 Ȳ n ˆθ n (y) = 1 ȳ n. Consider now the probability density function: f Yi (y i ; λ) = 1 ( λ exp y ) i. λ Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 20 / 40

Example (continued): Maximum likelihood estimator Equivariance principle The log-likelihood function is: l(y; λ) = nlog(λ) 1 λ The first-order condition with respect to λ is: n λ + 1 λ 2 y i = 0. y i. Since the second order condition holds, one gets (as to be expected!): ˆλ n = Ȳn = 1ˆθ n ˆλ n (y) = ȳ n = 1 ˆθ n (y). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 21 / 40

3. Fisher information. 3.1 Score vector Fisher information Score vector Definition The score vector, s, is defined to be the vector formed by the first (partial) derivative of the (conditional) log-likelihood with respect to the parameters θ Θ R k : ( ) ln s(θ) l n,θ (Y x; θ) = (Y x; θ) θ i 1 i k It satisfies: [ ] ln E θ (Y x; θ) = 0 k 1, x, θ. θ Remark: E θ means the expectation with respect to the conditional distribution Y X. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 22 / 40

Fisher information Score vector Application: The multiple linear regression model The score vector is given by: 1 x σ s(θ) = 2 i (Y i x i b) n + 1 (Y 2σ 2 2σ 4 i x i b)2 E θ [s(θ)] = 0 (k+1) 1 since: [ ] ln E θ β (Y x; βσ2 ) [ E θ n 2σ 2 + 1 σ 4 = 1 σ 2 x i (E θ (Y i ) x i b) = 0 k 1 ] (Y i E θ (Y i x i )) 1 2σ 4 = n 2σ 2 + E θ (Y i E(Y i x i )) 2 }{{} = 0 V θ (Y i x i )=σ 2 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 23 / 40

Fisher information 3.2. Fisher information matrix Fisher information matrix Definition The Fisher information matrix at x is the variance-covariance matrix of the score vector: [ I x ln F = V θ (Y x; θ) θ [ ln (Y x; θ) = E θ. l n(y x; θ) θ θ ] ]. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 24 / 40

Fisher information Fisher information matrix Definition The Fisher information matrix of x is also given by: Remarks: [ I x 2 ] l n F = E θ (Y x; θ). θ θ 1 Three equivalent definition of the Fisher information matrix three different consistent estimates of the Fisher information matrix. 2 Finite sample properties can be quite different! 3 I x F can be defined from the Fisher information matrix for the observation i. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 25 / 40

Fisher information Fisher information matrix Definition The Fisher information matrix for the observation i (or x i ) can be defined by: Ĩ x i F (θ) = V θ [ ] l θ (Y i x i ; θ) = E θ [ l θ (Y i x i ; θ). l = E θ [ 2 l n θ θ (Y i x i ; θ) ] θ (Y i x i ; θ) ]. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 26 / 40

Fisher information Fisher information matrix Proposition The Fisher information matrix at x = (x 1,, x n ) (or for n observations) is given by: I x F (θ) = Ĩ x i F (θ). Remark: In a sampling model (with i.i.d. observations), one has: I x F (θ) = nĩx i F (θ). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 27 / 40

Fisher information Fisher information matrix Definition The average Fisher information matrix for one observation is defined by: Theorem (a) Ĩ F (θ) = plim 1 n IX F (θ) n Ĩ F (θ) = E Xi Ĩ X i F (θ). (b) Ĩ F (θ) = E [ l θ (Y i X i ; θ) l θ (Y i X i ; θ) ] [ ] (c) Ĩ F (θ) = E 2 l θ θ (Y i X i ; θ) Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 28 / 40

Fisher information Fisher information matrix A consistent estimator of the Fisher information matrix? Proposition If ˆθ n converges in probability to θ 0, then: Î (1) F (ˆθ n,ml ) = 1 n Î (2) F (ˆθ n,ml ) = 1 n Î (3) F (ˆθ n,ml ) = 1 n I x i F (ˆθ n,ml ) l i (y i x i ; ˆθ n,ml ) l i (y i x i ; ˆθ n,ml ) θ θ t 2 l i (y i x i ; ˆθ n,ml ) θ θ t = 1 n 2 l n (y x; ˆθ n,ml ) θ θ t. are three consistent estimators of the Fisher information matrix. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 29 / 40

Fisher information Fisher information matrix These three consistent estimators of the Fisher information matrix are asymptotically equivalent and none of these estimators is preferable to the others on statistical grounds. The main difficulty is that these estimators can have very different finite sample properties (again!). This can lead to different statistical conclusions for the same problem! Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 30 / 40

Fisher information Fisher information matrix Application: The multiple linear regression model Computation of Ĩ F (θ): Derivation of the Hessian matrix of the log-likelihood function for observation i: 2 ( l i θ θ (y 1 x i x i ; θ) = σ 2 i x i 1 ) x σ 4 i (y i x i b) 1 x σ 4 i (y 1 i x i b) 1 (y 2σ 4 σ 6 i x i b)2 Expectation with respect to the conditional distribution of Y i X i = x i : [ 2 ] ( l E i 1 θ θ θ (.) x = σ 2 i x i 0 k 1 0 1 k 1 2σ 4 Expectation with respect to the distribution of X i : [ 2 ] ( l 1 Ĩ F (θ) = E Xi E i θ θ θ (.) E(X = σ 2 i X i ) 0 ) k 1 1 0 1 k 2σ 4 ) Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 31 / 40

4. Asymptotic results 4.1. Overview Asymptotic results Overview Under certain regularity conditions, the maximum likelihood estimator, ˆθ n, possesses many appealing properties: 1. The maximum likelihood estimator is consistent. 2. The maximum likelihood estimator is asymptotically normal: n (ˆθn θ 0 ) d N (.,.). 3. The maximum likelihood estimator is asymptotically optimal or efficient. 4. The maximum likelihood estimator is equivariant: if ˆθ n is an estimator of θ 0 then g(ˆθ n ) is an estimator of g(θ 0 ). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 32 / 40

Asymptotic results Overview At the same time, Dependence to the explicit assumptions regarding Y 1,,Y n Finite sample properties can be very different from large sample properties: - The maximum likelihood estimator is consistent but can be severely biased in finite samples - The estimation of the variance-covariance matrix can be seriously doubtful in finite samples. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 33 / 40

4.2. Consistency Asymptotic results Consistency Theorem Under suitable regularity conditions, ˆθ n,ml a.s θ 0. Remark: This implies that: ˆθ n,ml p θ0 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 34 / 40

Asymptotic results 4.3. Asymptotical efficiency Asymptotic efficiency Proposition An unbiased maximum likelihood estimator of θ or g(θ) attains the FDCR lower bound and is thus (asymptotically) efficient. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 35 / 40

Asymptotic results 4.4. Large sample distribution Large sample distribution Theorem Under suitable regularity conditions, ( ) d n(ˆθn,ml θ 0 ) N 0, Ĩ 1 n F (θ 0) ˆθ n,ml a N (θ 0, n 1 Ĩ 1 F (θ 0) Remark: In a sampling model, I F (θ 0 ) is independent of x and is the Fisher information matrix for one observation: ( ) d n(ˆθ n,ml θ 0 ) N 0, I 1 n 1 (θ 0) ). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 36 / 40

Asymptotic results Large sample distribution Interpretation: The distribution of ˆθ n is approximatively (for n large) normally distributed with expectation the true unknown parameter and variance-covariance matrix the FDCR lower bound. The maximum likelihood estimator is asymptotically unbiased. The maximum likelihood estimator is asymptotically efficient. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 37 / 40

Asymptotic results 5.3. Back to the equivariance... Back to the equivariance... Proposition Assume H1, H2, H3-H8 hold, and g is a continuously differentiable function of θ and is defined from R k to R p, then: ) a.s. g (ˆθ n g(θ 0 ) ( ) ) ( [ ] [ ]) d g g n g (ˆθ n g(θ 0 ) N 0, θ t (θ 0) Ĩ 1 t F (θ 0) θ (θ 0). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 38 / 40

Asymptotic results Back to the equivariance... Application: The multiple linear regression model The Fisher information matrix is given by: ( Ĩ 1 σ F (θ 2 0) = 0 (EX i X i ) ) 1 0 k 1 0 1 k 2σ0 4 Therefore, n(ˆbn,ml b 0 ) n(ˆσ 2 n,ml σ 2 0 ) ( d N 0, σ 2 n 0 (EX ix i ) 1) d n N (0, 2σ 4 0 ) The two vectors n(ˆb n,ml b 0 ) and n(ˆσ 2 n,ml σ2 0 ) are asymptotically independent. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 39 / 40

Asymptotic results Back to the equivariance... A consistent estimate of the Fisher information matrix can be given by: Ĩ (1) F = 1 n = ( 1ˆσ 2 n E 2 l(y i x i ; ˆθ n ) θ θ 1 n X ) X 0 k 1 0 k 1 1 ˆσ 4 n so that: ˆb n,ml a N ( b 0, ˆσ 2 n(x X) 1) ˆσ n,ml a N ( σ 2 0, 2ˆσ4 n n ). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 40 / 40