Graduate Econometrics I: Maximum Likelihood I

Similar documents
Graduate Econometrics I: Maximum Likelihood II

Graduate Econometrics I: Unbiased Estimation

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Chapter 3: Maximum Likelihood Theory

Introduction to Estimation Methods for Time Series models Lecture 2

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Graduate Econometrics I: Asymptotic Theory

The properties of L p -GMM estimators

Maximum Likelihood Estimation

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Econometrics I, Estimation

Lecture 4 September 15

Greene, Econometric Analysis (6th ed, 2008)

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Maximum Likelihood Estimation

Estimation of Dynamic Regression Models

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Information in a Two-Stage Adaptive Optimal Design

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Graduate Econometrics I: What is econometrics?

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Maximum likelihood estimation

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Chapter 3 : Likelihood function and inference

Maximum Likelihood Estimation

5601 Notes: The Sandwich Estimator

A Very Brief Summary of Statistical Inference, and Examples

Exercises Chapter 4 Statistical Hypothesis Testing

δ -method and M-estimation

ECE 275A Homework 7 Solutions

simple if it completely specifies the density of x

The outline for Unit 3

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Statistics 3858 : Maximum Likelihood Estimators

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Ch. 5 Hypothesis Testing

Estimation, Inference, and Hypothesis Testing

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Closest Moment Estimation under General Conditions

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Economics 101A (Lecture 3) Stefano DellaVigna

Posterior Regularization

DA Freedman Notes on the MLE Fall 2003

Link lecture - Lagrange Multipliers

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

10. Linear Models and Maximum Likelihood Estimation

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 4: Asymptotic Properties of the MLE

Lecture 1: Introduction

Lecture 4: Optimization. Maximizing a function of a single variable

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions

Advanced Quantitative Methods: maximum likelihood

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Statistics and econometrics

Review and continuation from last week Properties of MLEs

Chapter 3. Point Estimation. 3.1 Introduction

ML Testing (Likelihood Ratio Testing) for non-gaussian models

Lecture 3 September 1

STAT 461/561- Assignments, Year 2015

EM Algorithm II. September 11, 2018

Testing Restrictions and Comparing Models

Lecture 6: Gaussian Mixture Models (GMM)

Composite Hypotheses and Generalized Likelihood Ratio Tests

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Lecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches

Math 494: Mathematical Statistics

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Closest Moment Estimation under General Conditions

Answer Key for STAT 200B HW No. 7

Theoretical Statistics. Lecture 17.

ECE 275B Homework # 1 Solutions Winter 2018

Maximum Likelihood Large Sample Theory

Lecture 3 January 16

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

Outline of GLMs. Definitions

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Maximum Likelihood Estimation

Sampling distribution of GLM regression coefficients

II. An Application of Derivatives: Optimization

Parameter Estimation

Maximum Likelihood (ML) Estimation

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

8. Hypothesis Testing

Inference in non-linear time series

Empirical Likelihood

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

ECE531 Lecture 10b: Maximum Likelihood Estimation

MC3: Econometric Theory and Methods. Course Notes 4

Asymptotic inference for a nonstationary double ar(1) model

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Transcription:

Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 1/28

Outline 1 2 3 Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 2/28

Outline 1 2 3 Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 3/28

Consider : P = {P θ = l(y; θ), θ Θ R p }. A maximum likelihood estimator of θ is a solution to the maximization problem : max l(y; θ). θ Θ Because the solutions to an optimization problem remain unchanged when the objective function is transformed by a strictly increasing mapping : max log l(y; θ). θ Θ Note that log s make function more linear For conditional models max θ Θ l(y x; θ). Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 4/28

ML estimates the unknown parameters by choosing them in such a way that the resulting distribution corresponds as close as possible to the probability distribution of the observed data. Maximization (or optimization) is done by finding the values that make the gradient equal to zero : log l(y; θ) θ= = 0. ˆθ n Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 5/28

Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 6/28

In other words, ML searches for the distribution in the model that is closest to the empirical distribution according to the Kullback-Leibler discrepancy measure. Definition Given P = f (y) and P = f (y), [ I(P P ) = E log f ] (y) = log f (y) f (y) Y f (y) f (y)dy is the Kullback-Leibler discrepancy between P and P. Let f (y) = l(y; θ 0 ) and f (y) = l(y; θ). Then I(l(y; θ) l(y; θ 0 )) = log l(y; θ 0) Y l(y; θ) l(y; θ 0)dy = log l(y; θ 0 )l(y; θ 0 )dy Y Y log l(y; θ)l(y; θ 0 )dy. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 7/28

Since we want to minimize the distance between l(y; θ 0 ) and l(y; θ), it is equivalent to minimize min log l(y; θ)l(y; θ 0 )dy, θ Y or maximize the log-likelihood ( ML) max log l(y; θ)l(y; θ 0 )dy, θ or maximize the sample counterpart : Y max θ 1 n n log l(y i ; θ). i=1 We will denote the MLE by ˆθ n. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 8/28

Remark : There are a certain number of problems that may be encountered. 1- Non-existence of a solution : Due sometimes to the fact that the parameter space is open or the log-likelihood has discontinuities in θ. Property If the parameter space Θ is compact (bounded+ closed) and if the likelihood function θ l(y; θ) is continuous on Θ, then there exists a MLE. 2- Non-uniqueness of the likelihood function : When more than one value give the same likelihood. Property If the parameter space Θ is convex and if the log-likelihood function is strictly concave in ξ = h(θ), where h( ) is a bijective transformation of the parameter, then the MLE exists and it is unique. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 9/28

Outline 1 2 3 Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 10/28

Unconstrained Property If θ = (θ 1,..., θ p) Θ R p and the log likelihood function is differentiable in θ and if ˆθ n belongs to the interior of Θ, then the MLE satisfies : L(y; ˆθ n) = log l(y; ˆθ n) These equations are called the likelihood equations. = 0. Example : Let Y 1,.., Y n be a random sample drawn from a Poisson distribution P(λ). The loglikelihood function is : n n L(y; λ) = nλ + y i log λ log(y i!). It attains a maximum at ˆλ satisfying : 0 = L(y; ˆλ n) λ i=1 = n + n i=1 i=1 y i ˆλ n ˆλ n = ȳ. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 11/28

Constrained Econometric models have usually constraints on the parameters : f (θ) = 0. Maximization of L(y; θ) must take into account the constraints f (θ) = 0. To do so, we introduce a vector λ of r Lagrange multipliers and we maximize : max L(y; θ) λ f (θ). θ And the first order conditions are : { L(y; ˆθ n) f ( ˆθ n) λ = 0, f (ˆθ n) = 0. The same property as for the unconstrained case holds as far as f (θ) is a function from R p to R r with r p. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 12/28

Constrained Example : Suppose that Y = (Y 1,..., Y n) follows a binomial distribution : ( ) n P(Y = y) = p y q 1 y, y where p and q are two probabilities such that : p + q = 1 p + q 1 = 0. Therefore the maximization probability is : where θ = (p, q). max L(y; θ) λ(p + q 1), θ Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 13/28

Constrained First order conditions : then : Σn i=1 y i p L = Σn i=1 y i p L = n Σn i=1 y i q L p λ = 0 q λ = 0 λ = p + q 1 = 0 p = 1 q = n Σn i=1 y i = λ Σn i=1 y i q 1 q (1 q)(n Σn i=1 y i ) qσ n i=1 y i q(1 q) = 0 = n Σn i=1 y i q n nq n i=1 y i + q n i=1 y i q n i=1 y i = 0 ˆq n = 1 Σn i=1 y i n ˆp n = Σn i=1 y i n. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 14/28

Outline 1 2 3 Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 15/28

Existence and Consistency Consider a parametric model and random sampling. Regularity conditions 1 : A1 The variables Y i, i = 1,..., n are i.i.d. with density f (y; θ), θ Θ. A2 The parameter space is compact (= closed and bounded). A3 The true, but unknown, parameter value θ 0 is identified. A4 The log-likelihood function is continuous with respect to θ. A5 E 0 (log f (y i ; θ)) exists. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 16/28

Existence and Consistency Property Existence and consistency. Under assumptions A1-A5, existence and uniqueness, there exists a sequence of MLE converging to the true parameter value θ 0. PROOF (sketch) : A2 and A4 ensure the existence of the MLE ˆθ n obtained from maximizing L n(θ) or 1 n Ln(θ). Since 1 n Ln(θ) = 1 n n i=1 log f (y i; θ) can be interpreted as the sample mean of the random variables log f (y i ; θ). By the LLN : 1 n Ln(θ) p E 0(log l(y ; θ)). Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 17/28

Existence and Consistency Next, when the convergence is uniform, the solution ˆθ n converges to the solution of the limit problem : plim ˆθ n = θ = arg max θ E 0 (log l(y ; θ)) = arg max θ Y log l(y; θ)l(y; θ 0)dy. By the identification condition on θ 0, the solution to the limit problem is unique and equal to θ 0 θ = θ 0 plim ˆθ n = θ 0. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 18/28

Existence and Consistency Small variations in the assumptions can be done. In particular, instead working with all the parameter space Θ, we may replace A2 by : A2 The interior of Θ is non-empty and θ 0 belongs to the interior of Θ. We also need a local LLN. In this case we work with local maxima instead of global. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 19/28

Asymptotic Distribution Since the sequence ˆθ n converges to θ 0, it is useful to consider the asymptotic behaviour of ˆθ n θ 0, or rather, determine the rate of convergence. We need extra regularity conditions : Regularity conditions 2 : A6 L n(θ) is twice differentiable in an open neighbourhood of θ 0. ( ) A7 I 1 (θ 0 ) = E 0 2 log f (Y 1 ;θ 0 ) exists and is non-singular. I is the Fisher (expected) information matrix for one random variable. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 20/28

Asymptotic Distribution Property Under A1, A2, A3-A5, A6-A7, a consistent sequence ˆθ n of local maxima is such that n(ˆθ n θ 0 ) converges in distribution to a Gaussian distribution with mean zero and variance covariance matrix I 1 (θ 0 ) 1 : n(ˆθ n θ 0 ) d N(0, I 1 (θ 0 ) 1 ). PROOF (sketch) : Since ˆθ n satisfies Taylor expansion 1 of the score Ln( ˆθ) = 0 and it converges to θ 0, a Ln( ˆθ) in a neighborhood of θ = θ 0 gives : 1. Taylor expansion : p f (x 0 ) f (x) = (x x 0 ) i + R n i! i=0 where p f (x 0 ) i=0 i! (x x 0 ) i is a p-degree polynomial and R n is the remainder. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 21/28

Asymptotic Distribution 0 = Ln(ˆθ) = Ln(θ 0) + 2 L n(θ 0 ) (ˆθ θ 0 ) + o p(1) where the remainder of the expansion is o p(1). Rearranging : and dividing by n ( 1 n 2 L n(θ 0 ) (ˆθ n θ 0 ) Ln(θ 0) ) } 2 L n(θ 0 ) {{ } 1 n(ˆθ n θ 0 ) 1 L n(θ 0 ). n }{{} 2 Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 22/28

Asymptotic Distribution 1 1 n 2 L n(θ 0 ) = 1 n n i=1 2 log f (y i ; θ) this is an empirical mean. By an appropriate LLN it converges to : ( ) 2 log f (y 1 ; θ 0 ) I 1 (θ 0 ) = E θ. 2 1 L n(θ 0 ) n = = 1 n 1 n n log f (y i ; θ 0 ) n ( log f (yi ; θ 0 ) i=1 i=1 ( )) log f (yi ; θ 0 ) E θ0. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 23/28

Asymptotic Distribution and by the CLT it converges in distribution to ( ( )) log f (y1 ; θ 0 ) N 0, V θ0 N(0, I 1 (θ 0 )). Collecting 1 and 2 : and : I 1 (θ 0 ) n(ˆθ n θ 0 ) 1 L n(θ 0 ), n I 1 (θ 0 ) n(ˆθ n θ 0 ) d N(0, I 1 (θ 0 )) n(ˆθn θ 0 ) d N(0, I 1 (θ 0 ) 1 I 1 (θ 0 )I 1 (θ 0 ) 1 ) n(ˆθn θ 0 ) d N(0, I 1 (θ 0 ) 1 ). Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 24/28

Asymptotic Distribution All this implies that : ˆθ n d N(0, I n(θ 0 ) 1 ), where : 1 n I 1(θ 0 ) 1 = (ni 1 (θ 0 )) 1 = I n(θ 0 ) 1 is the Fisher information matrix for n observations. Hence, ˆθ n is consistent, efficient and asymptotically Gaussian! Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 25/28

Asymptotic Distribution I n(θ 0 ) depends on θ 0, which is unknown. But it can be estimated consistently by : I n(θ) = 1 n 2 log f (y i ; ˆθ n) n or I n(θ) = 1 n n i=1 i=1 log f (y i ; ˆθ n) log f (y i ; ˆθ n). Property Let g be a continuous differentiable function of θ R p with values in R q. Then under the regularity conditions : i) g(ˆθ n) p g(θ 0 ) ii) n(g(ˆθ n) g(θ 0 )) d N ( ) 0, g(θ 0) I 1 (θ 0 ) 1 g(θ 0). Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 26/28

Asymptotic Distribution Why in the previous proof n is so important? For two reasons : We had dividing by n : ( 1 ) 2 L n(θ 0 ) n }{{} First Reason 2 L n(θ 0 ) (ˆθ n θ 0 ) Ln(θ 0), n(ˆθ n θ 0 ) 1 L n(θ 0 ). n }{{} Second Reason First reason for n : Law of Large Numbers for the Hessian. If we do not divide by n the LLN cannot be applied. Second reason for n : Central Limit Theorem for the score. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 27/28

Asymptotic Distribution We had : or 1 n n ( 1 n n ( log f (yi ; θ 0 ) i=1 n i=1 and the CLT works here. log f (y i ; θ 0 ) ( )) log f (yi ; θ 0 ) E 0 1 n E 0 ( ) ) log f (yi ; θ 0 ) If we do not divide by n, the CLT cannot be applied (it does not converge to a Gaussian). Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 28/28