Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 1/28
Outline 1 2 3 Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 2/28
Outline 1 2 3 Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 3/28
Consider : P = {P θ = l(y; θ), θ Θ R p }. A maximum likelihood estimator of θ is a solution to the maximization problem : max l(y; θ). θ Θ Because the solutions to an optimization problem remain unchanged when the objective function is transformed by a strictly increasing mapping : max log l(y; θ). θ Θ Note that log s make function more linear For conditional models max θ Θ l(y x; θ). Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 4/28
ML estimates the unknown parameters by choosing them in such a way that the resulting distribution corresponds as close as possible to the probability distribution of the observed data. Maximization (or optimization) is done by finding the values that make the gradient equal to zero : log l(y; θ) θ= = 0. ˆθ n Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 5/28
Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 6/28
In other words, ML searches for the distribution in the model that is closest to the empirical distribution according to the Kullback-Leibler discrepancy measure. Definition Given P = f (y) and P = f (y), [ I(P P ) = E log f ] (y) = log f (y) f (y) Y f (y) f (y)dy is the Kullback-Leibler discrepancy between P and P. Let f (y) = l(y; θ 0 ) and f (y) = l(y; θ). Then I(l(y; θ) l(y; θ 0 )) = log l(y; θ 0) Y l(y; θ) l(y; θ 0)dy = log l(y; θ 0 )l(y; θ 0 )dy Y Y log l(y; θ)l(y; θ 0 )dy. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 7/28
Since we want to minimize the distance between l(y; θ 0 ) and l(y; θ), it is equivalent to minimize min log l(y; θ)l(y; θ 0 )dy, θ Y or maximize the log-likelihood ( ML) max log l(y; θ)l(y; θ 0 )dy, θ or maximize the sample counterpart : Y max θ 1 n n log l(y i ; θ). i=1 We will denote the MLE by ˆθ n. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 8/28
Remark : There are a certain number of problems that may be encountered. 1- Non-existence of a solution : Due sometimes to the fact that the parameter space is open or the log-likelihood has discontinuities in θ. Property If the parameter space Θ is compact (bounded+ closed) and if the likelihood function θ l(y; θ) is continuous on Θ, then there exists a MLE. 2- Non-uniqueness of the likelihood function : When more than one value give the same likelihood. Property If the parameter space Θ is convex and if the log-likelihood function is strictly concave in ξ = h(θ), where h( ) is a bijective transformation of the parameter, then the MLE exists and it is unique. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 9/28
Outline 1 2 3 Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 10/28
Unconstrained Property If θ = (θ 1,..., θ p) Θ R p and the log likelihood function is differentiable in θ and if ˆθ n belongs to the interior of Θ, then the MLE satisfies : L(y; ˆθ n) = log l(y; ˆθ n) These equations are called the likelihood equations. = 0. Example : Let Y 1,.., Y n be a random sample drawn from a Poisson distribution P(λ). The loglikelihood function is : n n L(y; λ) = nλ + y i log λ log(y i!). It attains a maximum at ˆλ satisfying : 0 = L(y; ˆλ n) λ i=1 = n + n i=1 i=1 y i ˆλ n ˆλ n = ȳ. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 11/28
Constrained Econometric models have usually constraints on the parameters : f (θ) = 0. Maximization of L(y; θ) must take into account the constraints f (θ) = 0. To do so, we introduce a vector λ of r Lagrange multipliers and we maximize : max L(y; θ) λ f (θ). θ And the first order conditions are : { L(y; ˆθ n) f ( ˆθ n) λ = 0, f (ˆθ n) = 0. The same property as for the unconstrained case holds as far as f (θ) is a function from R p to R r with r p. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 12/28
Constrained Example : Suppose that Y = (Y 1,..., Y n) follows a binomial distribution : ( ) n P(Y = y) = p y q 1 y, y where p and q are two probabilities such that : p + q = 1 p + q 1 = 0. Therefore the maximization probability is : where θ = (p, q). max L(y; θ) λ(p + q 1), θ Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 13/28
Constrained First order conditions : then : Σn i=1 y i p L = Σn i=1 y i p L = n Σn i=1 y i q L p λ = 0 q λ = 0 λ = p + q 1 = 0 p = 1 q = n Σn i=1 y i = λ Σn i=1 y i q 1 q (1 q)(n Σn i=1 y i ) qσ n i=1 y i q(1 q) = 0 = n Σn i=1 y i q n nq n i=1 y i + q n i=1 y i q n i=1 y i = 0 ˆq n = 1 Σn i=1 y i n ˆp n = Σn i=1 y i n. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 14/28
Outline 1 2 3 Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 15/28
Existence and Consistency Consider a parametric model and random sampling. Regularity conditions 1 : A1 The variables Y i, i = 1,..., n are i.i.d. with density f (y; θ), θ Θ. A2 The parameter space is compact (= closed and bounded). A3 The true, but unknown, parameter value θ 0 is identified. A4 The log-likelihood function is continuous with respect to θ. A5 E 0 (log f (y i ; θ)) exists. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 16/28
Existence and Consistency Property Existence and consistency. Under assumptions A1-A5, existence and uniqueness, there exists a sequence of MLE converging to the true parameter value θ 0. PROOF (sketch) : A2 and A4 ensure the existence of the MLE ˆθ n obtained from maximizing L n(θ) or 1 n Ln(θ). Since 1 n Ln(θ) = 1 n n i=1 log f (y i; θ) can be interpreted as the sample mean of the random variables log f (y i ; θ). By the LLN : 1 n Ln(θ) p E 0(log l(y ; θ)). Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 17/28
Existence and Consistency Next, when the convergence is uniform, the solution ˆθ n converges to the solution of the limit problem : plim ˆθ n = θ = arg max θ E 0 (log l(y ; θ)) = arg max θ Y log l(y; θ)l(y; θ 0)dy. By the identification condition on θ 0, the solution to the limit problem is unique and equal to θ 0 θ = θ 0 plim ˆθ n = θ 0. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 18/28
Existence and Consistency Small variations in the assumptions can be done. In particular, instead working with all the parameter space Θ, we may replace A2 by : A2 The interior of Θ is non-empty and θ 0 belongs to the interior of Θ. We also need a local LLN. In this case we work with local maxima instead of global. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 19/28
Asymptotic Distribution Since the sequence ˆθ n converges to θ 0, it is useful to consider the asymptotic behaviour of ˆθ n θ 0, or rather, determine the rate of convergence. We need extra regularity conditions : Regularity conditions 2 : A6 L n(θ) is twice differentiable in an open neighbourhood of θ 0. ( ) A7 I 1 (θ 0 ) = E 0 2 log f (Y 1 ;θ 0 ) exists and is non-singular. I is the Fisher (expected) information matrix for one random variable. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 20/28
Asymptotic Distribution Property Under A1, A2, A3-A5, A6-A7, a consistent sequence ˆθ n of local maxima is such that n(ˆθ n θ 0 ) converges in distribution to a Gaussian distribution with mean zero and variance covariance matrix I 1 (θ 0 ) 1 : n(ˆθ n θ 0 ) d N(0, I 1 (θ 0 ) 1 ). PROOF (sketch) : Since ˆθ n satisfies Taylor expansion 1 of the score Ln( ˆθ) = 0 and it converges to θ 0, a Ln( ˆθ) in a neighborhood of θ = θ 0 gives : 1. Taylor expansion : p f (x 0 ) f (x) = (x x 0 ) i + R n i! i=0 where p f (x 0 ) i=0 i! (x x 0 ) i is a p-degree polynomial and R n is the remainder. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 21/28
Asymptotic Distribution 0 = Ln(ˆθ) = Ln(θ 0) + 2 L n(θ 0 ) (ˆθ θ 0 ) + o p(1) where the remainder of the expansion is o p(1). Rearranging : and dividing by n ( 1 n 2 L n(θ 0 ) (ˆθ n θ 0 ) Ln(θ 0) ) } 2 L n(θ 0 ) {{ } 1 n(ˆθ n θ 0 ) 1 L n(θ 0 ). n }{{} 2 Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 22/28
Asymptotic Distribution 1 1 n 2 L n(θ 0 ) = 1 n n i=1 2 log f (y i ; θ) this is an empirical mean. By an appropriate LLN it converges to : ( ) 2 log f (y 1 ; θ 0 ) I 1 (θ 0 ) = E θ. 2 1 L n(θ 0 ) n = = 1 n 1 n n log f (y i ; θ 0 ) n ( log f (yi ; θ 0 ) i=1 i=1 ( )) log f (yi ; θ 0 ) E θ0. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 23/28
Asymptotic Distribution and by the CLT it converges in distribution to ( ( )) log f (y1 ; θ 0 ) N 0, V θ0 N(0, I 1 (θ 0 )). Collecting 1 and 2 : and : I 1 (θ 0 ) n(ˆθ n θ 0 ) 1 L n(θ 0 ), n I 1 (θ 0 ) n(ˆθ n θ 0 ) d N(0, I 1 (θ 0 )) n(ˆθn θ 0 ) d N(0, I 1 (θ 0 ) 1 I 1 (θ 0 )I 1 (θ 0 ) 1 ) n(ˆθn θ 0 ) d N(0, I 1 (θ 0 ) 1 ). Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 24/28
Asymptotic Distribution All this implies that : ˆθ n d N(0, I n(θ 0 ) 1 ), where : 1 n I 1(θ 0 ) 1 = (ni 1 (θ 0 )) 1 = I n(θ 0 ) 1 is the Fisher information matrix for n observations. Hence, ˆθ n is consistent, efficient and asymptotically Gaussian! Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 25/28
Asymptotic Distribution I n(θ 0 ) depends on θ 0, which is unknown. But it can be estimated consistently by : I n(θ) = 1 n 2 log f (y i ; ˆθ n) n or I n(θ) = 1 n n i=1 i=1 log f (y i ; ˆθ n) log f (y i ; ˆθ n). Property Let g be a continuous differentiable function of θ R p with values in R q. Then under the regularity conditions : i) g(ˆθ n) p g(θ 0 ) ii) n(g(ˆθ n) g(θ 0 )) d N ( ) 0, g(θ 0) I 1 (θ 0 ) 1 g(θ 0). Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 26/28
Asymptotic Distribution Why in the previous proof n is so important? For two reasons : We had dividing by n : ( 1 ) 2 L n(θ 0 ) n }{{} First Reason 2 L n(θ 0 ) (ˆθ n θ 0 ) Ln(θ 0), n(ˆθ n θ 0 ) 1 L n(θ 0 ). n }{{} Second Reason First reason for n : Law of Large Numbers for the Hessian. If we do not divide by n the LLN cannot be applied. Second reason for n : Central Limit Theorem for the score. Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 27/28
Asymptotic Distribution We had : or 1 n n ( 1 n n ( log f (yi ; θ 0 ) i=1 n i=1 and the CLT works here. log f (y i ; θ 0 ) ( )) log f (yi ; θ 0 ) E 0 1 n E 0 ( ) ) log f (yi ; θ 0 ) If we do not divide by n, the CLT cannot be applied (it does not converge to a Gaussian). Yves Dominicy Graduate Econometrics I: Maximum Likelihood I 28/28