Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program
Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 2
The Likelihood Principle Likelihood principle: our estimate of θ is the one that maximizes the likelihood of our sample (y, X) = ((y 1, x 1 ),...(y N, x N ) ). Likelihood is the probability of observing the sample, i.e. Pr[y, X; θ] for discrete data; f(y, X; θ) for continuous. The likelihood function is L N (θ) f(y, X; θ)= f(y X; θ)f(x; θ), which for i.i.d. data is f(y, X; θ) = N f(y i, x i ; θ). We assume f(x; θ) = f(x) so we can focus on f(y X; θ). Hence, our object of interest is the (conditional) log-likelihood function: N L N (θ) ln f(y i x i ; θ). Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 3
The Maximum Likelihood Estimator (MLE) The MLE is dened by the following optimization problem: ˆθ ML arg max θ Θ This estimator is: - Fully parametric - An extremum estimator - An m-estimator The FOC of the problem is: 1 N L N(θ) = arg max θ Θ 1 N N ln f(y i x i ; θ). 1 N N ln f(y i x i ; ˆθ ML ) θ = 0. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 4
Identication The true parameter vector θ 0 is identied if there are no observationally equivalent parameters. More formally, θ 0 is identied if the Kullback-Leibler inequality is satised: Pr[f(y x; θ) f(y x; θ 0 )] > 0 θ θ 0. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 5
Regularity conditions If the following two assumptions hold, i. The specied density f(y x; θ) is the data generating process (dgp) ii. The support of y does not depend on θ then the regularity conditions are satised: [ ] ln f(y x; θ) E f = 0 θ [ 2 ] ln f(y x; θ) E f θ θ = E f [ ln f(y x; θ) θ ] ln f(y x; θ) θ. The latter condition is a.k.a. information matrix equality. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 6
Consistency Using identication and the rst regularity condition: E[ln f(y x; θ)] < E[ln f(y x; θ 0 )] θ θ 0. By the Law of Large Numbers (LLN): 1 N N ln f(y i x i ; θ) E[ln f(y x; θ)]. p Then, as N, ˆθ ML = arg max L N (θ) p arg max L 0 (θ)= θ 0, whenever: i. The parameter space Θ is compact ii. L N(θ) is measurable for all θ Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 7
Asymptotic distribution Using a rst order Taylor Expansion of the FOC around θ 0: ( 1 N 2 l i(θ ) N N(ˆθ θ0) = N θ θ ) 1 1 N where l i(θ) ln f(y i x i; θ), and θ is between ˆθ and θ 0. l i(θ 0), θ Assuming i.i.d. observations and regularity+consistency conditions, by LLN: ( ) 1 1 N [ ] 2 l i(θ ) 2 1 N θ θ E l i(θ 0) p θ θ. By CLT: 1 N N l i(θ 0) θ d N ( 0, E [ li(θ 0) θ ]) l i(θ 0) θ. Finally, using Cramer theorem and IM equality: [ ] 2 1 [ l i(θ 0) li(θ 0) N(ˆθ θ0) N (0, Ω 0), Ω 0 = E d θ θ = E θ ] 1 l i(θ 0) θ. Since Ω 0 = IM 1, it is the Cramer-Rao lower bound (ecient estimator). Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 8
Generalized Method of Moments Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 9
General formulation Let θ be the parameter vector of interest, dened by the set of moments (or orthogonality conditions): where E[ψ(w; θ)] = 0, - w is a (vector) random variable, - and ψ( ) is a vector function such that dim(ψ) dim(θ). Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 10
Estimation Consider a random sample with N observations {w i } N. GMM estimation is based on the sample moment conditions: b N (θ) 1 N N ψ(w i ; θ). The GMM estimator minimizes the quadratic distance of b N (θ) from zero: ˆθ GMM arg min θ Θ b N(θ) W N b N (θ), where W N is semi-positive denite, and rank(w N ) dim(θ). If the problem is just-identied (dim(ψ) = dim(θ)): b N (ˆθ GMM ) = 0. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 11
Consistency GMM is an extremum estimator, so general consistency results hold, as in MLE. Conditions and intuition are similar to MLE: - Parameter space Θ R K is compact. - W N b N (θ) W 0 E[ψ(w; θ)]. p - Identication: W 0 E[ψ(w; θ)] = 0 θ = θ 0. If these conditions hold, ˆθ GMM p θ 0. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 12
Asymptotic distribution Following similar steps as in the MLE case, if - ˆθ GMM is a consistent estimator of θ 0, - θ is in the interior of Θ, - ψ(w; θ) is once dierentiable with respect to θ, - D N (θ) b N (θ)/θ p D 0 (θ), D 0 (θ) continuous at θ = θ 0, - For D 0 D 0 (θ 0 ), the matrix D 0 W 0D 0 is non-singular, - Nb N (θ 0 ) N (0, V 0 ), with V 0 = E[ψ(w; θ 0 )ψ(w; θ 0 ) ], d then N(ˆθ GMM θ 0 ) N (0, Ω 0 ), with: d Ω 0 = (D 0W 0 D 0 ) 1 D 0W 0 V 0 W 0 D 0 (D 0W 0 D 0 ) 1. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 13
Optimal weighting matrix Eciency is achieved with any W N that delivers W 0 = κv 1 0. This includes W N = V0 1 (unfeasible), but also W N = where ˆV N is any consistent estimator of V 0. ˆV 1 N, Optimal GMM estimator is implemented in two steps: 1. Obtain ˆθ GMM (WN 0 ) for an initial guess W N 0. 2. Re-estimate using ( N ) 1 Ŵ opt ψ(w i ; ˆθ GMM (WN))ψ(w 0 i ; ˆθ GMM (WN)) 0 as the new weighting matrix. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 14
Numerical Methods Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 15
Dierentiation We use the denition of a derivative: f f(x + ɛ) f(x) (x) = lim f (x) ɛ 0 ɛ for a small h (e.g. 10 6 ). f(x + h) f(x), h More accurate (and costly) is the two-sided dierential: f (x) f(x + h) f(x h). 2h To compute a gradient f (x), one element is moved at a time. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 16
Newton-Raphson optimization Originally conceived for nding roots. Approximates the function by the tangent line and nds the intercept (iterative procedure): f(x n ) 0 = f (x n ) x n+1 = x n f(x n) x n x n+1 f (x n ). Extension to optimization is natural: In the multivariate case: x n+1 = x n f (x n ) f (x n ). x n+1 = x n [H f (x n )] 1 f (x n ). Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 17
Integration Numerical integration (quadrature) consists of a weighted sum of a nite set of evaluations of the integrand. Integration weights depend on the method and on precision. Deterministic methods: midpoint rule, trapezoidal rule, Simpson's rule, Gaussian,... Alternative: Monte Carlo integration. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 18