MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22
Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22
Maximum Likelihood Estimation - Introduction For a linear model y = X β + ε, we can use OLS/2SLS, etc. MLE can estimate both linear and non-linear models. Basic idea: Specify a parametric pdf for your observed data. Find the values of parameters that make your data most likely. If the distributional assumption is correct, MLE is ecient. Li Zhao MLE and GMM 3 / 22
Maximum Likelihood Estimation Random, i.i.d. sample y 1,y 2,...y n. The likelihood function is the joint distribution of (y 1,...y n ): L(y;θ) = f (y 1,y 2,...y n ;θ). The likelihood estimator ˆθ MLE maximizes L(y;θ). Because y's are independent, L(y;θ) = f (y i ;θ). i We often use logarithm, equivalently, ˆθ MLE maximizes LL(y;θ) = ln(f (y i ;θ)). i Li Zhao MLE and GMM 4 / 22
MLE Example: Normal Distribution If y 1,y 2,...y n are i.i.d sample from N(µ,σ 2 ), the likelihood function is written: f (y 1,y 2,...y n µ,σ 2 ) = i 1 σ 2π exp(1 2 (ˆµ, ˆ σ 2 ) maximize the log likelihood function (y i µ) 2 σ 2 ). LL(y 1,y 2,...y n µ,σ 2 ) = n logσ n 2 log(2π) 1 2σ i (ˆµ, ˆ σ 2 ) satisfy the two FOCs LL µ = 1 σ (x 2 i µ) ˆµ MLE = x i. n. i LL σ = n σ + 1 σ (x 3 i µ) 2 ˆσ 2 = i(x i µ) 2. n i (y i µ) 2. Li Zhao MLE and GMM 5 / 22
MLE Example: Tobit If y i > 0, its density function is If y i = 0, its probability function is y = xβ + ε. { y if y y = > 0 0 if y = 0. f (y i µ,σ 2 ) = 1 σ 2π exp(1 (y i x i β) 2 ). 2 σ 2 Pr(y i = 0 µ,σ 2 ) = Φ(x i β). The log likelihood function is LL(y;θ) = ln(f (y i ;θ)). i 1 = [1(y i > 0) i σ 2π exp(1 (y i x i β) 2 ) + 1(y 2 σ 2 i = 0) Φ(x i β)]. Li Zhao MLE and GMM 6 / 22
Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 7 / 22
GMM - Introduction GMM is a generic method for estimating parameters in statistical models. It is applicable to cases where the full shape of the distribution function of the data may not be known, and therefore maximum likelihood estimation is not applicable. GMM is an alternative based on minimal assumptions. GMM estimation is often possible where a likelihood analysis is extremely dicult. As we will see soon, many applications in empirical IO end up with some moment conditions. GMM was developed by Lars Peter Hansen in 1982 as a generalization of the method of moments. Hansen shared the 2013 Nobel Prize in Economics in part for this work. Li Zhao MLE and GMM 8 / 22
Moments In GMM, we wish to build estimators around conditions such as E[g(y i,x i ;θ)] = 0. We need as least as many "identifying moments" as parameters. We may impose more moments than parameter so not all moments can hold simultaneously. GMM can encompass many estimation techniques we are familiar with: OLS: E[x i ε i ] = 0. IV: MLE: E[z i ε i ] = 0. E[ LL θ ] = 0. Li Zhao MLE and GMM 9 / 22
GMM Estimator We can moments in expectations E[g(y i,x i ;θ)] = 0. To estimate θ, we specify a positive denite matrix (called the weighting matrix) W n and nd parameters that minimize the following generalized distance: ˆθ GMM = arg min θ Q(θ), where Q(θ) = g n (θ) W n g n (θ) and g n (θ) is a sample average of the moments g n (θ) = 1 n i g(y i,x i ;θ). Li Zhao MLE and GMM 10 / 22
Example: Method of Moment Estimator of the Mean Assume that {y 1,...y n } are random variables drawn from a population with expectation µ. We have a single moment condition g(y i ; µ) = E(y i µ) = 0. The sample average of the moment is The MM estimator is maximizer of g n (θ) = 1 n (y i µ). ˆµ MM = arg min θ ( 1 n (y i µ)) 2, we get ˆµ MM = 1 n y i, which is the sample average. Li Zhao MLE and GMM 11 / 22
Example: Instrumental Variable The moment conditions are y = X β + ε g(β) = E[z i (y i X i β)] = 0 The corresponding sample moments are given by g n (β) = 1 n Z (Y X β). When the number of instruments is greater than the number of exogenous, we have more moments than the number of unknowns. ˆβ GMM minimizes ( 1 n Z (y X β)) W n ( 1 n Z (y X β)). Take FOC, ˆβ GMM = (X ZW n Z X ) 1 X ZW N Z y. Li Zhao MLE and GMM 12 / 22
Example: Bernoulli Bernoulli random variable y takes only two values 0 and 1 with probabilities. Its mean and variance is p and p(1 p). It has density function f (y p) = p y (1 p) 1 y. Matlab Illustration: MLE, MM and GMM estimation of Bernoulli distribution. Li Zhao MLE and GMM 13 / 22
Bernoulli - MLE function LL = LL_bernoulli(y,p) f1 = p; f0 = (1-p); f = f1.*(y==1) + f0.*(y==0); LL = -sum(log(f))'; end p0 = 0.5; A = [ 1; -1]; b = [1;0]; p_mle= fmincon(@(p)ll_bernoulli(y,p),p0,a,b); Li Zhao MLE and GMM 14 / 22
Bernoulli - MM and GMM Method of Moments function Q = MM_bernoulli(y,p) Q = (p-mean(y))^2; end p_mm = fmincon(@(p)mm_bernoulli(y,p),p0,a,b); GMM function Q = GMM_bernoulli(y,p) Q1 = (p-mean(y))^2; Q2 = (p*(1-p) - std(y))^2; Q = Q1 + Q2; end p_gmm = fmincon(@(p)gmm_bernoulli(y,p),p0,a,b); Li Zhao MLE and GMM 15 / 22
Ecient GMM Estimation The variance of ˆθ GMM GMM depends on the weight matrix, W n. The ecient GMM estimator has the smallest possible (asymptotic) variance. Let S be the var-cov matrix of g(y i, µ), It can be shown that the optimal weight matrix, W n, has the property that plim W OPT n = S 1. A moment with small variance is informative and should have large weight. Li Zhao MLE and GMM 16 / 22
Two-Step Ecient GMM We need an optimal weight matrix, but that depends on the parameters. Two-step ecient GMM: Step 1: choose an initial weight matrix, for example I, and nd a consistent but less ecient rst-step GMM estimator ˆθ [1] = arg min θ g n (θ) W [1] g n (θ), Step 2: Let W [2] = [ 1 n g(y i, ˆθ [1]] 1. Find the ecient estimator ˆθ [2] = arg min θ g n (θ) W [2] g n (θ), The estimator is not unique as it depends on the initial weight matrix W [1]. Li Zhao MLE and GMM 17 / 22
Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 18 / 22
Bernoulli Random Variable Bernoulli random variable y takes only two values 0 and 1 with probabilities. It has density function f (y p) = p y (1 p) 1 y. Li Zhao MLE and GMM 19 / 22
Extension from Bernoulli to Binary Choice Models Consider the case in which p (the probability of the event y = 1 (success)) varies across individuals. p i is a function of dependent X i : p i = F (X i ). The choice of functional form F ( ) is up to you, and dierent choices provides dierent models. Linear probability model: Pr(y = 1 X i ) = X β. Probit: Pr(y = 1 X i ) = Φ(X i β), Pr(y = 0 X i) = 1 Φ(X i β). Logit: Pr(y = 1 X i ) = exp(x β ) 1+exp(X i β ), Pr(y = 0 X 1 i) = 1+exp(X β ). Li Zhao MLE and GMM 20 / 22
Matlab Probit function LL = LL_probit(y,X,b) f1 = normcdf(x*b); f0 = 1 - f1; f = f1.*(y==1) + f0.*(y==0); LL = -sum(log(f)); end Logit function LL = LL_logit(y,X,b) my_exp= exp(x*b); f1 = my_exp./(1+my_exp); f0 = 1./(1+my_exp); f = f1.*(y==1) + f0.*(y==0); LL = -sum(log(f)); end Li Zhao MLE and GMM 21 / 22
Summary Maximum likelihood GMM Commonly used in nonlinear models. Ecient if the parametric assumption is correct. Relax parametric assumptions. Can be useful in cases where MLE is dicult to use. GMM is very popular in Empirical IO. Li Zhao MLE and GMM 22 / 22