MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Similar documents
Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Econometrics I. Ricardo Mora

Introduction to Estimation Methods for Time Series models Lecture 2

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Simulation. Li Zhao, SJTU. Spring, Li Zhao Simulation 1 / 19

Statistics and Econometrics I

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Generalized Linear Models Introduction

Applied Econometrics (QEM)

Non-linear panel data modeling

Generalized Method of Moments (GMM) Estimation

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

The Expectation-Maximization Algorithm

Lecture 14 More on structural estimation

Graduate Econometrics I: Maximum Likelihood II

Ultra High Dimensional Variable Selection with Endogenous Variables

Maximum Likelihood (ML) Estimation

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Econometrics I, Estimation

Econometrics of Panel Data

Advanced Quantitative Methods: maximum likelihood

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017

Lecture 6: Discrete Choice: Qualitative Response

Answer Key for STAT 200B HW No. 7

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.

IV estimators and forbidden regressions

Discrete Dependent Variable Models

Graduate Econometrics I: Asymptotic Theory

Introduction to Maximum Likelihood Estimation

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Chapter 3: Maximum Likelihood Theory

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

6. MAXIMUM LIKELIHOOD ESTIMATION

POLI 8501 Introduction to Maximum Likelihood Estimation

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

HT Introduction. P(X i = x i ) = e λ λ x i

Parameter Estimation

Generalized Method of Moment

Advanced Quantitative Methods: maximum likelihood

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

IEOR165 Discussion Week 5

Density Estimation: ML, MAP, Bayesian estimation

Generalized Linear Models

Single-level Models for Binary Responses

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Generalized Linear Models. Kurt Hornik

Econ 583 Final Exam Fall 2008

,..., θ(2),..., θ(n)

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

5.2 Expounding on the Admissibility of Shrinkage Estimators

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Statistics: Learning models from data

Stat 579: Generalized Linear Models and Extensions

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Master s Written Examination

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

IEOR 165: Spring 2019 Problem Set 2

Inference in non-linear time series

Problem Set 6 Solution

Generalized Method of Moments: I. Chapter 9, R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, 2004, Oxford.

1. Point Estimators, Review

Outline of GLMs. Definitions

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is

IEOR 165 Lecture 13 Maximum Likelihood Estimation

Introduction to Estimation Methods for Time Series models. Lecture 1

Chapter 11. Regression with a Binary Dependent Variable

ECON 3150/4150, Spring term Lecture 6

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

Exercises and Answers to Chapter 1

Association studies and regression

Gibbs Sampling in Latent Variable Models #1

Ordered Response and Multinomial Logit Estimation

Machine learning - HT Maximum Likelihood

Semiparametric Models and Estimators

Graduate Econometrics I: Unbiased Estimation

Short Questions (Do two out of three) 15 points each

analysis of incomplete data in statistical surveys

Math 181B Homework 1 Solution

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Introduction: structural econometrics. Jean-Marc Robin

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008

Modeling Binary Outcomes: Logit and Probit Models

Lecture 1: OLS derivations and inference

Practice Final Exam. December 14, 2009

Greene, Econometric Analysis (6th ed, 2008)

Large Sample Properties & Simulation

ECON 4160, Autumn term Lecture 1

Linear Regression. Junhui Qian. October 27, 2014

Formulary Applied Econometrics

Final Exam. Economics 835: Econometrics. Fall 2010

Math 152. Rumbos Fall Solutions to Assignment #12

Transcription:

MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22

Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22

Maximum Likelihood Estimation - Introduction For a linear model y = X β + ε, we can use OLS/2SLS, etc. MLE can estimate both linear and non-linear models. Basic idea: Specify a parametric pdf for your observed data. Find the values of parameters that make your data most likely. If the distributional assumption is correct, MLE is ecient. Li Zhao MLE and GMM 3 / 22

Maximum Likelihood Estimation Random, i.i.d. sample y 1,y 2,...y n. The likelihood function is the joint distribution of (y 1,...y n ): L(y;θ) = f (y 1,y 2,...y n ;θ). The likelihood estimator ˆθ MLE maximizes L(y;θ). Because y's are independent, L(y;θ) = f (y i ;θ). i We often use logarithm, equivalently, ˆθ MLE maximizes LL(y;θ) = ln(f (y i ;θ)). i Li Zhao MLE and GMM 4 / 22

MLE Example: Normal Distribution If y 1,y 2,...y n are i.i.d sample from N(µ,σ 2 ), the likelihood function is written: f (y 1,y 2,...y n µ,σ 2 ) = i 1 σ 2π exp(1 2 (ˆµ, ˆ σ 2 ) maximize the log likelihood function (y i µ) 2 σ 2 ). LL(y 1,y 2,...y n µ,σ 2 ) = n logσ n 2 log(2π) 1 2σ i (ˆµ, ˆ σ 2 ) satisfy the two FOCs LL µ = 1 σ (x 2 i µ) ˆµ MLE = x i. n. i LL σ = n σ + 1 σ (x 3 i µ) 2 ˆσ 2 = i(x i µ) 2. n i (y i µ) 2. Li Zhao MLE and GMM 5 / 22

MLE Example: Tobit If y i > 0, its density function is If y i = 0, its probability function is y = xβ + ε. { y if y y = > 0 0 if y = 0. f (y i µ,σ 2 ) = 1 σ 2π exp(1 (y i x i β) 2 ). 2 σ 2 Pr(y i = 0 µ,σ 2 ) = Φ(x i β). The log likelihood function is LL(y;θ) = ln(f (y i ;θ)). i 1 = [1(y i > 0) i σ 2π exp(1 (y i x i β) 2 ) + 1(y 2 σ 2 i = 0) Φ(x i β)]. Li Zhao MLE and GMM 6 / 22

Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 7 / 22

GMM - Introduction GMM is a generic method for estimating parameters in statistical models. It is applicable to cases where the full shape of the distribution function of the data may not be known, and therefore maximum likelihood estimation is not applicable. GMM is an alternative based on minimal assumptions. GMM estimation is often possible where a likelihood analysis is extremely dicult. As we will see soon, many applications in empirical IO end up with some moment conditions. GMM was developed by Lars Peter Hansen in 1982 as a generalization of the method of moments. Hansen shared the 2013 Nobel Prize in Economics in part for this work. Li Zhao MLE and GMM 8 / 22

Moments In GMM, we wish to build estimators around conditions such as E[g(y i,x i ;θ)] = 0. We need as least as many "identifying moments" as parameters. We may impose more moments than parameter so not all moments can hold simultaneously. GMM can encompass many estimation techniques we are familiar with: OLS: E[x i ε i ] = 0. IV: MLE: E[z i ε i ] = 0. E[ LL θ ] = 0. Li Zhao MLE and GMM 9 / 22

GMM Estimator We can moments in expectations E[g(y i,x i ;θ)] = 0. To estimate θ, we specify a positive denite matrix (called the weighting matrix) W n and nd parameters that minimize the following generalized distance: ˆθ GMM = arg min θ Q(θ), where Q(θ) = g n (θ) W n g n (θ) and g n (θ) is a sample average of the moments g n (θ) = 1 n i g(y i,x i ;θ). Li Zhao MLE and GMM 10 / 22

Example: Method of Moment Estimator of the Mean Assume that {y 1,...y n } are random variables drawn from a population with expectation µ. We have a single moment condition g(y i ; µ) = E(y i µ) = 0. The sample average of the moment is The MM estimator is maximizer of g n (θ) = 1 n (y i µ). ˆµ MM = arg min θ ( 1 n (y i µ)) 2, we get ˆµ MM = 1 n y i, which is the sample average. Li Zhao MLE and GMM 11 / 22

Example: Instrumental Variable The moment conditions are y = X β + ε g(β) = E[z i (y i X i β)] = 0 The corresponding sample moments are given by g n (β) = 1 n Z (Y X β). When the number of instruments is greater than the number of exogenous, we have more moments than the number of unknowns. ˆβ GMM minimizes ( 1 n Z (y X β)) W n ( 1 n Z (y X β)). Take FOC, ˆβ GMM = (X ZW n Z X ) 1 X ZW N Z y. Li Zhao MLE and GMM 12 / 22

Example: Bernoulli Bernoulli random variable y takes only two values 0 and 1 with probabilities. Its mean and variance is p and p(1 p). It has density function f (y p) = p y (1 p) 1 y. Matlab Illustration: MLE, MM and GMM estimation of Bernoulli distribution. Li Zhao MLE and GMM 13 / 22

Bernoulli - MLE function LL = LL_bernoulli(y,p) f1 = p; f0 = (1-p); f = f1.*(y==1) + f0.*(y==0); LL = -sum(log(f))'; end p0 = 0.5; A = [ 1; -1]; b = [1;0]; p_mle= fmincon(@(p)ll_bernoulli(y,p),p0,a,b); Li Zhao MLE and GMM 14 / 22

Bernoulli - MM and GMM Method of Moments function Q = MM_bernoulli(y,p) Q = (p-mean(y))^2; end p_mm = fmincon(@(p)mm_bernoulli(y,p),p0,a,b); GMM function Q = GMM_bernoulli(y,p) Q1 = (p-mean(y))^2; Q2 = (p*(1-p) - std(y))^2; Q = Q1 + Q2; end p_gmm = fmincon(@(p)gmm_bernoulli(y,p),p0,a,b); Li Zhao MLE and GMM 15 / 22

Ecient GMM Estimation The variance of ˆθ GMM GMM depends on the weight matrix, W n. The ecient GMM estimator has the smallest possible (asymptotic) variance. Let S be the var-cov matrix of g(y i, µ), It can be shown that the optimal weight matrix, W n, has the property that plim W OPT n = S 1. A moment with small variance is informative and should have large weight. Li Zhao MLE and GMM 16 / 22

Two-Step Ecient GMM We need an optimal weight matrix, but that depends on the parameters. Two-step ecient GMM: Step 1: choose an initial weight matrix, for example I, and nd a consistent but less ecient rst-step GMM estimator ˆθ [1] = arg min θ g n (θ) W [1] g n (θ), Step 2: Let W [2] = [ 1 n g(y i, ˆθ [1]] 1. Find the ecient estimator ˆθ [2] = arg min θ g n (θ) W [2] g n (θ), The estimator is not unique as it depends on the initial weight matrix W [1]. Li Zhao MLE and GMM 17 / 22

Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 18 / 22

Bernoulli Random Variable Bernoulli random variable y takes only two values 0 and 1 with probabilities. It has density function f (y p) = p y (1 p) 1 y. Li Zhao MLE and GMM 19 / 22

Extension from Bernoulli to Binary Choice Models Consider the case in which p (the probability of the event y = 1 (success)) varies across individuals. p i is a function of dependent X i : p i = F (X i ). The choice of functional form F ( ) is up to you, and dierent choices provides dierent models. Linear probability model: Pr(y = 1 X i ) = X β. Probit: Pr(y = 1 X i ) = Φ(X i β), Pr(y = 0 X i) = 1 Φ(X i β). Logit: Pr(y = 1 X i ) = exp(x β ) 1+exp(X i β ), Pr(y = 0 X 1 i) = 1+exp(X β ). Li Zhao MLE and GMM 20 / 22

Matlab Probit function LL = LL_probit(y,X,b) f1 = normcdf(x*b); f0 = 1 - f1; f = f1.*(y==1) + f0.*(y==0); LL = -sum(log(f)); end Logit function LL = LL_logit(y,X,b) my_exp= exp(x*b); f1 = my_exp./(1+my_exp); f0 = 1./(1+my_exp); f = f1.*(y==1) + f0.*(y==0); LL = -sum(log(f)); end Li Zhao MLE and GMM 21 / 22

Summary Maximum likelihood GMM Commonly used in nonlinear models. Ecient if the parametric assumption is correct. Relax parametric assumptions. Can be useful in cases where MLE is dicult to use. GMM is very popular in Empirical IO. Li Zhao MLE and GMM 22 / 22