Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Similar documents
Econometrics I, Estimation

Introduction to Estimation Methods for Time Series models Lecture 2

Graduate Econometrics I: Maximum Likelihood I

A Very Brief Summary of Statistical Inference, and Examples

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Greene, Econometric Analysis (6th ed, 2008)

Econometrics I. Ricardo Mora

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Maximum Likelihood Estimation

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Maximum Likelihood Estimation. only training data is available to design a classifier

Inference in non-linear time series

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Lecture 1: Introduction

Time Series Analysis Fall 2008

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Asymptotics for Nonlinear GMM

Chapter 3. Point Estimation. 3.1 Introduction

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Graduate Econometrics I: Maximum Likelihood II

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation of Dynamic Regression Models

ECE531 Lecture 10b: Maximum Likelihood Estimation

Lecture 3 September 1

Stat 451 Lecture Notes Numerical Integration

Nonlinear GMM. Eric Zivot. Winter, 2013

In manycomputationaleconomicapplications, one must compute thede nite n

CCP Estimation. Robert A. Miller. March Dynamic Discrete Choice. Miller (Dynamic Discrete Choice) cemmap 6 March / 27

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

A Very Brief Summary of Statistical Inference, and Examples

Maximum Likelihood Estimation

Estimation, Inference, and Hypothesis Testing

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

ECON 616: Lecture 1: Time Series Basics

Maximum Likelihood Estimation

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Section 9: Generalized method of moments

Testing for a Global Maximum of the Likelihood

Lecture 14 More on structural estimation

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Lecture 4 September 15

Statistics Ph.D. Qualifying Exam

DA Freedman Notes on the MLE Fall 2003

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

ADVANCED FINANCIAL ECONOMETRICS PROF. MASSIMO GUIDOLIN

LECTURE 18: NONLINEAR MODELS

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

Testing Algebraic Hypotheses

Parameter Estimation

Linear Regression and Its Applications

Classical Estimation Topics

Answer Key for STAT 200B HW No. 8

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

Integration, differentiation, and root finding. Phys 420/580 Lecture 7

STAT Advanced Bayesian Inference

Econ 583 Final Exam Fall 2008

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Chapter 3 : Likelihood function and inference

5.2 Fisher information and the Cramer-Rao bound

BTRY 4090: Spring 2009 Theory of Statistics

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

EM Algorithm II. September 11, 2018

Maximum likelihood estimation

Introduction: structural econometrics. Jean-Marc Robin

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

1 Solutions to selected problems

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models

5601 Notes: The Sandwich Estimator

Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Lecture Stat Information Criterion

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

Lie Groups for 2D and 3D Transformations

On the Power of Tests for Regime Switching

A General Overview of Parametric Estimation and Inference Techniques.

Information in a Two-Stage Adaptive Optimal Design

ML Testing (Likelihood Ratio Testing) for non-gaussian models

Covariance function estimation in Gaussian process regression

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

Lecture 14 October 13

Section 8.2. Asymptotic normality

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

University of Pavia. M Estimators. Eduardo Rossi

Mathematical statistics

MATH 4211/6211 Optimization Basics of Optimization Problems

1/37. Convexity theory. Victor Kitov

Can we do statistical inference in a non-asymptotic way? 1

The properties of L p -GMM estimators

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

A First-Order Framework for Solving Ellipsoidal Inclusion and. Inclusion and Optimal Design Problems. Selin Damla Ahipa³ao lu.

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Transcription:

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools Joan Llull Microeconometrics IDEA PhD Program

Maximum Likelihood Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 2

The Likelihood Principle Likelihood principle: our estimate of θ is the one that maximizes the likelihood of our sample (y, X) = ((y 1, x 1 ),...(y N, x N ) ). Likelihood is the probability of observing the sample, i.e. Pr[y, X; θ] for discrete data; f(y, X; θ) for continuous. The likelihood function is L N (θ) f(y, X; θ)= f(y X; θ)f(x; θ), which for i.i.d. data is f(y, X; θ) = N f(y i, x i ; θ). We assume f(x; θ) = f(x) so we can focus on f(y X; θ). Hence, our object of interest is the (conditional) log-likelihood function: N L N (θ) ln f(y i x i ; θ). Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 3

The Maximum Likelihood Estimator (MLE) The MLE is dened by the following optimization problem: ˆθ ML arg max θ Θ This estimator is: - Fully parametric - An extremum estimator - An m-estimator The FOC of the problem is: 1 N L N(θ) = arg max θ Θ 1 N N ln f(y i x i ; θ). 1 N N ln f(y i x i ; ˆθ ML ) θ = 0. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 4

Identication The true parameter vector θ 0 is identied if there are no observationally equivalent parameters. More formally, θ 0 is identied if the Kullback-Leibler inequality is satised: Pr[f(y x; θ) f(y x; θ 0 )] > 0 θ θ 0. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 5

Regularity conditions If the following two assumptions hold, i. The specied density f(y x; θ) is the data generating process (dgp) ii. The support of y does not depend on θ then the regularity conditions are satised: [ ] ln f(y x; θ) E f = 0 θ [ 2 ] ln f(y x; θ) E f θ θ = E f [ ln f(y x; θ) θ ] ln f(y x; θ) θ. The latter condition is a.k.a. information matrix equality. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 6

Consistency Using identication and the rst regularity condition: E[ln f(y x; θ)] < E[ln f(y x; θ 0 )] θ θ 0. By the Law of Large Numbers (LLN): 1 N N ln f(y i x i ; θ) E[ln f(y x; θ)]. p Then, as N, ˆθ ML = arg max L N (θ) p arg max L 0 (θ)= θ 0, whenever: i. The parameter space Θ is compact ii. L N(θ) is measurable for all θ Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 7

Asymptotic distribution Using a rst order Taylor Expansion of the FOC around θ 0: ( 1 N 2 l i(θ ) N N(ˆθ θ0) = N θ θ ) 1 1 N where l i(θ) ln f(y i x i; θ), and θ is between ˆθ and θ 0. l i(θ 0), θ Assuming i.i.d. observations and regularity+consistency conditions, by LLN: ( ) 1 1 N [ ] 2 l i(θ ) 2 1 N θ θ E l i(θ 0) p θ θ. By CLT: 1 N N l i(θ 0) θ d N ( 0, E [ li(θ 0) θ ]) l i(θ 0) θ. Finally, using Cramer theorem and IM equality: [ ] 2 1 [ l i(θ 0) li(θ 0) N(ˆθ θ0) N (0, Ω 0), Ω 0 = E d θ θ = E θ ] 1 l i(θ 0) θ. Since Ω 0 = IM 1, it is the Cramer-Rao lower bound (ecient estimator). Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 8

Generalized Method of Moments Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 9

General formulation Let θ be the parameter vector of interest, dened by the set of moments (or orthogonality conditions): where E[ψ(w; θ)] = 0, - w is a (vector) random variable, - and ψ( ) is a vector function such that dim(ψ) dim(θ). Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 10

Estimation Consider a random sample with N observations {w i } N. GMM estimation is based on the sample moment conditions: b N (θ) 1 N N ψ(w i ; θ). The GMM estimator minimizes the quadratic distance of b N (θ) from zero: ˆθ GMM arg min θ Θ b N(θ) W N b N (θ), where W N is semi-positive denite, and rank(w N ) dim(θ). If the problem is just-identied (dim(ψ) = dim(θ)): b N (ˆθ GMM ) = 0. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 11

Consistency GMM is an extremum estimator, so general consistency results hold, as in MLE. Conditions and intuition are similar to MLE: - Parameter space Θ R K is compact. - W N b N (θ) W 0 E[ψ(w; θ)]. p - Identication: W 0 E[ψ(w; θ)] = 0 θ = θ 0. If these conditions hold, ˆθ GMM p θ 0. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 12

Asymptotic distribution Following similar steps as in the MLE case, if - ˆθ GMM is a consistent estimator of θ 0, - θ is in the interior of Θ, - ψ(w; θ) is once dierentiable with respect to θ, - D N (θ) b N (θ)/θ p D 0 (θ), D 0 (θ) continuous at θ = θ 0, - For D 0 D 0 (θ 0 ), the matrix D 0 W 0D 0 is non-singular, - Nb N (θ 0 ) N (0, V 0 ), with V 0 = E[ψ(w; θ 0 )ψ(w; θ 0 ) ], d then N(ˆθ GMM θ 0 ) N (0, Ω 0 ), with: d Ω 0 = (D 0W 0 D 0 ) 1 D 0W 0 V 0 W 0 D 0 (D 0W 0 D 0 ) 1. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 13

Optimal weighting matrix Eciency is achieved with any W N that delivers W 0 = κv 1 0. This includes W N = V0 1 (unfeasible), but also W N = where ˆV N is any consistent estimator of V 0. ˆV 1 N, Optimal GMM estimator is implemented in two steps: 1. Obtain ˆθ GMM (WN 0 ) for an initial guess W N 0. 2. Re-estimate using ( N ) 1 Ŵ opt ψ(w i ; ˆθ GMM (WN))ψ(w 0 i ; ˆθ GMM (WN)) 0 as the new weighting matrix. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 14

Numerical Methods Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 15

Dierentiation We use the denition of a derivative: f f(x + ɛ) f(x) (x) = lim f (x) ɛ 0 ɛ for a small h (e.g. 10 6 ). f(x + h) f(x), h More accurate (and costly) is the two-sided dierential: f (x) f(x + h) f(x h). 2h To compute a gradient f (x), one element is moved at a time. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 16

Newton-Raphson optimization Originally conceived for nding roots. Approximates the function by the tangent line and nds the intercept (iterative procedure): f(x n ) 0 = f (x n ) x n+1 = x n f(x n) x n x n+1 f (x n ). Extension to optimization is natural: In the multivariate case: x n+1 = x n f (x n ) f (x n ). x n+1 = x n [H f (x n )] 1 f (x n ). Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 17

Integration Numerical integration (quadrature) consists of a weighted sum of a nite set of evaluations of the integrand. Integration weights depend on the method and on precision. Deterministic methods: midpoint rule, trapezoidal rule, Simpson's rule, Gaussian,... Alternative: Monte Carlo integration. Chapter 1. A Brief Review of Maximum Likelihood, GMM, and Numerical Tools 18