Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Similar documents
Econometrics I. Ricardo Mora

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Parameter Estimation

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Econometrics I, Estimation

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Introduction to Maximum Likelihood Estimation

Introduction to Estimation Methods for Time Series models Lecture 2

Advanced Quantitative Methods: maximum likelihood

Large Sample Properties & Simulation

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Applied Econometrics (QEM)

Chapter 3: Maximum Likelihood Theory

POLI 8501 Introduction to Maximum Likelihood Estimation

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Economics 582 Random Effects Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Review of Econometrics

Regression Estimation Least Squares and Maximum Likelihood

Maximum Likelihood (ML) Estimation

Statistics 3858 : Maximum Likelihood Estimators

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0

Discrete Dependent Variable Models

Advanced Quantitative Methods: maximum likelihood

Association studies and regression

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Econometrics of Panel Data

Statistics and Econometrics I

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Asymptotic Properties and simulation in gretl

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Lecture 34: Properties of the LSE

ECE531 Lecture 10b: Maximum Likelihood Estimation

Estimation, Inference, and Hypothesis Testing

ECON 4160, Autumn term Lecture 1

Outline of GLMs. Definitions

Least Squares Estimation-Finite-Sample Properties

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

STAT5044: Regression and Anova. Inyoung Kim

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Graduate Econometrics I: Maximum Likelihood I

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Regression #3: Properties of OLS Estimator

Linear Methods for Prediction

Fractional Imputation in Survey Sampling: A Comparative Review

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Greene, Econometric Analysis (6th ed, 2008)

Econometrics - 30C00200

Graduate Econometrics I: Unbiased Estimation

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Financial Econometrics

Linear Regression. Junhui Qian. October 27, 2014

Econ 583 Final Exam Fall 2008

Reliability of inference (1 of 2 lectures)

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Non-linear panel data modeling

ECONOMETFUCS FIELD EXAM Michigan State University May 11, 2007

Introductory Econometrics

Link lecture - Lagrange Multipliers

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

HT Introduction. P(X i = x i ) = e λ λ x i

Topic 12 Overview of Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Chapter 11. Regression with a Binary Dependent Variable

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Central Limit Theorem ( 5.3)

[y i α βx i ] 2 (2) Q = i=1

COS513 LECTURE 8 STATISTICAL CONCEPTS

System Identification, Lecture 4

Lecture 4 September 15

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

System Identification, Lecture 4

ECONOMETRICS. Bruce E. Hansen. c2000, 2001, 2002, 2003, University of Wisconsin

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

ECON 3150/4150, Spring term Lecture 6

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

Linear Methods for Prediction

Linear Regression and Its Applications

Econometrics Master in Business and Quantitative Methods

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Statistical Inference

Generalized Linear Models. Kurt Hornik

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Regression Models - Introduction

Exercises Chapter 4 Statistical Hypothesis Testing

A Course on Advanced Econometrics

Brief Review on Estimation Theory

Inference in non-linear time series

Statistics: Learning models from data

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

11. Bootstrap Methods

Transcription:

Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4

General Approaches to Parameter Estimation are there general approaches to estimation that produce estimators with good properties, such as consistency, and eciency? Least Squares: OLS, FGLS, FE Method of Moments: Assume θ = g EY. replaces population by sample moments: ˆθ = g E N [y i ]. OLS, FGLS, IV, FE Maximum Likelihood ML: loosely speaking, it chooses ˆθ which maximizes the estimate of the empirical density Why Should We Use ML? Nice asymptotic results under mild conditions Easy to implement for non-linear models: labor force participation decision, employment decision marriage/divorce decisions, number of kids a couple want big investment project decision means of transport choice... that is, useful for discrete choices...

Basic Setup Let {y 1, y,..., y N } be a sample from a population each with a probability f Y ;θ 0. We know f but do not know θ 0 We assume that observations {y 1, y,..., y N } are independent, so that f y 1, y,..., y N ;θ 0 = f y 1 ;θ 0 f y ;θ 0...f y N ;θ 0 Likelihood function: the function obtained for a given sample after replacing true θ 0 by any θ Lθ = f y 1 ;θf y ;θ...f y N ;θ Lθ is a random variable because it depends on the sample Denition The maximum likelihood estimator of θ 0, ˆθ ML, is the value of θ that maximizes the likelihood function Lθ. usually, it is more convenient to work with the logarithm of the likelihood function l θ = loglθ = N i=1 log f y i;θ

Example: Bernoulli 1/4 Y is Bernoulli: { likelihood for observation i : { takes value 1 with probability p0 takes value 0 with probability 1 p 0 p 0 if y i = 1 1 p 0 if y i = 0 let n 1 be the number of observations with 1. Then, under iid sampling Lp = p n 1 1 p n n 1 Example: Bernoulli /4 Each sample gives us one likelihood function Suppose we observe this: {0,1,0,0,0} Then we have this likelihood: Lp = p1 p 4 Suppose we observed instead this sample:{1, 0, 0, 1, 1} ones are more frequent Now we have a dierent likelihood: Lp = p 3 1 p

Example: Bernoulli 3/4 Sample: {0,1,0,0,0} Sample: {1,0,0,1,1} 0.09 0.035 0.08 0.03 0.07 0.06 0.05 Lp 0.05 0.04 Lp 0.0 0.015 0.03 0.0 0.01 0.01 0.005 0 0 0. 0.4 0.6 0.8 1 0 0 0. 0.4 0.6 0.8 1 p p with the rst sample, ˆp = 0. with the second sample, ˆp = 0.6 Example: Bernoulli 4/4 the maximum likelihood estimator is the value that maximizes Lp = p n 1 1 p n n 1 the same ˆp maximizes the logarithm of the likelihood function lp = n 1 logp + n n 1 log1 p lp p = 0 n 1 ˆp = n n 1 1 ˆp ˆp = n 1 n with {0,1,0,0,0} ˆp = 1 5 = 0. with{1,0,0,1,1} ˆp = 3 5 = 0.6

Computing the MLE ML estimates are sometimes easy to compute, as in the previous example in the linear regression model with normal errors, ML coincides with OLS sometimes, however, there is no algebraic solution to the maximization problem It is necessary to use some sort of nonlinear maximization procedure: STATA will take care of this Classical Assumptions Gauss-Markov Assumptions A1: Linearity: y = β 0 x + ε A: Random Sampling A3: Conditional Mean Independence: E[y x ] = β 0 x A4: Invertibility of Variance-covariance Matrix A5: Homoskedasticity: Var[ε x ] = σ0 Normality A6: Normality: y x Nβ 0 x,σ 0

Basic Setup Let {y 1, y,..., y N } be an iid sample from the population with density y x N β0 x,σ0. We aim to estimate θ 0 = β 0,σ0 Because of the iid assumption, the joint distribution of {y 1, y,..., y N } is simply the product of the densities: f y 1, y,..., y N x 1,..., x N ;θ 0 = f y 1 x 1 ;θ 0 f y x ;θ 0...f y N x N ;θ 0 Note that y x N β0 x,σ 0 ε N 0,σ 0. This implies that f Y X y i x i ;θ 0 = f ε y i β x i ;θ 0 Density of the Error Term We have that ε N 0,σ 0, so what is its density? We can use the following trick: 1 ε N 0,σ 0 implies that ε N 0,1 ε N 0,1 implies that CDF ε z = Pr ε z z = Φ 3 since the density of any continuous random variable is the rst derivative of its CDF: 1 z f ε z;θ 0 = φ

Density of the Sample Since and 1 z f ε z;θ 0 = φ f Y X y i x i ;θ 0 = f ε y i β x i ;θ 0 and f y 1, y,..., y N x 1,..., x N ;θ 0 = f y 1 x 1 ;θ 0 f y x ;θ 0...f y N x N ;θ 0 then we have that f y 1, y,..., y N x 1,..., x N ;θ 0 = i { 1 φ } yi β 0 x i The Log-likelihood 1/ The likelihood replaces the actual values of the parameters for real variables: { } 1 yi β x i Lβ,σ = φ σ σ i taking the log makes the problem easier { [ ]} 1 yi β x i log Lβ,σ = log + log φ σ σ i

The Log-likelihood / the rst term inside the sum is a constant for all observations { [ ]} yi β x i log Lβ,σ = Nlog σ + log φ σ i ] and given that φ yi β x i σ = π 1 exp[ yi β x i σ we have that 1 1 yi β x i log Lβ,σ = Nlog + πσ σ i The ML Estimator: FOC The ML estimator is the value for β,σ such that the log-likelihood is maximized We obtain the maximum of the likelihood by setting the partial derivatives with respect to β,σ to zero With respect to β, this implies which implies ˆσ ² x i x i y i ˆβ x i ˆσ y i ˆβ x i = 0 = 0 With respect to σ, this implies ˆσ ² = 1 y i ˆβ x i N i

Some Final Comments MLE for ˆβ is exactly the same estimator as OLS ˆσ is not the same as the unbiased estimator s = 1 N 1 i y i ˆβ x i ˆσ = N 1 N s is biased, but the bias disappears as N increases Consistency Assumptions nite-sample identication: lθ takes dierent values for dierent θ sampling: a law of large numbers is satised by 1 n Σ i l i ˆθ asymptotic identication: max lθ provides a unique way to determine the parameter in the limit as the sample size tends to innity. Under these conditions, the ML estimator is consistent plim ˆθ ML = θ

Asymptotic Normality Assumptions consistency lθ is dierentiable and attains an interior maximum a Central Limit Theorem can be applied to the gradient Under these conditions the ML estimator is asymptotically normal n 1 / ˆθ θ N 0,Σ as n where Σ = plim 1 n 1 H i Asymptotic Eciency and Variance Estimation If lθ is dierentiable and attains an interior maximum the MLE must be at least as asymptotically ecient as any other consistent estimator that is asymptotically unbiased Consistent estimators of the Variance-Covariance Matrix empirical hessian: var H ˆθ = = [ 1 n H 1 i ˆθ ] 1 [ T ] 1 1 BHHH, var BHHH ˆθ = n g 1 iˆθ n g iˆθ the sandwich estimator: valid even if the model is misspecied robust option in STATA

ML estimates are the values which maximize the likelihood function under the Gauss-Markov assumptions plus normality of the error term, ˆβ ML OLS is exactly the same estimator as ˆβ under general assumptions, ML is consistent, asymptotically normal, and asymptotically ecient