Econometrics I, Estimation

Similar documents
Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Estimation of Dynamic Regression Models

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Brief Review on Estimation Theory

STAT215: Solutions for Homework 2

Chapters 9. Properties of Point Estimators

Graduate Econometrics I: Maximum Likelihood I

Density Estimation. Seungjin Choi

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Statistics: Learning models from data

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

Lecture 7 Introduction to Statistical Decision Theory

Introduction to Estimation Methods for Time Series models Lecture 2

Lecture 8: Information Theory and Statistics

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0

ECE531 Lecture 10b: Maximum Likelihood Estimation

Information in a Two-Stage Adaptive Optimal Design

I. Bayesian econometrics

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Estimation, Inference, and Hypothesis Testing

POLI 8501 Introduction to Maximum Likelihood Estimation

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

2 Statistical Estimation: Basic Concepts

Lecture 2: Consistency of M-estimators

Classical Estimation Topics

1. Fisher Information

ECE 275A Homework 7 Solutions

STAT 730 Chapter 4: Estimation

Efficient Monte Carlo computation of Fisher information matrix using prior information

Graduate Econometrics I: Unbiased Estimation

EIE6207: Maximum-Likelihood and Bayesian Estimation

5.2 Fisher information and the Cramer-Rao bound

Bayesian Inference and MCMC

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Akaike Information Criterion

Covariance function estimation in Gaussian process regression

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Chapter 3: Maximum Likelihood Theory

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Lecture 3 September 1

COMP2610/COMP Information Theory

Graduate Econometrics I: Maximum Likelihood II

A General Overview of Parametric Estimation and Inference Techniques.

Lecture 5 September 19

Maximum Likelihood Estimation

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

BTRY 4090: Spring 2009 Theory of Statistics

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

DA Freedman Notes on the MLE Fall 2003

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

ML estimation: Random-intercepts logistic model. and z

Maximum Likelihood Estimation

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Statistics Ph.D. Qualifying Exam

Measure-Transformed Quasi Maximum Likelihood Estimation

If we want to analyze experimental or simulated data we might encounter the following tasks:

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Techniques Lecture 3

Lecture 1: Introduction

Parametric Inference

Quick Tour of Basic Probability Theory and Linear Algebra

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

ECON 3150/4150, Spring term Lecture 6

Inference in non-linear time series

Parametric Techniques

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Bayesian Methods: Naïve Bayes

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Parameter Estimation

Mathematical statistics

Econometrics II - EXAM Answer each question in separate sheets in three hours

δ -method and M-estimation

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Detection theory. H 0 : x[n] = w[n]

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Economics 583: Econometric Theory I A Primer on Asymptotics

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

EIE6207: Estimation Theory

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

CSC321 Lecture 18: Learning Probabilistic Models

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Transcription:

Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I

Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the data (sample, a random variable, a statistic. An estimate is a particular realization of an estimator. Analog principle: replacing population distribution in the parametric definition with the empirical distribution. Population moments and sample moments. Consistency follows from LLN.

Sample X = {X 1, X 2,..., X n }. Estimator ˆθ = φ (X. Example: ˆθ = 1 n n X i, or even ˆθ = X 1. Measures of closeness: X better than Y if P ( X θ Y θ = 1. P ( X θ > ɛ P ( Y θ > ɛ for every ɛ > 0. P ( X θ < Y θ P ( X θ > Y θ. E (X θ 2 E (Y θ 2. X and Y might not be rankable using the first two criteria. But always rankable using the last two. The last one, mean square error, is most commonly used.

Decomposing mean square error 2 ( E (ˆθ θ = V ˆθ + E ˆθ 2 θ. Other loss functions L (x, in addition to L(x = x 2, can be used. Let X and Y be two estimators of θ. We say X is more efficient than Y if EL (X θ EL (Y θ for all θ Θ and EL (X θ < EL (Y θ for at least one θ. ˆθ is inadmissible if there is another estimator that is more efficient in the sense of the above definition. Otherwise it is admissible. Bayes estimators, and limits of Bayes estimators, are admissible.

Making a choice among estimators is difficult. Possible strategies: Bayesian: prior weights on parameter space. Minimax estimator: ˆθ = arg min θ Θ max θ E ( θ θ 2. Limiting to subclass of estimators: e.g. linear estimators, unbiased estimators, equivariant estimators. ˆθ is unbiased if E ˆθ = θ for all θ Θ.

Sample mean is the best unbiased linear estimator (BLUE of the population mean: ( n V X n V a t X t for all a t satisfying E n a tx t = µ. But sample mean can be dominated by Biased linear estimator. Unbiased nonlinear estimator. Biased nonlinear estimator. Using asymptotic properties to select estimators. In particular compare asymptotic variances.

Asymptotic properties Consistency: ˆθ p θ. Asymptotic normality: d n (ˆθ θ N ( 0, σ 2. Maximum likelihood estimator is usually consistent and asymptotically normal (CAN. Definition: for X = {X 1,..., X n }. ˆθ MLE = arg max log L (X θ. θ Θ where L (X θ = P (X θ for discrete sample, and L (X θ = f (X θ for continuous sample.

Under i.i.d. sampling assumption: n log L (X θ = log p (x t θ for discrete data. Example: n tosses of a coin. X t = 1 if tth toss a head, and 0 otherwise. L = n p xt (1 p 1 xt ( n ( log L = x t log p + n n x t log (1 p. Example 2: X B (n, p, observed X = k, ( n L = p k (1 p n k k log L = log C n k + k log p + (n k log (1 p.

Continuous data, i.i.d sample: n log L (X θ = log f (x t θ for discrete data. Example: {X t }, t = 1,..., n, i.i.d., X t N ( µ, σ 2. L = n ( 1 exp 1 2πσ 2σ 2 (x t µ 2 log L = n 2 log (2π n 2 log σ2 1 2σ 2 Score equations: log L µ = 1 σ 2 T (x t µ = 0 log L σ 2 = n 2σ 2 + 1 2σ 4 n (X t µ 2. n (x t µ 2 = 0

Computation can be difficult Newton-Raphson if the log likelihood is smooth (several times differentiable Q (θ Q (ˆθ 1 + Q ( θ ˆθ 1 + 1 2 Q ( 2 2 ˆθ1 2 θ ˆθ 1. ( Q ˆθ 2 =ˆθ 1 2 Q ˆθ1 2. ˆθ1 Other gradient methods: BHHH etc. Nongradient based methods: Nelder-Mead (Matlab Simulated Annealing. Bayesian methods: Monte carlo markov chain.

Information matrix equality for L (θ L (X θ, E 2 log L (θ 0 = E log L (θ 0 log L (θ 0 Note that expectation is taken under θ 0 : Two step proof. First, the mean of the score function is zero at truth, as long as the support does not depend on θ: log L (θ E θ = E L (θ 1 L (θ L (θ dy = L (θ dy = L (θ dy = 0. Second, differentiate this identity again: E = E θ, 0 = log L (θ E = log L (θ L (θ dy 2 log L (θ log L (θ log L (θ = L (θ dy + L (θ dy =E 2 log L (θ + E log L (θ log L (θ

Cramer-Rao Lower Bound For any unbiased estimator E ˆθ (X = θ. In general, (ˆθ V ( 1 ( 1 E 2 log L (θ = V log L (x Proof by Cauchy-Schwartz: Cov (ˆθ (x, log L (x = E θ ˆθ (x log L (x 1 L (x L (x = ˆθ (x L (x dx = ˆθ (x L (x dx = ˆθ (x L (x dx = E ˆθ = θ = I. (ˆθ V V ( 1 log L (x Cov (ˆθ (x, log L (x

Asymptotic properties M (maximization or minimization estimator theory: ˆθ = arg maxθ Θ Q n (θ. Q n (θ is random, for example, Q n (θ = 1 n log L n (θ = 1 n n log f (X t ; θ. p Typicall Q n (θ Q (θ for deterministic Q (θ, in some uniform sense over θ Θ. Usually, Q (θ = EQ n (θ or Q (θ = lim n EQ n (θ. If Q (θ is uniquely maximized at θ 0, then we should expect ˆθ p θ 0.

Consistency of maximum likelihood estimator (Uniform law of large numbers: Q n (θ = 1 n n p log f (X t ; θ Q (θ = E log f (X ; θ. E log f (X ; θ uniquely maximized at θ 0 by Jensen s inequality. Q (θ Q (θ 0 = E log f (X ; θ E log f (X ; θ 0 =E log = log f (X ; θ f (X ; θ < log E f (X ; θ 0 f (X ; θ 0 = log X :f (X ;θ 0>0 f (X ; θ dx 0. f (X ; θ f (X ; θ 0 f (X ; θ 0 dx E log f (X ; θ E log f 0 (X is also called the Kullback Leibler information criterion (KLIC between f (X ; θ and f 0 (X. Misspecification: possible that there is no θ 0 such that f 0 (X = f (X ; θ 0.

Asymptotic normality of maximum likelihood estimator Holds when θ 0 int (Θ, an interior point. And when the support of X does not depend on θ. Then E log L (x θ 0 = 0. With probability converging to 1: 0 = log L = log L + 2 log L θ0 ˆθ n (ˆθ θ 0 ( 1 = n 2 log L θ θ (ˆθ θ 0. 1 1 log L n θ0

Central limit theorem 1 log L n θ0 d N ( ( 0, Ω = V log f (x (Locally uniform law of large number 1 2 log L p n H = E 2 log f (x; θ 0 θ By slutsky n (ˆθ θ0 d N ( H 1 ΩH 1 By information matrix equality Ω = H, H 1 ΩH 1 = H 1 = Ω 1.

Consistent estimate of asymptotic variance average outer-product of the score function: ( ˆΩ = 1 n log f x t, ˆθ ( log f x t, ˆθ n average Hessian Ĥ = 1 n T 2 log f ( x t ; ˆθ Sandwich formula: Ĥ 1 ˆΩĤ 1. Sandwich formula correct even under misspecification. Pseudo-likelihood.

Normal example continued: X t N ( µ, σ 2. û = 1 n n x t = x ˆσ 2 = 1 n n (x t x 2. log f ( x t ; θ = (µ, σ 2 = 1 2 log (2π 1 2 log σ2 1 (x 2σ 2 t µ 2. ( 1 log L = σ (x 2 t µ 1 2σ + 1 2 2σ (x 4 t µ 2 Note V 2 ( log L = ( log L V 1 σ 1 2 σ (x 4 t µ 1 1 σ (x 4 t µ 2σ 1 4 σ (x 6 t µ 2 ( = E 2 log L 1 = σ 0 2 1 0 2σ 4 ( (x t µ 2 ( = E (x t µ 4 E (x t µ 2 2 = 2σ 4.