University of Pavia. M Estimators. Eduardo Rossi

Similar documents
Estimation of Dynamic Regression Models

Asymptotic distribution of GMM Estimator

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

The properties of L p -GMM estimators

Follow links for Class Use and other Permissions. For more information send to:

Closest Moment Estimation under General Conditions

The generalized method of moments

Chapter 4: Asymptotic Properties of the MLE

Understanding Regressions with Observations Collected at High Frequency over Long Span

Closest Moment Estimation under General Conditions

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

Section 8.2. Asymptotic normality

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Introduction to Stochastic processes

A Local Generalized Method of Moments Estimator

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics

GMM Estimation and Testing

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

Maximum Likelihood Estimation

2 Statistical Estimation: Basic Concepts

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

GMM - Generalized method of moments

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models

Chapter 3. Point Estimation. 3.1 Introduction

Instrumental Variables

Economic modelling and forecasting

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

A GENERAL THEOREM ON APPROXIMATE MAXIMUM LIKELIHOOD ESTIMATION. Miljenko Huzak University of Zagreb,Croatia

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Single Equation Linear GMM with Serially Correlated Moment Conditions

System Identification, Lecture 4

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Joint Parameter Estimation of the Ornstein-Uhlenbeck SDE driven by Fractional Brownian Motion

System Identification, Lecture 4

Chapter 4. Theory of Tests. 4.1 Introduction

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Generalized Method of Moments Estimation

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Generalized Method of Moments (GMM) Estimation

EC402: Serial Correlation. Danny Quah Economics Department, LSE Lent 2015

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

Single Equation Linear GMM with Serially Correlated Moment Conditions

Measuring the Sensitivity of Parameter Estimates to Estimation Moments

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

Asymptotic distributions of the quadratic GMM estimator in linear dynamic panel data models

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Hölder regularity estimation by Hart Smith and Curvelet transforms

LECTURE 10 LINEAR PROCESSES II: SPECTRAL DENSITY, LAG OPERATOR, ARMA. In this lecture, we continue to discuss covariance stationary processes.

More Empirical Process Theory

Department of Economics, Boston University. Department of Economics, Boston University

Exogeneity and Causality

Notes for Functional Analysis

A strong consistency proof for heteroscedasticity and autocorrelation consistent covariance matrix estimators

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

The Arzelà-Ascoli Theorem

Inference for Subsets of Parameters in Partially Identified Models

Statistical signal processing

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Testing in GMM Models Without Truncation

Frequentist-Bayesian Model Comparisons: A Simple Example

General Notation. Exercises and Problems

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Stochastic Convergence, Delta Method & Moment Estimators

Convex Analysis and Optimization Chapter 2 Solutions

Multivariate Analysis and Likelihood Inference

1 Background and Review Material

Statistical Inference of Moment Structures

Problem Set 5. 2 n k. Then a nk (x) = 1+( 1)k

Large Sample Properties of Estimators in the Classical Linear Regression Model

Statistical Properties of Numerical Derivatives

Problem Set 2 Solution

1 Math 241A-B Homework Problem List for F2015 and W2016

Missing dependent variables in panel data models

1 Extremum Estimators

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Principles of Real Analysis I Fall VII. Sequences of Functions

Working Paper No Maximum score type estimators

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries

Problems for Chapter 3.

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

Berge s Maximum Theorem

Asymptotics for Nonlinear GMM

Stability of Stochastic Differential Equations

Nonlinear Systems and Control Lecture # 12 Converse Lyapunov Functions & Time Varying Systems. p. 1/1

Integration on Measure Spaces

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

LECTURE 7. k=1 (, v k)u k. Moreover r

GARCH Models Estimation and Inference

Transcription:

University of Pavia M Estimators Eduardo Rossi

Criterion Function A basic unifying notion is that most econometric estimators are defined as the minimizers of certain functions constructed from the sample data, called criterion functions, C T Parameter Space: Θ R p The true value of the parameter: θ 0 Θ An estimator of θ 0 based on a data set of size T is given by: θ T = arg min θ Θ C T(θ) (1) For instance, the OLS for the MLR model is an M-estimator where C T (θ) = t (y t x t(θ)) 2 Eduardo Rossi - Econometrics 10 2

Criterion Function The analysis is centered around the properties of C T and its derivatives. g T (θ) = C T θ = C T θ 1. C T θ p (p 1) (2) Q T (θ) = 2 C T θ θ = 2 C T θ 2 1..... 2 C T θ p θ 1... 2 C T θ 1 θ p. 2 C T θ 2 p (p p) (3) Eduardo Rossi - Econometrics 10 3

Criterion Function g T (θ) = C T (4) Q T (θ) = 2 C T (5) All these quantities are random variables and functions on the parameter space. The Criterion Function can be thought of as having the generic form ( ) 1 T C T (θ) = φ ψ t (θ) T t=1 φ( ) is a continuous function of r variables, ψ t (θ) is an r-vector of continuous functions. The factor 1 T ensures that C T is bounded in the limit as T. Eduardo Rossi - Econometrics 10 4

Criterion Function OLS GLS Maximum Likelihood Estimator: ψ t is scalar, corresponds to the negative of the log-probability or log-probability density associated with each observation. Method of Moments Estimator: Solution to sets of equations relating data moments and unknown parameters, the moments in question being replaced by sample averages to obtain the estimates. GMM Eduardo Rossi - Econometrics 10 5

Asymptotic properties Under regularity conditions, that we have to establish in particular cases, M-estimators satisfy the usual demanded properties of econometric estimators. Let (Ω, F) be a measurable space, and Θ a compact (closed and bounded) subset of R p. Let C : Ω Θ R (6) be a function, such that C(, θ) is F/B measurable for every θ Θ and C(ω, ) is continuous on Θ for every ω F for some F F. Then there exists a F/B p measurable function θ : Ω Θ such that for all ω F, C(ω, θ(ω)) = inf C(ω, θ) (7) θ Θ Eduardo Rossi - Econometrics 10 6

Asymptotic properties Random variable: X : Ω R Ω sample space, a σ-field of events F. (R, B, µ), B denotes the Borel Field of R, and µ is a probability measure on R. If for every B B there exists a set of A F such that X(A) = B, the function X( ) is said to be F/B-measurable. A r.v. is a measurable mapping from a probability space to the real line. Measurability of C(, ω) means that this can be treated as a r.v. when the data are random variables, having a well defined distribution. Eduardo Rossi - Econometrics 10 7

Asymptotic properties The important conditions are: 1. Continuity of C(ω, ) as a function of θ for given data 2. Compactness of Θ Eduardo Rossi - Econometrics 10 8

Asymptotic properties Compactness of Θ Boundedness is an innocuous requirement, large parameter are unreasonable. We cannot exclude points that are closure points. The problem is that the minimum of a function on a set may not exist as a member of the set, unless the set is closed. The asymptotic analysis is only concerned with the behavior of the estimator on a small open neighbourhood of θ 0. Eduardo Rossi - Econometrics 10 9

Consistency To analyze the consistency problem we need some new convergence concepts. Let {X t (θ), t = 1, 2,...} be a random sequence whose members are functions of parameters θ, on a domain Θ. Definition If X t (θ) X(θ) for every θ Θ, X t is said to converge in probability to X pointwise on Θ. Definition If sup θ Θ X t (θ) X(θ) probability to X uniformly on Θ. p 0, X t is said to converge in In general, pointwise convergence uniform convergence. Eduardo Rossi - Econometrics 10 10

Consistency Theorem 1. Θ is compact 2. C T (θ) p C(θ) (a non-stochastic function of θ) uniformly in Θ 3. θ 0 int(θ) is the unique minimum of C(θ) then θ T p θ0. Condition 2 can also be stated in the form sup C T (θ) C(θ) θ Θ p 0 (8) Eduardo Rossi - Econometrics 10 11

Consistency Assumption C T is twice differentiable with respect to θ, at all interior points of Θ, with probability 1. The expectations E(C T ), E(g T ), and E(Q T ) exist, with uniformly in Θ. E(C T ) C E(g T ) g E(Q T ) Q Theorem Given the above assumptions and θ 0 int(θ), if θ 0 is the unique minimum of C then g(θ 0 ) = 0 and Q(θ 0 ) is positive definite. Proof If θ is an interior point of Θ, E[C T (θ)] = E[g T (θ)] (9) 2 E[C T (θ)] = E[Q T (θ)] (10) Eduardo Rossi - Econometrics 10 12

Consistency In fact, by the following theorem Theorem If: for fixed real θ, f(θ) is an integrable random variable; for almost all ω in the sample space, f(θ, ω) is a function of θ, differentiable at the point θ 0, with [f(θ 0 + h, ω) f(θ 0, ω)] h d(ω) (11) for all h in an open neighborhood of 0 not depending on ω, where d is an integrable random variable; Eduardo Rossi - Econometrics 10 13

Consistency then E ( df dθ θ=θ0 ) = de(f) dθ. (12) θ=θ0 This is called differentiation under the integral sign. E(g T (θ 0 )) = 0 E(Q T (θ 0 )) p.d.m. are necessary and sufficient for identification (unique local minimum of E(C T )). The result follows letting T tend to. Eduardo Rossi - Econometrics 10 14

Consistency It depends essentially on the vectors ψ t (θ), for if their average converges uniformly, this property will carry over to C T by Slutsky s Theorem, given that φ( ) is continuous. A sufficient condition on the sequences {ψ it (θ), t = 1, 2,...} for i = 1,...,r is stochastic uniform equicontinuity (ψ it (θ) uniformly continuous, not merely for every finite t, but also in the limit). Eduardo Rossi - Econometrics 10 15

Identification Condition 3 is called identification condition, ensuring the problem is properly specified. If condition 3 is satisfied for a given θ 0, this defines the probability limit of the estimator by construction. The basic requirement for a consistent estimator is that the structure can be distinguished from alternative possibilities, given sufficient data. This requirement fails when there is observational equivalence. Eduardo Rossi - Econometrics 10 16

Identification Definition. Structures θ 1 and θ 2 are said to be observationally equivalent with respect to C T if for every ε > 0 there exists h ε 1 such that for all T T ε P( C T (θ 1 ) C T (θ 2 ) < ε) > 1 ε (13) The criterion changes by only an arbitrarily small amount between point θ 1 and θ 2 of Θ, with probability close to 1. In almost every sample of size exceeding T ε, it is not possible to distinguish between the two structures using C T. Eduardo Rossi - Econometrics 10 17

Identification Definition The true structure θ 0 is said to be globally (locally) identified by C T if no other point in Θ (in an open neighborhood of θ 0 ) is observationally equivalent to θ 0 with respect to C T. Identification by C T is equivalent to condition 3 in the following sense: Theorem Let C T (θ) p C(θ) uniformly in Θ and let θ 0 be a global (local) minimum of C(θ). The minimum is unique (unique in an open neighborhood of θ 0 if and only if θ 0 is asymptotically globally (locally) identified by C T. Eduardo Rossi - Econometrics 10 18

Identification Local identification and global identification coincide when C T is qudratic in θ, so that any minimum is known to be unique. Eduardo Rossi - Econometrics 10 19

Asymptotic Normality The form of the criterion function, with assumptions on the data series, must be such as to lead to the result L TgT (θ 0 ) N(0,A 0 ) A 0 = lim T TE[g T(θ 0 )g T (θ 0 ) ] while, under the assumption of consistency g T (θ 0 ) p 0 (14) To obtain this CLT the vector in question must have the following generic form g T (θ 0 ) = J T 1 T T t=1 v t (15) Eduardo Rossi - Econometrics 10 20

Asymptotic Normality where J T, (p q), is a random matrix J T p J J is a finite limit and E[v t ] = 0 (q 1) The essential step is a quadratic approximation argument based on an application of the mean value theorem. Theorem If TgT (θ 0 ) L N(0,A 0 ) and Q T (θ) p Q(θ) Eduardo Rossi - Econometrics 10 21

Asymptotic Normality uniformly in a neighborhood of θ 0, then T( θt θ 0 ) asy Q 0 1 Tg T (θ 0 ) L N(0,V 0 ) where V 0 = Q 0 1 A 0 Q 0 1. Eduardo Rossi - Econometrics 10 22

Asymptotic Normality To prove the theorem we need the following Lemma. If Y T (θ) is a stochastic function of θ, Y T (θ) p Y (θ) uniformly in an open neighbourhood of θ 0, and θ T p θ0, then Y T ( θ T ) p Y (θ 0 ) this can be thought of as an extension of Slutsky s Theorem. In this case there is a sequence of stochastic functions (i.e., functions depending on random variables other than θ T ) converging to a limit. Eduardo Rossi - Econometrics 10 23

Asymptotic Normality Slutsky s Theorem Let {X T } be a random sequence converging in probability to a constant a, and let g( ) be a function continuous at a. Then g(x T ) p g(a) Cramer s Theorem Let {Y T } and {X T } be random sequences, with Y T d Y and XT p a (a constant). Then 1. X T + Y T d a + Y 2. X T Y T d ay 3. Y T X T d Y a when a 0. Cramér-Wold device If {X T } is a sequence of random vectors, X T D X if and only if for all conformable fixed vectors λ, λ X T D λ X. Eduardo Rossi - Econometrics 10 24

Asymptotic Normality Proof Letting g Ti denote the i-th element of g T, and g Ti ( θ T ) = 0 by definition of θ T, the mean value theorem yields g Ti (θ 0 ) + ( g Ti θ=θ Ti ) ( θ T θ 0 ) = 0 i = 1,...,p where the vector θti is θti = θ 0 + λ i ( θ T θ 0 ) for 0 λ i 1 Stacking these equations up to form a column vector g T (θ 0 ) + Q T( θ T θ 0 ) = 0 (p 1) where Q T denotes Q T with row i evaluated at the point θti i. for each Eduardo Rossi - Econometrics 10 25

Asymptotic Normality After mulpiplying by T [ ] T g T (θ 0 ) + Q T( θ T θ 0 ) = 0 T( θt θ 0 ) = (Q T) 1 Tg T (θ 0 ) Since θ T θ Ti p θ0 p θ 0 i If follows from Q T (θ) p Q(θ) and the Lemma that Q T p Q(θ 0 ) Finally, from the Cramer s Theorem follows that (Q T) 1 Tg T (θ 0 ) d Q(θ 0 ) 1 W Eduardo Rossi - Econometrics 10 26

Asymptotic Normality where W N(0,A 0 ) then T( θt θ 0 ) d N(0,Q(θ 0 ) 1 A 0 Q(θ 0 ) 1 ) noting that Q(θ 0 ) is symmetric. Eduardo Rossi - Econometrics 10 27

Estimation of Covariance Matrices To construct confidence intervals and test hypotheses, requires replacing A 0 and Q 0 with consistent estimates. The estimator of Q 0 is Q T ( θ). Assume then A 0 = lim T TE[g T(θ 0 )g T (θ 0 ) ] 1 T T t=1 v t d N(0,B0 ) TgT (θ 0 ) d N(0,JB 0 J ) and A 0 = JB 0 J Eduardo Rossi - Econometrics 10 28

Estimation of Covariance Matrices In practice v t (θ) must be replaced by v t ( θ), B = 1 T T E[v t v T s] = 1 T t=1 s=1 T E[v t v t] + 1 T t=1 T E[v t v t 1 + v t 1 v t] + t=2 T E[v t v t 2 + v t 2 v t] t=3 +... + 1 T E[v Tv 1 + v 1 v T] If v t is an uncorrelated sequence, like vector martingale sequence (e.g. vector white noise) and serially independent series E[v t v s] = 0 t s in this case B T = 1 T T t=1 v t v t Eduardo Rossi - Econometrics 10 29

Estimation of Covariance Matrices In incompletely specified dynamic models, the orthogonality of the terms cannot be assumed. We can t use the direct sample counterpart, removing the expected value simply gives 1 T T t=1 T v t v s s=1 which is of rank 1 for any T, and does not converge in probability. B < implies that each of the sequences {E[v t v t m], m = 1, 2, 3,...} is summable. Many of E[v t v t j + v t jv t] are zero or close to. Eduardo Rossi - Econometrics 10 30

Estimation of Covariance Matrices Newey and West (1987): B T = 1 T v t v t + T t=1 T 1 j=1 w Tj T t=j+1 ( v t v t j + v t j v t) where w Tj = k(j/m T ) for some function k(x) that is suitably close to 0 for large values of x. m T, called the bandwidth, is a function increasing to infinity with T, although more slowly. Setting 1 x 1 k(x) = 0 otherwise omits terms involving lags larger than m T. Eduardo Rossi - Econometrics 10 31

Estimation of Covariance Matrices k( ) is a kernel function. Bartlett 1 x x 1 k(x) = 0 otherwise Parzen k(x) = 1 6x 2 + 6 x 3 x 1 2 2(1 x ) 3 1 2 x 1 0 otherwise The role of kernel is to taper the contribution of the longer lags smoothly to zero, rather than cutting it off abruptly. The kernel estimators are consistent estimators of B but the estimate may not be positive semidefinite in finite samples. Eduardo Rossi - Econometrics 10 32