University of Pavia. M Estimators. Eduardo Rossi

University of Pavia M Estimators Eduardo Rossi

Criterion Function A basic unifying notion is that most econometric estimators are defined as the minimizers of certain functions constructed from the sample data, called criterion functions, C T Parameter Space: Θ R p The true value of the parameter: θ 0 Θ An estimator of θ 0 based on a data set of size T is given by: θ T = arg min θ Θ C T(θ) (1) For instance, the OLS for the MLR model is an M-estimator where C T (θ) = t (y t x t(θ)) 2 Eduardo Rossi - Econometrics 10 2

Criterion Function The analysis is centered around the properties of C T and its derivatives. g T (θ) = C T θ = C T θ 1. C T θ p (p 1) (2) Q T (θ) = 2 C T θ θ = 2 C T θ 2 1..... 2 C T θ p θ 1... 2 C T θ 1 θ p. 2 C T θ 2 p (p p) (3) Eduardo Rossi - Econometrics 10 3

Criterion Function g T (θ) = C T (4) Q T (θ) = 2 C T (5) All these quantities are random variables and functions on the parameter space. The Criterion Function can be thought of as having the generic form ( ) 1 T C T (θ) = φ ψ t (θ) T t=1 φ( ) is a continuous function of r variables, ψ t (θ) is an r-vector of continuous functions. The factor 1 T ensures that C T is bounded in the limit as T. Eduardo Rossi - Econometrics 10 4

Criterion Function OLS GLS Maximum Likelihood Estimator: ψ t is scalar, corresponds to the negative of the log-probability or log-probability density associated with each observation. Method of Moments Estimator: Solution to sets of equations relating data moments and unknown parameters, the moments in question being replaced by sample averages to obtain the estimates. GMM Eduardo Rossi - Econometrics 10 5

Asymptotic properties Under regularity conditions, that we have to establish in particular cases, M-estimators satisfy the usual demanded properties of econometric estimators. Let (Ω, F) be a measurable space, and Θ a compact (closed and bounded) subset of R p. Let C : Ω Θ R (6) be a function, such that C(, θ) is F/B measurable for every θ Θ and C(ω, ) is continuous on Θ for every ω F for some F F. Then there exists a F/B p measurable function θ : Ω Θ such that for all ω F, C(ω, θ(ω)) = inf C(ω, θ) (7) θ Θ Eduardo Rossi - Econometrics 10 6

Asymptotic properties Random variable: X : Ω R Ω sample space, a σ-field of events F. (R, B, µ), B denotes the Borel Field of R, and µ is a probability measure on R. If for every B B there exists a set of A F such that X(A) = B, the function X( ) is said to be F/B-measurable. A r.v. is a measurable mapping from a probability space to the real line. Measurability of C(, ω) means that this can be treated as a r.v. when the data are random variables, having a well defined distribution. Eduardo Rossi - Econometrics 10 7

Asymptotic properties The important conditions are: 1. Continuity of C(ω, ) as a function of θ for given data 2. Compactness of Θ Eduardo Rossi - Econometrics 10 8

Asymptotic properties Compactness of Θ Boundedness is an innocuous requirement, large parameter are unreasonable. We cannot exclude points that are closure points. The problem is that the minimum of a function on a set may not exist as a member of the set, unless the set is closed. The asymptotic analysis is only concerned with the behavior of the estimator on a small open neighbourhood of θ 0. Eduardo Rossi - Econometrics 10 9

Consistency To analyze the consistency problem we need some new convergence concepts. Let {X t (θ), t = 1, 2,...} be a random sequence whose members are functions of parameters θ, on a domain Θ. Definition If X t (θ) X(θ) for every θ Θ, X t is said to converge in probability to X pointwise on Θ. Definition If sup θ Θ X t (θ) X(θ) probability to X uniformly on Θ. p 0, X t is said to converge in In general, pointwise convergence uniform convergence. Eduardo Rossi - Econometrics 10 10

Consistency Theorem 1. Θ is compact 2. C T (θ) p C(θ) (a non-stochastic function of θ) uniformly in Θ 3. θ 0 int(θ) is the unique minimum of C(θ) then θ T p θ0. Condition 2 can also be stated in the form sup C T (θ) C(θ) θ Θ p 0 (8) Eduardo Rossi - Econometrics 10 11

Consistency Assumption C T is twice differentiable with respect to θ, at all interior points of Θ, with probability 1. The expectations E(C T ), E(g T ), and E(Q T ) exist, with uniformly in Θ. E(C T ) C E(g T ) g E(Q T ) Q Theorem Given the above assumptions and θ 0 int(θ), if θ 0 is the unique minimum of C then g(θ 0 ) = 0 and Q(θ 0 ) is positive definite. Proof If θ is an interior point of Θ, E[C T (θ)] = E[g T (θ)] (9) 2 E[C T (θ)] = E[Q T (θ)] (10) Eduardo Rossi - Econometrics 10 12

Consistency In fact, by the following theorem Theorem If: for fixed real θ, f(θ) is an integrable random variable; for almost all ω in the sample space, f(θ, ω) is a function of θ, differentiable at the point θ 0, with [f(θ 0 + h, ω) f(θ 0, ω)] h d(ω) (11) for all h in an open neighborhood of 0 not depending on ω, where d is an integrable random variable; Eduardo Rossi - Econometrics 10 13

Consistency then E ( df dθ θ=θ0 ) = de(f) dθ. (12) θ=θ0 This is called differentiation under the integral sign. E(g T (θ 0 )) = 0 E(Q T (θ 0 )) p.d.m. are necessary and sufficient for identification (unique local minimum of E(C T )). The result follows letting T tend to. Eduardo Rossi - Econometrics 10 14

Consistency It depends essentially on the vectors ψ t (θ), for if their average converges uniformly, this property will carry over to C T by Slutsky s Theorem, given that φ( ) is continuous. A sufficient condition on the sequences {ψ it (θ), t = 1, 2,...} for i = 1,...,r is stochastic uniform equicontinuity (ψ it (θ) uniformly continuous, not merely for every finite t, but also in the limit). Eduardo Rossi - Econometrics 10 15

Identification Condition 3 is called identification condition, ensuring the problem is properly specified. If condition 3 is satisfied for a given θ 0, this defines the probability limit of the estimator by construction. The basic requirement for a consistent estimator is that the structure can be distinguished from alternative possibilities, given sufficient data. This requirement fails when there is observational equivalence. Eduardo Rossi - Econometrics 10 16

Identification Definition. Structures θ 1 and θ 2 are said to be observationally equivalent with respect to C T if for every ε > 0 there exists h ε 1 such that for all T T ε P( C T (θ 1 ) C T (θ 2 ) < ε) > 1 ε (13) The criterion changes by only an arbitrarily small amount between point θ 1 and θ 2 of Θ, with probability close to 1. In almost every sample of size exceeding T ε, it is not possible to distinguish between the two structures using C T. Eduardo Rossi - Econometrics 10 17

Identification Definition The true structure θ 0 is said to be globally (locally) identified by C T if no other point in Θ (in an open neighborhood of θ 0 ) is observationally equivalent to θ 0 with respect to C T. Identification by C T is equivalent to condition 3 in the following sense: Theorem Let C T (θ) p C(θ) uniformly in Θ and let θ 0 be a global (local) minimum of C(θ). The minimum is unique (unique in an open neighborhood of θ 0 if and only if θ 0 is asymptotically globally (locally) identified by C T. Eduardo Rossi - Econometrics 10 18

Identification Local identification and global identification coincide when C T is qudratic in θ, so that any minimum is known to be unique. Eduardo Rossi - Econometrics 10 19

Asymptotic Normality The form of the criterion function, with assumptions on the data series, must be such as to lead to the result L TgT (θ 0 ) N(0,A 0 ) A 0 = lim T TE[g T(θ 0 )g T (θ 0 ) ] while, under the assumption of consistency g T (θ 0 ) p 0 (14) To obtain this CLT the vector in question must have the following generic form g T (θ 0 ) = J T 1 T T t=1 v t (15) Eduardo Rossi - Econometrics 10 20

Asymptotic Normality where J T, (p q), is a random matrix J T p J J is a finite limit and E[v t ] = 0 (q 1) The essential step is a quadratic approximation argument based on an application of the mean value theorem. Theorem If TgT (θ 0 ) L N(0,A 0 ) and Q T (θ) p Q(θ) Eduardo Rossi - Econometrics 10 21

Asymptotic Normality uniformly in a neighborhood of θ 0, then T( θt θ 0 ) asy Q 0 1 Tg T (θ 0 ) L N(0,V 0 ) where V 0 = Q 0 1 A 0 Q 0 1. Eduardo Rossi - Econometrics 10 22

Asymptotic Normality To prove the theorem we need the following Lemma. If Y T (θ) is a stochastic function of θ, Y T (θ) p Y (θ) uniformly in an open neighbourhood of θ 0, and θ T p θ0, then Y T ( θ T ) p Y (θ 0 ) this can be thought of as an extension of Slutsky s Theorem. In this case there is a sequence of stochastic functions (i.e., functions depending on random variables other than θ T ) converging to a limit. Eduardo Rossi - Econometrics 10 23

Asymptotic Normality Slutsky s Theorem Let {X T } be a random sequence converging in probability to a constant a, and let g( ) be a function continuous at a. Then g(x T ) p g(a) Cramer s Theorem Let {Y T } and {X T } be random sequences, with Y T d Y and XT p a (a constant). Then 1. X T + Y T d a + Y 2. X T Y T d ay 3. Y T X T d Y a when a 0. Cramér-Wold device If {X T } is a sequence of random vectors, X T D X if and only if for all conformable fixed vectors λ, λ X T D λ X. Eduardo Rossi - Econometrics 10 24

Asymptotic Normality Proof Letting g Ti denote the i-th element of g T, and g Ti ( θ T ) = 0 by definition of θ T, the mean value theorem yields g Ti (θ 0 ) + ( g Ti θ=θ Ti ) ( θ T θ 0 ) = 0 i = 1,...,p where the vector θti is θti = θ 0 + λ i ( θ T θ 0 ) for 0 λ i 1 Stacking these equations up to form a column vector g T (θ 0 ) + Q T( θ T θ 0 ) = 0 (p 1) where Q T denotes Q T with row i evaluated at the point θti i. for each Eduardo Rossi - Econometrics 10 25

Asymptotic Normality After mulpiplying by T [ ] T g T (θ 0 ) + Q T( θ T θ 0 ) = 0 T( θt θ 0 ) = (Q T) 1 Tg T (θ 0 ) Since θ T θ Ti p θ0 p θ 0 i If follows from Q T (θ) p Q(θ) and the Lemma that Q T p Q(θ 0 ) Finally, from the Cramer s Theorem follows that (Q T) 1 Tg T (θ 0 ) d Q(θ 0 ) 1 W Eduardo Rossi - Econometrics 10 26

Asymptotic Normality where W N(0,A 0 ) then T( θt θ 0 ) d N(0,Q(θ 0 ) 1 A 0 Q(θ 0 ) 1 ) noting that Q(θ 0 ) is symmetric. Eduardo Rossi - Econometrics 10 27

Estimation of Covariance Matrices To construct confidence intervals and test hypotheses, requires replacing A 0 and Q 0 with consistent estimates. The estimator of Q 0 is Q T ( θ). Assume then A 0 = lim T TE[g T(θ 0 )g T (θ 0 ) ] 1 T T t=1 v t d N(0,B0 ) TgT (θ 0 ) d N(0,JB 0 J ) and A 0 = JB 0 J Eduardo Rossi - Econometrics 10 28

Estimation of Covariance Matrices In practice v t (θ) must be replaced by v t ( θ), B = 1 T T E[v t v T s] = 1 T t=1 s=1 T E[v t v t] + 1 T t=1 T E[v t v t 1 + v t 1 v t] + t=2 T E[v t v t 2 + v t 2 v t] t=3 +... + 1 T E[v Tv 1 + v 1 v T] If v t is an uncorrelated sequence, like vector martingale sequence (e.g. vector white noise) and serially independent series E[v t v s] = 0 t s in this case B T = 1 T T t=1 v t v t Eduardo Rossi - Econometrics 10 29

Estimation of Covariance Matrices In incompletely specified dynamic models, the orthogonality of the terms cannot be assumed. We can t use the direct sample counterpart, removing the expected value simply gives 1 T T t=1 T v t v s s=1 which is of rank 1 for any T, and does not converge in probability. B < implies that each of the sequences {E[v t v t m], m = 1, 2, 3,...} is summable. Many of E[v t v t j + v t jv t] are zero or close to. Eduardo Rossi - Econometrics 10 30

Estimation of Covariance Matrices Newey and West (1987): B T = 1 T v t v t + T t=1 T 1 j=1 w Tj T t=j+1 ( v t v t j + v t j v t) where w Tj = k(j/m T ) for some function k(x) that is suitably close to 0 for large values of x. m T, called the bandwidth, is a function increasing to infinity with T, although more slowly. Setting 1 x 1 k(x) = 0 otherwise omits terms involving lags larger than m T. Eduardo Rossi - Econometrics 10 31

Estimation of Covariance Matrices k( ) is a kernel function. Bartlett 1 x x 1 k(x) = 0 otherwise Parzen k(x) = 1 6x 2 + 6 x 3 x 1 2 2(1 x ) 3 1 2 x 1 0 otherwise The role of kernel is to taper the contribution of the longer lags smoothly to zero, rather than cutting it off abruptly. The kernel estimators are consistent estimators of B but the estimate may not be positive semidefinite in finite samples. Eduardo Rossi - Econometrics 10 32