Lecture 2: Consistency of M-estimators

Lecture 2: Instructor: Deartment of Economics Stanford University Preared by Wenbo Zhou, Renmin University

References Takeshi Amemiya, 1985, Advanced Econometrics, Harvard University Press Newey and McFadden, 1994, Chater 36, Volume 4, The Handbook of Econometrics.

Consistency Distinction between global and local consistency. Global condition: If Θ is comact, su θ Θ Q n θ) Q θ) 0, Q θ) < Q θ 0 ) for θ θ 0, then ˆθ θ 0, where ˆθ = argmax θ Θ Q n θ) Local condition: If N is a neighborhood around θ 0, Q su nθ) θ N θ Qθ) θ 0, Q θ) < Q θ 0 ) for θ θ 0 and θ N, then inf θ ˆΘ θ θ 0 which Qnθ) θ = 0. 0, where ˆΘ denotes the set of θ for For the local consistency condition, check 1) Qθ 0) θ = 0 and 2) 2 Qθ 0 ) θ θ negative definite.

Consistency for MLE Let L y 1,..., y n, θ) be the JOINT density for i.i.d data y 1,..., y n, then Q n θ) 1 n log L y 1,..., y n, θ) = 1 n n log f y t, θ). Change assumtions to θ 0 is identified, i.e. θ θ 0 f y t, θ) f y t, θ 0 ), E su θ Θ log f y; θ) <. Identification imlies Q θ) < Q θ 0 ) since log f y; θ) f y; θ) E < log E log f y; θ 0 ) f y; θ 0 ) = log f y; θ) dy = log 1 = 0. Condition 2 is a dominance condition for stochastic equicontinuity. MLE consistency holds even if you have a arameter deendent suort of the data.

In general case when y t is not i.i.d, E log L y 1,..., y n ; θ) log EL y 1,..., y n ; θ 0 ) still holds but to justify the strict < is harder. When global condition fails or Θ is not comact, local condition may hold. Examle: Mixture of normal distributions. L = [ n y t λn µ 1, σ1) 2 + 1 λ) N µ2, σ2) 2, ) y t u 1 ) 2 2σ1 2 + 1 λ ex 2πσ2 λ 2πσ1 ex y t u 2 ) 2 2σ 2 2 Set u 1 = y 1 and let σ 1 0, then L increases to. Hence global MLE cannot be consistent, but local MLE is. )].

Consistency for GMM Q n θ) = g n θ) Wg n θ), for g n θ) = 1 n n g z t, θ), and W is the ositive definite weighting matrix. If su θ Θ g n θ) Eg z t, θ) 0, Eg z t, θ) = 0 iff θ = θ 0, then ˆθ argmax θ Q n θ) 0. Global identification in nonlinear GMM model is usually difficult and assumed. But identification in linear models usually reduces to condition that the samle var-cov matrix for regressors is full rank, i.e Ex t x t for iid models, 1 n lim n n x tx t for fixed regressors. For least square, 1 n n y t x tβ) 2 full rank, E y x β) 2. Iff Ex t x t E y x β) 2 E y x β 0 ) 2 = E [x β β 0 )] 2 = β β 0 ) Ex t x t β β 0 ) > 0 if β β 0.

Quantile Regression Conditional τth quantile of y t given x t is a linear regression function x tβ 0, i.e. Pr y t x tβ 0 x t ) F y x tβ 0 x t ) = τ. The τ = 1 2th quantile is the median. Poulation moment condition: E τ 1 y t x tβ 0 )) xt = E τ Pr y t x tβ 0 x t )) xt = 0. Samle moment condition: 0 1 n = 1 n n n x t τ 1 y t x t ˆβ )) x t [τ1 y > x t ˆβ ) 1 τ) 1 y t x t ˆβ )]. Integrate the condition back to obtain the convex objective function Q n β).

Objective function for QR: Q n β) = 1 n = 1 n n [τ 1 y t x tβ)] y t x tβ) n [τ y t x tβ) + + 1 τ) y t x tβ) ] When τ = 1 2, Q n β) = 1 n n y t x tβ becomes the Least Absolute Deviation LAD) regression, which looks for the conditional median. Also, that Ex t x t is full rank imlies global consistency for the linear quantile regression model.

Q n β) for QR has two features: Q n β) is convex so that ointwise convergence is sufficient for uniform convergence over comact Θ and the arameter sace does not have to be comact. No moment conditions are needed for y t to obtain ointwise convergence, this is done by subtracting Q n β 0 ), and Q n β) Q n β 0 ) Q β) Q β 0 ), by alying triangular inequality. Concavity and noncomact arameter set: when Q n θ) is concave for maximization or convex for minimization), then ointwise convergence uniform convergence. Qθ) s local maximization global consistency.

Uniform Convergence in robability) Definition: ˆQ θ) converges in robability to Q θ) uniformly over the comact set θ Θ if ) ɛ > 0, lim P su ˆQ θ) Q θ) > ɛ = 0. T θ Θ Consistency of M-Estimators: If Q T θ) converges in robability to Q θ) uniformly, Q θ) continuous and uniquely maximized at θ 0, ˆθ = argmaxq T θ) over comact arameter set Θ, lus continuity and measurability for Q T θ), then ˆθ θ 0. Consistency of estimated var-cov matrix: Note that it is sufficient for uniform convergence to hold over a shrinking neighborhood of θ 0.

Conditions for Uniform Convergence: Equicontinuity First think about sequence of deterministic functions f n θ). Uniform Equicontinuity for f n θ): lim su su δ 0 n θ θ <δ f n θ ) f n θ) = 0. What if f n θ) may be discontinuous but the size of the jum goes to 0? Asymtotic uniform equicontinuity for f n θ): lim δ 0 lim su n su θ θ <δ f n θ ) f n θ) = 0. Uniform convergence of f n θ): Θ comact, su θ Θ f n θ) 0 if and only if f n θ) 0 for each θ and f n is asymtotically uniformly equicontinuous.

Then the stochastic case Q n θ). Definition: A sequence of random functions Q n θ) is stochastic uniform equicontinuity if ɛ > 0, ) lim δ 0 lim su P n su Q n θ) Q n θ ) > ɛ θ θ <δ Uniform convergence in robability: If Q n θ) 0 for each θ, and Q n θ) is stochastic equicontinuous on θ Θ comact, then su Q n θ) 0. θ Θ = 0.

Lischitz Condition for Stochastic Equicontinuity Simle sufficient condition for stochastic equicontinuity. where the objective function is smooth, differentiable, etc. Lischitz condition: For θ, θ Θ, if Q n θ) Q n θ ) B n d θ, θ ), where lim δ 0 su θ θ <δ d θ, θ ) = 0 and B n = O 1), then Q n θ) is stochastic equicontinuous. Examle: Suose Q n θ) = 1 n n f z t, θ), z t iid, f z t, θ) differentiable with f θ z t, θ), then by Taylor, for θ θ, θ ), Q n θ) Q n θ ) 1 n n f θ zt, θ ) θ θ. If b z t ) = su θ Θ f θ z t, θ) is such that Eb z t ) <, then the Lischitz condition holds with B n = 1 n n b z t).

Uniform WLLN But what to do when the Lischitz condition is not alicable? Uniform WLLN Θ comact, y t iid, g y t, θ) continuous in θ for each y t a.s., Eg y t, θ) = 0, E su θ Θ g y t, θ) <, then ɛ > 0, ) lim P n su 1 θ Θ n n g y t, θ) > ɛ = 0.

Proof: Use ointwise convergence + stochastic equicontinuity. 1 E su θ Θ g y t, θ) < = E g y t, θ) > for each θ, so use SLLN 2 to conclude 1 n n g y t, θ) a.s.) 0 for each θ. 2 Verify stochastic equicontinuity for 1 n n g y t, θ): su 1 θ θ <δ n n g y t, θ) g y t, θ ) 1 su θ θ <δ n 1 n n n g y t, θ) g y t, θ ) su θ θ <δ g y t, θ) g y t, θ ).

Therefore lim δ 0 lim su P n lim lim su P δ 0 n su 1 θ θ <δ n 1 n n n g y t, θ) g ) y t, θ ) > ɛ su θ θ <δ g y t, θ) g y t, θ ) > ɛ E n lim lim su su θ θ <δ g yt, θ) g yt, θ ) δ 0 n nɛ = lim E su g y t, θ) g y t, θ ) δ 0 θ θ <δ Finally use uniform b/o comact Θ) continuity of g y t, θ) and DOM. Since lim δ 0 su θ θ <δ g y t, θ) g y t, θ ) almost surely, and E su δ su θ θ <δ g y t, θ) g y t, θ ) < E2 su θ g y t, θ) <. )