The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models

Similar documents
The properties of L p -GMM estimators

Introduction to Estimation Methods for Time Series models Lecture 2

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Inference in non-linear time series

Information Criteria and Model Selection

Graduate Econometrics I: Maximum Likelihood I

Fall, 2007 Nonlinear Econometrics. Theory: Consistency for Extremum Estimators. Modeling: Probit, Logit, and Other Links.

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Estimation of Dynamic Regression Models

University of Pavia. M Estimators. Eduardo Rossi

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Chapter 3: Maximum Likelihood Theory

Chapter 4: Asymptotic Properties of the MLE

PARAMETER ESTIMATION OF CHIRP SIGNALS IN PRESENCE OF STATIONARY NOISE

Section 8: Asymptotic Properties of the MLE

Theoretical Statistics. Lecture 1.

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

1 Stat 605. Homework I. Due Feb. 1, 2011

Financial Econometrics and Volatility Models Estimation of Stochastic Volatility Models

ARMA MODELS Herman J. Bierens Pennsylvania State University February 23, 2009

Berge s Maximum Theorem

STA205 Probability: Week 8 R. Wolpert

Parameter Estimation

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Testing Algebraic Hypotheses

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

ECE531 Lecture 10b: Maximum Likelihood Estimation

On Parameter Estimation of Two Dimensional Chirp Signal

The generalized method of moments

A strong consistency proof for heteroscedasticity and autocorrelation consistent covariance matrix estimators

Location Multiplicative Error Model. Asymptotic Inference and Empirical Analysis

Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Limiting Distributions

Theoretical Statistics. Lecture 12.

7 Influence Functions

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS

Maximum Likelihood Estimation

Chapter 3. Point Estimation. 3.1 Introduction

Limiting Distributions

Lecture 21: Convergence of transformations and generating a random variable

PARAMETER ESTIMATION OF CHIRP SIGNALS IN PRESENCE OF STATIONARY NOISE

ST5215: Advanced Statistical Theory

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Comparison of inferential methods in partially identified models in terms of error in coverage probability

Asymptotics of minimax stochastic programs

Working Paper No Maximum score type estimators

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

DA Freedman Notes on the MLE Fall 2003

Follow links for Class Use and other Permissions. For more information send to:

Sums of exponentials of random walks

A GENERAL THEOREM ON APPROXIMATE MAXIMUM LIKELIHOOD ESTIMATION. Miljenko Huzak University of Zagreb,Croatia

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Propagating terraces and the dynamics of front-like solutions of reaction-diffusion equations on R

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

STAT 7032 Probability Spring Wlodek Bryc

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Probability and Measure

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

The Logit Model: Estimation, Testing and Interpretation

5 Birkhoff s Ergodic Theorem

Lecture I: Asymptotics for large GUE random matrices

Analysis Comprehensive Exam, January 2011 Instructions: Do as many problems as you can. You should attempt to answer completely some questions in both

λ(x + 1)f g (x) > θ 0

8 Laws of large numbers

Elements of Probability Theory

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

Lecture 6 Basic Probability

Estimates for probabilities of independent events and infinite series

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

4.6 Montel's Theorem. Robert Oeckl CA NOTES 7 17/11/2009 1

17. Convergence of Random Variables

On Least Absolute Deviation Estimators For One Dimensional Chirp Model

Lecture 2: Consistency of M-estimators

Nonlinear GMM. Eric Zivot. Winter, 2013

4 Sums of Independent Random Variables

Introduction to Stochastic processes

Strong Consistency of Set-Valued Frechet Sample Mean in Metric Spaces

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Section 8.2. Asymptotic normality

ON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING AND THE BOOTSTRAP. Joseph P. Romano Azeem M. Shaikh

STAT 512 sp 2018 Summary Sheet

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Separate Appendix to: Semi-Nonparametric Competing Risks Analysis of Recidivism

Econometrics I, Estimation

Greene, Econometric Analysis (6th ed, 2008)

Stochastic process for macro

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Exercises from other sources REAL NUMBERS 2,...,

Estimation, Inference, and Hypothesis Testing

Hartogs Theorem: separate analyticity implies joint Paul Garrett garrett/

The Arzelà-Ascoli Theorem

Transcription:

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models Herman J. Bierens Pennsylvania State University September 16, 2005 1. The uniform weak law of large numbers In econometrics we often have to deal with sample means of random functions. A random function is a function that is a random variable for each fixed value of its argument. In crosssection econometrics random functions usually take the form of a function g(z,2) of a random vector Z and a non-random vector 2. For example, consider a Logit model: P[Y j ' y X j ] ' y % (1 & y)exp(&α & βt X j ) 1 % exp(&α & β T X j ), y ' 0,1, where Y j 0 {0,1} is the dependent variable and X j 0ú k is a vector of explanatory variables. Denoting Z j ' (Y j,x T j ) T, and given a random sample {Z 1,Z 2,...,Z n }, the log-likelihood function involved takes the form ' n j'1 g(z j,θ), where g(z j,θ) ' ln Y j % (1 & Y j )exp(&α & β T X j ) & ln 1 % exp(&α & β T X j ) ' Y j (α % β T X j ) & ln 1 % exp(α % β T X j ),whereθ ' (α,β T ) T. (1) For such functions we can extend the weak law of large numbers for i.i.d. random variables to a Uniform Weak Law of Large Numbers (UWLLN): Theorem 1: Let Z j, j = 1,..,n, be a random sample from a k-variate distribution. Let g(z,2) be a Borel measurable function on Ζ Θ, where Ζ dú k is a Borel set such that P[ 0 Ζ] ' 1, and Θ is a compact subset of ú m, such that for each z 0 Ζ, g(z,θ) is a continuous function on Θ. Furthermore, let 1

E[ *g(z j,θ)*] <4. (2) Then plim n64 *(1/n)' n j'1 g(z j,θ) & E[g(Z 1,θ)]* ' 0. Note that subsets of Euclidean spaces are compact if and only if they are closed and bounded. See, for example, Bierens (2004), Appendix II, Theorem II.2. The original proof of the stronger result *(1/n)' n j'1 g(z j,θ) & E[g(Z 1,θ)]* 60a.s., was given in the seminal paper of Jennrich (1969). This proof is explained in detail in Bierens (2004, Appendix to Chapter 6). The condition that the random vectors Z j are i.i.d. can be relaxed, because the result in Theorem 1 also holds for strictly stationary time series processes with a vanishing memory: Definition 1: A (vector) time series process X t 0ú k is strictly stationary if for arbitrary integers m 1 < m 2 <...< m n the joint distribution of X T t&m 1,...,X T does not depend on the time index t. t&m n T Definition 2: A (vector) time series process X t 0ú k has a vanishing memory if all the sets in the remote F-algebra ö &4 ' _ t σ {X t&j } 4 j'0 have either probability zero or one. Note that if the X t s are independent then by Kolmogorov s zero-one law the time series X t has a vanishing memory. It has been shown in Bierens (2004, Theorem 7.4) that Theorem 2: If X t 0ú k is a strictly stationary time series process with vanishing memory, and E[ X t ] < 4, then plim n64 (1/n)' n t'1 X t ' E[X 1 ]. I will use this result to prove the following more general version of Theorem 1. To be able to generalize the UWLLN to the time series case where the random functions involved depend on 2

the entire past of the time series rather than on a finite dimensional vector of variables, I will reformulate and prove Theorem 1 under slightly different moment conditions. Theorem 3: Let 0ú k be a strictly stationary vector time series process with a vanishing memory, 1 defined on a common probability space {Ω,ö,P}. Let g(z,2) be a Borel measurable real function on Ζ Θ 0, where Ζ dú k is a Borel set such that P[ 0 Ζ] ' 1, and Θ 0 is an open subset of ú m, such that for each z 0 Ζ, g(z,θ) is a continuous function on Θ 0. Furthermore, let 1 be a compact subset of Θ 0. Finally, assume that for each θ ( 0 Θ there exists an arbitrary small * > 0, possibly depending on θ (, such that E sup θ&θ( # δ g(z 1,θ) < 4, E inf θ&θ ( # δ g(z 1,θ) > &4. (3) Then plim n64 *(1/n)' n j'1 g(z j,θ) & E[g(Z 1,θ)]* ' 0. Proof: Observe from condition (3) that for each θ 0 Θ, E[g(Z 1,θ)] is well-defined. Actually, due to the compactness of 1, (3) implies (2) [Exercise: Why?], so that the latter is a weaker condition than (3). Moreover, it follows from condition (3), the continuity of g(z,θ) in 2, and the dominated convergence theorem, that lim δ90 E sup θ&θ( # δ g(z 1,θ) & inf θ&θ ( # δ g(z 1,θ) ' 0, (4) pointwise in θ ( 0 Θ. Therefore, for an arbitrary g > 0 and each θ ( 0 Θ we can choose a positive number δ(θ (,g) such that, with we have N(θ ( g) ' {θ 0 Θ 0 : θ & θ ( < δ(θ (,g)}, (5) 0 # E sup θ 0 N(θ( g) g(z 1,θ) & inf θ 0 N(θ ( g) g(z 1,θ) < g. (6) Next, observe that the sets (5) are open, so that is an open covering of Θ. ^θ( 0Θ N(θ ( g) Then by the compactness of Θ there exists a finite sub-covering of Θ: 1 Which includes the case that the s are i.i.d. 3

Θ d ^ K i'1 N(θ i g), (7) where K and the vectors θ i 0 Θ depend on g. Using the easy inequality for each θ i 0 Θ, sup x *f(x)* # sup x f(x) % inf x f(x), it is not hard to verify that sup θ0n(θi g) *(1/n)'n t'1 g(,θ) & E[g(Z 1,θ)* # 2 (1/n)' n t'1 sup θ0n(θ i g) g(,θ) & E[sup θ0n(θ i g) g(z 1,θ)] % 2 (1/n)' n t'1 inf θ0n(θ i g) g(,θ) & E[inf θ0n(θ i g) g(z 1,θ)] (8) % 2 E[sup θ0n(θi g) g(z 1,θ)] & E[inf θ0n(θ i g) g(z 1,θ)]. It follows from Theorem 2 that the first two terms at the right-hand side of (8) converge in probability to zero, and from (6) that the last term is less than 2.g. Hence, *(1/n)' n t'1 g(,θ) & E[g(Z 1,θ)* # max 1#i#K sup θ0n(θi g) *(1/n)'n t'1 g(,θ) & E[g(Z 1,θ)* (9) # R n (g) % 2.g, where plim n64 R n (g) ' 0. Theorem 3 follows now straightforwardly from (9). Q.E.D. In time series econometrics there are quite a few cases where we need a UWLLN for functions g(.,θ) depending on &j for all j $ 0. In that case g(.,θ) takes a more general form as a random function: Definition 3: Let {S,ö,P} be the probability space. A random function f(2) on a subset 1 of a Euclidean space is a mapping f(ω,θ): Ω Θ 6úsuch that for each Borel set B in ú and each 2 0 1, {ω 0 Ω: f(ω,θ) 0 B} 0ö. 4

Definition 4: A random function f(θ) on a subset Θ of a Euclidean space is almost surely continuous on Θ if there exists a set A with probability one such that for each continuous in θ 0 Θ. ω 0 A, f(ω,θ) is For example, let 0ú MA(1) process]: be a stationary Gaussian moving average process or order 1 [alias an ' U t & α 0 U t&1, α 0 <1,U t - i.i.d. N(0,σ 2 0 ). (10) Then backwards substitution of U t ' α 0 U t&1 % yields U t ' ' 4 j'0 αj 0 &j, hence '&' 4 j'1 αj 0 &1 % U t (11) Thus, denoting ö t ' σ(u t,u t&1,u t&2,...), the distribution of conditional on ö t&1 is normal with conditional expectation &' 4 j'1 αj 0 &1 and conditional variance σ 2 0. If the s were observable for all t # n, a version of the log-likelihood would take the form ' n j'1 g t (θ), where g t (θ) '& 1 2σ 2 '4 j'0 αj 2 1 &j & 2 ln(σ2 ) & ln 2π, θ ' (α,σ 2 ) T, (12) is a random function. In that case we need to reformulate Theorem 3 as follows. Theorem 4: Let ö t ' σ(u t,u t&1,u t&2,...), where U t is a time series process with vanishing memory. Let g t (θ) be a sequence of a.s. continuous random function on an open subset Θ 0 of a Euclidean space, and let Θbe a compact subset of Θ 0. If for each θ ( 0 Θ there exists an arbitrarily small * > 0 such that (a) g t (θ ( ), sup θ&θ( #δ g t (θ) and inf θ&θ are measurable and strictly stationary, ( #δ g t (θ) ö t (b) E[sup θ&θ( #δ g 1 (θ)] < 4, E[inf θ&θ ( #δ g 1 (θ)] > &4, then plim n64 (1/n)' n t'1 g t (θ) & E[g 1 (θ)] ' 0. 5

2. Consistency of M-estimators Theorems 3 and 4 are important tools for proving consistency of parameter estimators. A large class of estimators are obtained by maximizing or minimizing an objective function of the form (1/n)' n t'1 g t (θ), for example maximum likelihood estimators or nonlinear least squares estimators. These estimators are called M-estimators (where the M indicates that the estimator is obtained by Maximizing or Minimizing a Mean of random functions). Suppose that the conditions of Theorem 4 are satisfied, and that the parameter vector of interest is θ 0 ' argmax θ0θ E[g 1 (θ)]. (13) Note that "argmax" is a short-hand notation for the argument for which the function involved is maximal. Then it seems a natural choice to use ˆθ ' argmax θ0θ (1/n)' n t'1 g t (θ) (14) as an estimator of θ 0. Indeed, under some mild conditions the estimator involved is consistent: Theorem 5: (Consistency of M-estimators) Let ˆθ = argmax θ0θ ˆQ(θ) and θ 0 = argmax θ0θ Q(θ), where ˆQ(θ) = (1/n)' n t'1 g t (θ) and Q(θ) ' E[ ˆQ(θ)] ' E[g 1 (θ)]. If 2 0 is unique then under the conditions of Theorem 4, plim n64ˆθ ' θ 0. Proof: Since a continuous function on a compact set takes its maximum value in this set [see, for example, Bierens (2004, Appendix II )], it follows that ˆθ 0 Θ and θ 0 0 Θ. Moreover, by the same result it follows from the continuity of every g > 0 for which the set {θ 0 Θ: θ&θ 0 $ g} is non-empty, Q(θ) and the uniqueness of 2 0 that for Q(θ 0 )>, 2θ&θ0 2$g Q(θ) (15) [Exercise: Why?] Now by the definition of 2 0, 6

0 # Q(θ 0 ) & Q(ˆθ) ' Q(θ 0 ) & ˆQ(θ 0 ) % ˆQ(θ 0 ) & Q(ˆθ) # Q(θ 0 ) & ˆQ(θ 0 ) % ˆQ(ˆθ) & Q(ˆθ) # 2. * ˆQ(θ) & Q(θ)*, (16) and it follows from Theorem 4 that the right-hand side of (16) converges in probability to zero. Thus: plim n64 Q(ˆθ) ' Q(θ 0 ). (17) Moreover, (15) implies that for arbitrary g > 0 there exists a * > 0 such that Q(θ 0 ) & Q(ˆθ) $ δ if 2ˆθ & θ 0 2$g, hence P 2ˆθ & θ 0 2 > g # P Q(θ 0 ) & Q(ˆθ) $ δ. (18) Combining (17) and (18), the theorem under review follows. Q.E.D. It is easy to verify that Theorem 5 carries over to the "argmin" case. References Bierens, H. J. (2004): Introduction to the Mathematical and Statistical Foundations of Econometrics, Cambridge University Press, Cambridge, U.K. Jennrich, R. I. (1969): Asymptotic Properties of Non-Linear Least Squares Estimators, Annals of Mathematical Statistics 40, 633-643. 7