Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Similar documents
Discrete Dependent Variable Models

Lecture 2: Consistency of M-estimators

Asymptotics for Nonlinear GMM

Labor Supply and the Two-Step Estimator

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

Maximum Likelihood (ML) Estimation

Fall, 2007 Nonlinear Econometrics. Theory: Consistency for Extremum Estimators. Modeling: Probit, Logit, and Other Links.

δ -method and M-estimation

1 Extremum Estimators

Estimation of Dynamic Regression Models

Chapter 4: Asymptotic Properties of the MLE

Chapter 1. GMM: Basic Concepts

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Math 328 Course Notes

Section 8: Asymptotic Properties of the MLE

Time Series Analysis Fall 2008

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Econ 583 Final Exam Fall 2008

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models

Lecture Notes 3 Convergence (Chapter 5)

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

A Local Generalized Method of Moments Estimator

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries

University of Pavia. M Estimators. Eduardo Rossi

Introduction: structural econometrics. Jean-Marc Robin

Nonlinear Programming (NLP)

Berge s Maximum Theorem

Econometric Methods and Applications II Chapter 2: Simultaneous equations. Econometric Methods and Applications II, Chapter 2, Slide 1

Lecture notes on statistical decision theory Econ 2110, fall 2013

A Course on Advanced Econometrics

The properties of L p -GMM estimators

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

Bayesian Learning in Social Networks

Lecture Notes on Metric Spaces

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

Lecture 4 Noisy Channel Coding

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Lecture 8: Basic convex analysis

Estimation, Inference, and Hypothesis Testing

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

LECTURE NOTES 57. Lecture 9

11. Further Issues in Using OLS with TS Data

Introduction to Maximum Likelihood Estimation

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Large Sample Properties of Estimators in the Classical Linear Regression Model

Statistical inference

V. Properties of estimators {Parts C, D & E in this file}

Introduction to Estimation Methods for Time Series models Lecture 2

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Quick Review on Linear Multiple Regression

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Linear Models in Econometrics

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

Final Exam. Economics 835: Econometrics. Fall 2010

The Influence Function of Semiparametric Estimators

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

Closest Moment Estimation under General Conditions

Section 9: Generalized method of moments

Multivariate Time Series: Part 4

Estimation and Inference with Weak Identi cation

Some Background Material

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Graduate Econometrics I: Maximum Likelihood I

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Stochastic Models (Lecture #4)

3 Integration and Expectation

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

ECON 4160, Spring term 2015 Lecture 7

Regression #3: Properties of OLS Estimator

Lecture 3: Lower Bounds for Bandit Algorithms

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Metric Spaces and Topology

Notes on Measure Theory and Markov Processes

Part III. 10 Topological Space Basics. Topological Spaces

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Uncertainty and Disagreement in Equilibrium Models

EMERGING MARKETS - Lecture 2: Methodology refresher

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Introduction to Estimation Methods for Time Series models. Lecture 1

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

2 Statistical Estimation: Basic Concepts

Econ 2148, spring 2019 Statistical decision theory

Probabilistic Choice Models

L p Spaces and Convexity

Generalized Method of Moments (GMM) Estimation

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Sets, Functions and Metric Spaces

Transcription:

Analogy Principle Asymptotic Theory Part II James J. Heckman University of Chicago Econ 312 This draft, April 5, 2006

Consider four methods: 1. Maximum Likelihood Estimation (MLE) 2. (Nonlinear) Least Squares 3. Methods of Moments 4. Generalized Method of Moments (GMM) These methods are not always distinct for a particular problem. Consider the classical normal linear regression model: = + 1

Under standard assumptions 1. (0 2 ); i.i.d.; 2. non-stochastic; and 3. 0 full rank. OLS is all four rolled into one. In this lecture we will show how one basic principle the Analogy Principle underlies all of these modern econometric methods. 2

1 Analogy Principle: The Large Sample Version This is originally due to Karl Pearson or Goldberger. The intuition behind the analogy principle is as follows: Supposeweknowsomepropertiesthataresatisfiedforthe true parameter" in the population. If we can find a parameter value in the sample that causes the sample to mimic thepropertiesofthepopulation,wemightusethisparameter value to estimate the true parameter. 3

The main points of this principle are set out in 1.1. The conditions for consistency of the estimator are discussed in 1.2. In 1.3, an abstract example with a finite parameter set illustrates the application of the analog principle and the role of regularity conditions in ensuring consistency of the estimator. 4

1.1 Outline of the Analog Principle The ideas constituting the analog principle can be grouped under four steps: 1. The model. Suppose there exists a true model in the population ( 0 )=0 animplicitequation. 0 is a true parameter vector, and the model is defined for other values (at least some other) of. 2. Criterion function. Based on the model, construct a criterion function of model and data: ( ) which has property in the population when = 0 (the true parameter vector). 5

3. Analog in sample. In the sample, construct an analog to, ( ), which has the following properties: a. s. unif. (a) ( ) ( ) ( ), where is the sample size; and (b) ( ) mimics in sample properties of ( ) 1 in the population. 4. The estimator. Letˆ be the estimator of 0 in sample formed from the analog in sample, which causes sample (.) to have the property. 1 Hereafter, we shall suppress the dependence of the criterion function and the sample analog on the data ( ) for notational convenience. 6

1.2 Consistency and Regularity Definition 1 (Consistency) We say that an estimator ˆ is a consistent estimator of parameter 0 if ˆ 0. In general, to prove that the estimator formed using the analog principle ˆ is consistent, we need to make the following two assumptions, which are referred to as the regularity conditions: 1. Identification condition. Generally this states that only 0 causes to possess property, at least in some neighborhood of 0 ; i.e. 0 must be at least locally identified. 7

2. Uniform convergence of the analog function. We need the sample analog (ˆ ) to converge nicely. In particular, we need the convergence of the sample analog criterion function to the criterion function ( ) ( ) ( ) to be uniform. 8

The first condition ensures that 0 can be identified. If there is more than one value of that causes to have property, thenwecannotbesurethatonlyonevalueof will cause to assume property in the sample. In this case we may not be able to determine what ˆ estimates. The second condition is a technical one that ensures that if has a property such as continuity or di erentiability, this property is inherited by. Specific proofs of consistency depend on which property we suppose has when = 0. 9

1.3 An Abstract Example The intuition underlying the analog principle method and the role of the regularity conditions is illustrated in the simple (non-stochastic) example below. Assume an abstract model ( 0 ) which leads to a criterion function ( ) which has the property := ( ) is maximized at = 0 (in population). Construct a sample analog of the criterion function such that ( ) ( ). 10

Select ˆ to maximize ( ) for each. Then, if convergence is OK (we get that under the regularity conditions),wehaveconvergenceinthesampletothemaximizer in the population; i.e. we have ˆ as. 11

Now suppose = {1 2 3} is a finite set, so that ˆ assumes only one of three values. Then under regularity condition 1 (q.v. 1.2), ( ) is maximized at one of these. Further, by construction we have = {1 2 3} ( ) ( ). Note that the rule picks ˆ = argmax ( ). This estimator must work well (i.e. be consistent for for big enough, because for each. Why? We loosely sketch a proof of the consistency of ˆ for below. 12

Say 0 =2so that ( ) is maximized at =2, (2) (1), and (2) (3). Now ( ) ( ) ( ). This implies that as gets large, ( ) ( ) ( ) 0. In other words, as gets big we get ( ) arbitrarily close to ( ). 13

Now suppose that even for very large, ( ) is not maximized at 2 but say at 1. Thenwehave ( ) (1) (2) 0. Then under regularity condition 2 (q.v. 1.2) this would imply (1) (2) 0, a contradiction. Hence the estimator ˆ. Here principle is an example of the extremum principle. This principle involves either maximization or minimization of the criterion function. Examples include the maximum likelihood (maximization) and nonlinear least squares (minimization) methods. 14

2 Overview of Convergence Concepts for Non-stochastic Functions Most estimators involve forming functions using the data available. These functions of data can be viewed as sequences of functions, with the index being the number of data points available. With su cient data (large samples), these functions of sample data converge nicely (assuming certain conditions), ensuring that the estimators we form have good properties, such as consistency and asymptotic normality. 15

In this section, we examine certain basic concepts about convergence of sequences of functions. The key idea here is that of uniform convergence; we shall broadly motivate why this notion of convergence is required for us. For simplicity, we look at non-stochastic functions and nonstochastic convergence notions. Broadly construed, the same ideas also apply to stochastic functions analogous convergence concepts are applicable in the stochastic case. 16

2.1 Pointwise Convergence Let : be a sequence of real valued functions. For each form a sequence ( ) ª.Let be the set =1 of points for which ( ) converges. ( ) ( ) := lim ( ) We then call ( ) a limit function, and say ( ) ª =1 converges pointwise to ( ) on. Nota bene, pointwise convergence is not enough to guarantee that if has a property (continuity, di erentiability, etc.) that that property is inherited by. 17

Inheritance requires uniform convergence. Usually,thisisa su cient condition for the properties of being shared by the limit function. Some of the properties we will be interested in are summarized below: Does ( ) ( ) is continuous = ( ) is continuous? Does ( ) lim ( ) = ( 0 ) = lim ( ) = ( 0 )? 0 0 I.e. does lim lim ( ) = lim lim ( )? 0 0 R Does lim ( ) = R lim ( ) = R ( ) (where ( ) is the limit function)? Theanswertoallthreequestionsis: No, not in general. The pointwise convergence of ( ) to ( ) is generally not su - 18

cient to guarantee that these properties hold. This is demonstrated by the examples which follow. 19

Example 1 Consider ( ) = (0 1). Here, ( ) is continuous for every, but the limit function is: ( 0 if 0 1 ( ) = 1 if 0 =1 This is discontinuous at =1. 20

Example 2 Consider ( ) = + ( R). Then the limit function is ( ) = lim ( ) =0for each fixed. Thisimpliesthatlim lim ( ) =0. But we have lim ( ) =1for every fixed. Thisimpliesthat lim lim ( ) =16= lim lim ( ) =0. 21

Example 3 Consider ( ) = (1 2 ) (0 1). Then the limit function is 2 ( ) = lim ( ) =0 for each fixed. This implies that R 1 ( ) =0. But we also 0 get Z 1 lim ( ) = lim 0 2 +2 = 1 2. 2 See Rudin, Chapter 3 for concepts relating to convergence of sequences. 22

2.2 Uniform Convergence A sequence of functions { } converges uniformly to on if ( 0)( )( )( ) ( ) ( ) where depends on but not on, i.e., ( ) ( ) ( ) ( )+. Intuitively, for any as gets large enough, ( ) lies in a band of uniform thickness around the limit function ( ). 23

Note that the convergence displayed in the examples in 2.1 did not satisfy the notion of uniform convergence. We have theorems that ensure that the properties of inheritance referred to in 2.1 are satisfied when convergence is uniform. 3 3 See Rudin, Chapter 7. Theorems 7.12, 7.11, and 7.16 respectively ensure that the three properties listed in 2.1 hold. 24

3 Applications of the Analog Principle 25

3.1 Moment Principle In this case, the criterion function is an equation connecting population moments and 0. = population moments of [ ] =0 at = 0. (1) Thus, the property here is not maximization or minimization (as in the extremum principle), but setting some function of the population moments (generally) to zero at = 0,sothat solving equation 1 we get the estimator: ˆ = sample moments [y,x] 26

When forming the estimator analog in the sample, we can consider two cases. 3.1.1 Case A Here we solve for 0 in the population equation, obtaining 0 = 0 population moments [y,x]. From this equation we construct the simple analog ˆ = ˆ population moments [y,x]. Given regularity condition 1 (q.v. 1.2), we get ˆ that we do not need regularity condition 2. 0.Note 27

Example 4 (OLS) 1. The model. = 0 + E( )=0 Var( )= 2 E( 0 )=0 E( 0 )= X positive definite E( 0 )= X 2. Moment principle (criterion function). In the population E( 0 ) = 0 or P 0 = 0 0 + 0 : = P 0 +0 0 = P 1 P. 28

3. Analog in sample. ˆ = μp 0 Now, r.h.s. P 1 P 1 μp.thus,ˆ 0. 29

3.1.2 Case B Here we form the criterion function, form the sample analog for the criterion function, and then solve for the estimator. We require condition 1 (q.v. 1.2) and uniform convergence (condition 2) to get consistency of the estimator i.e. for ˆ 0. Example 5 (OLS another analogy) 1. The model. As in 3.1.1. 2. Moment principle (criterion function). In the population : E( 0 )=0. 30

3. Analog in sample. Here we define an ample analog of : b = ˆ This mimics in the criterion function, so that we can form the sample analog of the criterion function by substituting b for in the expression for above: = 1 X ( ˆ ) =0 4. The estimator. Here we pick ˆ by solving from to arrive at the relation 1 X μp = ˆ. 31

Remarks. Recall that in Case A (q.v. 3.1.1) we did not form, but first solved for 0 and then formed ˆ directly, skipping step3ofcaseb. Observe that when =1, is the mean. ˆ = P OLS can also be interpreted as an application of the extremum principle (s.v. 3.2). 32

3.2 Extremum Principle We saw in 1.3 that under the extremum principle, property P provides that ( ) achieves a maximum or minimum at = 0 in the population. This principle underlies the nonlinear least squares (NLS) and maximum likelihood estimation (MLE) methods. As their names suggest, MLE choose to be a maximum while NLS choose to be the minimum. 33

The OLS estimator can be seen as a special case of the NLS estimators, and can be viewed as an extremum estimator. In this section we analyze OLS and NLS as extremum estimators. MLE is examined in more detail in 4. 34

Example 1 (OLS as an extremum principle) 1. Themodel.Weassumethetruemodel = = = 0 + + ( 0 )+ = ( ) 0 ( )= ( 0 ) 0 0 ( 0 ) +2 0 ( 0 )+ 0. 2. Criterion function. = E ( ) 0 ( ) From model assumptions, we have E( 0 )=0. Thus =( 0 ) 0 X ( 0 )+ 2. 35

So is minimized (with respect to ) at = 0. Here, then, is an example of the extremum principle. It is minimized when = 0 (the true parameter vector). 3. Analog in sample. A sample analog of the criterion function is constructed as = 1 X ( ) 0 ( ). 0 =1 36

We can show that this analog satisfies the key requirement of the analog principle: plim = plim 1 X ( ) 0 ( ) 0 =1 =( 0 ) 0 X ( 0 )+ 2 = (Assuming conditions for application of some LLN are satisfied; q.v. Part I of these lectures.) 4. The estimator. We pick ˆ to minimize.understandard regularity conditions, we can show that we get contradiction unless ˆ 0. 37

Example 2 (NLS as an extremum principle) 1. Themodel.Weassumethatthefollowingmodelholdsin the population: = ( ; 0 )+ (non-linear model) = = ( ; )+ ( ; 0 ) ( ; ) +. Assume ( ) i.i.d. Then implies that ( ) ( ; ). 38

2. Criterion function. Choose the criterion function as = E ( ; ) 2 = E ( ; 0 ) ( ; ) 2 + 2. Then is minimized at = 0 (a true parameter value). If = 0 is the only such value, the model is identified with respect to the criterion regularity condition 1 (q.v. 1.2) is satisfied. 3. Analog in sample. ( ) := 1 X =1 ( ; ) 2 As in the OLS case, we can show that plim =. 39

4. The estimator. We construct the NLS estimator as ˆ = argmin ( ). We thus choose ˆ to minimize ( ). Reductio ad absurdum verifies that ( 0 ) ( 0 ) and ( ) ( ) ( ) = ˆ. 40

Remark. The NLS estimator could also be derived as a moment estimator, justasintheolsexample(q.v. 3.1). 1. The model. Same as the non-linear model above. We have ( ; ). 2. Criterion function. = E ( ; ) =0.Notethat this is only one implication of. Wemaynowwrite ( ; ) = = = E ( ; ) ( ; ) =0 3. Analog in sample. := 1 X =1 ( ; ) ( ; ) 41

4. The estimator. Find which sets =0(or as close to zero as possible). 4 Maximum Likelihood The maximum likelihood estimation (MLE) method is an example of the extremum principle. In this section, we look at the ideas underlying MLE and examine the regularity conditions and convergence notions in more detail for this estimator. 42

4.1 The Model Suppose that the joint density of data is ( ; 0 ) ( ; 0 ) ( ). Assume that is exogenous i.e. the density of is uninformative about 0. Also assume random sampling. We arrive at the likelihood function Y L = ( ; 0 ). =1 Taking ( ) as data, L becomes a function of 0. The log likelihood function is X X X ln L = ln ( ; ) = ln ( ; )+ ln ( ). =1 =1 43 =1

4.2 Criterion Function In the population define the criterion function as = E 0 ln ( ; ) = Z ln ( ; ) ( ; 0 ) d d. (We assume this integral exists.) We pick the that maximizes L. Notethatthisisanextremum principle application of the analogy principle. 44

Claim. The criterion function is maximized at = 0. Proof. μ ( ; ) E 0 =1 because ( ; 0 ) Z ( ; ) ( ; 0 ) ( ; 0 ) d d =1. 45

Applying Jensen s inequality, concavity of the ln function implies that E ln( ) ln E( ) μ ( ; ) = E 0 μln 0 ( ; 0 ) = ( ) E 0 ln ( ; ) E 0 ln ( ; 0 ). We get global identification in the population if the inequality is strict for all 6=. 46

4.3 Analog in Sample Construct a sample analog of the criterion function as := 1 X =1 ln ( ; ). 4.4 The Estimator and its Properties We form the estimator as ˆ := argmax (weassumethisexists). 47

Local form of the principle. In the local form, we use FOC and SOC to arrive at argmax. Recallthatwehavethe criterion function Z ( ) = ln ( ; ) ( ; 0 ) d = E 0 ln ( ; ). Maximization of the criterion function yields first and second order conditions: FOC: SOC: Z ln ( ) ( ; 0 ) d =0 Z 2 ln ( ; ) 0 ( ; 0 ) negative definite 48

Accordingly, we require for the sample analog: 1 1 X =1 X =1 ln ( ; ) 2 ln 0 =0 negative definite For local identification, we require that the second order conditions be satisfied locally around the point solving the FOC. For global identification, we need SOC to hold for every. 49

Either way (e.g. directly by a grid search or using the FO/SOCs), we have the same basic idea. Foreach,wepickˆ such that ( ) (ˆ ) ( ). Now if (ˆ ) (lim ˆ ) (uniform convergence), we get the contradiction (ˆ ) ( 0 ) assumed to be a maximum value unless plim ˆ = 0. To be more precise, we must check whether uniformly (almost surely). It remains to cover certain concepts and definitions for random functions. 50

5 Some Concepts and Definitions for Random (Stochastic) Functions In 5.1 we define random functions and examine some fundamental properties of such functions. In 5.2 we define convergence concepts for sequences of random functions. 51

5.1 Random Functions and Some Properties Definition 2 (Random Function) Let ( A ) be a probability space and let R. A real function ( ) = ( ) on is called a random function on if ( R 1 )( ) : ( ) ª A. We can then assign a probability to the event: ( ). 52

Proposition. If ( ) is a continuous real-valued function on where is compact, then ( ) sup ( ) and ( ) inf ( ) are continuous functions. Proof. See Stokey, Lucas, and Prescott, Chapter 3 for definitions of sup and inf, and for the proof of this proposition (q.v. Theorem 3.6). 53

Proposition. If for almost all values of, ( ) is continuous with respect to at the point 0, and if for all in a neighborhood of 0 we have ( ) 1 ( ), then Z lim 0 ( ) d ( ) = Z ( 0 ) d ( ). I.e. lim E ( ) = E ( 0 ). 0 Proof. This is a version of a dominated convergence theorem. See inter alios Royden, Chapter 4. 54

Proposition. If for almost all values of and for a fixed value of (a) (b) ( ) exists (in a neighborhood of ), and ( + ) ( ) 2( ), for 0 0, independent of, then Z Z ( ) ( ) d ( ) = d ( ). I.e. E ( ) ( ) = E. 55

5.2 Convergence Concepts for Random Functions In Part I (asymptotic theory) we defined convergence concepts for random variables. Here we define analogous concepts for random functions. Definition 3 (Almost Sure Convergence) Let ( ) and ( ) be random functions on for each. Then ( ) almost surely converges to ( ) as if ( 0) : lim ( ) ( ) ª =1; i.e.ifforeveryfixed the set such that ( ) ( ), 0 ( ), has no probability. 56

= may have a non-negligible probability even though any one set has negligible probability. We avoid this by the following definition. Definition 4 (Almost Sure Uniform Convergence) ( ) ( ) almost surely uniformly in if sup ( ) ( ) 0 almost surely as. I.e., if ( 0)( ) : lim sup ( ) ( ) ª =1. In this case, the negligible set is not indexed by. 57

Definition 5 (Convergence in Probability) Let ( ) and ( ) be random functions on. Then ( ) ( ) in probability uniformly in on if lim sup ( ) ( ) ª =0. 58

Theorem 1 (Strong Uniform Law of Large Numbers) Let { } be a sequence of random 1 i.i.d. vectors. Let ( ) be a continuous real function on. is compact (it is closed and bounded, and thus has a finite subcover). Define ( ) = sup sup ( ). Let ( ) be the distribution of. AssumeE[ ( )]. Then, 1 X ( ) =1 a. s. unif. Z ( ) d ( ). 59

This is a type of LLN, and could be modified for the non-i.i.d. case. In that case, each has its own distribution,andwe would require that (a) (b) 1 X and =1 sup 1 X E ( ) 1+. =1 Then 1 X ( ) =1 a. s. unif. Z ( ) d ( ). Note that we need a bound in either case on E[ ( ) ]. 60

References 1. Amemiya, Advanced Econometrics, 1985, Chapter3. 2. Greenberg and Webster, Advanced Econometrics, 1991, Chapter 1. 3. Newey and McFadden, Large Sample Estimation and Hypothesis Testing in Handbook of Econometrics, 1994, Chapter 36, Volume IV. 4. Royden, Real Analysis, 1968, Chapter 4. 5. Rudin, Principles of Mathematical Analysis, 1976, Chapters 3 and 7. 61

6. Stokey, Lucas, and Prescott, Recursive Methods in Economic Dynamics, 1989,Chapter3. 62