Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Similar documents
Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Econometrics I. September, Part I. Department of Economics Stanford University

Chapter 7: Special Distributions

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

Lecture 2: Consistency of M-estimators

Estimating Time-Series Models

Chapter 3. GMM: Selected Topics

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

CHAPTER 2: SMOOTH MAPS. 1. Introduction In this chapter we introduce smooth maps between manifolds, and some important

Consistency and asymptotic normality

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression

Advanced Econometrics II (Part 1)

Elementary Analysis in Q p

MATH 2710: NOTES FOR ANALYSIS

The following document is intended for online publication only (authors webpage).

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression

1 Probability Spaces and Random Variables

Adaptive Estimation of the Regression Discontinuity Model

Economics 620, Lecture 8: Asymptotics I

2 K. ENTACHER 2 Generalized Haar function systems In the following we x an arbitrary integer base b 2. For the notations and denitions of generalized

7. Introduction to Large Sample Theory

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

General Linear Model Introduction, Classes of Linear models and Estimation

HENSEL S LEMMA KEITH CONRAD

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

1 Extremum Estimators

On a Markov Game with Incomplete Information

Colin Cameron: Brief Asymptotic Theory for 240A

The power performance of fixed-t panel unit root tests allowing for structural breaks in their deterministic components

Estimation of the large covariance matrix with two-step monotone missing data

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Simple Estimators for Semiparametric Multinomial Choice Models

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Asymptotic theory for linear regression and IV estimation

Lecture 3 Consistency of Extremum Estimators 1

Introduction to Group Theory Note 1

Real Analysis 1 Fall Homework 3. a n.

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Economics 583: Econometric Theory I A Primer on Asymptotics

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

HEC Lausanne - Advanced Econometrics

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Numerical Linear Algebra

Chapter 6. Phillip Hall - Room 537, Huxley

Sums of independent random variables

8 STOCHASTIC PROCESSES

Elementary theory of L p spaces

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Asymptotic Statistics-III. Changliang Zou

Lecture 4: Law of Large Number and Central Limit Theorem

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

Notes on Instrumental Variables Methods

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Notes on Random Vectors and Multivariate Normal

arxiv: v1 [physics.data-an] 26 Oct 2012

Convex Analysis and Economic Theory Winter 2018

1 1 c (a) 1 (b) 1 Figure 1: (a) First ath followed by salesman in the stris method. (b) Alternative ath. 4. D = distance travelled closing the loo. Th

Microeconomics Fall 2017 Problem set 1: Possible answers

Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1)

SECTION 5: FIBRATIONS AND HOMOTOPY FIBERS

Adaptive estimation with change detection for streaming data

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP

Proof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at

On the rate of convergence in the martingale central limit theorem

Review of Probability Theory II

Chapter 1. GMM: Basic Concepts

Asymptotic F Test in a GMM Framework with Cross Sectional Dependence

The non-stochastic multi-armed bandit problem

Lecture 21: Convergence of transformations and generating a random variable

Empirical Likelihood for Regression Discontinuity Design

SDS : Theoretical Statistics

A multiple testing approach to the regularisation of large sample correlation matrices

Probability Lecture III (August, 2006)

Chater Matrix Norms and Singular Value Decomosition Introduction In this lecture, we introduce the notion of a norm for matrices The singular value de

Applicable Analysis and Discrete Mathematics available online at HENSEL CODES OF SQUARE ROOTS OF P-ADIC NUMBERS

Asymptotically Optimal Simulation Allocation under Dependent Sampling

CMSC 425: Lecture 4 Geometry and Geometric Programming

Estimation of spatial autoregressive panel data models with xed e ects

Simple Estimators for Monotone Index Models

We begin by thinking about population relationships.

On Wald-Type Optimal Stopping for Brownian Motion

EE 508 Lecture 13. Statistical Characterization of Filter Characteristics

On split sample and randomized confidence intervals for binomial proportions

On Doob s Maximal Inequality for Brownian Motion

Brownian Motion and Random Prime Factorization

PETER J. GRABNER AND ARNOLD KNOPFMACHER

ECE 534 Information Theory - Midterm 2

Nonparametric estimation of Exact consumer surplus with endogeneity in price

Tsung-Lin Cheng and Yuan-Shih Chow. Taipei115,Taiwan R.O.C.

CHAPTER 3: TANGENT SPACE

Nonparametric Estimation of a Polarization Measure

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Chapter 1 Fundamentals

Transcription:

Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the classical least squares estimator for the normal linear regression model, and for other leading secial combinations of distributions and statistics, generally distributional results are unavailable when statistics are nonlinear in the underlying data and/or when the joint distribution of those data is not arametrically seci ed. Fortunately, most statistics of interest can be shown to be smooth functions of samle averages of some observable random variables, in which case, under fairly general conditions, their distributions can be aroximated by a normal distribution, usually with a covariance matrix that shrinks to zero as the number of random variables in the averages increases. Thus, as the samle size increases, the ga between the true distribution of the statistic and its aroximating distribution asymtotes to zero, motivating the term asymtotic theory. There are two main stes in the aroximation of the distribution of a statistic by a normal distribution. The rst ste is the demonstration that a samle average converges (in an aroriate sense) to its exectation, and that a standardized version of the statistic has distribution function that converges to a standard normal. Such results are called limit theorems the former (aroximation of a samle average by its exectations) is called a law of large numbers (abbreviated LLN ), and the second (aroximation of the distribution of a standardized samle average by a standard normal distribution) is known as a central limit theorem (abbreviated CLT ). There are a number of LLNs and CLTs in the statistics literature, which imose di erent combinations of restrictions on the number of moments of the variables being averaged and the deendence between them; some basic versions are given below. The second ste in deriving a normal aroximation of the distribution of a smooth function of samle averages is demonstration that this smooth function is aroximately a linear function of samle averages (which is itself a samle average). Useful results for this ste, which use the usual calculus aroximation of smooth (di erentiable) functions by linear functions, are groued under the heading Slutsky theorems, after the author of one of the articularly useful results. So, as long as the conditions of the limit theorems and Slutsky theorems aly, smooth functions of samle averages are aroximately samle averages 1

themselves, and thus their distribution functions are aroximately normal. The whole aaratus of asymtotic theory is made u of the regularity conditions for alicability of the limit and Slutsky theorems, along with the rules for mathematical maniulation of the aroximating objects. In order to discuss aroximation of a statistic by a constant, or its distribution by a normal distribution, we rst have to extend the usual de nitions of limits of deterministic sequences to accomodate sequences of random variables (yielding the concet of convergence in robability ) or their distribution functions ( convergence in distribution ). Convergence in Distribution and Probability For most alications of asymtotic theory to statistical roblems, the objects of interest are sequences of random vectors (or matrices), say or Y N ; which are tyically statistics indexed by the samle size N; the object is to nd simle aroximations for the distribution functions of or Y N which can be made arbitrarily close to their true distribution functions for N large enough. Writing the cumulative distribution function of as F N (x) Prf xg (where the argument x has the same dimension as and the inequality holds only if it is true for each comonent), we de ne convergence in distribution (sometimes called convergence in law) in terms of convergence of the function F N (x) to some xed distribution function F (x) for any oint x where the F is continuous. That is, the random vector converges in distribution to X, denoted d! X as N! 1; if at all oints x where F (x) is continuous. lim F N(x) = F (x) N!1 A few comments on this de nition and its imlications are warranted: 1. The limiting random vector X in this de nition is really just a laceholder for the distribution function F (x); there needs be no real random vector X which has the limiting distribution function. In fact, it is often customary to relace X with the limiting distribution function F (x) in the 2

de nition, i.e., to write d! F (x) as N! 1; if, say, F (x) is a multivariate standard normal, then we would write d! N(0; I); where the condition N! 1 is usually not stated, but imlied. d 2. If! F (x); then we can use the c.d.f. F (x) to aroximate robabilies for XN if N is large, i.e., Z Prf 2 Ag = A df (x); where the symbol = means A that the di erence in the left- and right-hand sides goes to zero as N increases. A 3. We would like to construct statistics so that their limiting distributions are of convenient (tabulated) form, e.g., normal or chi-squared distribution. This often involves standardizing the statistics by subtracting their exectations and dividing by their standard deviations (or the matrix equivalent). 4. If lim N!1 F N (x) is discontinuous, it may not be a distribution function, which is why the quali cation at all oints where F (x) is continuous has to be aended to the de nition of convergence in distribution. To make this more concrete, suose N 0; 1 ; N so that F N (x) = Nx! 8 < : 0 if x < 0 1 2 if x = 0 1 if x > 0 Obviously here lim N!1 F N (x) is not right-continuous, so we would never use it to aroximate robabilities for (since it isn t a c.d.f.). Instead, a reasonable aroximating function for F N (x) would be F (x) 1fx 0g: : 3

where 1fAg is the indicator function of the statement A; i.e., it equals one if A is true and is zero otherwise. The function F (x) is the c.d.f. of a degenerate random variable X which equals zero with robability one, which is a retty good aroximation for F N (x) when N is large for this roblem. The last examle convergence of the c.d.f. F N (x) to the c.d.f. of a degenerate ( nonrandom ) random variable is a secial case of the other key convergence concet for asymtotic theory, convergence in robability. Following Ruud s text, we de ne convergence in robability of the sequence of random vectors to the (nonrandom) vector x, denoted or if i.e., lim = x N!1! x as N! 1 d! x; F N (c) Prf cg! 1fx cg whenever all comonents of x are di erent from those of c. (The symbol lim stands for the term robabilty limit.) We can generalize this de nition of convergence in robability of to a constant x to ermit to converge in robability to a nondegenerate random vector X by secifying that! X means that is that X d! 0: An equivalent de nition of convergence in robability, which is the de nition usually seen in textbooks,! x if, for any number " > 0; Prfjj xjj >"g! 0 4

as N! 1: Thus, converges in robability to x if all the robability mass for eventually ends u in an "-neighborhood of x; no matter how small " is. Showing the equivalence of these two de nitions of robability limit is a routine exercise in robability theory and real analysis. A stronger form of convergence of to a nonrandom vector x; which can often be used to show convergence in robability, is quadratic mean convergence. We say qm X n! x as N! 1 if E[jj xjj 2 ]! 0: If this condition can be shown, then it follows that converges in robability to x because of Markov s Inequality: If Z is a nonnegative (scalar) random variable, i.e., PrfZ 0g = 1; then for any K > 0: PrfZ > Kg E(Z) K The roof of this inequality is worth remembering; it is based uon a decomosition of E(Z) (which may be in nite) as E(Z) = Z 1 0 Z K 0 zdf Z (z) zdf Z (z) + Z 1 K (z K)dF Z (z) + Since the rst two terms in this sum are nonnegative, it follows that E(Z) Z 1 K Z 1 K KdF Z (z) = K PrfZ > Kg; KdF Z (z): from which Markov s inequality immediately follows. A better-known variant of this inequality, Tchebyshe s Inequality, takes Z = (X E(X)) 2 for a scalar random variable X (with nite second moment) and K = " 2 for any " > 0; from which it follows that PrfjX E(X)j > "g = Prf(X E(X)) 2 > " 2 g V ar(x) " 2 ; since V ar(x) = E(Z) for this examle. A similar consequence of Markov s inequality is that convergence in quadratic mean imlies convergence in robability: if E[jj xjj 2 ]! 0; 5

then Prfjj xjj >"g E[jj xjj 2 ] " 2! 0 for any " > 0: There is another, stronger version of convergence of a random variable to a limit, known as almost sure convergence (or convergence almost everywhere or convergence with robability one), which is denoted a:s:! x: Rather than give the detailed de nition of this form of convergence which is best left to a more advanced course it is worth ointing out that it is indeed stronger than convergence in robability (which is sometimes called weak convergence for that reason), but that, for the usual uroses of asymtotic theory, the weaker concet is su cient. Limit Theorems With the concets of convergence in robability and distribution, we can more recisely state how a samle average is aroximately equal to its exectation, or how its standardized version is aroximately normal. The rst result, on convergence in robability, is Weak Law of Large Numbers (WLLN): If 1 N P N i=1 X i is a samle average of scalar, i.i.d. random variables fx i g N i=1 which have E(X i) = and V ar(x i ) = 2 nite, then! = E(Xi ): Proof: By the usual calculations for means and variances of sums of i.i.d. random variables, E( ) = ; V ar( ) = 2 N = E( mean, and thus in robability. ) 2 ; which tends to zero as N! 1: By de nition, XN tends to in quadratic Insection of the roof of the WLLN suggests a more general result for averages of random variables that are not i.i.d. namely, that E( )! 0 if 1 N 2 NX i=1 j=1 NX Cov(X i ; X j )! 0: 6

This general condition can be secialized to yield other LLNs for heterogeneous (non-constant variances) or deendent (nonzero covariances) data. Other laws of large numbers exist which relax conditions on the existence of second moments, and which show almost sure convergence (and are known as strong laws of large numbers ). The next main result, on aroximate normality of samle averages, again alies to i.i.d. data: Central Limit Theorem (Lindeberg-Levy): If XN 1 N P N i=1 X i is a samle average of scalar, i.i.d. random variables fx i g N i=1 which have E(X i) = and V ar(x i ) = 2 nite, then X Z N N E( X N ) V ar( XN ) = XN N d! N(0; 1): The roof of this result is a bit more involved, requiring maniulation of characteristic functions, and will not be resented here. The result of this CLT is often rewritten as N( XN ) d! N(0; 2 ); which oints out that converges to its mean at exactly the rate N increases, so that the roduct eventually balances out to yield a random variable with a normal distribution. Another way to rewrite the consequence of the CLT is A N ; 2 ; N where the symbol means A is aroximately distributed as (or is asymtotically distributed as ). It is very imortant here to make the distinction between the limiting distribution of the standardized mean Z N ; which is a standard normal, and the asymtotic distribution of the (non-standardized) samle mean ; which is actually a sequence of aroximating normal distributions with variances that shrink to zero at seed 1=N: In short, a limiting distribution cannot deend uon N; which has assed to its (in nite) limit, while an asymtotic distribution can involve the samle size N: Like the Weak Law of Large Numbers, the Lindeberg-Levy Central Limit Theorem admits a number of di erent generalizations (often named after their authors) which relax the assumtion of i.i.d. data while 7

imosing various stronger restrictions on the existence of moments or the tail behavior of the true data distributions. To this oint, the WLLN and CLT aly only to scalar random variables fx i g; to extend them to random vectors fx i g; we can use the intriguingly-named Cramér-Wold device, which is really a theorem that states that if the scalar random variable Y N 0 converges in distribution to Y 0 X for any xed vector having the same dimension as that is, if 0 d! 0 X for any then the vector converges in distribution to X; d! X: This result immediately yields a multivariate version of the Lindeberg-Levy CLT: a samle mean of i.i.d. random vectors fx i g with E(X i ) = and V (X i ) = has N( ) d! N (0; ) : And, since convergence in robability is a secial case of convergence in distribution, the Cramér-Wald device also immediately yields a multivariate version of the WLLN. Slutsky Theorems To extend the limit theorems for samle averages to statistics which are functions of samle averages, asymtotic theory uses smoothness roerties of those functions (i.e., continuity and di erentiability) to aroximate those functions by olynomials, usually constant or linear functions. The simlest of these aroximation results is the continuity theorem, which states that robability limits share an imortant roerty of ordinary limits namely, the robability limit of a continuous function is the value of that function evaluated at the robability limit: 8

Continuity Theorem: If! x0 and the function g(x) is continuous at x = x 0 ; then g( )! g(x 0 ): Proof: For any " > 0; continuity of the function g(x) at x 0 imlies that there is some > 0 so that jj x 0 jj imlies that jjg( ) g(x 0 )jj ": Thus Prfjjg( ) g(x 0 )jj "g Prfjj x 0 jj g: But the robability on the right-hand side converges to one because! x0 ; so the robability on the left-hand side does, too. With the continuity theorem, it should be clear that robability limits are more conveniently maniulated than, say, mathematical exectations, since, for examle, if ^ YN is an estimator of the ratio of the means of Y i and X i ; 0 E(Y i) E(X i ) Y X ; then ^ N! 0 as long as X 6= 0; even though E(^ N ) 6= 0 in general (and may not even be well-de ned.. Another key aroximation result is Slutsky s Theorem, which states that the sum (or roduct) of a random vector that converges in robability and another that converges in distribution itself converges in distribution: Slutsky s Theorem: If the random vectors and Y N have the same length, and if! x0 and Y N d! Y; then XN + Y N d! x0 + Y and X 0 N Y N d! x 0 0 Y. The tyical alication of Slutsky s theorem is to estimators that can be reresented in terms of a linear aroximation involving some statistics that converge either in robability or distribution. To be more concrete, suose the (scalar) estimator ^ N can be decomosed as N(^N 0 ) = X 0 NY N + Z N ; where! x0 ; d Y N! N(0; ); and Z N! 0: 9

This decomosition is often obtained by a rst-order Taylor s series exansion of ^ N ; and the comonents ; Y N ; and Z N are samle averages to which a LLN or CLT can be alied; an examle is the so-called delta method discussed below. Under these conditions, Slutsky s theorem imlies that N(^N 0 ) d! N(0; x 0 0x 0 ): Both the continuity theorem and Slutsky s theorem are secial cases of the continuous maing theorem, which says that convergence in distribution, like convergence in robability, interchanges with continuous functions, as long as they are continuous everywhere: g(x): Continuous Maing Theorem: If d! X and the function g(x) is continous for all x; then g(xn ) d! The continuity requirement for g(x) can be relaxed to hold only on the suort of X if needed. The continuous maing theorem can be used to get aroximations for the distributions of test statistics based uon asymtotically-normal estimators; for examle, if Z N N(^ N 0 ) d! N (0; I) ; then T Z 0 NZ N = N(^ N 0 ) 0 (^ N 0 ) d! 2 ; where dimf^ N g: The nal aroximation result discussed here, which is an alication of Slutsky s theorem, is the so-called delta method, which states that continuously-di erentiable functions of asymtotically-normal statistics are themselves asymtotically normal: The Delta Method : If ^ N is a random vector which is asymtotically normal, with N(^N 0 ) d! N (0; ) for some 0 ; and if g() is continuously di erentiable at = 0 with Jacobian matrix G 0 @g( 0) @ 0 10

that has full row rank, then N(g(^N ) g( 0 ))! d N 0; G 0 G 0 0 : Proof: First suose that g() is scalar-valued. Since ^ N converges in robability to 0 (by Slutsky s Theorem), and since @g()=@ 0 is continuous at 0 ; the Mean Value Theorem of di erential calculus imlies that, g(^ N ) = g( 0 ) + @g( N ) @ 0 (^ N 0 ); for some value of N between ^ N and 0 (with arbitrarily high robability). Since ^ N that N! 0 ; so Thus, by Slutsky s Theorem, @g( N ) @ 0! G 0 @g( 0) @ 0 :! 0, it follows N(g(^ N ) g( 0 )) = @g( N ) @ 0 : N(^ N 0 ) d! G 0 N(0; ) = N(0; G 0 G 0 0): To extend this argument to the case where g(^ N ) is vector-valued, use the same argument to show that any (scalar) linear combination 0 g(^) has the asymtotic distribution N( 0 g(^ N ) 0 g( 0 )) d! N(0; 0 G 0 G 0 0); from which the result follows from the Cramér-Wald device. 11